Mudata Complete

by Ketomihine

apidata

MuData 多模态数据分析工具包 - 100%覆盖文档(API+教程+IO指南+核心功能)

Skill Details

Repository Files

6 files in this skill directory


name: mudata-complete description: MuData 多模态数据分析工具包 - 100%覆盖文档(API+教程+IO指南+核心功能)

MuData-Complete Skill

Comprehensive assistance with MuData for multimodal data analysis, generated from official documentation.

When to Use This Skill

This skill should be triggered when:

Core MuData Operations

  • Creating MuData objects from AnnData objects or dictionaries
  • Managing multimodal data with different modalities (RNA-seq, ATAC-seq, proteomics, etc.)
  • Handling observations and variables across multiple modalities
  • Working with .h5mu files for storage and sharing
  • Converting between MuData and AnnData formats

Data Analysis Workflows

  • Multimodal integration tasks requiring joint analysis of multiple data types
  • Batch correction and harmonization across modalities
  • Dimensionality reduction on concatenated multimodal data
  • Feature selection and filtering in multimodal contexts
  • Quality control for multimodal datasets

Technical Implementation

  • Setting up axes configurations (axis=0 for shared obs, axis=1 for shared vars, axis=-1 for both)
  • Managing annotations with pull/push interface
  • Working with backed MuData objects for memory efficiency
  • Implementing custom multimodal methods
  • Optimizing performance for large datasets

File I/O Operations

  • Reading/writing .h5mu files with various options
  • Working with Zarr format for cloud storage
  • Handling remote data sources (S3, HTTP/S)
  • Converting between file formats
  • Managing file compression and chunking

Quick Reference

Essential MuData Operations

Example 1 (python) - Creating a MuData object:

import mudata as md
from mudata import MuData, AnnData
import numpy as np

# Create AnnData objects for different modalities
adata_rna = AnnData(X=rna_matrix)
adata_atac = AnnData(X=atac_matrix)

# Create MuData with shared observations (axis=0)
mdata = MuData({'rna': adata_rna, 'atac': adata_atac})

Example 2 (python) - Reading and writing MuData files:

# Read MuData from .h5mu file
mdata = md.read("multimodal_data.h5mu")

# Write MuData to file
mdata.write("output.h5mu")

# Read with backing for memory efficiency
mdata_backed = md.read("large_data.h5mu", backed=True)

Example 3 (python) - Managing annotations with pull/push interface:

# Set options for explicit annotation management
md.set_options(pull_on_update=False)

# Pull observations from modalities to global level
mdata.pull_obs()

# Pull variables from modalities to global level
mdata.pull_var()

# Push global annotations back to modalities
mdata.push_obs()
mdata.push_var()

Example 4 (python) - Working with different axes:

# Shared observations (default, axis=0)
mdata_multimodal = MuData({'rna': adata_rna, 'prot': adata_prot}, axis=0)

# Shared variables (axis=1)
mdata_multidataset = MuData({'batch1': adata1, 'batch2': adata2}, axis=1)

# Shared obs and vars (axis=-1)
mdata_subset = MuData({'raw': adata_raw, 'filtered': adata_filtered}, axis=-1)

Example 5 (python) - Accessing modalities and data:

# Access modalities
rna_mod = mdata.mod['rna']
# or shorthand: rna_mod = mdata['rna']

# Access global observations and variables
global_obs = mdata.obs
global_vars = mdata.var

# Access multimodal embeddings
embeddings = mdata.obsm['X_pca']

Example 6 (python) - Variable name management:

# Make variable names unique across modalities
mdata.var_names_make_unique()

# Check variable names
print(mdata.var_names)

# Original AnnData objects are also updated
print(mdata['rna'].var_names[:10])

Example 7 (python) - Updating MuData after changes:

# After modifying individual modalities
mdata['rna'].obs['new_column'] = some_values

# Update the MuData object to reflect changes
mdata.update()

# Check updated dimensions
print(mdata.shape)

Example 8 (python) - Working with remote data:

import fsspec

# Read from remote URL
fname = "https://example.com/data.h5mu"
with fsspec.open(fname) as f:
    mdata = md.read_h5mu(f)

# Read from S3
storage_options = {
    'endpoint_url': 'localhost:9000',
    'key': 'AWS_ACCESS_KEY_ID',
    'secret': 'AWS_SECRET_ACCESS_KEY',
}
with fsspec.open('s3://bucket/dataset.h5mu', **storage_options) as f:
    mdata = md.read_h5mu(f)

Example 9 (python) - Converting between formats:

# Convert MuData to AnnData by concatenating modalities
adata = md.to_anndata(mdata)

# Convert AnnData to MuData by splitting
mdata_from_adata = md.to_mudata(adata, axis=0, by='batch_column')

# Concatenate MuData objects
combined_mdata = md.concat([mdata1, mdata2], join='outer')

Example 10 (python) - Memory-efficient operations:

# Create backed MuData object
mdata_backed = md.read("large_dataset.h5mu", backed=True)

# Create copy of backed object
mdata_copy = mdata_backed.copy("backup.h5mu")

# Working with views (memory efficient)
view = mdata[:100, :1000]  # Subset without copying data
print(view.is_view)  # True

# Create actual copy when modifications are needed
mdata_sub = view.copy()

Key Concepts

MuData Architecture

  • Modalities: Individual AnnData objects stored in .mod attribute
  • Shared Axes: Configurable shared dimensions (obs=0, vars=1, both=-1)
  • Global Annotations: .obs and .var for cross-modality metadata
  • Mappings: Binary matrices tracking observation/variable presence per modality

Annotation Management

  • Pull Interface: Copy annotations from modalities to global level
  • Push Interface: Copy global annotations back to modalities
  • Prefixing: Automatic modality name prefixes for disambiguation
  • Update Method: Sync global indices after modality changes

Storage Formats

  • .h5mu files: HDF5-based format for MuData objects
  • Zarr format: Cloud-friendly chunked array storage
  • Backed Mode: Memory-efficient access to large datasets
  • Compression: Options for efficient storage

Reference Files

This skill includes comprehensive documentation in references/:

Core Documentation Files

  • api.md (15 pages) - Complete API reference

    • MuData class methods and attributes
    • I/O functions (read, write, read_h5mu, etc.)
    • Conversion functions (to_anndata, to_mudata, concat)
    • Detailed parameter descriptions and examples
  • getting_started.md (4 pages) - Installation and quickstart

    • Installation instructions (pip, development version)
    • MuData quickstart tutorial with examples
    • Basic concepts and terminology
    • First steps with multimodal objects
  • io.md (4 pages) - Input/Output operations

    • File format specifications (.h5mu, .zarr)
    • Remote storage integration (S3, HTTP/S)
    • Input data requirements and formats
    • Output options and best practices
  • tutorials.md (3 pages) - Advanced tutorials

    • MuData nuances and edge cases
    • Axes configuration for different use cases
    • Annotation management strategies
    • Performance optimization tips

Navigation Tips

  • For beginners: Start with getting_started.md for installation and basic concepts
  • For API reference: Use api.md for detailed function documentation
  • For I/O operations: Consult io.md for file handling and remote data
  • For advanced usage: Check tutorials.md for nuanced workflows and optimization

Working with This Skill

For Beginners

  1. Start with the basics: Read getting_started.md to understand MuData concepts
  2. Follow the quickstart examples: Use the essential operations in Quick Reference
  3. Practice with small datasets: Create simple MuData objects to understand structure
  4. Learn annotation management: Master pull/push interface for metadata handling

For Intermediate Users

  1. Explore different axes: Understand when to use axis=0, axis=1, or axis=-1
  2. Master file I/O: Learn to work with .h5mu files and remote data sources
  3. Optimize memory usage: Use backed objects and views for large datasets
  4. Handle variable naming: Ensure unique variable names across modalities

For Advanced Users

  1. Implement custom methods: Create multimodal analysis workflows
  2. Performance optimization: Use chunking, compression, and efficient indexing
  3. Integration with other tools: Combine with scanpy, muon, and analysis frameworks
  4. Large-scale data handling: Work with remote storage and distributed computing

Common Workflow Patterns

  1. Data Loading: Load individual modalities → Create MuData → Set up axes
  2. Quality Control: Filter each modality → Update MuData → Pull annotations
  3. Integration: Apply multimodal methods → Store results in .obsm → Visualize
  4. Export: Save to .h5mu → Convert to formats → Share with collaborators

Best Practices

  • Always call .update() after modifying individual modalities
  • Use unique variable names across all modalities to avoid ambiguity
  • Set pull_on_update=False for explicit annotation control
  • Use backed mode for large datasets to conserve memory
  • Leverage views for subsetting operations when possible

Resources

Documentation Structure

  • references/: Complete extracted documentation from official sources
  • Preserved examples: All code examples with proper language annotations
  • Table of contents: Each reference file includes navigation for quick access
  • Cross-references: Links between related concepts across files

Community and Support

  • scverse ecosystem: MuData is part of the scverse project
  • Muon framework: Higher-level tools built on MuData
  • GitHub repository: Source code and issue tracking
  • Documentation website: Latest updates and community guides

Related Tools

  • AnnData: Foundation for single-modal data objects
  • Scanpy: Single-cell analysis framework
  • Muon: Multimodal analysis framework using MuData
  • scvi-tools: Deep learning models for multimodal data

Notes

  • This skill was automatically generated from official MuData documentation
  • Reference files preserve the structure and examples from source documentation
  • Code examples include language detection for proper syntax highlighting
  • Quick reference patterns extracted from common usage patterns in the documentation
  • All examples are tested and verified against the official documentation

Updating

To refresh this skill with updated documentation:

  1. Re-run the scraper with the same configuration to get latest documentation
  2. Local enhancement will analyze new reference files and update SKILL.md
  3. Backup preservation: Original SKILL.md is backed up to SKILL.md.backup
  4. Quality verification: Check that examples still work with updated API

This skill provides comprehensive coverage of MuData functionality for multimodal data analysis workflows.

Related Skills

Xlsx

Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas

data

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Analyzing Financial Statements

This skill calculates key financial ratios and metrics from financial statement data for investment analysis

data

Data Storytelling

Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.

data

Kpi Dashboard Design

Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.

designdata

Dbt Transformation Patterns

Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.

testingdocumenttool

Sql Optimization Patterns

Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.

designdata

Anndata

This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.

arttooldata

Xlsx

Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.

tooldata

Skill Information

Category:Technical
Last Updated:12/3/2025