Mudata Complete
by Ketomihine
MuData 多模态数据分析工具包 - 100%覆盖文档(API+教程+IO指南+核心功能)
Skill Details
Repository Files
6 files in this skill directory
name: mudata-complete description: MuData 多模态数据分析工具包 - 100%覆盖文档(API+教程+IO指南+核心功能)
MuData-Complete Skill
Comprehensive assistance with MuData for multimodal data analysis, generated from official documentation.
When to Use This Skill
This skill should be triggered when:
Core MuData Operations
- Creating MuData objects from AnnData objects or dictionaries
- Managing multimodal data with different modalities (RNA-seq, ATAC-seq, proteomics, etc.)
- Handling observations and variables across multiple modalities
- Working with .h5mu files for storage and sharing
- Converting between MuData and AnnData formats
Data Analysis Workflows
- Multimodal integration tasks requiring joint analysis of multiple data types
- Batch correction and harmonization across modalities
- Dimensionality reduction on concatenated multimodal data
- Feature selection and filtering in multimodal contexts
- Quality control for multimodal datasets
Technical Implementation
- Setting up axes configurations (axis=0 for shared obs, axis=1 for shared vars, axis=-1 for both)
- Managing annotations with pull/push interface
- Working with backed MuData objects for memory efficiency
- Implementing custom multimodal methods
- Optimizing performance for large datasets
File I/O Operations
- Reading/writing .h5mu files with various options
- Working with Zarr format for cloud storage
- Handling remote data sources (S3, HTTP/S)
- Converting between file formats
- Managing file compression and chunking
Quick Reference
Essential MuData Operations
Example 1 (python) - Creating a MuData object:
import mudata as md
from mudata import MuData, AnnData
import numpy as np
# Create AnnData objects for different modalities
adata_rna = AnnData(X=rna_matrix)
adata_atac = AnnData(X=atac_matrix)
# Create MuData with shared observations (axis=0)
mdata = MuData({'rna': adata_rna, 'atac': adata_atac})
Example 2 (python) - Reading and writing MuData files:
# Read MuData from .h5mu file
mdata = md.read("multimodal_data.h5mu")
# Write MuData to file
mdata.write("output.h5mu")
# Read with backing for memory efficiency
mdata_backed = md.read("large_data.h5mu", backed=True)
Example 3 (python) - Managing annotations with pull/push interface:
# Set options for explicit annotation management
md.set_options(pull_on_update=False)
# Pull observations from modalities to global level
mdata.pull_obs()
# Pull variables from modalities to global level
mdata.pull_var()
# Push global annotations back to modalities
mdata.push_obs()
mdata.push_var()
Example 4 (python) - Working with different axes:
# Shared observations (default, axis=0)
mdata_multimodal = MuData({'rna': adata_rna, 'prot': adata_prot}, axis=0)
# Shared variables (axis=1)
mdata_multidataset = MuData({'batch1': adata1, 'batch2': adata2}, axis=1)
# Shared obs and vars (axis=-1)
mdata_subset = MuData({'raw': adata_raw, 'filtered': adata_filtered}, axis=-1)
Example 5 (python) - Accessing modalities and data:
# Access modalities
rna_mod = mdata.mod['rna']
# or shorthand: rna_mod = mdata['rna']
# Access global observations and variables
global_obs = mdata.obs
global_vars = mdata.var
# Access multimodal embeddings
embeddings = mdata.obsm['X_pca']
Example 6 (python) - Variable name management:
# Make variable names unique across modalities
mdata.var_names_make_unique()
# Check variable names
print(mdata.var_names)
# Original AnnData objects are also updated
print(mdata['rna'].var_names[:10])
Example 7 (python) - Updating MuData after changes:
# After modifying individual modalities
mdata['rna'].obs['new_column'] = some_values
# Update the MuData object to reflect changes
mdata.update()
# Check updated dimensions
print(mdata.shape)
Example 8 (python) - Working with remote data:
import fsspec
# Read from remote URL
fname = "https://example.com/data.h5mu"
with fsspec.open(fname) as f:
mdata = md.read_h5mu(f)
# Read from S3
storage_options = {
'endpoint_url': 'localhost:9000',
'key': 'AWS_ACCESS_KEY_ID',
'secret': 'AWS_SECRET_ACCESS_KEY',
}
with fsspec.open('s3://bucket/dataset.h5mu', **storage_options) as f:
mdata = md.read_h5mu(f)
Example 9 (python) - Converting between formats:
# Convert MuData to AnnData by concatenating modalities
adata = md.to_anndata(mdata)
# Convert AnnData to MuData by splitting
mdata_from_adata = md.to_mudata(adata, axis=0, by='batch_column')
# Concatenate MuData objects
combined_mdata = md.concat([mdata1, mdata2], join='outer')
Example 10 (python) - Memory-efficient operations:
# Create backed MuData object
mdata_backed = md.read("large_dataset.h5mu", backed=True)
# Create copy of backed object
mdata_copy = mdata_backed.copy("backup.h5mu")
# Working with views (memory efficient)
view = mdata[:100, :1000] # Subset without copying data
print(view.is_view) # True
# Create actual copy when modifications are needed
mdata_sub = view.copy()
Key Concepts
MuData Architecture
- Modalities: Individual AnnData objects stored in
.modattribute - Shared Axes: Configurable shared dimensions (obs=0, vars=1, both=-1)
- Global Annotations:
.obsand.varfor cross-modality metadata - Mappings: Binary matrices tracking observation/variable presence per modality
Annotation Management
- Pull Interface: Copy annotations from modalities to global level
- Push Interface: Copy global annotations back to modalities
- Prefixing: Automatic modality name prefixes for disambiguation
- Update Method: Sync global indices after modality changes
Storage Formats
- .h5mu files: HDF5-based format for MuData objects
- Zarr format: Cloud-friendly chunked array storage
- Backed Mode: Memory-efficient access to large datasets
- Compression: Options for efficient storage
Reference Files
This skill includes comprehensive documentation in references/:
Core Documentation Files
-
api.md(15 pages) - Complete API reference- MuData class methods and attributes
- I/O functions (read, write, read_h5mu, etc.)
- Conversion functions (to_anndata, to_mudata, concat)
- Detailed parameter descriptions and examples
-
getting_started.md(4 pages) - Installation and quickstart- Installation instructions (pip, development version)
- MuData quickstart tutorial with examples
- Basic concepts and terminology
- First steps with multimodal objects
-
io.md(4 pages) - Input/Output operations- File format specifications (.h5mu, .zarr)
- Remote storage integration (S3, HTTP/S)
- Input data requirements and formats
- Output options and best practices
-
tutorials.md(3 pages) - Advanced tutorials- MuData nuances and edge cases
- Axes configuration for different use cases
- Annotation management strategies
- Performance optimization tips
Navigation Tips
- For beginners: Start with
getting_started.mdfor installation and basic concepts - For API reference: Use
api.mdfor detailed function documentation - For I/O operations: Consult
io.mdfor file handling and remote data - For advanced usage: Check
tutorials.mdfor nuanced workflows and optimization
Working with This Skill
For Beginners
- Start with the basics: Read
getting_started.mdto understand MuData concepts - Follow the quickstart examples: Use the essential operations in Quick Reference
- Practice with small datasets: Create simple MuData objects to understand structure
- Learn annotation management: Master pull/push interface for metadata handling
For Intermediate Users
- Explore different axes: Understand when to use axis=0, axis=1, or axis=-1
- Master file I/O: Learn to work with .h5mu files and remote data sources
- Optimize memory usage: Use backed objects and views for large datasets
- Handle variable naming: Ensure unique variable names across modalities
For Advanced Users
- Implement custom methods: Create multimodal analysis workflows
- Performance optimization: Use chunking, compression, and efficient indexing
- Integration with other tools: Combine with scanpy, muon, and analysis frameworks
- Large-scale data handling: Work with remote storage and distributed computing
Common Workflow Patterns
- Data Loading: Load individual modalities → Create MuData → Set up axes
- Quality Control: Filter each modality → Update MuData → Pull annotations
- Integration: Apply multimodal methods → Store results in .obsm → Visualize
- Export: Save to .h5mu → Convert to formats → Share with collaborators
Best Practices
- Always call
.update()after modifying individual modalities - Use unique variable names across all modalities to avoid ambiguity
- Set
pull_on_update=Falsefor explicit annotation control - Use backed mode for large datasets to conserve memory
- Leverage views for subsetting operations when possible
Resources
Documentation Structure
references/: Complete extracted documentation from official sources- Preserved examples: All code examples with proper language annotations
- Table of contents: Each reference file includes navigation for quick access
- Cross-references: Links between related concepts across files
Community and Support
- scverse ecosystem: MuData is part of the scverse project
- Muon framework: Higher-level tools built on MuData
- GitHub repository: Source code and issue tracking
- Documentation website: Latest updates and community guides
Related Tools
- AnnData: Foundation for single-modal data objects
- Scanpy: Single-cell analysis framework
- Muon: Multimodal analysis framework using MuData
- scvi-tools: Deep learning models for multimodal data
Notes
- This skill was automatically generated from official MuData documentation
- Reference files preserve the structure and examples from source documentation
- Code examples include language detection for proper syntax highlighting
- Quick reference patterns extracted from common usage patterns in the documentation
- All examples are tested and verified against the official documentation
Updating
To refresh this skill with updated documentation:
- Re-run the scraper with the same configuration to get latest documentation
- Local enhancement will analyze new reference files and update SKILL.md
- Backup preservation: Original SKILL.md is backed up to SKILL.md.backup
- Quality verification: Check that examples still work with updated API
This skill provides comprehensive coverage of MuData functionality for multimodal data analysis workflows.
Related Skills
Xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Analyzing Financial Statements
This skill calculates key financial ratios and metrics from financial statement data for investment analysis
Data Storytelling
Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.
Kpi Dashboard Design
Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Sql Optimization Patterns
Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
Xlsx
Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.
