Scvitools
by Ketomihine
Comprehensive skill for scvi-tools - Deep probabilistic models for single-cell omics analysis. Use for scVI, scANVI, totalVI, MultiVI models, single-cell RNA-seq integration, batch correction, differential expression, and multimodal data analysis.
Skill Details
Repository Files
6 files in this skill directory
name: scvitools description: Comprehensive skill for scvi-tools - Deep probabilistic models for single-cell omics analysis. Use for scVI, scANVI, totalVI, MultiVI models, single-cell RNA-seq integration, batch correction, differential expression, and multimodal data analysis.
Scvitools Skill
Comprehensive assistance with scvi-tools development and single-cell omics analysis using deep probabilistic models.
When to Use This Skill
This skill should be triggered when:
Core scvi-tools Tasks:
- Working with scvi-tools models (scVI, scANVI, totalVI, MultiVI, etc.)
- Setting up AnnData objects for scvi-tools analysis
- Performing batch correction and data integration
- Running differential expression analysis
- Analyzing single-cell RNA-seq, ATAC-seq, or multimodal data
- Implementing custom model classes or modules
Analysis and Visualization:
- Getting latent representations and embeddings
- Creating UMAP/tSNE visualizations from scvi-tools outputs
- Interpreting model results and biological insights
- Working with spatial transcriptomics data
Development and Advanced Tasks:
- Building custom scvi-tools models
- Hyperparameter tuning with scvi.autotune
- Model evaluation and benchmarking
- Integration with Scanpy workflows
- Debugging scvi-tools code and installation issues
Data Processing:
- Preprocessing single-cell data for scvi-tools
- Setting up data registration with setup_anndata
- Handling batch effects and covariates
- Working with count data and normalization
Quick Reference
Core Model Setup and Training
Basic scVI Model Setup
import scvi
import scanpy as sc
# Setup AnnData for scVI
scvi.model.SCVI.setup_anndata(
adata,
batch_key="batch",
labels_key="cell_type"
)
# Create and train model
model = scvi.model.SCVI(adata)
model.train()
# Get latent representation
latent = model.get_latent_representation()
adata.obsm["X_scVI"] = latent
Differential Expression Analysis
# 1-vs-1 DE test
de_results = model.differential_expression(
groupby="cell_type",
group1="T-cell",
group2="B-cell"
)
# 1-vs-all DE test
de_results_all = model.differential_expression(
groupby="cell_type",
mode="change"
)
Data Integration and Batch Correction
Multimodal Data with totalVI
import scvi
from scvi.external import TOTALVI
# Setup for RNA + protein data
TOTALVI.setup_anndata(
adata,
batch_key="batch",
protein_expression_obsm_key="protein_expression"
)
model = TOTALVI(adata)
model.train()
# Get normalized RNA and protein
rna_norm = model.get_normalized_expression()
protein_norm = model.get_protein_foregrounds()
Spatial Transcriptomics with GIMVI
from scvi.external import GIMVI
# Setup spatial and seq data
spatial_adata = ... # spatial data
seq_adata = ... # single-cell seq data
model = GIMVI(seq_adata, spatial_adata)
model.train()
# Get latent representations for both modalities
spatial_latent = model.get_latent_representation(spatial_adata)
seq_latent = model.get_latent_representation(seq_adata)
Advanced Model Customization
Custom Model Class
from scvi.model.base import BaseModelClass, UnsupervisedTrainingMixin
from scvi.module import VAE
class CustomModel(UnsupervisedTrainingMixin, BaseModelClass):
def __init__(self, adata, n_latent=30):
super().__init__(adata)
self.module = VAE(
n_input=self.summary_stats["n_vars"],
n_batch=self.summary_stats["n_batch"],
n_latent=n_latent,
)
self._model_summary_string = f"CustomModel with n_latent: {n_latent}"
self.init_params_ = self._get_init_params(locals())
@classmethod
def setup_anndata(cls, adata, batch_key=None, layer=None):
setup_method_args = cls._get_setup_method_args(**locals())
anndata_fields = [
LayerField(REGISTRY_KEYS.X_KEY, layer, is_count_data=True),
CategoricalObsField(REGISTRY_KEYS.BATCH_KEY, batch_key),
]
adata_manager = AnnDataManager(fields=anndata_fields, setup_method_args=setup_method_args)
adata_manager.register_fields(adata, **kwargs)
cls.register_manager(adata_manager)
Hyperparameter Tuning
Automated Hyperparameter Search
import ray
from ray import tune
from scvi import autotune
# Define search space
search_space = {
"model_params": {
"n_hidden": tune.choice([64, 128, 256]),
"n_layers": tune.choice([1, 2, 3])
},
"train_params": {
"max_epochs": 100,
"plan_kwargs": {"lr": tune.loguniform(1e-4, 1e-2)}
}
}
# Run tuning
results = autotune.run_autotune(
scvi.model.SCVI,
data=adata,
mode="min",
metrics="validation_loss",
search_space=search_space,
num_samples=5,
resources={"cpu": 10, "gpu": 1}
)
ATAC-seq Analysis
scBasset for scATAC-seq
from scvi.external import ScBasset
# Setup ATAC data
ScBasset.setup_anndata(adata, batch_key="batch")
# Create and train model
model = ScBasset(adata)
model.train()
# Get latent representation
latent = model.get_latent_representation()
adata.obsm["X_scBasset"] = latent
# Score TF activity
tf_activities = model.score_tf_activity("motif_library_path")
Spatial Data Analysis
ResolVI for Spatial Transcriptomics
from scvi.external import RESOLVI
# Setup spatial data
RESOLVI.setup_anndata(
adata,
batch_key="slice",
labels_key="cell_type"
)
# Train model
model = RESOLVI(adata)
model.train()
# Get corrected counts
corrected_counts = model.get_corrected_counts()
# Differential abundance in spatial neighborhoods
da_results = model.differential_abundance(
groupby="cell_type",
group1="neuron_layer1",
group2="neuron_layer2"
)
Model Evaluation and Visualization
Model Quality Assessment
# Get ELBO (reconstruction quality)
elbo_score = model.get_elbo()
# Get reconstruction error
recon_error = model.get_reconstruction_error()
# Visualize latent space
sc.pp.neighbors(adata, use_rep="X_scVI")
sc.tl.umap(adata)
sc.pl.umap(adata, color=["cell_type", "batch"])
Installation and Setup
Installation with GPU Support
# Basic CPU installation
pip install scvi-tools
# GPU support for Linux
pip install scvi-tools[cuda]
# Apple Silicon (MPS) support
pip install scvi-tools[metal]
# Full installation with all dependencies
pip install scvi-tools[all,tutorials,jax]
Environment Setup
import scvi
import torch
# Set seed for reproducibility
scvi.settings.seed = 0
# Configure GPU settings
torch.set_float32_matmul_precision("high")
# Check version
print(f"scvi-tools version: {scvi.__version__}")
Key Concepts
Core Models:
- scVI: Single-cell Variational Inference for batch correction and integration
- scANVI: Semi-supervised scVI for cell type annotation
- totalVI: Total Variational Inference for joint RNA + protein (CITE-seq) data
- MultiVI: Multimodal Variational Inference for paired + unpaired data
- GIMVI: Generative Integrative Modeling for spatial + single-cell data
Data Structures:
- AnnData: Core data structure for single-cell data
- AnnDataManager: scvi-tools data registry and validation system
- setup_anndata(): Required preprocessing step to register data with scvi-tools
Model Architecture:
- BaseModelClass: Abstract base for all scvi-tools models
- BaseModuleClass: Abstract base for neural network modules
- VAEMixin: Provides VAE-specific methods (get_latent_representation, etc.)
Reference Files
references/api_reference.md
Comprehensive API Documentation - Complete reference for all classes and methods:
- Developer API for custom model building
- Data registration utilities (AnnDataManager, AnnDataFields)
- Model base classes and mixins
- Module building blocks (encoders, decoders)
- Training plans and utilities
references/getting_started.md
Installation and Tutorials - Entry points for learning scvi-tools:
- Complete installation guide (CPU, GPU, dependencies)
- Introduction to scvi-tools workflow
- gimVI tutorial for spatial transcriptomics
- Basic setup and data preparation examples
references/tutorials.md
In-depth Tutorial Collection - 60+ pages of detailed tutorials:
- Topic modeling with Amortized LDA
- scBasset for scATAC-seq analysis
- ResolVI for spatial transcriptomics correction
- SHAP and IntegratedGradients for model interpretability
- Advanced use cases and specialized analyses
references/user_guide.md
Comprehensive User Guide - Detailed workflow documentation:
- Complete scvi-tools workflow overview
- Data loading and preprocessing best practices
- Model creation, training, and saving
- Integration with Scanpy for downstream analysis
- Visualization and interpretation techniques
Working with This Skill
For Beginners
Start Here:
- Read the installation guide in
references/getting_started.md - Follow the basic scvi-tools tutorial for data setup
- Practice with the Quick Reference examples above
- Focus on basic scVI workflows first
Recommended Learning Path:
- Install scvi-tools and verify setup
- Load a sample dataset and run
setup_anndata() - Create and train a basic scVI model
- Extract latent representations and create visualizations
- Perform simple differential expression analysis
For Intermediate Users
Expand Your Skills:
- Explore multimodal models (totalVI, MultiVI)
- Learn hyperparameter tuning with
scvi.autotune - Practice batch correction with complex datasets
- Implement custom model classes
- Work with spatial transcriptomics data
Common Tasks:
- Setting up covariates and batch effects
- Choosing appropriate model parameters
- Evaluating model quality and convergence
- Integrating with existing Scanpy workflows
For Advanced Users
Advanced Features:
- Build custom model architectures
- Implement new modules and training plans
- Use Pyro-based models for Bayesian analysis
- Develop specialized analysis pipelines
- Contribute to scvi-tools development
Expert Resources:
- Developer API documentation in
references/api_reference.md - Advanced tutorials for specialized applications
- Model architecture and extension guides
Common Workflows
Standard scVI Analysis
# 1. Data preparation
sc.pp.filter_genes(adata, min_counts=3)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
# 2. Setup with scvi-tools
scvi.model.SCVI.setup_anndata(adata, batch_key="batch")
# 3. Model training
model = scvi.model.SCVI(adata)
model.train()
# 4. Downstream analysis
adata.obsm["X_scVI"] = model.get_latent_representation()
sc.pp.neighbors(adata, use_rep="X_scVI")
sc.tl.umap(adata)
Quality Control and Troubleshooting
Model Training Issues:
- Check data preprocessing (use raw counts, not normalized)
- Verify
setup_anndata()was called correctly - Monitor training loss and convergence
- Adjust learning rate and architectural parameters
Data Integration Problems:
- Ensure proper batch key registration
- Check for sufficient shared features across batches
- Consider using scANVI for semi-supervised integration
- Validate integration quality with biological markers
Resources
Installation and Environment
- Virtual environment setup recommended
- GPU support available for CUDA and Apple Silicon
- Optional dependencies for specialized features
Community and Support
- Official scvi-tools documentation
- GitHub repository for issues and contributions
- Community forums and discussion boards
- Tutorial notebooks and examples
Performance Optimization
- GPU acceleration for model training
- Memory-efficient data loading
- Distributed training options
- Hyperparameter tuning best practices
Notes
- This skill covers scvi-tools v1.3+ features
- Always use raw count data as input to scvi-tools models
setup_anndata()must be called before model initialization- Models train faster with GPU acceleration when available
- Integration with Scanpy provides seamless downstream analysis workflows
Related Skills
Xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Analyzing Financial Statements
This skill calculates key financial ratios and metrics from financial statement data for investment analysis
Data Storytelling
Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.
Kpi Dashboard Design
Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Sql Optimization Patterns
Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
Xlsx
Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.
