Bulk Rna Seq Batch Correction With Combat

by Starlitnightly

skill

Use omicverse's pyComBat wrapper to remove batch effects from merged bulk RNA-seq or microarray cohorts, export corrected matrices, and benchmark pre/post correction visualisations.

Skill Details

Repository Files

2 files in this skill directory


name: bulk-rna-seq-batch-correction-with-combat title: Bulk RNA-seq batch correction with ComBat description: Use omicverse's pyComBat wrapper to remove batch effects from merged bulk RNA-seq or microarray cohorts, export corrected matrices, and benchmark pre/post correction visualisations.

Bulk RNA-seq batch correction with ComBat

Overview

Apply this skill when a user has multiple bulk expression matrices measured across different batches and needs to harmonise them before downstream analysis. It follows t_bulk_combat.ipynb, w hich demonstrates the pyComBat workflow on ovarian cancer microarray cohorts.

Instructions

  1. Import core libraries
    • Load omicverse as ov, anndata, pandas as pd, and matplotlib.pyplot as plt.
    • Call ov.ov_plot_set() (aliased ov.plot_set() in some releases) to align figures with omicverse styling.
  2. Load each batch separately
    • Read the prepared pickled matrices (or user-provided expression tables) with pd.read_pickle(...)/pd.read_csv(...).
    • Transpose to gene × sample before wrapping them in anndata.AnnData objects so adata.obs stores sample metadata.
    • Assign a batch column for every cohort (adata.obs['batch'] = '1', '2', ...). Encourage descriptive labels when availa ble.
  3. Concatenate on shared genes
    • Use anndata.concat([adata1, adata2, adata3], merge='same') to retain the intersection of genes across batches.
    • Confirm the combined adata reports balanced sample counts per batch; if not, prompt users to re-check inputs.
  4. Run ComBat batch correction
    • Execute ov.bulk.batch_correction(adata, batch_key='batch').
    • Explain that corrected values are stored in adata.layers['batch_correction'] while the original counts remain in adata.X.
  5. Export corrected and raw matrices
    • Obtain DataFrames via adata.to_df().T (raw) and adata.to_df(layer='batch_correction').T (corrected).
    • Encourage saving both tables (.to_csv(...)) plus the harmonised AnnData (adata.write_h5ad('adata_batch.h5ad', compressio n='gzip')).
  6. Benchmark the correction
    • For per-sample variance checks, draw before/after boxplots and recolour boxes using ov.utils.red_color, blue_color, gree n_color palettes to match batches.
    • Copy raw counts to a named layer with adata.layers['raw'] = adata.X.copy() before PCA.
    • Run ov.pp.pca(adata, layer='raw', n_pcs=50) and ov.pp.pca(adata, layer='batch_correction', n_pcs=50).
    • Visualise embeddings with ov.utils.embedding(..., basis='raw|original|X_pca', color='batch', frameon='small') and repeat fo r the corrected layer to verify mixing.
  7. Troubleshooting tips
    • Mismatched gene identifiers cause dropped features—remind users to harmonise feature names (e.g., gene symbols) before conca tenation.
    • pyComBat expects log-scale intensities or similarly distributed counts; recommend log-transforming strongly skewed matrices.
    • If batch_correction layer is missing, ensure the batch_key matches the column name in adata.obs.

Examples

  • "Combine three GEO ovarian cohorts, run ComBat, and export both the raw and corrected CSV matrices."
  • "Plot PCA embeddings before and after batch correction to confirm that batches 1–3 overlap."
  • "Save the harmonised AnnData file so I can reload it later for downstream DEG analysis."

References

Related Skills

Attack Tree Construction

Build comprehensive attack trees to visualize threat paths. Use when mapping attack scenarios, identifying defense gaps, or communicating security risks to stakeholders.

skill

Grafana Dashboards

Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.

skill

Matplotlib

Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.

skill

Scientific Visualization

Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.

skill

Seaborn

Statistical visualization. Scatter, box, violin, heatmaps, pair plots, regression, correlation matrices, KDE, faceted plots, for exploratory analysis and publication figures.

skill

Shap

Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model

skill

Pydeseq2

Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.

skill

Query Writing

For writing and executing SQL queries - from simple single-table queries to complex multi-table JOINs and aggregations

skill

Pydeseq2

Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.

skill

Scientific Visualization

Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.

skill

Skill Information

Category:Skill
Last Updated:10/27/2025