Bulk Rna Seq Differential Expression With Omicverse
by Starlitnightly
Guide Claude through omicverse's bulk RNA-seq DEG pipeline, from gene ID mapping and DESeq2 normalization to statistical testing, visualization, and pathway enrichment. Use when a user has bulk count matrices and needs differential expression analysis in omicverse.
Skill Details
Repository Files
2 files in this skill directory
name: bulk-rna-seq-differential-expression-with-omicverse title: Bulk RNA-seq differential expression with omicverse description: Guide Claude through omicverse's bulk RNA-seq DEG pipeline, from gene ID mapping and DESeq2 normalization to statistical testing, visualization, and pathway enrichment. Use when a user has bulk count matrices and needs differential expression analysis in omicverse.
Bulk RNA-seq differential expression with omicverse
Overview
Follow this skill to run the end-to-end differential expression (DEG) workflow showcased in t_deg.ipynb. It assumes the user provides a raw gene-level count matrix (e.g., from featureCounts) and wants to analyse bulk RNA-seq cohorts inside omicverse.
Instructions
- Set up the session
- Import
omicverse as ov,scanpy as sc, andmatplotlib.pyplot as plt. - Call
ov.plot_set()so downstream plots adopt omicverse styling.
- Import
- Prepare ID mapping assets
- When gene IDs must be converted to gene symbols, instruct the user to download mapping pairs via
ov.utils.download_geneid_annotation_pair()and store them undergenesets/. - Mention the available prebuilt genomes (T2T-CHM13, GRCh38, GRCh37, GRCm39, danRer7, danRer11) and that users can generate their own mapping from GTF files if needed.
- When gene IDs must be converted to gene symbols, instruct the user to download mapping pairs via
- Load the raw counts
- Read tab-delimited featureCounts output with
ov.pd.read_csv(..., sep='\t', header=1, index_col=0). - Strip trailing
.bamsegments from column names using list comprehension so sample IDs are clean.
- Read tab-delimited featureCounts output with
- Map gene identifiers
- Run
ov.bulk.Matrix_ID_mapping(counts_df, 'genesets/pair_<GENOME>.tsv')to replacegene_identries with gene symbols.
- Run
- Initialise the DEG object
- Create
dds = ov.bulk.pyDEG(mapped_counts). - Handle duplicate gene symbols with
dds.drop_duplicates_index()to keep the highest expressed version.
- Create
- Normalise and estimate size factors
- Execute
dds.normalize()to calculate DESeq2 size factors, correcting for library size and batch differences.
- Execute
- Run differential testing
- Collect treatment and control replicate labels into lists.
- Call
dds.deg_analysis(treatment_groups, control_groups, method='ttest')for the default Welch t-test. - Offer optional alternatives:
method='edgepy'for edgeR-like tests andmethod='limma'for limma-style modelling.
- Filter and threshold results
- Note that lowly expressed genes are retained by default; filter using
dds.result.loc[dds.result['log2(BaseMean)'] > 1]when needed. - Set dynamic fold-change and significance cutoffs via
dds.foldchange_set(fc_threshold=-1, pval_threshold=0.05, logp_max=6)(fc_threshold=-1auto-selects based on log2FC distribution).
- Note that lowly expressed genes are retained by default; filter using
- Visualise differential expression
- Produce volcano plots with
dds.plot_volcano(title=..., figsize=..., plot_genes=... or plot_genes_num=...)to highlight key genes. - Generate per-gene boxplots using
dds.plot_boxplot(genes=[...], treatment_groups=..., control_groups=..., figsize=..., legend_bbox=...); adjust y-axis tick labels if required.
- Produce volcano plots with
- Perform pathway enrichment (optional)
- Download curated pathway libraries through
ov.utils.download_pathway_database(). - Load genesets with
ov.utils.geneset_prepare(<path>, organism='Mouse'|'Human'|...). - Build the DEG gene list from
dds.result.loc[dds.result['sig'] != 'normal'].index. - Run enrichment with
ov.bulk.geneset_enrichment(gene_list=deg_genes, pathways_dict=..., pvalue_type='auto', organism=...). Encourage users without internet access to provide abackgroundgene list. - Visualise single-library results via
ov.bulk.geneset_plot(...)and combine multiple ontologies usingov.bulk.geneset_plot_multi(enr_dict, colors_dict, num=...).
- Download curated pathway libraries through
- Document outputs
- Suggest exporting
dds.resultand enrichment tables to CSV for downstream reporting. - Encourage users to save figures generated by matplotlib (
plt.savefig(...)) when running outside notebooks.
- Suggest exporting
- Troubleshooting tips
- Ensure sample labels in
treatment_groups/control_groupsexactly match column names post-cleanup. - Verify required packages (
omicverse,pyComplexHeatmap,gseapy) are installed for enrichment visualisations. - Remind users that internet access is required the first time they download gene mappings or pathway databases.
- Ensure sample labels in
Examples
- "I have a featureCounts matrix for mouse tumour samples—normalize it with DESeq2, run t-test DEG, and highlight the top 8 genes in a volcano plot."
- "Use omicverse to compute edgeR-style differential expression between treated and control replicates, then run GO enrichment on significant genes."
- "Guide me through converting Ensembl IDs to symbols, performing limma DEG, and plotting boxplots for Krtap9-5 and Lef1."
References
- Detailed walkthrough notebook:
t_deg.ipynb - Sample count matrix for testing:
sample/counts.txt - Quick copy/paste commands:
reference.md
Related Skills
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Senior Data Scientist
World-class data science skill for statistical modeling, experimentation, causal inference, and advanced analytics. Expertise in Python (NumPy, Pandas, Scikit-learn), R, SQL, statistical methods, A/B testing, time series, and business intelligence. Includes experiment design, feature engineering, model evaluation, and stakeholder communication. Use when designing experiments, building predictive models, performing causal analysis, or driving data-driven decisions.
Hypogenic
Automated hypothesis generation and testing using large language models. Use this skill when generating scientific hypotheses from datasets, combining literature insights with empirical data, testing hypotheses against observational data, or conducting systematic hypothesis exploration for research discovery in domains like deception detection, AI content detection, mental health analysis, or other empirical research tasks.
Ux Researcher Designer
UX research and design toolkit for Senior UX Designer/Researcher including data-driven persona generation, journey mapping, usability testing frameworks, and research synthesis. Use for user research, persona creation, journey mapping, and design validation.
Hypogenic
Automated LLM-driven hypothesis generation and testing on tabular datasets. Use when you want to systematically explore hypotheses about patterns in empirical data (e.g., deception detection, content analysis). Combines literature insights with data-driven hypothesis testing. For manual hypothesis formulation use hypothesis-generation; for creative ideation use scientific-brainstorming.
Data Engineering Data Driven Feature
Build features guided by data insights, A/B testing, and continuous measurement using specialized agents for analysis, implementation, and experimentation.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Dashboard Design
USE THIS SKILL FIRST when user wants to create and design a dashboard, ESPECIALLY Vizro dashboards. This skill enforces a 3-step workflow (requirements, layout, visualization) that must be followed before implementation. For implementation and testing, use the dashboard-build skill after completing Steps 1-3.
Ux Researcher Designer
UX research and design toolkit for Senior UX Designer/Researcher including data-driven persona generation, journey mapping, usability testing frameworks, and research synthesis. Use for user research, persona creation, journey mapping, and design validation.
Performance Testing
Benchmark indicator performance with BenchmarkDotNet. Use for Series/Buffer/Stream benchmarks, regression detection, and optimization patterns. Target 1.5x Series for StreamHub, 1.2x for BufferList.
