Bio Metagenomics Visualization
by GPTomics
Visualize metagenomic profiles using R (phyloseq, microbiome) and Python (matplotlib, seaborn). Create stacked bar plots, heatmaps, PCA plots, and diversity analyses. Use when creating publication-quality figures from MetaPhlAn, Bracken, or other taxonomic profiling output.
Skill Details
Repository Files
4 files in this skill directory
name: bio-metagenomics-visualization description: Visualize metagenomic profiles using R (phyloseq, microbiome) and Python (matplotlib, seaborn). Create stacked bar plots, heatmaps, PCA plots, and diversity analyses. Use when creating publication-quality figures from MetaPhlAn, Bracken, or other taxonomic profiling output. tool_type: mixed primary_tool: phyloseq
Metagenome Visualization
Python - Stacked Bar Plot
import pandas as pd
import matplotlib.pyplot as plt
abundance = pd.read_csv('merged_abundance.txt', sep='\t', index_col=0)
abundance = abundance[abundance.index.str.contains('s__')]
abundance.index = abundance.index.str.split('|').str[-1].str.replace('s__', '')
top_n = 10
top_species = abundance.sum(axis=1).nlargest(top_n).index
abundance_top = abundance.loc[top_species]
abundance_top.loc['Other'] = abundance.drop(top_species).sum()
abundance_top.T.plot(kind='bar', stacked=True, figsize=(12, 6), colormap='tab20')
plt.xlabel('Sample')
plt.ylabel('Relative Abundance (%)')
plt.title('Species Composition')
plt.legend(bbox_to_anchor=(1.02, 1), loc='upper left')
plt.tight_layout()
plt.savefig('stacked_bar.png', dpi=300)
Python - Heatmap
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
abundance = pd.read_csv('merged_abundance.txt', sep='\t', index_col=0)
abundance = abundance[abundance.index.str.contains('s__')]
abundance.index = abundance.index.str.split('|').str[-1].str.replace('s__', '')
top_species = abundance.sum(axis=1).nlargest(20).index
abundance_top = abundance.loc[top_species]
plt.figure(figsize=(12, 10))
sns.heatmap(abundance_top, cmap='YlOrRd', annot=False, cbar_kws={'label': 'Abundance (%)'})
plt.xlabel('Sample')
plt.ylabel('Species')
plt.title('Species Abundance Heatmap')
plt.tight_layout()
plt.savefig('heatmap.png', dpi=300)
Python - PCA
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
abundance = pd.read_csv('merged_abundance.txt', sep='\t', index_col=0).T
scaler = StandardScaler()
abundance_scaled = scaler.fit_transform(abundance)
pca = PCA(n_components=2)
pca_result = pca.fit_transform(abundance_scaled)
plt.figure(figsize=(8, 6))
plt.scatter(pca_result[:, 0], pca_result[:, 1])
for i, sample in enumerate(abundance.index):
plt.annotate(sample, (pca_result[i, 0], pca_result[i, 1]))
plt.xlabel(f'PC1 ({pca.explained_variance_ratio_[0]*100:.1f}%)')
plt.ylabel(f'PC2 ({pca.explained_variance_ratio_[1]*100:.1f}%)')
plt.title('PCA of Sample Composition')
plt.savefig('pca.png', dpi=300)
R - phyloseq Setup
library(phyloseq)
library(ggplot2)
library(vegan)
# From MetaPhlAn merged table
abundance <- read.table('merged_abundance.txt', sep = '\t', header = TRUE, row.names = 1)
# Filter to species level
species <- abundance[grepl('s__', rownames(abundance)), ]
rownames(species) <- sapply(strsplit(rownames(species), '\\|'), tail, 1)
rownames(species) <- gsub('s__', '', rownames(species))
# Create phyloseq object
otu <- otu_table(as.matrix(species), taxa_are_rows = TRUE)
# Sample metadata (create or load)
sample_data <- data.frame(
Sample = colnames(species),
Group = c('Control', 'Control', 'Treatment', 'Treatment'),
row.names = colnames(species)
)
samp <- sample_data(sample_data)
ps <- phyloseq(otu, samp)
R - Stacked Bar Plot
library(phyloseq)
library(ggplot2)
# Top taxa
top_taxa <- names(sort(taxa_sums(ps), decreasing = TRUE))[1:10]
ps_top <- prune_taxa(top_taxa, ps)
# Stacked bar
plot_bar(ps_top, fill = 'Species') +
geom_bar(stat = 'identity', position = 'stack') +
theme_minimal() +
labs(x = 'Sample', y = 'Relative Abundance (%)') +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
R - Ordination (PCoA)
library(phyloseq)
library(ggplot2)
# Bray-Curtis distance
ord <- ordinate(ps, method = 'PCoA', distance = 'bray')
# Plot ordination
plot_ordination(ps, ord, color = 'Group') +
geom_point(size = 4) +
stat_ellipse() +
theme_minimal() +
labs(title = 'PCoA of Sample Composition')
R - Alpha Diversity
library(phyloseq)
library(ggplot2)
# Calculate diversity metrics
alpha_div <- estimate_richness(ps, measures = c('Shannon', 'Simpson', 'Observed'))
# Add metadata
alpha_div$Group <- sample_data(ps)$Group
# Plot
ggplot(alpha_div, aes(x = Group, y = Shannon, fill = Group)) +
geom_boxplot() +
geom_jitter(width = 0.1) +
theme_minimal() +
labs(title = 'Alpha Diversity by Group', y = 'Shannon Index')
R - Beta Diversity (PERMANOVA)
library(vegan)
# Get abundance matrix
abundance_matrix <- as(otu_table(ps), 'matrix')
if (taxa_are_rows(ps)) abundance_matrix <- t(abundance_matrix)
# Calculate Bray-Curtis distance
dist_bc <- vegdist(abundance_matrix, method = 'bray')
# PERMANOVA
groups <- sample_data(ps)$Group
permanova <- adonis2(dist_bc ~ groups, permutations = 999)
permanova
Krona Chart
# From Kraken2 report
ktImportTaxonomy -q 1 -t 5 kraken_report.txt -o krona_chart.html
# From MetaPhlAn
metaphlan2krona.py -p profile.txt -k krona_profile.txt
ktImportText krona_profile.txt -o krona_metaphlan.html
Key Packages
Python
| Package | Purpose |
|---|---|
| matplotlib | General plotting |
| seaborn | Statistical visualizations |
| scikit-learn | PCA, clustering |
| scipy | Statistical tests |
R
| Package | Purpose |
|---|---|
| phyloseq | Microbiome data handling |
| vegan | Community ecology |
| ggplot2 | Visualization |
| microbiome | Additional analyses |
Related Skills
- kraken-classification - Generate input data
- metaphlan-profiling - Generate input data
- abundance-estimation - Process Kraken output
Related Skills
Attack Tree Construction
Build comprehensive attack trees to visualize threat paths. Use when mapping attack scenarios, identifying defense gaps, or communicating security risks to stakeholders.
Grafana Dashboards
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.
Matplotlib
Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.
Scientific Visualization
Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.
Seaborn
Statistical visualization. Scatter, box, violin, heatmaps, pair plots, regression, correlation matrices, KDE, faceted plots, for exploratory analysis and publication figures.
Shap
Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Query Writing
For writing and executing SQL queries - from simple single-table queries to complex multi-table JOINs and aggregations
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Scientific Visualization
Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.
