Seuratclusterstats
by pwwang
Generates comprehensive cluster statistics and visualizations for Seurat objects, including dimension reduction plots, gene expression visualizations, cluster quality metrics, and clustree diagrams. This process is essential for exploring and validating clustering results.
Skill Details
Repository Files
1 file in this skill directory
name: seuratclusterstats description: Generates comprehensive cluster statistics and visualizations for Seurat objects, including dimension reduction plots, gene expression visualizations, cluster quality metrics, and clustree diagrams. This process is essential for exploring and validating clustering results.
SeuratClusterStats Process Configuration
Purpose
Generates comprehensive cluster statistics and visualizations for Seurat objects, including dimension reduction plots, gene expression visualizations, cluster quality metrics, and clustree diagrams. This process is essential for exploring and validating clustering results.
When to Use
- After:
SeuratClusteringorSeuratSubClusteringprocesses - Use cases:
- Cluster quality assessment and validation
- Visualizing cluster characteristics across dimensions
- Comparing marker gene expression between clusters
- Assessing cluster stability via clustree plots
- Exploring metadata relationships with clusters
- Always enabled in immunopipe TCR and non-TCR workflows (order = -1, runs early)
Configuration Structure
Process Enablement
[SeuratClusterStats]
cache = true
Input Specification
[SeuratClusterStats.in]
srtobj = ["SeuratClustering"]
Note: srtobj accepts the output name from SeuratClustering or SeuratSubClustering.
Environment Variables
Global Settings
[SeuratClusterStats.envs]
# Mutate metadata before plotting
mutaters = {}
# Cache feature plots (time-consuming)
cache = "/tmp"
Clustree Plots
Visualize clustering resolution relationships.
[SeuratClusterStats.envs.clustrees_defaults]
prefix = true # Auto-detect clustering columns
devpars = {res = 100, width = 800, height = 500}
more_formats = []
save_code = false
Clustree cases:
[SeuratClusterStats.envs.clustrees."Custom Clustree"]
prefix = "seurat_clusters"
devpars = {height = 600}
Cluster Statistics (stats)
Cell count and fraction plots across clusters.
[SeuratClusterStats.envs.stats_defaults]
subset = ""
devpars = {res = 100, height = 600, width = 800}
descr = ""
more_formats = []
save_code = false
save_data = false
Plot types for stats (via scplotter::CellStatPlot):
bar- Bar chartcircos- Circos plot (chord diagram)pie- Single pie chartring/donut- Ring/donut charttrend- Trend plotarea- Area plotsankey/alluvial- Sankey/alluvial diagramheatmap- Heatmapradar- Radar plotspider- Spider plotviolin- Violin plotbox- Box plot
Default cases:
[SeuratClusterStats.envs.stats]
"Number of cells in each cluster (Bar Chart)" = {plot_type = "bar", x_text_angle = 90}
"Number of cells in each cluster by Sample (Bar Chart)" = {plot_type = "bar", group_by = "Sample", x_text_angle = 90}
Custom stat example:
[SeuratClusterStats.envs.stats."Cells by Diagnosis"]
plot_type = "bar"
group_by = "Diagnosis"
frac = "group" # Options: "none", "group", "ident", "cluster", "all"
x_text_angle = 90
swap = true
position = "stack"
Gene Count Visualization (ngenes)
Number of genes detected per cell.
[SeuratClusterStats.envs.ngenes_defaults]
more_formats = []
subset = ""
devpars = {res = 100, height = 800, width = 1000}
Default case:
[SeuratClusterStats.envs.ngenes]
"Number of genes detected in each cluster" = {}
Feature Visualization (features)
Gene expression and metadata column plots.
[SeuratClusterStats.envs.features_defaults]
# Feature specification (multiple formats)
features = ["CD3D", "CD4", "CD8A"] # OR
# features = "file://path/to/genes.txt" # OR
# features = 10 # Top N variant features
# Cluster ordering
order_by = "desc(mean(Expression, na.rm = TRUE))" # OR
# order_by = ["c1", "c2", "c3"] # Literal order
subset = ""
devpars = {res = 100}
descr = ""
more_formats = []
save_code = false
save_data = false
Feature plot types (via scplotter::FeatureStatPlot):
violin- Violin plotbox- Box plotbar- Bar plotridge- Ridge plotdim- Dimension reduction plotcor- Correlation plotheatmap- Heatmapdot- Dot plot (heatmap shortcut)
Common feature parameters:
plot_type- Type of visualizationident- Identity column (e.g., "seurat_clusters", "Diagnosis")group_by- Group cells by metadata columnsplit_by- Split into multiple plotsfacet_by- Facet plots by metadataadd_box- Add box plot overlay (violin/ridge)add_point- Add jittered pointsadd_bg- Add background referencestack- Stack multiple featuresflip- Flip plot orientationcomparisons- Add statistical comparisons
Dimension Reduction Plots (dimplots)
UMAP/tSNE/PCA visualizations.
[SeuratClusterStats.envs.dimplots_defaults]
group_by = null
split_by = null
subset = ""
devpars = {res = 100}
reduction = "dim" # Options: "dim", "auto", "umap", "tsne", "pca"
Reduction options:
dim- Auto-detect: UMAP → tSNE → PCA (uses sub_umap for subclusters)auto- Same asdimumap- Force UMAPtsne- Force tSNEpca- Force PCA
Common dimplot parameters:
label- Add cluster labelslabel_size- Label font sizelabel_repel- Repel overlapping labelsadd_mark- Add cluster boundaries (options: hull, ellipse, rect, circle)mark_alpha- Mark transparencymark_linetype- Mark line typehex- Use hexagonal binninghex_bins- Number of hex binsstat_by- Add statistics by metadatastat_plot_type- pie, ring, bar, linestat_plot_size- Size of stat plotfacet_by- Facet by metadatahighlight- Highlight specific cells
Default cases:
[SeuratClusterStats.envs.dimplots]
"Dimensional reduction plot" = {label = true}
"VDJ Presence" = {group_by = "VDJ_Presence"} # Only if TCR data present
External References
Plotthis Plot Types
Dimension Reduction:
DimPlot: UMAP/tSNE/PCA visualizationdims- Dimensions to plot (default: 1:2)pt_size- Point sizealpha- Point transparencylabel- Add cluster labelshighlight- Highlight cellsadd_density- Add density layerhex- Hexagonal binning
Statistical Plots:
-
ViolinPlot: Distribution with densityadd_box- Add box overlayadd_point- Add pointsadd_trend- Add trend lineflip- Horizontal orientation
-
BoxPlot: Box and whisker plotsadd_jitter- Add jittered pointsadd_violin- Add violin overlay
-
BarPlot: Bar chartsposition- "stack", "dodge", "fill"x_text_angle- X-axis text rotationswap- Swap x and fill aesthetics
-
RidgePlot: Ridge (joy) plotsflip- Horizontal orientation
Heatmaps:
Heatmap: Gene expression heatmapscell_type- "tile", "dot", "violin", "boxplot", "bar", "pie"cluster_rows- Cluster rowscluster_columns- Cluster columnsrows_split_by- Split rows by metadatacolumns_split_by- Split columns by metadataflip- Transpose heatmappalette- Color palette (e.g., "viridis", "YlOrRd", "Spectral")column_annotation- Add column annotations (list of column names)column_annotation_type- Annotation types (simple, violin, pie, ring, bar)dot_size- Function for dot size (e.g., function(x) sum(x > 0) / length(x))dot_size_name- Legend name for dot sizeadd_reticle- Add grid linesadd_bg- Add background
Advanced Visualizations:
CircosPlot: Chord/circos diagramSankeyPlot: Sankey/alluvial diagramlinks_alpha- Link transparencygroup_by- Node columns (list for multiple nodes)
Device Parameters
Common to all plot types:
devpars = {
res = 100, # Resolution in DPI
width = 800, # Width in pixels
height = 600 # Height in pixels
}
Configuration Examples
Minimal Configuration
[SeuratClusterStats]
cache = true
[SeuratClusterStats.in]
srtobj = ["SeuratClustering"]
Standard QC Plots
[SeuratClusterStats.envs.stats."Number of cells per cluster"]
plot_type = "bar"
x_text_angle = 90
[SeuratClusterStats.envs.stats."Cells by Sample"]
plot_type = "bar"
group_by = "Sample"
x_text_angle = 90
Gene Expression Visualization
[SeuratClusterStats.envs.features_defaults]
features = ["CD3D", "CD4", "CD8A", "MS4A1", "CD14", "LYZ", "FCGR3A", "NCAM1", "KLRD1"]
[SeuratClusterStats.envs.features."T cell markers (violin)"]
plot_type = "violin"
ident = "seurat_clusters"
add_box = true
[SeuratClusterStats.envs.features."T cell markers (ridge)"]
plot_type = "ridge"
ident = "seurat_clusters"
flip = true
[SeuratClusterStats.envs.features."Marker on UMAP"]
plot_type = "dim"
feature = "CD4"
highlight = "seurat_clusters == 'c1'"
Heatmap with Annotations
[SeuratClusterStats.envs.features."Marker heatmap"]
features = {
"T cell markers" = ["CD3D", "CD4", "CD8A"],
"B cell markers" = ["MS4A1"],
"Monocyte markers" = ["CD14", "LYZ", "FCGR3A"],
"NK cell markers" = ["NCAM1", "KLRD1"]
}
plot_type = "heatmap"
ident = "Diagnosis"
columns_split_by = "seurat_clusters"
name = "Expression"
devpars = {height = 560}
cell_type = "dot"
dot_size = "nanmean"
dot_size_name = "Percent Expressed"
column_annotation = ["percent.mt", "VDJ_Presence"]
column_annotation_type = {percent.mt = "violin", VDJ_Presence = "pie"}
devpars = {width = 1400, height = 900}
Advanced Dimplot
[SeuratClusterStats.envs.dimplots."UMAP with labels"]
label = true
[SeuratClusterStats.envs.dimplots."UMAP with marks"]
add_mark = true
mark_linetype = 2
[SeuratClusterStats.envs.dimplots."UMAP by Diagnosis"]
facet_by = "Diagnosis"
highlight = true
theme = "theme_blank"
[SeuratClusterStats.envs.dimplots."UMAP with hex bins"]
hex = true
hex_bins = 50
[SeuratClusterStats.envs.dimplots."UMAP with stat"]
stat_by = "Diagnosis"
stat_plot_type = "ring"
stat_plot_size = 0.15
Common Patterns
Pattern 1: Basic UMAP Visualization
[SeuratClusterStats.envs.dimplots."Basic UMAP"]
label = true
reduction = "umap"
Pattern 2: QC Metrics per Cluster
[SeuratClusterStats.envs.ngenes."Genes per cluster"]
plot_type = "violin"
add_box = true
add_point = true
[SeuratClusterStats.envs.stats."QC stats"]
plot_type = "bar"
group_by = "percent.mt_bin"
x_text_angle = 90
Pattern 3: Custom Feature Plots
# From file
[SeuratClusterStats.envs.features_defaults]
features = "file://path/to/custom_markers.txt"
[SeuratClusterStats.envs.features."Custom markers"]
plot_type = "violin"
ident = "seurat_clusters"
comparisons = true
sig_label = "p.signif"
Pattern 4: Cluster Comparison Sankey
[SeuratClusterStats.envs.stats."Cluster flow by condition"]
plot_type = "sankey"
group_by = ["seurat_clusters", "Diagnosis"]
links_alpha = 0.6
devpars = {width = 800}
Pattern 5: Subclustering Visualization
[SeuratClusterStats.envs.dimplots."Subcluster UMAP"]
group_by = "sub_clusters"
reduction = "umap" # Uses sub_umap_<ident> automatically
label = true
Dependencies
- Upstream:
SeuratClustering,SeuratSubClustering(viaCombinedInput) - Downstream: None (terminal visualization process)
- Data: Seurat object with cluster assignments and optional subclustering
Validation Rules
- Feature names: Must match gene symbols or metadata columns in Seurat object
- Reduction names: Must exist in Seurat object (umap, tsne, pca, or sub_umap_)
- Plot types: Must be valid plotthis plot types
- Metadata columns: Must exist in
@meta.dataslot - Device parameters: Positive integers required for width/height/res
Troubleshooting
Plot generation errors
- "Feature not found": Check gene symbols match case sensitivity (human: UPPERCASE, mouse: TitleCase)
- "Reduction not found": Verify reduction name in
Reducuctions(srtobj)object - Empty plots: Check if
subsetexpression filters out all cells - Slow rendering: Use
cache = truefor feature plots, reducehex_binsor downsample
Visual quality issues
- Overcrowded labels: Use
label_repel = trueor reduce number of clusters - Poor color contrast: Set custom
paletteparameter - Incorrect orientation: Use
flip = trueto transpose plot - Missing annotations: Verify
column_annotationcolumns exist in metadata
Missing subcluster UMAP
- If subclustering exists but
sub_umap_<ident>not found, process uses standard UMAP - To force subcluster visualization: Run
RunUMAP()on subcluster level or specifyreduction = "umap"
Large dataset performance
- Enable
hex = truefor dimplots with >10,000 cells - Use
downsampleparameter in feature plots - Set
cache = trueto avoid re-rendering expensive plots
Output Structure
<srtobj_stem>.cluster_stats/
├── clustrees/ # Clustree plots (png + pdf)
├── stats/ # Cell count/statistics plots
├── ngenes/ # Gene count plots
├── features/ # Gene expression visualizations
└── dimplots/ # Dimension reduction plots
Each subdirectory contains plots for each configured case in the process environment.
Related Skills
Attack Tree Construction
Build comprehensive attack trees to visualize threat paths. Use when mapping attack scenarios, identifying defense gaps, or communicating security risks to stakeholders.
Grafana Dashboards
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.
Matplotlib
Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.
Scientific Visualization
Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.
Seaborn
Statistical visualization. Scatter, box, violin, heatmaps, pair plots, regression, correlation matrices, KDE, faceted plots, for exploratory analysis and publication figures.
Shap
Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Query Writing
For writing and executing SQL queries - from simple single-table queries to complex multi-table JOINs and aggregations
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Scientific Visualization
Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.
