Seuratclusterstats

by pwwang

skill

Generates comprehensive cluster statistics and visualizations for Seurat objects, including dimension reduction plots, gene expression visualizations, cluster quality metrics, and clustree diagrams. This process is essential for exploring and validating clustering results.

Skill Details

Repository Files

1 file in this skill directory


name: seuratclusterstats description: Generates comprehensive cluster statistics and visualizations for Seurat objects, including dimension reduction plots, gene expression visualizations, cluster quality metrics, and clustree diagrams. This process is essential for exploring and validating clustering results.

SeuratClusterStats Process Configuration

Purpose

Generates comprehensive cluster statistics and visualizations for Seurat objects, including dimension reduction plots, gene expression visualizations, cluster quality metrics, and clustree diagrams. This process is essential for exploring and validating clustering results.

When to Use

  • After: SeuratClustering or SeuratSubClustering processes
  • Use cases:
    • Cluster quality assessment and validation
    • Visualizing cluster characteristics across dimensions
    • Comparing marker gene expression between clusters
    • Assessing cluster stability via clustree plots
    • Exploring metadata relationships with clusters
  • Always enabled in immunopipe TCR and non-TCR workflows (order = -1, runs early)

Configuration Structure

Process Enablement

[SeuratClusterStats]
cache = true

Input Specification

[SeuratClusterStats.in]
srtobj = ["SeuratClustering"]

Note: srtobj accepts the output name from SeuratClustering or SeuratSubClustering.

Environment Variables

Global Settings

[SeuratClusterStats.envs]
# Mutate metadata before plotting
mutaters = {}

# Cache feature plots (time-consuming)
cache = "/tmp"

Clustree Plots

Visualize clustering resolution relationships.

[SeuratClusterStats.envs.clustrees_defaults]
prefix = true  # Auto-detect clustering columns
devpars = {res = 100, width = 800, height = 500}
more_formats = []
save_code = false

Clustree cases:

[SeuratClusterStats.envs.clustrees."Custom Clustree"]
prefix = "seurat_clusters"
devpars = {height = 600}

Cluster Statistics (stats)

Cell count and fraction plots across clusters.

[SeuratClusterStats.envs.stats_defaults]
subset = ""
devpars = {res = 100, height = 600, width = 800}
descr = ""
more_formats = []
save_code = false
save_data = false

Plot types for stats (via scplotter::CellStatPlot):

  • bar - Bar chart
  • circos - Circos plot (chord diagram)
  • pie - Single pie chart
  • ring/donut - Ring/donut chart
  • trend - Trend plot
  • area - Area plot
  • sankey/alluvial - Sankey/alluvial diagram
  • heatmap - Heatmap
  • radar - Radar plot
  • spider - Spider plot
  • violin - Violin plot
  • box - Box plot

Default cases:

[SeuratClusterStats.envs.stats]
"Number of cells in each cluster (Bar Chart)" = {plot_type = "bar", x_text_angle = 90}
"Number of cells in each cluster by Sample (Bar Chart)" = {plot_type = "bar", group_by = "Sample", x_text_angle = 90}

Custom stat example:

[SeuratClusterStats.envs.stats."Cells by Diagnosis"]
plot_type = "bar"
group_by = "Diagnosis"
frac = "group"  # Options: "none", "group", "ident", "cluster", "all"
x_text_angle = 90
swap = true
position = "stack"

Gene Count Visualization (ngenes)

Number of genes detected per cell.

[SeuratClusterStats.envs.ngenes_defaults]
more_formats = []
subset = ""
devpars = {res = 100, height = 800, width = 1000}

Default case:

[SeuratClusterStats.envs.ngenes]
"Number of genes detected in each cluster" = {}

Feature Visualization (features)

Gene expression and metadata column plots.

[SeuratClusterStats.envs.features_defaults]
# Feature specification (multiple formats)
features = ["CD3D", "CD4", "CD8A"]  # OR
# features = "file://path/to/genes.txt"  # OR
# features = 10  # Top N variant features

# Cluster ordering
order_by = "desc(mean(Expression, na.rm = TRUE))"  # OR
# order_by = ["c1", "c2", "c3"]  # Literal order

subset = ""
devpars = {res = 100}
descr = ""
more_formats = []
save_code = false
save_data = false

Feature plot types (via scplotter::FeatureStatPlot):

  • violin - Violin plot
  • box - Box plot
  • bar - Bar plot
  • ridge - Ridge plot
  • dim - Dimension reduction plot
  • cor - Correlation plot
  • heatmap - Heatmap
  • dot - Dot plot (heatmap shortcut)

Common feature parameters:

  • plot_type - Type of visualization
  • ident - Identity column (e.g., "seurat_clusters", "Diagnosis")
  • group_by - Group cells by metadata column
  • split_by - Split into multiple plots
  • facet_by - Facet plots by metadata
  • add_box - Add box plot overlay (violin/ridge)
  • add_point - Add jittered points
  • add_bg - Add background reference
  • stack - Stack multiple features
  • flip - Flip plot orientation
  • comparisons - Add statistical comparisons

Dimension Reduction Plots (dimplots)

UMAP/tSNE/PCA visualizations.

[SeuratClusterStats.envs.dimplots_defaults]
group_by = null
split_by = null
subset = ""
devpars = {res = 100}
reduction = "dim"  # Options: "dim", "auto", "umap", "tsne", "pca"

Reduction options:

  • dim - Auto-detect: UMAP → tSNE → PCA (uses sub_umap for subclusters)
  • auto - Same as dim
  • umap - Force UMAP
  • tsne - Force tSNE
  • pca - Force PCA

Common dimplot parameters:

  • label - Add cluster labels
  • label_size - Label font size
  • label_repel - Repel overlapping labels
  • add_mark - Add cluster boundaries (options: hull, ellipse, rect, circle)
  • mark_alpha - Mark transparency
  • mark_linetype - Mark line type
  • hex - Use hexagonal binning
  • hex_bins - Number of hex bins
  • stat_by - Add statistics by metadata
  • stat_plot_type - pie, ring, bar, line
  • stat_plot_size - Size of stat plot
  • facet_by - Facet by metadata
  • highlight - Highlight specific cells

Default cases:

[SeuratClusterStats.envs.dimplots]
"Dimensional reduction plot" = {label = true}
"VDJ Presence" = {group_by = "VDJ_Presence"}  # Only if TCR data present

External References

Plotthis Plot Types

Full reference

Dimension Reduction:

  • DimPlot: UMAP/tSNE/PCA visualization
    • dims - Dimensions to plot (default: 1:2)
    • pt_size - Point size
    • alpha - Point transparency
    • label - Add cluster labels
    • highlight - Highlight cells
    • add_density - Add density layer
    • hex - Hexagonal binning

Statistical Plots:

  • ViolinPlot: Distribution with density

    • add_box - Add box overlay
    • add_point - Add points
    • add_trend - Add trend line
    • flip - Horizontal orientation
  • BoxPlot: Box and whisker plots

    • add_jitter - Add jittered points
    • add_violin - Add violin overlay
  • BarPlot: Bar charts

    • position - "stack", "dodge", "fill"
    • x_text_angle - X-axis text rotation
    • swap - Swap x and fill aesthetics
  • RidgePlot: Ridge (joy) plots

    • flip - Horizontal orientation

Heatmaps:

  • Heatmap: Gene expression heatmaps
    • cell_type - "tile", "dot", "violin", "boxplot", "bar", "pie"
    • cluster_rows - Cluster rows
    • cluster_columns - Cluster columns
    • rows_split_by - Split rows by metadata
    • columns_split_by - Split columns by metadata
    • flip - Transpose heatmap
    • palette - Color palette (e.g., "viridis", "YlOrRd", "Spectral")
    • column_annotation - Add column annotations (list of column names)
    • column_annotation_type - Annotation types (simple, violin, pie, ring, bar)
    • dot_size - Function for dot size (e.g., function(x) sum(x > 0) / length(x))
    • dot_size_name - Legend name for dot size
    • add_reticle - Add grid lines
    • add_bg - Add background

Advanced Visualizations:

  • CircosPlot: Chord/circos diagram
  • SankeyPlot: Sankey/alluvial diagram
    • links_alpha - Link transparency
    • group_by - Node columns (list for multiple nodes)

Device Parameters

Common to all plot types:

devpars = {
  res = 100,      # Resolution in DPI
  width = 800,     # Width in pixels
  height = 600     # Height in pixels
}

Configuration Examples

Minimal Configuration

[SeuratClusterStats]
cache = true

[SeuratClusterStats.in]
srtobj = ["SeuratClustering"]

Standard QC Plots

[SeuratClusterStats.envs.stats."Number of cells per cluster"]
plot_type = "bar"
x_text_angle = 90

[SeuratClusterStats.envs.stats."Cells by Sample"]
plot_type = "bar"
group_by = "Sample"
x_text_angle = 90

Gene Expression Visualization

[SeuratClusterStats.envs.features_defaults]
features = ["CD3D", "CD4", "CD8A", "MS4A1", "CD14", "LYZ", "FCGR3A", "NCAM1", "KLRD1"]

[SeuratClusterStats.envs.features."T cell markers (violin)"]
plot_type = "violin"
ident = "seurat_clusters"
add_box = true

[SeuratClusterStats.envs.features."T cell markers (ridge)"]
plot_type = "ridge"
ident = "seurat_clusters"
flip = true

[SeuratClusterStats.envs.features."Marker on UMAP"]
plot_type = "dim"
feature = "CD4"
highlight = "seurat_clusters == 'c1'"

Heatmap with Annotations

[SeuratClusterStats.envs.features."Marker heatmap"]
features = {
  "T cell markers" = ["CD3D", "CD4", "CD8A"],
  "B cell markers" = ["MS4A1"],
  "Monocyte markers" = ["CD14", "LYZ", "FCGR3A"],
  "NK cell markers" = ["NCAM1", "KLRD1"]
}
plot_type = "heatmap"
ident = "Diagnosis"
columns_split_by = "seurat_clusters"
name = "Expression"
devpars = {height = 560}
cell_type = "dot"
dot_size = "nanmean"
dot_size_name = "Percent Expressed"
column_annotation = ["percent.mt", "VDJ_Presence"]
column_annotation_type = {percent.mt = "violin", VDJ_Presence = "pie"}
devpars = {width = 1400, height = 900}

Advanced Dimplot

[SeuratClusterStats.envs.dimplots."UMAP with labels"]
label = true

[SeuratClusterStats.envs.dimplots."UMAP with marks"]
add_mark = true
mark_linetype = 2

[SeuratClusterStats.envs.dimplots."UMAP by Diagnosis"]
facet_by = "Diagnosis"
highlight = true
theme = "theme_blank"

[SeuratClusterStats.envs.dimplots."UMAP with hex bins"]
hex = true
hex_bins = 50

[SeuratClusterStats.envs.dimplots."UMAP with stat"]
stat_by = "Diagnosis"
stat_plot_type = "ring"
stat_plot_size = 0.15

Common Patterns

Pattern 1: Basic UMAP Visualization

[SeuratClusterStats.envs.dimplots."Basic UMAP"]
label = true
reduction = "umap"

Pattern 2: QC Metrics per Cluster

[SeuratClusterStats.envs.ngenes."Genes per cluster"]
plot_type = "violin"
add_box = true
add_point = true

[SeuratClusterStats.envs.stats."QC stats"]
plot_type = "bar"
group_by = "percent.mt_bin"
x_text_angle = 90

Pattern 3: Custom Feature Plots

# From file
[SeuratClusterStats.envs.features_defaults]
features = "file://path/to/custom_markers.txt"

[SeuratClusterStats.envs.features."Custom markers"]
plot_type = "violin"
ident = "seurat_clusters"
comparisons = true
sig_label = "p.signif"

Pattern 4: Cluster Comparison Sankey

[SeuratClusterStats.envs.stats."Cluster flow by condition"]
plot_type = "sankey"
group_by = ["seurat_clusters", "Diagnosis"]
links_alpha = 0.6
devpars = {width = 800}

Pattern 5: Subclustering Visualization

[SeuratClusterStats.envs.dimplots."Subcluster UMAP"]
group_by = "sub_clusters"
reduction = "umap"  # Uses sub_umap_<ident> automatically
label = true

Dependencies

  • Upstream: SeuratClustering, SeuratSubClustering (via CombinedInput)
  • Downstream: None (terminal visualization process)
  • Data: Seurat object with cluster assignments and optional subclustering

Validation Rules

  • Feature names: Must match gene symbols or metadata columns in Seurat object
  • Reduction names: Must exist in Seurat object (umap, tsne, pca, or sub_umap_)
  • Plot types: Must be valid plotthis plot types
  • Metadata columns: Must exist in @meta.data slot
  • Device parameters: Positive integers required for width/height/res

Troubleshooting

Plot generation errors

  • "Feature not found": Check gene symbols match case sensitivity (human: UPPERCASE, mouse: TitleCase)
  • "Reduction not found": Verify reduction name in Reducuctions(srtobj) object
  • Empty plots: Check if subset expression filters out all cells
  • Slow rendering: Use cache = true for feature plots, reduce hex_bins or downsample

Visual quality issues

  • Overcrowded labels: Use label_repel = true or reduce number of clusters
  • Poor color contrast: Set custom palette parameter
  • Incorrect orientation: Use flip = true to transpose plot
  • Missing annotations: Verify column_annotation columns exist in metadata

Missing subcluster UMAP

  • If subclustering exists but sub_umap_<ident> not found, process uses standard UMAP
  • To force subcluster visualization: Run RunUMAP() on subcluster level or specify reduction = "umap"

Large dataset performance

  • Enable hex = true for dimplots with >10,000 cells
  • Use downsample parameter in feature plots
  • Set cache = true to avoid re-rendering expensive plots

Output Structure

<srtobj_stem>.cluster_stats/
├── clustrees/          # Clustree plots (png + pdf)
├── stats/              # Cell count/statistics plots
├── ngenes/             # Gene count plots
├── features/           # Gene expression visualizations
└── dimplots/           # Dimension reduction plots

Each subdirectory contains plots for each configured case in the process environment.

Related Skills

Attack Tree Construction

Build comprehensive attack trees to visualize threat paths. Use when mapping attack scenarios, identifying defense gaps, or communicating security risks to stakeholders.

skill

Grafana Dashboards

Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.

skill

Matplotlib

Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.

skill

Scientific Visualization

Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.

skill

Seaborn

Statistical visualization. Scatter, box, violin, heatmaps, pair plots, regression, correlation matrices, KDE, faceted plots, for exploratory analysis and publication figures.

skill

Shap

Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model

skill

Pydeseq2

Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.

skill

Query Writing

For writing and executing SQL queries - from simple single-table queries to complex multi-table JOINs and aggregations

skill

Pydeseq2

Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.

skill

Scientific Visualization

Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.

skill

Skill Information

Category:Skill
Last Updated:1/20/2026