Langfuse Score Analytics

by mberto10

skill

This skill should be used when the user asks to "analyze scores", "show score trends", "detect score regressions", "compare scores across releases", "get score statistics", or needs to understand score distributions and quality metrics over time.

Skill Details

Repository Files

2 files in this skill directory


name: langfuse-score-analytics description: This skill should be used when the user asks to "analyze scores", "show score trends", "detect score regressions", "compare scores across releases", "get score statistics", or needs to understand score distributions and quality metrics over time.

Langfuse Score Analytics

Analyze score trends, detect regressions, and understand score distributions across your Langfuse project.

When to Use

  • Analyzing score statistics (mean, min, max, percentiles)
  • Tracking score trends over time
  • Comparing scores across releases, environments, or trace names
  • Detecting quality regressions between time periods
  • Understanding score value distributions

Operations

Score Summary

Get aggregate statistics for a score:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
  summary --score-name "accuracy" --days 30

Returns: count, mean, min, max, p50, p95

Score Trend

Show score values over time with configurable granularity:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
  trend --score-name "accuracy" --days 14 --granularity day

Granularity options: hour, day, week, month

Compare by Dimension

Compare scores across different dimensions:

# Compare across releases
python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
  compare --score-name "accuracy" --dimension release --days 7

# Compare across environments
python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
  compare --score-name "accuracy" --dimension environment --days 7

# Compare across trace names
python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
  compare --score-name "accuracy" --dimension name --days 7

Regression Detection

Compare scores between two time periods to detect regressions:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
  regression \
  --score-name "accuracy" \
  --baseline-days 14 \
  --current-days 7

Compares the last N days (current) against the previous N days (baseline).

Score Distribution

Show the distribution of score values:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
  distribution --score-name "accuracy" --days 30 --bins 10

List Available Scores

See all score names in your project:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
  list-scores --days 30

Examples

Example 1: Weekly Quality Report

# Get summary of key scores
python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
  summary --score-name "quality" --days 7

# Check for regressions
python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
  regression --score-name "quality" --baseline-days 14 --current-days 7

Example 2: Release Comparison

# Compare accuracy across releases
python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
  compare --score-name "accuracy" --dimension release --days 30

Example 3: Trend Analysis

# Daily trend for the past 2 weeks
python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
  trend --score-name "helpfulness" --days 14 --granularity day

# Hourly trend for the past day
python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
  trend --score-name "latency" --days 1 --granularity hour

Required Environment Variables

LANGFUSE_PUBLIC_KEY=pk-...    # Required
LANGFUSE_SECRET_KEY=sk-...    # Required
LANGFUSE_HOST=https://cloud.langfuse.com  # Optional

Troubleshooting

No data returned:

  • Verify scores exist with the given name using list-scores
  • Check that the time range contains data
  • Confirm environment variables are set correctly

Unexpected values:

  • Scores are aggregated server-side; outliers can affect means
  • Use distribution to understand value spread
  • Consider filtering by dimension for more specific analysis

Regression not detected:

  • Ensure baseline and current periods don't overlap
  • Check that both periods have sufficient data
  • Consider statistical significance of the change

Related Skills

Attack Tree Construction

Build comprehensive attack trees to visualize threat paths. Use when mapping attack scenarios, identifying defense gaps, or communicating security risks to stakeholders.

skill

Grafana Dashboards

Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.

skill

Matplotlib

Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.

skill

Scientific Visualization

Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.

skill

Seaborn

Statistical visualization. Scatter, box, violin, heatmaps, pair plots, regression, correlation matrices, KDE, faceted plots, for exploratory analysis and publication figures.

skill

Shap

Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model

skill

Pydeseq2

Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.

skill

Query Writing

For writing and executing SQL queries - from simple single-table queries to complex multi-table JOINs and aggregations

skill

Pydeseq2

Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.

skill

Scientific Visualization

Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.

skill

Skill Information

Category:Skill
Last Updated:12/19/2025