Langfuse Score Analytics
by mberto10
This skill should be used when the user asks to "analyze scores", "show score trends", "detect score regressions", "compare scores across releases", "get score statistics", or needs to understand score distributions and quality metrics over time.
Skill Details
Repository Files
2 files in this skill directory
name: langfuse-score-analytics description: This skill should be used when the user asks to "analyze scores", "show score trends", "detect score regressions", "compare scores across releases", "get score statistics", or needs to understand score distributions and quality metrics over time.
Langfuse Score Analytics
Analyze score trends, detect regressions, and understand score distributions across your Langfuse project.
When to Use
- Analyzing score statistics (mean, min, max, percentiles)
- Tracking score trends over time
- Comparing scores across releases, environments, or trace names
- Detecting quality regressions between time periods
- Understanding score value distributions
Operations
Score Summary
Get aggregate statistics for a score:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
summary --score-name "accuracy" --days 30
Returns: count, mean, min, max, p50, p95
Score Trend
Show score values over time with configurable granularity:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
trend --score-name "accuracy" --days 14 --granularity day
Granularity options: hour, day, week, month
Compare by Dimension
Compare scores across different dimensions:
# Compare across releases
python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
compare --score-name "accuracy" --dimension release --days 7
# Compare across environments
python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
compare --score-name "accuracy" --dimension environment --days 7
# Compare across trace names
python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
compare --score-name "accuracy" --dimension name --days 7
Regression Detection
Compare scores between two time periods to detect regressions:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
regression \
--score-name "accuracy" \
--baseline-days 14 \
--current-days 7
Compares the last N days (current) against the previous N days (baseline).
Score Distribution
Show the distribution of score values:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
distribution --score-name "accuracy" --days 30 --bins 10
List Available Scores
See all score names in your project:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
list-scores --days 30
Examples
Example 1: Weekly Quality Report
# Get summary of key scores
python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
summary --score-name "quality" --days 7
# Check for regressions
python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
regression --score-name "quality" --baseline-days 14 --current-days 7
Example 2: Release Comparison
# Compare accuracy across releases
python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
compare --score-name "accuracy" --dimension release --days 30
Example 3: Trend Analysis
# Daily trend for the past 2 weeks
python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
trend --score-name "helpfulness" --days 14 --granularity day
# Hourly trend for the past day
python3 ${CLAUDE_PLUGIN_ROOT}/skills/score-analytics/helpers/score_analyzer.py \
trend --score-name "latency" --days 1 --granularity hour
Required Environment Variables
LANGFUSE_PUBLIC_KEY=pk-... # Required
LANGFUSE_SECRET_KEY=sk-... # Required
LANGFUSE_HOST=https://cloud.langfuse.com # Optional
Troubleshooting
No data returned:
- Verify scores exist with the given name using
list-scores - Check that the time range contains data
- Confirm environment variables are set correctly
Unexpected values:
- Scores are aggregated server-side; outliers can affect means
- Use distribution to understand value spread
- Consider filtering by dimension for more specific analysis
Regression not detected:
- Ensure baseline and current periods don't overlap
- Check that both periods have sufficient data
- Consider statistical significance of the change
Related Skills
Attack Tree Construction
Build comprehensive attack trees to visualize threat paths. Use when mapping attack scenarios, identifying defense gaps, or communicating security risks to stakeholders.
Grafana Dashboards
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.
Matplotlib
Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.
Scientific Visualization
Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.
Seaborn
Statistical visualization. Scatter, box, violin, heatmaps, pair plots, regression, correlation matrices, KDE, faceted plots, for exploratory analysis and publication figures.
Shap
Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Query Writing
For writing and executing SQL queries - from simple single-table queries to complex multi-table JOINs and aggregations
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Scientific Visualization
Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.
