Langfuse Dataset Management
by mberto10
This skill should be used when the user asks to "create dataset", "add trace to dataset", "curate regression tests", "build test set from traces", "list datasets", "show dataset items", or needs to manage Langfuse datasets for experiment validation and regression testing.
Skill Details
Repository Files
2 files in this skill directory
name: langfuse-dataset-management description: This skill should be used when the user asks to "create dataset", "add trace to dataset", "curate regression tests", "build test set from traces", "list datasets", "show dataset items", or needs to manage Langfuse datasets for experiment validation and regression testing.
Langfuse Dataset Management
Create and manage regression test datasets from production traces for validation and testing.
When to Use
- Curating failing traces into regression datasets
- Building golden test sets from high-quality examples
- Adding specific traces to existing datasets
- Listing available datasets and their items
- Preparing data for validation testing
Naming Convention
Recommended format: {project}_{purpose} or {workflow}_{purpose}
Examples:
checkout_regressions- Failing traces for checkout flowapi_v2_golden_set- High-quality verified outputsauth_edge_cases- Edge cases for authentication workflow
Operations
Create Dataset
python3 ${CLAUDE_PLUGIN_ROOT}/skills/dataset-management/helpers/dataset_manager.py \
create \
--name "checkout_regressions" \
--description "Failing traces for checkout flow issues" \
--metadata '{"project": "checkout", "purpose": "regression"}'
Add Single Trace
python3 ${CLAUDE_PLUGIN_ROOT}/skills/dataset-management/helpers/dataset_manager.py \
add-trace \
--dataset "checkout_regressions" \
--trace-id abc123def456 \
--expected-score 9.0
Add with Custom Expected Output
python3 ${CLAUDE_PLUGIN_ROOT}/skills/dataset-management/helpers/dataset_manager.py \
add-trace \
--dataset "checkout_regressions" \
--trace-id abc123def456 \
--expected-output '{"min_score": 9.0, "required_fields": ["summary", "recommendations"]}'
Add Multiple Traces (Batch)
# Create file with trace IDs (one per line)
echo "trace_id_1
trace_id_2
trace_id_3" > failing_traces.txt
python3 ${CLAUDE_PLUGIN_ROOT}/skills/dataset-management/helpers/dataset_manager.py \
add-batch \
--dataset "checkout_regressions" \
--trace-file failing_traces.txt \
--expected-score 9.0
List All Datasets
python3 ${CLAUDE_PLUGIN_ROOT}/skills/dataset-management/helpers/dataset_manager.py list
Get Dataset Items
python3 ${CLAUDE_PLUGIN_ROOT}/skills/dataset-management/helpers/dataset_manager.py \
get \
--name "checkout_regressions"
Python SDK Note
When using the Langfuse Python SDK directly (not via CLI), use the correct method for adding items:
from langfuse import Langfuse
lf = Langfuse()
# Correct: use lf.create_dataset_item()
lf.create_dataset_item(
dataset_name="checkout_regressions",
input={"query": "example input"},
expected_output={"min_score": 9.0},
metadata={"source_trace_id": "abc123"}
)
# Incorrect: dataset.create_item() does not exist in the SDK
# dataset = lf.get_dataset("checkout_regressions")
# dataset.create_item(...) # ← This will fail!
Key difference: The SDK method is lf.create_dataset_item() with dataset_name as a parameter, not dataset.create_item() on a dataset object.
Dataset Item Structure
When adding a trace to a dataset, the tool extracts:
Input (from trace): The trace's input data merged with its metadata. All fields from the original trace are preserved.
Expected Output (from arguments):
{
"min_score": 9.0
}
Or custom expectations:
{
"min_score": 8.5,
"required_fields": ["summary", "recommendations"]
}
Metadata (automatic):
{
"source_trace_id": "abc123",
"added_date": "2025-12-19",
"original_score": 6.2
}
Common Workflows
Workflow 1: Create Regression Dataset from Failing Traces
- Find failing traces (using data-retrieval skill):
python3 ${CLAUDE_PLUGIN_ROOT}/skills/data-retrieval/helpers/trace_retriever.py \
--last 20 --max-score 7.0 --mode minimal
- Create dataset:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/dataset-management/helpers/dataset_manager.py \
create \
--name "checkout_regressions" \
--description "Failing traces for checkout fixes"
-
Extract trace IDs (from step 1 output) and save to file
-
Add traces to dataset:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/dataset-management/helpers/dataset_manager.py \
add-batch \
--dataset "checkout_regressions" \
--trace-file failing_ids.txt \
--expected-score 9.0
Workflow 2: Build Golden Test Set
- Find high-quality traces:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/data-retrieval/helpers/trace_retriever.py \
--last 10 --min-score 9.0 --mode minimal
- Create golden set dataset:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/dataset-management/helpers/dataset_manager.py \
create \
--name "api_golden_set" \
--description "Verified high-quality outputs for baseline"
- Add traces:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/dataset-management/helpers/dataset_manager.py \
add-batch \
--dataset "api_golden_set" \
--trace-file golden_ids.txt \
--expected-score 9.0
Workflow 3: Add Specific Failing Trace
When you identify a specific failure during investigation:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/dataset-management/helpers/dataset_manager.py \
add-trace \
--dataset "checkout_regressions" \
--trace-id problematic_trace_id_here \
--expected-score 9.0 \
--failure-reason "Payment processing timeout"
Required Environment Variables
Same as data-retrieval skill:
LANGFUSE_PUBLIC_KEY=pk-... # Required
LANGFUSE_SECRET_KEY=sk-... # Required
LANGFUSE_HOST=https://cloud.langfuse.com # Optional
Troubleshooting
Dataset already exists:
- Use a different name or delete the existing dataset from Langfuse UI
Trace not found:
- Verify trace ID is correct
- Check that trace is within the retention period
Rate limiting:
- When adding many traces, the tool may hit API rate limits
- Consider adding traces in smaller batches
Related Skills
Xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Analyzing Financial Statements
This skill calculates key financial ratios and metrics from financial statement data for investment analysis
Data Storytelling
Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.
Kpi Dashboard Design
Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Sql Optimization Patterns
Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
Xlsx
Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.
