Nixtla Forecast Validator

by intent-solutions-io

skill

>

Skill Details

Repository Files

5 files in this skill directory


name: nixtla-forecast-validator description: > Validates time series forecast quality metrics by comparing current performance against historical benchmarks. Detects degradation in MASE and sMAPE metrics. Activates when user mentions "validate forecast", "check forecast quality", or "assess forecast metrics". allowed-tools: "Read,Write,Bash,Glob,Grep" version: "1.0.0"

Nixtla Forecast Validator

Validates time series forecast quality metrics and detects performance degradation using statistical measures. Compares current forecast accuracy against historical benchmarks to identify significant deviations in MASE and sMAPE metrics.

Overview

This skill analyzes forecast quality by comparing current performance metrics against historical baselines. It detects significant increases in error metrics (MASE and sMAPE) that may indicate model degradation, data quality issues, or changing patterns in the time series. The skill generates comprehensive reports, alerts, and visualizations to help users identify and address forecast quality problems quickly.

Activates automatically when Claude detects forecast validation needs, or when explicitly requested with phrases like "validate forecast quality", "check model performance", or "assess forecast accuracy".

Prerequisites

Tools: Read, Write, Bash, Glob, Grep

Environment: No API keys required (operates on CSV metrics files)

Python Packages:

pip install pandas matplotlib

Required CSV Format: CSV files must contain columns: model, MASE, sMAPE

Instructions

Step 1: Prepare metrics data

Ensure you have two CSV files containing forecast metrics:

  • Historical metrics CSV (baseline performance)
  • Current metrics CSV (recent performance to validate)

Each CSV must have columns: model, MASE, sMAPE

Example format:

model,MASE,sMAPE
model_A,1.2,0.15
model_B,0.8,0.10

Step 2: Set validation thresholds

Configure acceptable deviation thresholds for MASE and sMAPE metrics. Default thresholds are 0.2 (20% increase), but these can be adjusted based on business requirements and model characteristics.

Recommended thresholds:

  • Conservative: 0.1 (10% increase triggers alert)
  • Standard: 0.2 (20% increase triggers alert)
  • Lenient: 0.3 (30% increase triggers alert)

Step 3: Execute validation

Run the validation script to compare current metrics against historical benchmarks:

python {baseDir}/scripts/validate_forecast.py \
  --historical historical_metrics.csv \
  --current current_metrics.csv \
  --mase_threshold 0.2 \
  --smape_threshold 0.2

The script performs:

  1. Loads historical and current metrics from CSV files
  2. Calculates percentage increase for each metric per model
  3. Compares increases against configured thresholds
  4. Generates validation report, comparison CSV, alert log, and visualization

Step 4: Review validation outputs

Analyze the generated outputs to identify forecast quality issues:

  • Read validation_report.txt for summary of findings
  • Check alert.log for models requiring immediate attention
  • Review metrics_comparison.csv for detailed metric changes
  • Examine metrics_visualization.png for visual comparison

If degradation is detected, investigate potential causes such as data quality changes, concept drift, or model staleness.

Output

The validation process generates four output files:

  1. validation_report.txt: Summary report indicating which models show significant degradation and overall validation status
  2. metrics_comparison.csv: Side-by-side comparison of historical vs current metrics for all models
  3. alert.log: Alert messages for models exceeding degradation thresholds
  4. metrics_visualization.png: Bar chart visualization comparing historical and current MASE and sMAPE values

Error Handling

Common errors and solutions:

  1. Missing required metrics column (MASE or sMAPE)

    • Ensure input CSV files contain columns named exactly MASE and sMAPE (case-sensitive)
    • Verify column headers match expected format
  2. Invalid threshold value

    • Provide positive numerical values for --mase_threshold and --smape_threshold
    • Thresholds represent percentage increase (0.2 = 20%)
  3. Historical data unavailable

    • Verify path to historical metrics CSV file is correct
    • Ensure file exists and is readable
    • Check file format matches required CSV structure
  4. File not found error

    • Verify both --historical and --current file paths are correct
    • Use absolute paths if relative paths fail
    • Check file permissions
  5. Empty DataFrame error

    • Ensure CSV files are not empty
    • Verify CSV files contain data rows beyond the header
    • Check for proper CSV formatting (commas as delimiters)

Examples

Example 1: Significant MASE degradation detected

Input (historical_metrics.csv):

model,MASE,sMAPE
model_A,1.2,0.15

Input (current_metrics.csv):

model,MASE,sMAPE
model_A,1.8,0.18

Command:

python scripts/validate_forecast.py --historical historical_metrics.csv --current current_metrics.csv

Output (validation_report.txt):

WARNING: Significant increase in MASE detected for model model_A.

Interpretation: Model A shows 50% increase in MASE (from 1.2 to 1.8), exceeding the default 20% threshold. This indicates forecast quality degradation requiring investigation.

Example 2: Stable performance, no alerts

Input (historical_metrics.csv):

model,MASE,sMAPE
model_B,0.8,0.10

Input (current_metrics.csv):

model,MASE,sMAPE
model_B,0.85,0.11

Command:

python scripts/validate_forecast.py --historical historical_metrics.csv --current current_metrics.csv

Output (validation_report.txt):

Forecast validation passed. No significant degradation detected.

Interpretation: Model B shows only 6.25% increase in MASE and 10% increase in sMAPE, both below the 20% threshold. Performance is stable.

Example 3: Multiple models with custom thresholds

Command:

python scripts/validate_forecast.py \
  --historical multi_model_historical.csv \
  --current multi_model_current.csv \
  --mase_threshold 0.3 \
  --smape_threshold 0.25

Uses more lenient thresholds (30% for MASE, 25% for sMAPE) suitable for volatile forecasts or experimental models.

Resources

Script: {baseDir}/scripts/validate_forecast.py

Metrics: MASE (Mean Absolute Scaled Error), sMAPE (symmetric Mean Absolute Percentage Error)

Related skills: nixtla-timegpt-lab, nixtla-experiment-architect, nixtla-schema-mapper

Related Skills

Attack Tree Construction

Build comprehensive attack trees to visualize threat paths. Use when mapping attack scenarios, identifying defense gaps, or communicating security risks to stakeholders.

skill

Grafana Dashboards

Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.

skill

Matplotlib

Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.

skill

Scientific Visualization

Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.

skill

Seaborn

Statistical visualization. Scatter, box, violin, heatmaps, pair plots, regression, correlation matrices, KDE, faceted plots, for exploratory analysis and publication figures.

skill

Shap

Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model

skill

Pydeseq2

Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.

skill

Query Writing

For writing and executing SQL queries - from simple single-table queries to complex multi-table JOINs and aggregations

skill

Pydeseq2

Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.

skill

Scientific Visualization

Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.

skill

Skill Information

Category:Skill
Version:1.0.0
Allowed Tools:Read,Write,Bash,Glob,Grep
Last Updated:12/10/2025