Rangebar Eval Metrics

by terrylica

skill

Range bar evaluation metrics for quant trading. TRIGGERS - range bar metrics, Sharpe ratio, WFO metrics, PSR DSR MinTRL.

Skill Details

Repository Files

13 files in this skill directory


name: rangebar-eval-metrics description: Range bar evaluation metrics for quant trading. TRIGGERS - range bar metrics, Sharpe ratio, WFO metrics, PSR DSR MinTRL. allowed-tools: Read, Grep, Glob, Bash

Range Bar Evaluation Metrics

Machine-readable reference + computation scripts for state-of-the-art metrics evaluating range bar (price-based sampling) data.

When to Use This Skill

Use this skill when:

  • Evaluating ML model performance on range bar data
  • Computing Sharpe ratios with non-IID bar sequences
  • Running Walk-Forward Optimization metric analysis
  • Calculating PSR, DSR, or MinTRL statistical tests
  • Generating evaluation reports from fold results

Quick Start

# Compute metrics from predictions + actuals
python scripts/compute_metrics.py --predictions preds.npy --actuals actuals.npy --timestamps ts.npy

# Generate full evaluation report
python scripts/generate_report.py --results folds.jsonl --output report.md

Metric Tiers

Tier Purpose Metrics Compute
Primary (5) Research decisions weekly_sharpe, hit_rate, cumulative_pnl, n_bars, positive_sharpe_rate Per-fold + aggregate
Secondary/Risk (5) Additional context max_drawdown, bar_sharpe, return_per_bar, profit_factor, cv_fold_returns Per-fold
ML Quality (3) Prediction health ic, prediction_autocorr, is_collapsed Per-fold
Diagnostic (5) Final validation psr, dsr, autocorr_lag1, effective_n, binomial_pvalue Aggregate only
Extended Risk (5) Deep risk analysis var_95, cvar_95, omega_ratio, sortino_ratio, ulcer_index Per-fold (optional)

Why Range Bars Need Special Treatment

Range bars violate standard IID assumptions:

  1. Variable duration: Bars form based on price movement, not time
  2. Autocorrelation: High-volatility periods cluster bars → temporal correlation
  3. Non-constant information: More bars during volatility = more information per day

Canonical solution: Daily aggregation via _group_by_day() before Sharpe calculation.

References

Core Reference Files

Topic Reference File
Sharpe Ratio Calculations sharpe-formulas.md
Risk Metrics (VaR, Omega, Ulcer) risk-metrics.md
ML Prediction Quality (IC, Autocorr) ml-prediction-quality.md
Crypto Market Considerations crypto-markets.md
Temporal Aggregation Rules temporal-aggregation.md
JSON Schema for Metrics metrics-schema.md
Anti-Patterns (Transaction Costs) anti-patterns.md
SOTA 2025-2026 (SHAP, BOCPD, etc.) sota-2025-2026.md
Worked Examples (BTC, EUR/USD) worked-examples.md
Structured Logging (NDJSON) structured-logging.md

Related Skills

Skill Relationship
adaptive-wfo-epoch Uses weekly_sharpe, psr, dsr for WFE calculation

Dependencies

pip install -r requirements.txt
# Or: pip install numpy>=1.24 pandas>=2.0 scipy>=1.10

Key Formulas

Daily-Aggregated Sharpe (Primary Metric)

def weekly_sharpe(pnl: np.ndarray, timestamps: np.ndarray) -> float:
    """Sharpe with daily aggregation for range bars."""
    daily_pnl = _group_by_day(pnl, timestamps)  # Sum PnL per calendar day
    if len(daily_pnl) < 2 or np.std(daily_pnl) == 0:
        return 0.0
    daily_sharpe = np.mean(daily_pnl) / np.std(daily_pnl)
    # For crypto (7-day week): sqrt(7). For equities: sqrt(5)
    return daily_sharpe * np.sqrt(7)  # Crypto default

Information Coefficient (Prediction Quality)

from scipy.stats import spearmanr

def information_coefficient(predictions: np.ndarray, actuals: np.ndarray) -> float:
    """Spearman rank IC - captures magnitude alignment."""
    ic, _ = spearmanr(predictions, actuals)
    return ic  # Range: [-1, 1]. >0.02 acceptable, >0.05 good, >0.10 excellent

Probabilistic Sharpe Ratio (Statistical Validation)

from scipy.stats import norm

def psr(sharpe: float, se: float, benchmark: float = 0.0) -> float:
    """P(true Sharpe > benchmark)."""
    return norm.cdf((sharpe - benchmark) / se)

Annualization Factors

Market Daily → Weekly Daily → Annual Rationale
Crypto (24/7) sqrt(7) = 2.65 sqrt(365) = 19.1 7 trading days/week
Equity sqrt(5) = 2.24 sqrt(252) = 15.9 5 trading days/week

NEVER use sqrt(252) for crypto markets.

CRITICAL: Session Filter Changes Annualization

View Filter days_per_week Rationale
Session-filtered (London-NY) Weekdays 08:00-16:00 sqrt(5) Trading like equities
All-bars (unfiltered) None sqrt(7) Full 24/7 crypto

Using sqrt(7) for session-filtered data overstates Sharpe by ~18%!

See crypto-markets.md for detailed rationale.

Dual-View Metrics

For comprehensive analysis, compute metrics with BOTH views:

  1. Session-filtered (London 08:00 to NY 16:00): Primary strategy evaluation
  2. All-bars: Regime detection, data quality diagnostics

Academic References

Concept Citation
Deflated Sharpe Ratio Bailey & López de Prado (2014)
Sharpe SE with Non-Normality Mertens (2002)
Statistics of Sharpe Ratios Lo (2002)
Omega Ratio Keating & Shadwick (2002)
Ulcer Index Peter Martin (1987)

Decision Framework

Go Criteria (Research)

go_criteria:
  - positive_sharpe_rate > 0.55
  - mean_weekly_sharpe > 0
  - cv_fold_returns < 1.5
  - mean_hit_rate > 0.50

Publication Criteria

publication_criteria:
  - binomial_pvalue < 0.05
  - psr > 0.85
  - dsr > 0.50 # If n_trials > 1

Scripts

Script Purpose
scripts/compute_metrics.py Compute all metrics from predictions/actuals
scripts/generate_report.py Generate Markdown report from fold results
scripts/validate_schema.py Validate metrics JSON against schema

Remediations (2026-01-19 Multi-Agent Audit)

The following fixes were applied based on a 12-subagent adversarial audit:

Issue Root Cause Fix Source
weekly_sharpe=0 Constant predictions Model collapse detection + architecture fix model-expert
IC=None Zero variance predictions Return 1.0 for constant (semantically correct) model-expert
prediction_autocorr=NaN Division by zero Guard for std < 1e-10, return 1.0 model-expert
Ulcer Index divide-by-zero Peak equity = 0 Guard with np.where(peak > 1e-10, ...) risk-analyst
Omega/Profit Factor unreliable Too few samples min_days parameter (default: 5) robustness-analyst
BiLSTM mean collapse Architecture too small hidden_size: 16→48, dropout: 0.5→0.3 model-expert
profit_factor=1.0 (n_bars=0) Early return wrong value Return NaN when no data to compute ratio risk-analyst

Model Collapse Detection

# ALWAYS check for model collapse after prediction
pred_std = np.std(predictions)
if pred_std < 1e-6:
    logger.warning(
        f"Constant predictions detected (std={pred_std:.2e}). "
        "Model collapsed to mean - check architecture."
    )

Recommended BiLSTM Architecture

# BEFORE (causes collapse on range bars)
HIDDEN_SIZE = 16
DROPOUT = 0.5

# AFTER (prevents collapse)
HIDDEN_SIZE = 48  # Triple capacity
DROPOUT = 0.3     # Less aggressive regularization

See reference docs for complete implementation details.


Troubleshooting

Issue Cause Solution
weekly_sharpe is 0 Constant predictions Check for model collapse, increase hidden_size
IC returns None Zero variance in predictions Model collapsed - check architecture
prediction_autocorr is NaN Division by zero Guard for std < 1e-10 in autocorr calculation
Ulcer Index divide error Peak equity is zero Add guard: np.where(peak > 1e-10, ...)
profit_factor = 1.0 No bars processed Return NaN when n_bars is 0
Sharpe inflated 18% Wrong annualization for data Use sqrt(5) for session-filtered, sqrt(7) for 24/7
PSR/DSR not computed Missing scipy Install: pip install scipy
Timestamps not parsed Wrong format Ensure Unix timestamps, not datetime strings

Related Skills

Attack Tree Construction

Build comprehensive attack trees to visualize threat paths. Use when mapping attack scenarios, identifying defense gaps, or communicating security risks to stakeholders.

skill

Grafana Dashboards

Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.

skill

Matplotlib

Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.

skill

Scientific Visualization

Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.

skill

Seaborn

Statistical visualization. Scatter, box, violin, heatmaps, pair plots, regression, correlation matrices, KDE, faceted plots, for exploratory analysis and publication figures.

skill

Shap

Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model

skill

Pydeseq2

Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.

skill

Query Writing

For writing and executing SQL queries - from simple single-table queries to complex multi-table JOINs and aggregations

skill

Pydeseq2

Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.

skill

Scientific Visualization

Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.

skill

Skill Information

Category:Skill
Allowed Tools:Read, Grep, Glob, Bash
Last Updated:1/31/2026