Rangebar Eval Metrics
by terrylica
Range bar evaluation metrics for quant trading. TRIGGERS - range bar metrics, Sharpe ratio, WFO metrics, PSR DSR MinTRL.
Skill Details
Repository Files
13 files in this skill directory
name: rangebar-eval-metrics description: Range bar evaluation metrics for quant trading. TRIGGERS - range bar metrics, Sharpe ratio, WFO metrics, PSR DSR MinTRL. allowed-tools: Read, Grep, Glob, Bash
Range Bar Evaluation Metrics
Machine-readable reference + computation scripts for state-of-the-art metrics evaluating range bar (price-based sampling) data.
When to Use This Skill
Use this skill when:
- Evaluating ML model performance on range bar data
- Computing Sharpe ratios with non-IID bar sequences
- Running Walk-Forward Optimization metric analysis
- Calculating PSR, DSR, or MinTRL statistical tests
- Generating evaluation reports from fold results
Quick Start
# Compute metrics from predictions + actuals
python scripts/compute_metrics.py --predictions preds.npy --actuals actuals.npy --timestamps ts.npy
# Generate full evaluation report
python scripts/generate_report.py --results folds.jsonl --output report.md
Metric Tiers
| Tier | Purpose | Metrics | Compute |
|---|---|---|---|
| Primary (5) | Research decisions | weekly_sharpe, hit_rate, cumulative_pnl, n_bars, positive_sharpe_rate | Per-fold + aggregate |
| Secondary/Risk (5) | Additional context | max_drawdown, bar_sharpe, return_per_bar, profit_factor, cv_fold_returns | Per-fold |
| ML Quality (3) | Prediction health | ic, prediction_autocorr, is_collapsed | Per-fold |
| Diagnostic (5) | Final validation | psr, dsr, autocorr_lag1, effective_n, binomial_pvalue | Aggregate only |
| Extended Risk (5) | Deep risk analysis | var_95, cvar_95, omega_ratio, sortino_ratio, ulcer_index | Per-fold (optional) |
Why Range Bars Need Special Treatment
Range bars violate standard IID assumptions:
- Variable duration: Bars form based on price movement, not time
- Autocorrelation: High-volatility periods cluster bars → temporal correlation
- Non-constant information: More bars during volatility = more information per day
Canonical solution: Daily aggregation via _group_by_day() before Sharpe calculation.
References
Core Reference Files
| Topic | Reference File |
|---|---|
| Sharpe Ratio Calculations | sharpe-formulas.md |
| Risk Metrics (VaR, Omega, Ulcer) | risk-metrics.md |
| ML Prediction Quality (IC, Autocorr) | ml-prediction-quality.md |
| Crypto Market Considerations | crypto-markets.md |
| Temporal Aggregation Rules | temporal-aggregation.md |
| JSON Schema for Metrics | metrics-schema.md |
| Anti-Patterns (Transaction Costs) | anti-patterns.md |
| SOTA 2025-2026 (SHAP, BOCPD, etc.) | sota-2025-2026.md |
| Worked Examples (BTC, EUR/USD) | worked-examples.md |
| Structured Logging (NDJSON) | structured-logging.md |
Related Skills
| Skill | Relationship |
|---|---|
| adaptive-wfo-epoch | Uses weekly_sharpe, psr, dsr for WFE calculation |
Dependencies
pip install -r requirements.txt
# Or: pip install numpy>=1.24 pandas>=2.0 scipy>=1.10
Key Formulas
Daily-Aggregated Sharpe (Primary Metric)
def weekly_sharpe(pnl: np.ndarray, timestamps: np.ndarray) -> float:
"""Sharpe with daily aggregation for range bars."""
daily_pnl = _group_by_day(pnl, timestamps) # Sum PnL per calendar day
if len(daily_pnl) < 2 or np.std(daily_pnl) == 0:
return 0.0
daily_sharpe = np.mean(daily_pnl) / np.std(daily_pnl)
# For crypto (7-day week): sqrt(7). For equities: sqrt(5)
return daily_sharpe * np.sqrt(7) # Crypto default
Information Coefficient (Prediction Quality)
from scipy.stats import spearmanr
def information_coefficient(predictions: np.ndarray, actuals: np.ndarray) -> float:
"""Spearman rank IC - captures magnitude alignment."""
ic, _ = spearmanr(predictions, actuals)
return ic # Range: [-1, 1]. >0.02 acceptable, >0.05 good, >0.10 excellent
Probabilistic Sharpe Ratio (Statistical Validation)
from scipy.stats import norm
def psr(sharpe: float, se: float, benchmark: float = 0.0) -> float:
"""P(true Sharpe > benchmark)."""
return norm.cdf((sharpe - benchmark) / se)
Annualization Factors
| Market | Daily → Weekly | Daily → Annual | Rationale |
|---|---|---|---|
| Crypto (24/7) | sqrt(7) = 2.65 | sqrt(365) = 19.1 | 7 trading days/week |
| Equity | sqrt(5) = 2.24 | sqrt(252) = 15.9 | 5 trading days/week |
NEVER use sqrt(252) for crypto markets.
CRITICAL: Session Filter Changes Annualization
| View | Filter | days_per_week | Rationale |
|---|---|---|---|
| Session-filtered (London-NY) | Weekdays 08:00-16:00 | sqrt(5) | Trading like equities |
| All-bars (unfiltered) | None | sqrt(7) | Full 24/7 crypto |
Using sqrt(7) for session-filtered data overstates Sharpe by ~18%!
See crypto-markets.md for detailed rationale.
Dual-View Metrics
For comprehensive analysis, compute metrics with BOTH views:
- Session-filtered (London 08:00 to NY 16:00): Primary strategy evaluation
- All-bars: Regime detection, data quality diagnostics
Academic References
| Concept | Citation |
|---|---|
| Deflated Sharpe Ratio | Bailey & López de Prado (2014) |
| Sharpe SE with Non-Normality | Mertens (2002) |
| Statistics of Sharpe Ratios | Lo (2002) |
| Omega Ratio | Keating & Shadwick (2002) |
| Ulcer Index | Peter Martin (1987) |
Decision Framework
Go Criteria (Research)
go_criteria:
- positive_sharpe_rate > 0.55
- mean_weekly_sharpe > 0
- cv_fold_returns < 1.5
- mean_hit_rate > 0.50
Publication Criteria
publication_criteria:
- binomial_pvalue < 0.05
- psr > 0.85
- dsr > 0.50 # If n_trials > 1
Scripts
| Script | Purpose |
|---|---|
scripts/compute_metrics.py |
Compute all metrics from predictions/actuals |
scripts/generate_report.py |
Generate Markdown report from fold results |
scripts/validate_schema.py |
Validate metrics JSON against schema |
Remediations (2026-01-19 Multi-Agent Audit)
The following fixes were applied based on a 12-subagent adversarial audit:
| Issue | Root Cause | Fix | Source |
|---|---|---|---|
weekly_sharpe=0 |
Constant predictions | Model collapse detection + architecture fix | model-expert |
IC=None |
Zero variance predictions | Return 1.0 for constant (semantically correct) | model-expert |
prediction_autocorr=NaN |
Division by zero | Guard for std < 1e-10, return 1.0 | model-expert |
| Ulcer Index divide-by-zero | Peak equity = 0 | Guard with np.where(peak > 1e-10, ...) | risk-analyst |
| Omega/Profit Factor unreliable | Too few samples | min_days parameter (default: 5) | robustness-analyst |
| BiLSTM mean collapse | Architecture too small | hidden_size: 16→48, dropout: 0.5→0.3 | model-expert |
profit_factor=1.0 (n_bars=0) |
Early return wrong value | Return NaN when no data to compute ratio | risk-analyst |
Model Collapse Detection
# ALWAYS check for model collapse after prediction
pred_std = np.std(predictions)
if pred_std < 1e-6:
logger.warning(
f"Constant predictions detected (std={pred_std:.2e}). "
"Model collapsed to mean - check architecture."
)
Recommended BiLSTM Architecture
# BEFORE (causes collapse on range bars)
HIDDEN_SIZE = 16
DROPOUT = 0.5
# AFTER (prevents collapse)
HIDDEN_SIZE = 48 # Triple capacity
DROPOUT = 0.3 # Less aggressive regularization
See reference docs for complete implementation details.
Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
| weekly_sharpe is 0 | Constant predictions | Check for model collapse, increase hidden_size |
| IC returns None | Zero variance in predictions | Model collapsed - check architecture |
| prediction_autocorr is NaN | Division by zero | Guard for std < 1e-10 in autocorr calculation |
| Ulcer Index divide error | Peak equity is zero | Add guard: np.where(peak > 1e-10, ...) |
| profit_factor = 1.0 | No bars processed | Return NaN when n_bars is 0 |
| Sharpe inflated 18% | Wrong annualization for data | Use sqrt(5) for session-filtered, sqrt(7) for 24/7 |
| PSR/DSR not computed | Missing scipy | Install: pip install scipy |
| Timestamps not parsed | Wrong format | Ensure Unix timestamps, not datetime strings |
Related Skills
Attack Tree Construction
Build comprehensive attack trees to visualize threat paths. Use when mapping attack scenarios, identifying defense gaps, or communicating security risks to stakeholders.
Grafana Dashboards
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.
Matplotlib
Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.
Scientific Visualization
Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.
Seaborn
Statistical visualization. Scatter, box, violin, heatmaps, pair plots, regression, correlation matrices, KDE, faceted plots, for exploratory analysis and publication figures.
Shap
Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Query Writing
For writing and executing SQL queries - from simple single-table queries to complex multi-table JOINs and aggregations
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Scientific Visualization
Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.
