Time_Series_Anomaly_Detection
by benchflow-ai
Detect anomalies in time series data using Prophet Framework (Meta), which frames the seasonality, trend holiday effect and other needed regressors into its model, to identify unusual surges or slumps in trends. This is a general methodology analyst can use for understanding what changes of their tracking metrics are manifesting anomalies pattern.
Skill Details
Repository Files
2 files in this skill directory
name: time_series_anomaly_detection description: Detect anomalies in time series data using Prophet Framework (Meta), which frames the seasonality, trend holiday effect and other needed regressors into its model, to identify unusual surges or slumps in trends. This is a general methodology analyst can use for understanding what changes of their tracking metrics are manifesting anomalies pattern.
Time Series Anomaly Detection Framework
Comprehensive time series anomaly detection framework leverages Facebook Prophet's time-sereis trend prediction (counterfactural) in the target window and compare it with the actual data. An anomaly index that measures how much deviate the actual number is from the expected trend (counterfactural) for each of the categories or groups within that period. The anomaly index ranking can be used to identify outerstanding groups/categories that manifest unexpected surge and slump with quantitative index ranged -100 to 100.
Core Components
TimeSeriesAnomalyDetector
Main class for anomaly detection with configurable parameters.
Constructor Parameters
min_training_days(int): Minimum days of the available training data required per category (default: 180). Groups have less than this number of days data would be skippedconfidence_interval(float): Prophet confidence interval width (default: 0.68 = ±1 std) as the starting point to measure the unexpected deviationchangepoint_prior_scale(float): Flexibility of trend changes (default: 0.05) However, if dataset is not in a big size, use 0.1seasonality_prior_scale(float): Strength of seasonality (default: 10.0)
Anomaly Index Calculation (This is CRITICAL and is the key to identify unsual pattern)
- Within confidence interval: Index = 0
- Above upper bound: Index = (actual - upper) / std
- Below lower bound: Index = (actual - lower) / std
- Scaled Index: 100 × tanh(raw_index) → maps to [-100, 100] so after this transformation
- Index=1 → ~76 points
- index=2 → ~96 points
- index=3 → ~99.5 points (approaches ±100)
Usage
from anomaly_detection import TimeSeriesAnomalyDetector
# Initialize detector
detector = TimeSeriesAnomalyDetector(
min_training_days=180, # depends on the dataset span (usually 50%+ of the dataset time span could be good threshold)
confidence_interval=0.68 # 1 std for a good intepretability visually and statistically
)
# Run anomaly detection
results = detector.detect_anomalies(
df=transaction_df,
date_col='<temporal column>',
category_col='<grouping column>',
value_col='<metric to analyze for anomalies>',
cutoff_date='<training and prediction split date, format YYYY-MM-DD>',
prediction_end='end of prediction window',
agg_func='sum', # Aggregation: 'sum', 'mean', 'count', or None (e.g if data is not transactional data and value_col value is ready for the modeling)
)
# Access results
anomaly_summary = results['anomaly_summary'] # All categories with anomaly indices
# Get top 10 postive anomaly groups
top_surge = results['anomaly_summary'].head(10)['<grouping column>'].tolist()
# Get top 10 negative anomaly groups (descendingly)
top_slump = results['anomaly_summary'].tail(10)['<grouping column>'].tolist()
Input Format
Required DataFrame columns:
- Date column (e.g.,
Order Date,Transaction Date,Purchase Date) - temporal dimension - Category column (e.g.,
Category,Region) - grouping dimension | Granularity that the method will be applied in loops - Value column (e.g.,
Price,Spend,Revenue,Quantity) - metric to detect anomalies in (This is very important as this, directly or indirectly, is the y of the modeling) - Optional: Additional columns (e.g.,
User ID)
Parameters:
cutoff_date: Divides training data (before) from prediction period (after)prediction_end: End of anomaly detection windowagg_func: How to aggregate values per category per day'sum': Sum all values of the target column for each category-date (e.g user-level purchase data)'mean': Average values'count': Count recordsNone: No aggregation (e.g non transactional data)
Output
detect_anomalies() returns a dictionary with 3 keys:
{
'anomaly_summary': pd.DataFrame, # Summary metrics per category
'predictions': Dict[str, pd.DataFrame], # Prophet forecasts per category
'models': Dict[str, Prophet] # Trained models per category
}
Output anomaly_summary Preview
| Column | Type | Description |
|---|---|---|
{category_col} |
str | Category name (column name matches your input) |
Anomaly_Index |
float | Scaled anomaly score in [-100, 100] |
days_with_data |
int | Days with actual data in prediction window |
avg_daily_anomaly |
float | Average of daily scaled anomalies |
max_daily_anomaly |
float | Maximum daily scaled anomaly |
min_daily_anomaly |
float | Minimum daily scaled anomaly |
total_actual |
float | Sum of actual values in prediction window |
total_predicted |
float | Sum of predicted (yhat) values |
Sorted by Anomaly_Index descending
predictions Dictionary
Key = category name, Value = DataFrame with columns:
ds: Dateyhat: Prophet predictionyhat_lower: Lower confidence boundyhat_upper: Upper confidence bound
models Dictionary
Key = category name, Value = Fitted Prophet model object
Key Features
- Pure anomaly detection without business logic (no hardcoded top N)
- Automatic daily aggregation by grouping dimension
- Prophet model training with seasonality (daily, weekly, yearly)
- Counterfactual prediction for intervention analysis
- Scaled anomaly index [-100, 100] for interpretability
Best Practices
- Ensure sufficient training data (180+ days recommended)
- Use
cutoff_dateto define pre/post intervention periods - Use
agg_func=Noneif data already at daily category level - Apply your own filtering logic on
anomaly_summary(top N, thresholds, etc.) - Chain with DID analysis for causal inference on selected categories
Related Skills
Xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Analyzing Financial Statements
This skill calculates key financial ratios and metrics from financial statement data for investment analysis
Data Storytelling
Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.
Kpi Dashboard Design
Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Sql Optimization Patterns
Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
Xlsx
Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.
