Log Summary Date Ranges

by benchflow-ai

skill

Guidance for analyzing log files and generating summary reports with counts aggregated across multiple date ranges and severity levels. This skill applies when tasks involve parsing log files by date, counting occurrences by severity (ERROR, WARNING, INFO), and outputting structured CSV summaries across time periods like "today", "last 7 days", or "last 30 days".

Skill Details

Repository Files

1 file in this skill directory


name: log-summary-date-ranges description: Guidance for analyzing log files and generating summary reports with counts aggregated across multiple date ranges and severity levels. This skill applies when tasks involve parsing log files by date, counting occurrences by severity (ERROR, WARNING, INFO), and outputting structured CSV summaries across time periods like "today", "last 7 days", or "last 30 days".

Log Summary Date Ranges

This skill provides guidance for tasks that require analyzing log files and producing summary reports with counts aggregated across date ranges and severity levels.

When to Use This Skill

This skill applies when:

  • Parsing log files organized by date (e.g., YYYY-MM-DD_source.log)
  • Counting log entries by severity level (ERROR, WARNING, INFO, DEBUG)
  • Aggregating counts across multiple date ranges (today, last N days, total)
  • Outputting results in structured CSV format

Recommended Approach

Phase 1: Exploration

Before writing any code, explore the data structure to understand:

  1. Directory structure: Identify where log files are stored and their naming convention
  2. Date format in filenames: Extract dates from filenames (e.g., 2025-08-12_api.log)
  3. Log entry format: Examine sample entries to understand severity marker patterns (e.g., [ERROR], [WARNING], [INFO])
  4. Multiple sources: Identify if there are multiple log sources per day (api, app, auth, db)

Example exploration commands:

ls -la /path/to/logs/
head -5 /path/to/logs/*.log

Phase 2: Date Range Calculation

When calculating date ranges, apply careful interpretation:

  • "Last 7 days including today": Use timedelta(days=6) from today
    • Example: If today is 2025-08-12, range is 2025-08-06 to 2025-08-12
  • "Last 30 days including today": Use timedelta(days=29) from today
    • Example: If today is 2025-08-12, range is 2025-07-14 to 2025-08-12

The "including today" phrasing means the count of days includes the current day, so subtract 1 from the timedelta.

Phase 3: Implementation Strategy

Preferred approach: Extract dates from filenames rather than parsing timestamps within log files. This is more efficient and aligns with typical log file organization.

Processing pattern:

  1. List all log files in the directory
  2. Extract date from each filename using pattern matching
  3. For each file, count occurrences of each severity level
  4. Accumulate counts into appropriate date range buckets
  5. Output results in the required CSV format

Python vs Shell: Either approach works. Python offers better structured data handling for multiple date ranges and severity combinations. Shell with grep, awk, and associative arrays can be more concise for simpler cases.

Phase 4: Output Generation

When generating CSV output:

  • Include header row with column names
  • Use consistent ordering of rows (by date range and severity)
  • Ensure proper CSV formatting (comma-separated, quoted strings if needed)
  • Use Unix line endings (\n) unless Windows format is explicitly required

Phase 5: Verification

Always verify results independently using shell commands:

# Verify counts for a specific date
grep -c "\[ERROR\]" /path/to/logs/2025-08-12_*.log | awk -F: '{sum+=$2} END {print sum}'

# Batch verification for multiple severities
for sev in ERROR WARNING INFO; do
  grep -c "\[$sev\]" /path/to/logs/2025-08-12_*.log | awk -F: '{sum+=$2} END {print "'$sev':", sum}'
done

Cross-check at least 2-3 data points from the generated output against independent grep counts.

Common Pitfalls

Date Range Off-by-One Errors

  • "Last 7 days" including today means 7 total days, not 7 days before today
  • Double-check boundary dates by listing them explicitly before processing

Severity Level Filtering

  • Only count requested severity levels (typically ERROR, WARNING, INFO)
  • DEBUG entries may exist but should be excluded unless explicitly requested
  • Match exact patterns like [ERROR] to avoid false positives

Multiple Log Sources Per Day

  • A single day may have logs from multiple sources (api, app, auth, db)
  • Aggregate counts across all sources for each date, not just one file

Line Ending Consistency

  • Use Unix line endings (\n) for CSV output unless otherwise specified
  • Windows line endings (\r\n) can cause issues in some processing pipelines

Empty or Malformed Data

  • Handle empty log files gracefully (count as 0)
  • Handle malformed log lines that don't match expected patterns
  • Validate that log file dates fall within expected ranges

Verification Checklist

Before considering the task complete:

  1. Output file exists at the specified location
  2. CSV format matches requirements (header row, correct columns)
  3. Row count matches expected number of date ranges and severity combinations
  4. At least 2-3 counts verified independently with grep commands
  5. Date ranges calculated correctly (boundary dates are accurate)
  6. All severity levels accounted for in output

Cleanup Considerations

  • Remove temporary scripts created during processing unless they serve ongoing purposes
  • Document any artifacts left in the working directory

Related Skills

Attack Tree Construction

Build comprehensive attack trees to visualize threat paths. Use when mapping attack scenarios, identifying defense gaps, or communicating security risks to stakeholders.

skill

Grafana Dashboards

Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.

skill

Matplotlib

Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.

skill

Scientific Visualization

Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.

skill

Seaborn

Statistical visualization. Scatter, box, violin, heatmaps, pair plots, regression, correlation matrices, KDE, faceted plots, for exploratory analysis and publication figures.

skill

Shap

Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model

skill

Pydeseq2

Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.

skill

Query Writing

For writing and executing SQL queries - from simple single-table queries to complex multi-table JOINs and aggregations

skill

Pydeseq2

Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.

skill

Scientific Visualization

Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.

skill

Skill Information

Category:Skill
Last Updated:1/23/2026