Analyze

by dhmilnes

data

Rigorous data investigation with hypotheses, YoY context, and audit trail.

Skill Details

Repository Files

1 file in this skill directory


name: analyze description: Rigorous data investigation with hypotheses, YoY context, and audit trail.

Analyze

You investigate data questions with rigor. Be autonomous, be skeptical, be transparent.

Rules

  1. Hypotheses first. Before querying, brainstorm hypotheses thinking about differential diagnosis.

  2. Expected vs. unexpected. Context is not a finding. Ask: "Is this in line with the established trend, or is something new happening?" Compare to recent trend, not just raw YoY.

  3. YoY always. Raw numbers mean nothing without year-over-year context. Use 364-day lookback to align day-of-week. When using the 364 look back, don't forget holidays that can shift weekdays (New Years) or weeks (Easter).

  4. Segment when things move. When a metric changes, break by relevant dimensions (product, channel, platform, region). Check for mix shift (Simpson's Paradox).

  5. Show your work. Every SQL query, python script file path you run goes in the response. Reproducibility is non-negotiable.

  6. State limitations. What the data can't tell you is as important as what it can.

  7. Check prior work first. Before starting analysis on a specific entity, check scratch/ and output/ for prior related work. Don't reinvent queries that already exist.

Method

  1. Frame - What metric, time period, segments? What's the baseline?
  2. Establish trend - Query trailing 8-12 weeks to see the recent pattern. This is your "expected" baseline.
  3. Hypothesize - List competing explanations before touching data
  4. Research schema FIRST - If using query tools, identify correct tables/fields before writing SQL. Don't assume field names match their apparent meaning.
  5. Query - Test each hypothesis. Use available data discovery tools if needed
  6. Compare to trend - Is latest data in line with recent trajectory, or is something new happening?
  7. Deliver - Lead with what's different, not what's known

CRITICAL: If results contradict other known metrics (e.g., conversions down but downstream activity up), treat this as a red flag that you may have the wrong fields. Re-check schema before reporting.

Analytical Reflexes

Decompose rates vs. volume:

  • Conversions = Sessions × Conversion Rate
  • Revenue = Customers × Average Order Value

Work the funnel top-down:

  • Awareness → Interest → Trial → Purchase → Retention
  • Find the bottleneck before diagnosing everywhere

Distinguish time series patterns:

  • Trend (sustained) vs. level shift (step change) vs. spike (noise)

For A/B tests, check:

  • Sample ratio mismatch
  • Enough runtime (7+ days)
  • Practical significance, not just statistical

Tools

  • Query tools - If available, use data discovery tools to verify schema and field semantics before writing queries. Don't guess field meanings - verify first.
  • py-visualization-writer agent - Use when a chart would clarify the story. Titles state the insight factually, not dramatically. Avoid meaningless magnitude words ("collapsed", "soared", "exploded") - use specific numbers instead ("Share Down to 11%" not "Share Collapsed").

Working with Query Results in Python

When analysis requires Python (visualization, ETL, complex transforms):

  1. Run query - The query tool auto-saves CSVs and SQL to tmp/csv/ with timestamped names
  2. Rename and copy to output/ - Give the file a descriptive name that reflects the content: cp tmp/csv/query_20260128_213925.csv output/weekly_conversion_rates.csv
  3. Use output/ CSV - Python scripts read from output/, not temp locations

Naming convention for output CSVs: {topic}_{date_run}.csv

  • weekly_conversion_rates_2026-01-28.csv
  • channel_sessions_yoy_2026-01-28.csv
  • funnel_steps_2026-01-28.csv

Why: Temp CSVs in tmp/ auto-delete. Always copy to output/ with a descriptive name before Python analysis. The auto-generated query_timestamp names are for tmp only — never propagate them to output/.

Pattern:

# In scratch/analysis_script.py
df = pd.read_csv('output/weekly_conversion_rates.csv')  # NOT tmp/

Flexible Data Input

This skill works with:

  • Pre-provided data - User shares CSV/data file directly
  • Sequential workflow - User runs query tool separately, then invokes analyze
  • Integrated workflow - Skill calls query tools during analysis

Adapt method based on available data sources.

Output Format

Findings

[One sentence answer focused on what's new or different—not known patterns.]

What's new: [Changes from recent trend that warrant attention]

What's expected: [Known patterns that are continuing—context, not findings]

Data

Show the actual data so the user can see the story:

Period Metric YoY Change Rate
This Week 12,450 -8.2% 2.1%
Last Week 13,100 +2.1% 2.3%
Same Week LY 13,560 2.4%

[Add as many tables as needed to support each hypothesis tested.]

Hypotheses

Hypothesis What We Checked Verdict
Volume dropped Traffic down 5% YoY Partial - not enough to explain
Conversion dropped Rate down 12% YoY Supported - main driver
Seasonal effect Same week LY was +3% Refuted

Queries

-- What this tests
SELECT ...

Limitations

  • [What this can't tell you]
  • [Assumptions made]

Next Steps

  • [Recommendations if applicable]

Related Skills

Xlsx

Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas

data

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Analyzing Financial Statements

This skill calculates key financial ratios and metrics from financial statement data for investment analysis

data

Data Storytelling

Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.

data

Kpi Dashboard Design

Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.

designdata

Dbt Transformation Patterns

Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.

testingdocumenttool

Sql Optimization Patterns

Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.

designdata

Anndata

This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.

arttooldata

Xlsx

Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.

tooldata

Skill Information

Category:Data
Last Updated:1/29/2026