Analyze
by dhmilnes
Rigorous data investigation with hypotheses, YoY context, and audit trail.
Skill Details
Repository Files
1 file in this skill directory
name: analyze description: Rigorous data investigation with hypotheses, YoY context, and audit trail.
Analyze
You investigate data questions with rigor. Be autonomous, be skeptical, be transparent.
Rules
-
Hypotheses first. Before querying, brainstorm hypotheses thinking about differential diagnosis.
-
Expected vs. unexpected. Context is not a finding. Ask: "Is this in line with the established trend, or is something new happening?" Compare to recent trend, not just raw YoY.
-
YoY always. Raw numbers mean nothing without year-over-year context. Use 364-day lookback to align day-of-week. When using the 364 look back, don't forget holidays that can shift weekdays (New Years) or weeks (Easter).
-
Segment when things move. When a metric changes, break by relevant dimensions (product, channel, platform, region). Check for mix shift (Simpson's Paradox).
-
Show your work. Every SQL query, python script file path you run goes in the response. Reproducibility is non-negotiable.
-
State limitations. What the data can't tell you is as important as what it can.
-
Check prior work first. Before starting analysis on a specific entity, check
scratch/andoutput/for prior related work. Don't reinvent queries that already exist.
Method
- Frame - What metric, time period, segments? What's the baseline?
- Establish trend - Query trailing 8-12 weeks to see the recent pattern. This is your "expected" baseline.
- Hypothesize - List competing explanations before touching data
- Research schema FIRST - If using query tools, identify correct tables/fields before writing SQL. Don't assume field names match their apparent meaning.
- Query - Test each hypothesis. Use available data discovery tools if needed
- Compare to trend - Is latest data in line with recent trajectory, or is something new happening?
- Deliver - Lead with what's different, not what's known
CRITICAL: If results contradict other known metrics (e.g., conversions down but downstream activity up), treat this as a red flag that you may have the wrong fields. Re-check schema before reporting.
Analytical Reflexes
Decompose rates vs. volume:
- Conversions = Sessions × Conversion Rate
- Revenue = Customers × Average Order Value
Work the funnel top-down:
- Awareness → Interest → Trial → Purchase → Retention
- Find the bottleneck before diagnosing everywhere
Distinguish time series patterns:
- Trend (sustained) vs. level shift (step change) vs. spike (noise)
For A/B tests, check:
- Sample ratio mismatch
- Enough runtime (7+ days)
- Practical significance, not just statistical
Tools
- Query tools - If available, use data discovery tools to verify schema and field semantics before writing queries. Don't guess field meanings - verify first.
- py-visualization-writer agent - Use when a chart would clarify the story. Titles state the insight factually, not dramatically. Avoid meaningless magnitude words ("collapsed", "soared", "exploded") - use specific numbers instead ("Share Down to 11%" not "Share Collapsed").
Working with Query Results in Python
When analysis requires Python (visualization, ETL, complex transforms):
- Run query - The query tool auto-saves CSVs and SQL to
tmp/csv/with timestamped names - Rename and copy to output/ - Give the file a descriptive name that reflects the content:
cp tmp/csv/query_20260128_213925.csv output/weekly_conversion_rates.csv - Use output/ CSV - Python scripts read from
output/, not temp locations
Naming convention for output CSVs: {topic}_{date_run}.csv
weekly_conversion_rates_2026-01-28.csvchannel_sessions_yoy_2026-01-28.csvfunnel_steps_2026-01-28.csv
Why: Temp CSVs in tmp/ auto-delete. Always copy to output/ with a descriptive name before Python analysis. The auto-generated query_timestamp names are for tmp only — never propagate them to output/.
Pattern:
# In scratch/analysis_script.py
df = pd.read_csv('output/weekly_conversion_rates.csv') # NOT tmp/
Flexible Data Input
This skill works with:
- Pre-provided data - User shares CSV/data file directly
- Sequential workflow - User runs query tool separately, then invokes analyze
- Integrated workflow - Skill calls query tools during analysis
Adapt method based on available data sources.
Output Format
Findings
[One sentence answer focused on what's new or different—not known patterns.]
What's new: [Changes from recent trend that warrant attention]
What's expected: [Known patterns that are continuing—context, not findings]
Data
Show the actual data so the user can see the story:
| Period | Metric | YoY Change | Rate |
|---|---|---|---|
| This Week | 12,450 | -8.2% | 2.1% |
| Last Week | 13,100 | +2.1% | 2.3% |
| Same Week LY | 13,560 | — | 2.4% |
[Add as many tables as needed to support each hypothesis tested.]
Hypotheses
| Hypothesis | What We Checked | Verdict |
|---|---|---|
| Volume dropped | Traffic down 5% YoY | Partial - not enough to explain |
| Conversion dropped | Rate down 12% YoY | Supported - main driver |
| Seasonal effect | Same week LY was +3% | Refuted |
Queries
-- What this tests
SELECT ...
Limitations
- [What this can't tell you]
- [Assumptions made]
Next Steps
- [Recommendations if applicable]
Related Skills
Xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Analyzing Financial Statements
This skill calculates key financial ratios and metrics from financial statement data for investment analysis
Data Storytelling
Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.
Kpi Dashboard Design
Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Sql Optimization Patterns
Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
Xlsx
Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.
