name: analyze description: Data analysis with end-state first protocol. Clarify decisions before diving into data. Python/pandas focused. allowed-tools: Read, Write, Edit, Bash, Grep, Glob, AskUserQuestion

Data Analysis Skill

End-state first data analysis. Know what decision you're informing before touching data.

Usage

/analyze <description>

Philosophy

END-STATE FIRST

Before any query or code, answer:

What decision will this inform?
What action will user take based on results?
What format is needed? (report, dashboard, alert)
Who is the audience?

Workflow

┌─────────────────────────────────────┐
│ Step 1: End-State Clarification     │
│ (Decision, action, format, audience)│
└─────────────────┬───────────────────┘
                  ↓
┌─────────────────────────────────────┐
│ Step 2: Data Source Identification  │
│ (Tables, schemas, access)           │
└─────────────────┬───────────────────┘
                  ↓
┌─────────────────────────────────────┐
│ Step 3: Schema Analysis             │
│ (Check registry, document if new)   │
└─────────────────┬───────────────────┘
                  ↓
┌─────────────────────────────────────┐
│ Step 4: Analysis Plan               │
│ (Steps, calculations, validations)  │
└─────────────────┬───────────────────┘
                  ↓
┌─────────────────────────────────────┐
│ Step 5: Implementation              │
│ (Python/pandas, iterative)          │
└─────────────────┬───────────────────┘
                  ↓
┌─────────────────────────────────────┐
│ Step 6: Results & Insights          │
│ (Formatted for audience)            │
└─────────────────────────────────────┘

Step 1: End-State Clarification

Ask these questions FIRST:

## End-State Clarification

1. **Decision:** What decision will this analysis inform?
   [Answer required]

2. **Action:** What will you do differently based on the results?
   [Answer required]

3. **Format:** How should results be presented?
   - [ ] Summary report (markdown)
   - [ ] Data table (CSV/Excel)
   - [ ] Visualization (chart type)
   - [ ] Dashboard (Tableau input)
   - [ ] Automated alert/threshold

4. **Audience:** Who will consume this?
   - [ ] Technical (show methodology)
   - [ ] Executive (high-level insights)
   - [ ] Operational (actionable items)

Step 2: Data Source Identification

Identify and document data sources:

## Data Sources

### Primary Data
- **Source:** [database/file/API]
- **Access:** [how to connect]
- **Freshness:** [real-time/daily/weekly]

### Secondary Data
- **Source:** [if needed]
- **Join key:** [how to link]

Step 3: Schema Analysis

Check schema registry first:

# Check if schema exists
registry_path = Path("~/.claude/schemas/registry.json").expanduser()
if registry_path.exists():
    registry = json.loads(registry_path.read_text())
    schema = next((s for s in registry if s["name"] == "schema_name"), None)

If schema not found, document it:

## Schema: [table_name]

| Column | Type | Description | Sample Values |
|--------|------|-------------|---------------|
| col1 | string | [desc] | "val1", "val2" |
| col2 | int | [desc] | 1, 2, 3 |

**Primary Key:** [column(s)]
**Row Count:** ~[estimate]
**Gotchas:** [nulls, encoding issues, etc.]

Propose adding to registry:

Propose adding this schema to ~/.claude/schemas/[domain]/[name].json?
[Y/N]

Step 4: Analysis Plan

Write the plan before code:

## Analysis Plan

### Objective
[Restate the goal in analysis terms]

### Steps
1. Load and validate data
   - Expected rows: [X]
   - Expected columns: [list]
   - Validation: [null checks, type checks]

2. Data cleaning
   - Handle nulls: [strategy]
   - Filter conditions: [list]
   - Transformations: [list]

3. Calculations
   - Metric A: [formula]
   - Metric B: [formula]

4. Aggregations
   - Group by: [columns]
   - Aggregations: [sum, avg, count, etc.]

5. Output
   - Format: [as specified in Step 1]
   - Destination: [file, stdout, etc.]

Step 5: Implementation

Python/pandas focused. Iterate and validate.

Code Structure

import pandas as pd
from pathlib import Path

def load_data(source: str) -> pd.DataFrame:
    """Load and validate source data."""
    df = pd.read_csv(source)  # or appropriate loader

    # Validation
    assert len(df) > 0, "No data loaded"
    expected_cols = ["col1", "col2", "col3"]
    missing = set(expected_cols) - set(df.columns)
    assert not missing, f"Missing columns: {missing}"

    return df

def clean_data(df: pd.DataFrame) -> pd.DataFrame:
    """Apply cleaning transformations."""
    return (df
        .dropna(subset=["required_col"])
        .query("status == 'active'")
        .assign(date=lambda x: pd.to_datetime(x["date_str"]))
    )

def analyze(df: pd.DataFrame) -> pd.DataFrame:
    """Core analysis logic."""
    return (df
        .groupby("category")
        .agg(
            total=("value", "sum"),
            count=("id", "count"),
            avg=("value", "mean")
        )
        .sort_values("total", ascending=False)
    )

def main():
    raw = load_data("data/source.csv")
    clean = clean_data(raw)
    results = analyze(clean)

    # Output based on format requirement
    print(results.to_markdown())
    results.to_csv("output/results.csv", index=False)

if __name__ == "__main__":
    main()

Iterative Development

1. Load small sample, verify shape
2. Apply one transformation, verify
3. Check intermediate results
4. Full pipeline on sample
5. Full pipeline on full data

Step 6: Results & Insights

Format for audience specified in Step 1.

Technical Report

## Analysis: [Title]

### Methodology
[Steps taken, assumptions]

### Data Quality Notes
[Issues found, how handled]

### Results
[Tables, charts]

### Statistical Validity
[Confidence, limitations]

Executive Summary

## [Title]

**Key Finding:** [One sentence]

**Recommendation:** [Action to take]

**Supporting Data:**
- Metric 1: [value]
- Metric 2: [value]

**Next Steps:**
1. [Action]

Tableau Prep

If user needs to port to Tableau:

Output: analysis_output.csv

Columns for Tableau:
- dimension_cols: [list]
- measure_cols: [list]
- date_cols: [list]

Suggested chart types:
- [metric] → bar chart by [dimension]
- [trend] → line chart over [date]

Schema Registry

Location: ~/.claude/schemas/

Structure:

schemas/
├── registry.json      # Index of all schemas
└── [domain]/
    └── [schema].json

Schema format:

{
  "name": "schema_name",
  "domain": "domain_name",
  "source": "database.table",
  "last_updated": "2024-01-15",
  "tables": {
    "table_name": {
      "columns": {
        "col1": {
          "type": "string",
          "description": "Description",
          "nullable": false
        }
      },
      "primaryKey": ["col1"],
      "rowEstimate": 1000000,
      "sampleQueries": ["SELECT * FROM table LIMIT 10"]
    }
  }
}

Expert Mode Integration

Auto-injected expertise for:

Tesla/autodiag work: Diagnostic data, alert patterns, vehicle telemetry
Generic data: Statistical best practices, pandas optimization

Example

/analyze "What's the trend in autodiag detection rate this quarter?"

Step 1 - End-State:
- Decision: Prioritize improvements to low-detection rules
- Action: File tickets for bottom 10 performing rules
- Format: Report with trend chart
- Audience: Technical (show methodology)

Step 2 - Data Source:
- autodiag_runs table (BigQuery)
- alerts_triggered table

Step 3 - Schema:
- Check ~/.claude/schemas/autodiag/runs.json
- Document if not present

Step 4 - Plan:
- Load 90 days of data
- Calculate detection_rate = alerts_detected / total_runs
- Group by week, rule_name
- Trend line over time

Step 5 - Implementation:
[Python code]

Step 6 - Results:
## Detection Rate Trend Q4 2024
[Chart]

Key Finding: Detection rate dropped 12% in December
Top 5 declining rules: [list]
Recommendation: Investigate rule changes in Dec release

Analyze

Skill Details

Repository Files

name: analyze description: Data analysis with end-state first protocol. Clarify decisions before diving into data. Python/pandas focused. allowed-tools: Read, Write, Edit, Bash, Grep, Glob, AskUserQuestion

Data Analysis Skill

Usage

Philosophy

Workflow

Step 1: End-State Clarification

Step 2: Data Source Identification

Step 3: Schema Analysis

Step 4: Analysis Plan

Step 5: Implementation

Code Structure

Iterative Development

Step 6: Results & Insights

Technical Report

Executive Summary

Tableau Prep

Schema Registry

Expert Mode Integration

Example

Related Skills

Xlsx

Clickhouse Io

Clickhouse Io

Analyzing Financial Statements

Data Storytelling

Kpi Dashboard Design

Dbt Transformation Patterns

Sql Optimization Patterns

Anndata

Xlsx

Skill Information