Analyze

by fernando-fernandez3

data

Data analysis with end-state first protocol. Clarify decisions before diving into data. Python/pandas focused.

Skill Details

Repository Files

1 file in this skill directory


name: analyze description: Data analysis with end-state first protocol. Clarify decisions before diving into data. Python/pandas focused. allowed-tools: Read, Write, Edit, Bash, Grep, Glob, AskUserQuestion

Data Analysis Skill

End-state first data analysis. Know what decision you're informing before touching data.

Usage

/analyze <description>

Philosophy

END-STATE FIRST

Before any query or code, answer:

  1. What decision will this inform?
  2. What action will user take based on results?
  3. What format is needed? (report, dashboard, alert)
  4. Who is the audience?

Workflow

┌─────────────────────────────────────┐
│ Step 1: End-State Clarification     │
│ (Decision, action, format, audience)│
└─────────────────┬───────────────────┘
                  ↓
┌─────────────────────────────────────┐
│ Step 2: Data Source Identification  │
│ (Tables, schemas, access)           │
└─────────────────┬───────────────────┘
                  ↓
┌─────────────────────────────────────┐
│ Step 3: Schema Analysis             │
│ (Check registry, document if new)   │
└─────────────────┬───────────────────┘
                  ↓
┌─────────────────────────────────────┐
│ Step 4: Analysis Plan               │
│ (Steps, calculations, validations)  │
└─────────────────┬───────────────────┘
                  ↓
┌─────────────────────────────────────┐
│ Step 5: Implementation              │
│ (Python/pandas, iterative)          │
└─────────────────┬───────────────────┘
                  ↓
┌─────────────────────────────────────┐
│ Step 6: Results & Insights          │
│ (Formatted for audience)            │
└─────────────────────────────────────┘

Step 1: End-State Clarification

Ask these questions FIRST:

## End-State Clarification

1. **Decision:** What decision will this analysis inform?
   [Answer required]

2. **Action:** What will you do differently based on the results?
   [Answer required]

3. **Format:** How should results be presented?
   - [ ] Summary report (markdown)
   - [ ] Data table (CSV/Excel)
   - [ ] Visualization (chart type)
   - [ ] Dashboard (Tableau input)
   - [ ] Automated alert/threshold

4. **Audience:** Who will consume this?
   - [ ] Technical (show methodology)
   - [ ] Executive (high-level insights)
   - [ ] Operational (actionable items)

Step 2: Data Source Identification

Identify and document data sources:

## Data Sources

### Primary Data
- **Source:** [database/file/API]
- **Access:** [how to connect]
- **Freshness:** [real-time/daily/weekly]

### Secondary Data
- **Source:** [if needed]
- **Join key:** [how to link]

Step 3: Schema Analysis

Check schema registry first:

# Check if schema exists
registry_path = Path("~/.claude/schemas/registry.json").expanduser()
if registry_path.exists():
    registry = json.loads(registry_path.read_text())
    schema = next((s for s in registry if s["name"] == "schema_name"), None)

If schema not found, document it:

## Schema: [table_name]

| Column | Type | Description | Sample Values |
|--------|------|-------------|---------------|
| col1 | string | [desc] | "val1", "val2" |
| col2 | int | [desc] | 1, 2, 3 |

**Primary Key:** [column(s)]
**Row Count:** ~[estimate]
**Gotchas:** [nulls, encoding issues, etc.]

Propose adding to registry:

Propose adding this schema to ~/.claude/schemas/[domain]/[name].json?
[Y/N]

Step 4: Analysis Plan

Write the plan before code:

## Analysis Plan

### Objective
[Restate the goal in analysis terms]

### Steps
1. Load and validate data
   - Expected rows: [X]
   - Expected columns: [list]
   - Validation: [null checks, type checks]

2. Data cleaning
   - Handle nulls: [strategy]
   - Filter conditions: [list]
   - Transformations: [list]

3. Calculations
   - Metric A: [formula]
   - Metric B: [formula]

4. Aggregations
   - Group by: [columns]
   - Aggregations: [sum, avg, count, etc.]

5. Output
   - Format: [as specified in Step 1]
   - Destination: [file, stdout, etc.]

Step 5: Implementation

Python/pandas focused. Iterate and validate.

Code Structure

import pandas as pd
from pathlib import Path

def load_data(source: str) -> pd.DataFrame:
    """Load and validate source data."""
    df = pd.read_csv(source)  # or appropriate loader

    # Validation
    assert len(df) > 0, "No data loaded"
    expected_cols = ["col1", "col2", "col3"]
    missing = set(expected_cols) - set(df.columns)
    assert not missing, f"Missing columns: {missing}"

    return df

def clean_data(df: pd.DataFrame) -> pd.DataFrame:
    """Apply cleaning transformations."""
    return (df
        .dropna(subset=["required_col"])
        .query("status == 'active'")
        .assign(date=lambda x: pd.to_datetime(x["date_str"]))
    )

def analyze(df: pd.DataFrame) -> pd.DataFrame:
    """Core analysis logic."""
    return (df
        .groupby("category")
        .agg(
            total=("value", "sum"),
            count=("id", "count"),
            avg=("value", "mean")
        )
        .sort_values("total", ascending=False)
    )

def main():
    raw = load_data("data/source.csv")
    clean = clean_data(raw)
    results = analyze(clean)

    # Output based on format requirement
    print(results.to_markdown())
    results.to_csv("output/results.csv", index=False)

if __name__ == "__main__":
    main()

Iterative Development

1. Load small sample, verify shape
2. Apply one transformation, verify
3. Check intermediate results
4. Full pipeline on sample
5. Full pipeline on full data

Step 6: Results & Insights

Format for audience specified in Step 1.

Technical Report

## Analysis: [Title]

### Methodology
[Steps taken, assumptions]

### Data Quality Notes
[Issues found, how handled]

### Results
[Tables, charts]

### Statistical Validity
[Confidence, limitations]

Executive Summary

## [Title]

**Key Finding:** [One sentence]

**Recommendation:** [Action to take]

**Supporting Data:**
- Metric 1: [value]
- Metric 2: [value]

**Next Steps:**
1. [Action]

Tableau Prep

If user needs to port to Tableau:

Output: analysis_output.csv

Columns for Tableau:
- dimension_cols: [list]
- measure_cols: [list]
- date_cols: [list]

Suggested chart types:
- [metric] → bar chart by [dimension]
- [trend] → line chart over [date]

Schema Registry

Location: ~/.claude/schemas/

Structure:

schemas/
├── registry.json      # Index of all schemas
└── [domain]/
    └── [schema].json

Schema format:

{
  "name": "schema_name",
  "domain": "domain_name",
  "source": "database.table",
  "last_updated": "2024-01-15",
  "tables": {
    "table_name": {
      "columns": {
        "col1": {
          "type": "string",
          "description": "Description",
          "nullable": false
        }
      },
      "primaryKey": ["col1"],
      "rowEstimate": 1000000,
      "sampleQueries": ["SELECT * FROM table LIMIT 10"]
    }
  }
}

Expert Mode Integration

Auto-injected expertise for:

  • Tesla/autodiag work: Diagnostic data, alert patterns, vehicle telemetry
  • Generic data: Statistical best practices, pandas optimization

Example

/analyze "What's the trend in autodiag detection rate this quarter?"

Step 1 - End-State:
- Decision: Prioritize improvements to low-detection rules
- Action: File tickets for bottom 10 performing rules
- Format: Report with trend chart
- Audience: Technical (show methodology)

Step 2 - Data Source:
- autodiag_runs table (BigQuery)
- alerts_triggered table

Step 3 - Schema:
- Check ~/.claude/schemas/autodiag/runs.json
- Document if not present

Step 4 - Plan:
- Load 90 days of data
- Calculate detection_rate = alerts_detected / total_runs
- Group by week, rule_name
- Trend line over time

Step 5 - Implementation:
[Python code]

Step 6 - Results:
## Detection Rate Trend Q4 2024
[Chart]

Key Finding: Detection rate dropped 12% in December
Top 5 declining rules: [list]
Recommendation: Investigate rule changes in Dec release

Related Skills

Xlsx

Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas

data

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Analyzing Financial Statements

This skill calculates key financial ratios and metrics from financial statement data for investment analysis

data

Data Storytelling

Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.

data

Kpi Dashboard Design

Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.

designdata

Dbt Transformation Patterns

Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.

testingdocumenttool

Sql Optimization Patterns

Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.

designdata

Anndata

This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.

arttooldata

Xlsx

Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.

tooldata

Skill Information

Category:Data
Allowed Tools:Read, Write, Edit, Bash, Grep, Glob, AskUserQuestion
Last Updated:1/25/2026