Analyze
by fernando-fernandez3
Data analysis with end-state first protocol. Clarify decisions before diving into data. Python/pandas focused.
Skill Details
Repository Files
1 file in this skill directory
name: analyze description: Data analysis with end-state first protocol. Clarify decisions before diving into data. Python/pandas focused. allowed-tools: Read, Write, Edit, Bash, Grep, Glob, AskUserQuestion
Data Analysis Skill
End-state first data analysis. Know what decision you're informing before touching data.
Usage
/analyze <description>
Philosophy
END-STATE FIRST
Before any query or code, answer:
- What decision will this inform?
- What action will user take based on results?
- What format is needed? (report, dashboard, alert)
- Who is the audience?
Workflow
┌─────────────────────────────────────┐
│ Step 1: End-State Clarification │
│ (Decision, action, format, audience)│
└─────────────────┬───────────────────┘
↓
┌─────────────────────────────────────┐
│ Step 2: Data Source Identification │
│ (Tables, schemas, access) │
└─────────────────┬───────────────────┘
↓
┌─────────────────────────────────────┐
│ Step 3: Schema Analysis │
│ (Check registry, document if new) │
└─────────────────┬───────────────────┘
↓
┌─────────────────────────────────────┐
│ Step 4: Analysis Plan │
│ (Steps, calculations, validations) │
└─────────────────┬───────────────────┘
↓
┌─────────────────────────────────────┐
│ Step 5: Implementation │
│ (Python/pandas, iterative) │
└─────────────────┬───────────────────┘
↓
┌─────────────────────────────────────┐
│ Step 6: Results & Insights │
│ (Formatted for audience) │
└─────────────────────────────────────┘
Step 1: End-State Clarification
Ask these questions FIRST:
## End-State Clarification
1. **Decision:** What decision will this analysis inform?
[Answer required]
2. **Action:** What will you do differently based on the results?
[Answer required]
3. **Format:** How should results be presented?
- [ ] Summary report (markdown)
- [ ] Data table (CSV/Excel)
- [ ] Visualization (chart type)
- [ ] Dashboard (Tableau input)
- [ ] Automated alert/threshold
4. **Audience:** Who will consume this?
- [ ] Technical (show methodology)
- [ ] Executive (high-level insights)
- [ ] Operational (actionable items)
Step 2: Data Source Identification
Identify and document data sources:
## Data Sources
### Primary Data
- **Source:** [database/file/API]
- **Access:** [how to connect]
- **Freshness:** [real-time/daily/weekly]
### Secondary Data
- **Source:** [if needed]
- **Join key:** [how to link]
Step 3: Schema Analysis
Check schema registry first:
# Check if schema exists
registry_path = Path("~/.claude/schemas/registry.json").expanduser()
if registry_path.exists():
registry = json.loads(registry_path.read_text())
schema = next((s for s in registry if s["name"] == "schema_name"), None)
If schema not found, document it:
## Schema: [table_name]
| Column | Type | Description | Sample Values |
|--------|------|-------------|---------------|
| col1 | string | [desc] | "val1", "val2" |
| col2 | int | [desc] | 1, 2, 3 |
**Primary Key:** [column(s)]
**Row Count:** ~[estimate]
**Gotchas:** [nulls, encoding issues, etc.]
Propose adding to registry:
Propose adding this schema to ~/.claude/schemas/[domain]/[name].json?
[Y/N]
Step 4: Analysis Plan
Write the plan before code:
## Analysis Plan
### Objective
[Restate the goal in analysis terms]
### Steps
1. Load and validate data
- Expected rows: [X]
- Expected columns: [list]
- Validation: [null checks, type checks]
2. Data cleaning
- Handle nulls: [strategy]
- Filter conditions: [list]
- Transformations: [list]
3. Calculations
- Metric A: [formula]
- Metric B: [formula]
4. Aggregations
- Group by: [columns]
- Aggregations: [sum, avg, count, etc.]
5. Output
- Format: [as specified in Step 1]
- Destination: [file, stdout, etc.]
Step 5: Implementation
Python/pandas focused. Iterate and validate.
Code Structure
import pandas as pd
from pathlib import Path
def load_data(source: str) -> pd.DataFrame:
"""Load and validate source data."""
df = pd.read_csv(source) # or appropriate loader
# Validation
assert len(df) > 0, "No data loaded"
expected_cols = ["col1", "col2", "col3"]
missing = set(expected_cols) - set(df.columns)
assert not missing, f"Missing columns: {missing}"
return df
def clean_data(df: pd.DataFrame) -> pd.DataFrame:
"""Apply cleaning transformations."""
return (df
.dropna(subset=["required_col"])
.query("status == 'active'")
.assign(date=lambda x: pd.to_datetime(x["date_str"]))
)
def analyze(df: pd.DataFrame) -> pd.DataFrame:
"""Core analysis logic."""
return (df
.groupby("category")
.agg(
total=("value", "sum"),
count=("id", "count"),
avg=("value", "mean")
)
.sort_values("total", ascending=False)
)
def main():
raw = load_data("data/source.csv")
clean = clean_data(raw)
results = analyze(clean)
# Output based on format requirement
print(results.to_markdown())
results.to_csv("output/results.csv", index=False)
if __name__ == "__main__":
main()
Iterative Development
1. Load small sample, verify shape
2. Apply one transformation, verify
3. Check intermediate results
4. Full pipeline on sample
5. Full pipeline on full data
Step 6: Results & Insights
Format for audience specified in Step 1.
Technical Report
## Analysis: [Title]
### Methodology
[Steps taken, assumptions]
### Data Quality Notes
[Issues found, how handled]
### Results
[Tables, charts]
### Statistical Validity
[Confidence, limitations]
Executive Summary
## [Title]
**Key Finding:** [One sentence]
**Recommendation:** [Action to take]
**Supporting Data:**
- Metric 1: [value]
- Metric 2: [value]
**Next Steps:**
1. [Action]
Tableau Prep
If user needs to port to Tableau:
Output: analysis_output.csv
Columns for Tableau:
- dimension_cols: [list]
- measure_cols: [list]
- date_cols: [list]
Suggested chart types:
- [metric] → bar chart by [dimension]
- [trend] → line chart over [date]
Schema Registry
Location: ~/.claude/schemas/
Structure:
schemas/
├── registry.json # Index of all schemas
└── [domain]/
└── [schema].json
Schema format:
{
"name": "schema_name",
"domain": "domain_name",
"source": "database.table",
"last_updated": "2024-01-15",
"tables": {
"table_name": {
"columns": {
"col1": {
"type": "string",
"description": "Description",
"nullable": false
}
},
"primaryKey": ["col1"],
"rowEstimate": 1000000,
"sampleQueries": ["SELECT * FROM table LIMIT 10"]
}
}
}
Expert Mode Integration
Auto-injected expertise for:
- Tesla/autodiag work: Diagnostic data, alert patterns, vehicle telemetry
- Generic data: Statistical best practices, pandas optimization
Example
/analyze "What's the trend in autodiag detection rate this quarter?"
Step 1 - End-State:
- Decision: Prioritize improvements to low-detection rules
- Action: File tickets for bottom 10 performing rules
- Format: Report with trend chart
- Audience: Technical (show methodology)
Step 2 - Data Source:
- autodiag_runs table (BigQuery)
- alerts_triggered table
Step 3 - Schema:
- Check ~/.claude/schemas/autodiag/runs.json
- Document if not present
Step 4 - Plan:
- Load 90 days of data
- Calculate detection_rate = alerts_detected / total_runs
- Group by week, rule_name
- Trend line over time
Step 5 - Implementation:
[Python code]
Step 6 - Results:
## Detection Rate Trend Q4 2024
[Chart]
Key Finding: Detection rate dropped 12% in December
Top 5 declining rules: [list]
Recommendation: Investigate rule changes in Dec release
Related Skills
Xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Analyzing Financial Statements
This skill calculates key financial ratios and metrics from financial statement data for investment analysis
Data Storytelling
Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.
Kpi Dashboard Design
Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Sql Optimization Patterns
Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
Xlsx
Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.
