name: data-analyst description: Expert data analysis covering SQL, visualization, statistical analysis, business intelligence, and data storytelling. version: 1.0.0 author: Claude Skills category: data-analytics tags: [analytics, sql, visualization, statistics, reporting]

Data Analyst

Expert-level data analysis for business insights.

Core Competencies

SQL and database querying
Data visualization
Statistical analysis
Business intelligence
Data storytelling
Dashboard development
Reporting automation
Stakeholder communication

SQL Mastery

Query Patterns

Aggregation:

SELECT
    date_trunc('month', created_at) as month,
    COUNT(*) as total_orders,
    COUNT(DISTINCT customer_id) as unique_customers,
    SUM(amount) as total_revenue,
    AVG(amount) as avg_order_value
FROM orders
WHERE created_at >= '2024-01-01'
GROUP BY 1
ORDER BY 1;

Window Functions:

SELECT
    customer_id,
    order_date,
    amount,
    SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_date) as running_total,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) as order_number,
    LAG(amount) OVER (PARTITION BY customer_id ORDER BY order_date) as previous_order
FROM orders;

CTEs for Clarity:

WITH monthly_metrics AS (
    SELECT
        date_trunc('month', created_at) as month,
        SUM(amount) as revenue
    FROM orders
    GROUP BY 1
),
growth_calc AS (
    SELECT
        month,
        revenue,
        LAG(revenue) OVER (ORDER BY month) as prev_revenue
    FROM monthly_metrics
)
SELECT
    month,
    revenue,
    ROUND((revenue - prev_revenue) / prev_revenue * 100, 1) as growth_pct
FROM growth_calc;

Cohort Analysis:

WITH first_orders AS (
    SELECT
        customer_id,
        date_trunc('month', MIN(created_at)) as cohort_month
    FROM orders
    GROUP BY 1
),
cohort_data AS (
    SELECT
        f.cohort_month,
        date_trunc('month', o.created_at) as order_month,
        COUNT(DISTINCT o.customer_id) as customers
    FROM orders o
    JOIN first_orders f ON o.customer_id = f.customer_id
    GROUP BY 1, 2
)
SELECT
    cohort_month,
    order_month,
    EXTRACT(MONTH FROM AGE(order_month, cohort_month)) as months_since_cohort,
    customers
FROM cohort_data
ORDER BY 1, 2;

Query Optimization

Use EXPLAIN:

EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 123;

Best Practices:

Use indexes on filtered columns
Avoid SELECT * in production
Use LIMIT for exploratory queries
Filter early, aggregate late
Use appropriate data types

Data Visualization

Chart Selection Guide

Data Type	Best Chart	Alternative
Trend over time	Line chart	Area chart
Part of whole	Pie/Donut	Stacked bar
Comparison	Bar chart	Column chart
Distribution	Histogram	Box plot
Correlation	Scatter plot	Heatmap
Geographic	Map	Choropleth

Visualization Best Practices

Do:

Start Y-axis at zero (for bars)
Use consistent colors
Label axes clearly
Include context (benchmarks, targets)
Order categories meaningfully

Don't:

Use 3D charts
Use more than 5-7 colors
Truncate axes misleadingly
Clutter with gridlines
Use pie charts for many categories

Dashboard Layout

┌─────────────────────────────────────────────────────────────┐
│  EXECUTIVE SUMMARY                                          │
│  [KPI 1: $X]  [KPI 2: X%]  [KPI 3: X]  [KPI 4: X%]         │
├─────────────────────────────────────────────────────────────┤
│  TRENDS                          │  BREAKDOWN               │
│  [Line Chart - Primary Metric]   │  [Bar Chart - Segments]  │
│                                  │                          │
├──────────────────────────────────┼──────────────────────────┤
│  COMPARISON                      │  DETAIL TABLE            │
│  [Bar Chart - vs Target/LY]      │  [Top N with metrics]    │
│                                  │                          │
└──────────────────────────────────┴──────────────────────────┘

Statistical Analysis

Descriptive Statistics

import pandas as pd
import numpy as np

def describe_data(df, column):
    stats = {
        'count': df[column].count(),
        'mean': df[column].mean(),
        'median': df[column].median(),
        'std': df[column].std(),
        'min': df[column].min(),
        'max': df[column].max(),
        'q25': df[column].quantile(0.25),
        'q75': df[column].quantile(0.75),
        'skewness': df[column].skew(),
        'kurtosis': df[column].kurtosis()
    }
    return stats

Hypothesis Testing

from scipy import stats

# T-test: Compare two groups
def compare_groups(group_a, group_b, alpha=0.05):
    stat, p_value = stats.ttest_ind(group_a, group_b)

    result = {
        't_statistic': stat,
        'p_value': p_value,
        'significant': p_value < alpha,
        'effect_size': (group_a.mean() - group_b.mean()) / np.sqrt(
            (group_a.std()**2 + group_b.std()**2) / 2
        )
    }
    return result

# Chi-square: Test independence
def test_independence(observed, alpha=0.05):
    stat, p_value, dof, expected = stats.chi2_contingency(observed)

    return {
        'chi2': stat,
        'p_value': p_value,
        'degrees_of_freedom': dof,
        'significant': p_value < alpha
    }

Regression Analysis

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error

def simple_regression(X, y):
    model = LinearRegression()
    model.fit(X.reshape(-1, 1), y)

    predictions = model.predict(X.reshape(-1, 1))

    return {
        'coefficient': model.coef_[0],
        'intercept': model.intercept_,
        'r_squared': r2_score(y, predictions),
        'mae': mean_absolute_error(y, predictions)
    }

Business Analysis

Analysis Framework

# Analysis: [Topic]

## Business Question
[What are we trying to answer?]

## Hypothesis
[What do we expect to find?]

## Data Sources
- [Source 1]: [Description]
- [Source 2]: [Description]

## Methodology
1. [Step 1]
2. [Step 2]
3. [Step 3]

## Findings

### Finding 1: [Title]
[Description with supporting data]

### Finding 2: [Title]
[Description with supporting data]

## Recommendations
1. [Recommendation]: [Expected impact]
2. [Recommendation]: [Expected impact]

## Limitations
- [Limitation 1]
- [Limitation 2]

## Next Steps
- [Action item]

Key Business Metrics

Acquisition:

Customer Acquisition Cost (CAC)
Cost per Lead (CPL)
Conversion Rate

Engagement:

Daily/Monthly Active Users
Session Duration
Feature Adoption

Retention:

Churn Rate
Retention Rate
Net Revenue Retention

Revenue:

Monthly Recurring Revenue (MRR)
Average Revenue Per User (ARPU)
Lifetime Value (LTV)

Data Storytelling

Presentation Structure

1. CONTEXT
   - Why does this matter?
   - What question are we answering?

2. KEY FINDING
   - Lead with the insight
   - Make it memorable

3. EVIDENCE
   - Show the data
   - Use effective visuals

4. IMPLICATIONS
   - What does this mean?
   - So what?

5. RECOMMENDATIONS
   - What should we do?
   - Clear next steps

Insight Template

## [Headline: Action-oriented finding]

**What:** [One sentence description of the finding]

**So What:** [Why this matters to the business]

**Now What:** [Recommended action]

**Evidence:**
[Chart or data supporting the finding]

**Confidence:** [High/Medium/Low]

Reference Materials

references/sql_patterns.md - Advanced SQL queries
references/visualization.md - Chart selection guide
references/statistics.md - Statistical methods
references/storytelling.md - Presentation best practices

Scripts

# Data profiler
python scripts/data_profiler.py --table orders --output profile.html

# SQL query analyzer
python scripts/query_analyzer.py --query query.sql --explain

# Dashboard generator
python scripts/dashboard_gen.py --config dashboard.yaml

# Report automation
python scripts/report_gen.py --template monthly --output report.pdf

Data Analyst

Skill Details

Repository Files

name: data-analyst description: Expert data analysis covering SQL, visualization, statistical analysis, business intelligence, and data storytelling. version: 1.0.0 author: Claude Skills category: data-analytics tags: [analytics, sql, visualization, statistics, reporting]

Data Analyst

Core Competencies

SQL Mastery

Query Patterns

Query Optimization

Data Visualization

Chart Selection Guide

Visualization Best Practices

Dashboard Layout

Statistical Analysis

Descriptive Statistics

Hypothesis Testing

Regression Analysis

Business Analysis

Analysis Framework

Key Business Metrics

Data Storytelling

Presentation Structure

Insight Template

Reference Materials

Scripts

Related Skills

Xlsx

Clickhouse Io

Clickhouse Io

Analyzing Financial Statements

Data Storytelling

Kpi Dashboard Design

Dbt Transformation Patterns

Sql Optimization Patterns

Anndata

Xlsx

Skill Information