Data Analyst

by borghei

data

Expert data analysis covering SQL, visualization, statistical analysis, business intelligence, and data storytelling.

Skill Details

Repository Files

1 file in this skill directory


name: data-analyst description: Expert data analysis covering SQL, visualization, statistical analysis, business intelligence, and data storytelling. version: 1.0.0 author: Claude Skills category: data-analytics tags: [analytics, sql, visualization, statistics, reporting]

Data Analyst

Expert-level data analysis for business insights.

Core Competencies

  • SQL and database querying
  • Data visualization
  • Statistical analysis
  • Business intelligence
  • Data storytelling
  • Dashboard development
  • Reporting automation
  • Stakeholder communication

SQL Mastery

Query Patterns

Aggregation:

SELECT
    date_trunc('month', created_at) as month,
    COUNT(*) as total_orders,
    COUNT(DISTINCT customer_id) as unique_customers,
    SUM(amount) as total_revenue,
    AVG(amount) as avg_order_value
FROM orders
WHERE created_at >= '2024-01-01'
GROUP BY 1
ORDER BY 1;

Window Functions:

SELECT
    customer_id,
    order_date,
    amount,
    SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_date) as running_total,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) as order_number,
    LAG(amount) OVER (PARTITION BY customer_id ORDER BY order_date) as previous_order
FROM orders;

CTEs for Clarity:

WITH monthly_metrics AS (
    SELECT
        date_trunc('month', created_at) as month,
        SUM(amount) as revenue
    FROM orders
    GROUP BY 1
),
growth_calc AS (
    SELECT
        month,
        revenue,
        LAG(revenue) OVER (ORDER BY month) as prev_revenue
    FROM monthly_metrics
)
SELECT
    month,
    revenue,
    ROUND((revenue - prev_revenue) / prev_revenue * 100, 1) as growth_pct
FROM growth_calc;

Cohort Analysis:

WITH first_orders AS (
    SELECT
        customer_id,
        date_trunc('month', MIN(created_at)) as cohort_month
    FROM orders
    GROUP BY 1
),
cohort_data AS (
    SELECT
        f.cohort_month,
        date_trunc('month', o.created_at) as order_month,
        COUNT(DISTINCT o.customer_id) as customers
    FROM orders o
    JOIN first_orders f ON o.customer_id = f.customer_id
    GROUP BY 1, 2
)
SELECT
    cohort_month,
    order_month,
    EXTRACT(MONTH FROM AGE(order_month, cohort_month)) as months_since_cohort,
    customers
FROM cohort_data
ORDER BY 1, 2;

Query Optimization

Use EXPLAIN:

EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 123;

Best Practices:

  • Use indexes on filtered columns
  • Avoid SELECT * in production
  • Use LIMIT for exploratory queries
  • Filter early, aggregate late
  • Use appropriate data types

Data Visualization

Chart Selection Guide

Data Type Best Chart Alternative
Trend over time Line chart Area chart
Part of whole Pie/Donut Stacked bar
Comparison Bar chart Column chart
Distribution Histogram Box plot
Correlation Scatter plot Heatmap
Geographic Map Choropleth

Visualization Best Practices

Do:

  • Start Y-axis at zero (for bars)
  • Use consistent colors
  • Label axes clearly
  • Include context (benchmarks, targets)
  • Order categories meaningfully

Don't:

  • Use 3D charts
  • Use more than 5-7 colors
  • Truncate axes misleadingly
  • Clutter with gridlines
  • Use pie charts for many categories

Dashboard Layout

┌─────────────────────────────────────────────────────────────┐
│  EXECUTIVE SUMMARY                                          │
│  [KPI 1: $X]  [KPI 2: X%]  [KPI 3: X]  [KPI 4: X%]         │
├─────────────────────────────────────────────────────────────┤
│  TRENDS                          │  BREAKDOWN               │
│  [Line Chart - Primary Metric]   │  [Bar Chart - Segments]  │
│                                  │                          │
├──────────────────────────────────┼──────────────────────────┤
│  COMPARISON                      │  DETAIL TABLE            │
│  [Bar Chart - vs Target/LY]      │  [Top N with metrics]    │
│                                  │                          │
└──────────────────────────────────┴──────────────────────────┘

Statistical Analysis

Descriptive Statistics

import pandas as pd
import numpy as np

def describe_data(df, column):
    stats = {
        'count': df[column].count(),
        'mean': df[column].mean(),
        'median': df[column].median(),
        'std': df[column].std(),
        'min': df[column].min(),
        'max': df[column].max(),
        'q25': df[column].quantile(0.25),
        'q75': df[column].quantile(0.75),
        'skewness': df[column].skew(),
        'kurtosis': df[column].kurtosis()
    }
    return stats

Hypothesis Testing

from scipy import stats

# T-test: Compare two groups
def compare_groups(group_a, group_b, alpha=0.05):
    stat, p_value = stats.ttest_ind(group_a, group_b)

    result = {
        't_statistic': stat,
        'p_value': p_value,
        'significant': p_value < alpha,
        'effect_size': (group_a.mean() - group_b.mean()) / np.sqrt(
            (group_a.std()**2 + group_b.std()**2) / 2
        )
    }
    return result

# Chi-square: Test independence
def test_independence(observed, alpha=0.05):
    stat, p_value, dof, expected = stats.chi2_contingency(observed)

    return {
        'chi2': stat,
        'p_value': p_value,
        'degrees_of_freedom': dof,
        'significant': p_value < alpha
    }

Regression Analysis

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error

def simple_regression(X, y):
    model = LinearRegression()
    model.fit(X.reshape(-1, 1), y)

    predictions = model.predict(X.reshape(-1, 1))

    return {
        'coefficient': model.coef_[0],
        'intercept': model.intercept_,
        'r_squared': r2_score(y, predictions),
        'mae': mean_absolute_error(y, predictions)
    }

Business Analysis

Analysis Framework

# Analysis: [Topic]

## Business Question
[What are we trying to answer?]

## Hypothesis
[What do we expect to find?]

## Data Sources
- [Source 1]: [Description]
- [Source 2]: [Description]

## Methodology
1. [Step 1]
2. [Step 2]
3. [Step 3]

## Findings

### Finding 1: [Title]
[Description with supporting data]

### Finding 2: [Title]
[Description with supporting data]

## Recommendations
1. [Recommendation]: [Expected impact]
2. [Recommendation]: [Expected impact]

## Limitations
- [Limitation 1]
- [Limitation 2]

## Next Steps
- [Action item]

Key Business Metrics

Acquisition:

  • Customer Acquisition Cost (CAC)
  • Cost per Lead (CPL)
  • Conversion Rate

Engagement:

  • Daily/Monthly Active Users
  • Session Duration
  • Feature Adoption

Retention:

  • Churn Rate
  • Retention Rate
  • Net Revenue Retention

Revenue:

  • Monthly Recurring Revenue (MRR)
  • Average Revenue Per User (ARPU)
  • Lifetime Value (LTV)

Data Storytelling

Presentation Structure

1. CONTEXT
   - Why does this matter?
   - What question are we answering?

2. KEY FINDING
   - Lead with the insight
   - Make it memorable

3. EVIDENCE
   - Show the data
   - Use effective visuals

4. IMPLICATIONS
   - What does this mean?
   - So what?

5. RECOMMENDATIONS
   - What should we do?
   - Clear next steps

Insight Template

## [Headline: Action-oriented finding]

**What:** [One sentence description of the finding]

**So What:** [Why this matters to the business]

**Now What:** [Recommended action]

**Evidence:**
[Chart or data supporting the finding]

**Confidence:** [High/Medium/Low]

Reference Materials

  • references/sql_patterns.md - Advanced SQL queries
  • references/visualization.md - Chart selection guide
  • references/statistics.md - Statistical methods
  • references/storytelling.md - Presentation best practices

Scripts

# Data profiler
python scripts/data_profiler.py --table orders --output profile.html

# SQL query analyzer
python scripts/query_analyzer.py --query query.sql --explain

# Dashboard generator
python scripts/dashboard_gen.py --config dashboard.yaml

# Report automation
python scripts/report_gen.py --template monthly --output report.pdf

Related Skills

Xlsx

Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas

data

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Analyzing Financial Statements

This skill calculates key financial ratios and metrics from financial statement data for investment analysis

data

Data Storytelling

Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.

data

Kpi Dashboard Design

Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.

designdata

Dbt Transformation Patterns

Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.

testingdocumenttool

Sql Optimization Patterns

Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.

designdata

Anndata

This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.

arttooldata

Xlsx

Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.

tooldata

Skill Information

Category:Data
Version:1.0.0
Last Updated:1/13/2026