Data Analyst
by borghei
Expert data analysis covering SQL, visualization, statistical analysis, business intelligence, and data storytelling.
Skill Details
Repository Files
1 file in this skill directory
name: data-analyst description: Expert data analysis covering SQL, visualization, statistical analysis, business intelligence, and data storytelling. version: 1.0.0 author: Claude Skills category: data-analytics tags: [analytics, sql, visualization, statistics, reporting]
Data Analyst
Expert-level data analysis for business insights.
Core Competencies
- SQL and database querying
- Data visualization
- Statistical analysis
- Business intelligence
- Data storytelling
- Dashboard development
- Reporting automation
- Stakeholder communication
SQL Mastery
Query Patterns
Aggregation:
SELECT
date_trunc('month', created_at) as month,
COUNT(*) as total_orders,
COUNT(DISTINCT customer_id) as unique_customers,
SUM(amount) as total_revenue,
AVG(amount) as avg_order_value
FROM orders
WHERE created_at >= '2024-01-01'
GROUP BY 1
ORDER BY 1;
Window Functions:
SELECT
customer_id,
order_date,
amount,
SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_date) as running_total,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) as order_number,
LAG(amount) OVER (PARTITION BY customer_id ORDER BY order_date) as previous_order
FROM orders;
CTEs for Clarity:
WITH monthly_metrics AS (
SELECT
date_trunc('month', created_at) as month,
SUM(amount) as revenue
FROM orders
GROUP BY 1
),
growth_calc AS (
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) as prev_revenue
FROM monthly_metrics
)
SELECT
month,
revenue,
ROUND((revenue - prev_revenue) / prev_revenue * 100, 1) as growth_pct
FROM growth_calc;
Cohort Analysis:
WITH first_orders AS (
SELECT
customer_id,
date_trunc('month', MIN(created_at)) as cohort_month
FROM orders
GROUP BY 1
),
cohort_data AS (
SELECT
f.cohort_month,
date_trunc('month', o.created_at) as order_month,
COUNT(DISTINCT o.customer_id) as customers
FROM orders o
JOIN first_orders f ON o.customer_id = f.customer_id
GROUP BY 1, 2
)
SELECT
cohort_month,
order_month,
EXTRACT(MONTH FROM AGE(order_month, cohort_month)) as months_since_cohort,
customers
FROM cohort_data
ORDER BY 1, 2;
Query Optimization
Use EXPLAIN:
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 123;
Best Practices:
- Use indexes on filtered columns
- Avoid SELECT * in production
- Use LIMIT for exploratory queries
- Filter early, aggregate late
- Use appropriate data types
Data Visualization
Chart Selection Guide
| Data Type | Best Chart | Alternative |
|---|---|---|
| Trend over time | Line chart | Area chart |
| Part of whole | Pie/Donut | Stacked bar |
| Comparison | Bar chart | Column chart |
| Distribution | Histogram | Box plot |
| Correlation | Scatter plot | Heatmap |
| Geographic | Map | Choropleth |
Visualization Best Practices
Do:
- Start Y-axis at zero (for bars)
- Use consistent colors
- Label axes clearly
- Include context (benchmarks, targets)
- Order categories meaningfully
Don't:
- Use 3D charts
- Use more than 5-7 colors
- Truncate axes misleadingly
- Clutter with gridlines
- Use pie charts for many categories
Dashboard Layout
┌─────────────────────────────────────────────────────────────┐
│ EXECUTIVE SUMMARY │
│ [KPI 1: $X] [KPI 2: X%] [KPI 3: X] [KPI 4: X%] │
├─────────────────────────────────────────────────────────────┤
│ TRENDS │ BREAKDOWN │
│ [Line Chart - Primary Metric] │ [Bar Chart - Segments] │
│ │ │
├──────────────────────────────────┼──────────────────────────┤
│ COMPARISON │ DETAIL TABLE │
│ [Bar Chart - vs Target/LY] │ [Top N with metrics] │
│ │ │
└──────────────────────────────────┴──────────────────────────┘
Statistical Analysis
Descriptive Statistics
import pandas as pd
import numpy as np
def describe_data(df, column):
stats = {
'count': df[column].count(),
'mean': df[column].mean(),
'median': df[column].median(),
'std': df[column].std(),
'min': df[column].min(),
'max': df[column].max(),
'q25': df[column].quantile(0.25),
'q75': df[column].quantile(0.75),
'skewness': df[column].skew(),
'kurtosis': df[column].kurtosis()
}
return stats
Hypothesis Testing
from scipy import stats
# T-test: Compare two groups
def compare_groups(group_a, group_b, alpha=0.05):
stat, p_value = stats.ttest_ind(group_a, group_b)
result = {
't_statistic': stat,
'p_value': p_value,
'significant': p_value < alpha,
'effect_size': (group_a.mean() - group_b.mean()) / np.sqrt(
(group_a.std()**2 + group_b.std()**2) / 2
)
}
return result
# Chi-square: Test independence
def test_independence(observed, alpha=0.05):
stat, p_value, dof, expected = stats.chi2_contingency(observed)
return {
'chi2': stat,
'p_value': p_value,
'degrees_of_freedom': dof,
'significant': p_value < alpha
}
Regression Analysis
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error
def simple_regression(X, y):
model = LinearRegression()
model.fit(X.reshape(-1, 1), y)
predictions = model.predict(X.reshape(-1, 1))
return {
'coefficient': model.coef_[0],
'intercept': model.intercept_,
'r_squared': r2_score(y, predictions),
'mae': mean_absolute_error(y, predictions)
}
Business Analysis
Analysis Framework
# Analysis: [Topic]
## Business Question
[What are we trying to answer?]
## Hypothesis
[What do we expect to find?]
## Data Sources
- [Source 1]: [Description]
- [Source 2]: [Description]
## Methodology
1. [Step 1]
2. [Step 2]
3. [Step 3]
## Findings
### Finding 1: [Title]
[Description with supporting data]
### Finding 2: [Title]
[Description with supporting data]
## Recommendations
1. [Recommendation]: [Expected impact]
2. [Recommendation]: [Expected impact]
## Limitations
- [Limitation 1]
- [Limitation 2]
## Next Steps
- [Action item]
Key Business Metrics
Acquisition:
- Customer Acquisition Cost (CAC)
- Cost per Lead (CPL)
- Conversion Rate
Engagement:
- Daily/Monthly Active Users
- Session Duration
- Feature Adoption
Retention:
- Churn Rate
- Retention Rate
- Net Revenue Retention
Revenue:
- Monthly Recurring Revenue (MRR)
- Average Revenue Per User (ARPU)
- Lifetime Value (LTV)
Data Storytelling
Presentation Structure
1. CONTEXT
- Why does this matter?
- What question are we answering?
2. KEY FINDING
- Lead with the insight
- Make it memorable
3. EVIDENCE
- Show the data
- Use effective visuals
4. IMPLICATIONS
- What does this mean?
- So what?
5. RECOMMENDATIONS
- What should we do?
- Clear next steps
Insight Template
## [Headline: Action-oriented finding]
**What:** [One sentence description of the finding]
**So What:** [Why this matters to the business]
**Now What:** [Recommended action]
**Evidence:**
[Chart or data supporting the finding]
**Confidence:** [High/Medium/Low]
Reference Materials
references/sql_patterns.md- Advanced SQL queriesreferences/visualization.md- Chart selection guidereferences/statistics.md- Statistical methodsreferences/storytelling.md- Presentation best practices
Scripts
# Data profiler
python scripts/data_profiler.py --table orders --output profile.html
# SQL query analyzer
python scripts/query_analyzer.py --query query.sql --explain
# Dashboard generator
python scripts/dashboard_gen.py --config dashboard.yaml
# Report automation
python scripts/report_gen.py --template monthly --output report.pdf
Related Skills
Xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Analyzing Financial Statements
This skill calculates key financial ratios and metrics from financial statement data for investment analysis
Data Storytelling
Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.
Kpi Dashboard Design
Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Sql Optimization Patterns
Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
Xlsx
Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.
