Data Science
by htlin222
Data analysis, SQL queries, BigQuery operations, and data insights. Use for data analysis tasks and queries.
Skill Details
Repository Files
1 file in this skill directory
name: data-science description: Data analysis, SQL queries, BigQuery operations, and data insights. Use for data analysis tasks and queries.
Data Science
Data analysis, SQL, and insights generation.
When to Use
- Writing SQL queries
- Data analysis and exploration
- Creating visualizations
- Statistical analysis
- ETL and data pipelines
SQL Patterns
Common Queries
-- Aggregation with window functions
SELECT
user_id,
order_date,
amount,
SUM(amount) OVER (PARTITION BY user_id ORDER BY order_date) as running_total,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date DESC) as recency_rank
FROM orders;
-- CTEs for readability
WITH monthly_stats AS (
SELECT
DATE_TRUNC('month', created_at) as month,
COUNT(*) as total_orders,
SUM(amount) as revenue
FROM orders
GROUP BY 1
),
growth AS (
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) as prev_revenue,
(revenue - LAG(revenue) OVER (ORDER BY month)) / NULLIF(LAG(revenue) OVER (ORDER BY month), 0) as growth_rate
FROM monthly_stats
)
SELECT * FROM growth;
BigQuery Specifics
-- Partitioned table query
SELECT *
FROM `project.dataset.events`
WHERE DATE(_PARTITIONTIME) BETWEEN '2024-01-01' AND '2024-01-31';
-- UNNEST for arrays
SELECT
user_id,
item
FROM `project.dataset.orders`,
UNNEST(items) as item;
-- Approximate counts for large data
SELECT APPROX_COUNT_DISTINCT(user_id) as unique_users
FROM `project.dataset.events`;
Python Analysis
import pandas as pd
import numpy as np
# Load and explore
df = pd.read_csv('data.csv')
df.info()
df.describe()
# Clean and transform
df['date'] = pd.to_datetime(df['date'])
df = df.dropna(subset=['required_field'])
df['category'] = df['category'].fillna('Unknown')
# Aggregate
summary = df.groupby('category').agg({
'value': ['mean', 'sum', 'count'],
'date': ['min', 'max']
}).round(2)
# Visualize
import matplotlib.pyplot as plt
df.groupby('date')['value'].sum().plot(figsize=(12, 6))
plt.title('Daily Values')
plt.savefig('chart.png', dpi=150, bbox_inches='tight')
Statistical Analysis
from scipy import stats
# Hypothesis testing
t_stat, p_value = stats.ttest_ind(group_a, group_b)
# Correlation
correlation = df['x'].corr(df['y'])
# Regression
from sklearn.linear_model import LinearRegression
model = LinearRegression().fit(X, y)
print(f"R² = {model.score(X, y):.3f}")
Output Format
## Analysis Summary
**Question:** [What we're trying to answer]
**Data Source:** [Tables/files used]
**Date Range:** [Time period]
### Key Findings
1. [Finding with supporting metric]
2. [Finding with supporting metric]
### Visualization
[Chart description or embedded image]
### Recommendations
- [Actionable insight]
Examples
Input: "Analyze user retention" Action: Query cohort data, calculate retention rates, visualize trends
Input: "Find top customers" Action: Write SQL for RFM analysis, segment users, summarize findings
Related Skills
Xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Analyzing Financial Statements
This skill calculates key financial ratios and metrics from financial statement data for investment analysis
Data Storytelling
Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.
Kpi Dashboard Design
Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Sql Optimization Patterns
Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
Xlsx
Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.
