Root Cause Investigation
by nimrodfisher
Systematic investigation of metric changes and anomalies. Use when a metric unexpectedly changes, investigating business metric drops, explaining performance variations, or drilling into aggregated metric drivers.
Skill Details
Repository Files
1 file in this skill directory
name: root-cause-investigation description: Systematic investigation of metric changes and anomalies. Use when a metric unexpectedly changes, investigating business metric drops, explaining performance variations, or drilling into aggregated metric drivers.
Root Cause Investigation
Quick Start
Systematically investigate why a metric changed, using structured drill-downs, statistical analysis, and hypothesis testing to identify root causes and provide actionable explanations.
Context Requirements
Before investigating, I need:
- The Metric: What changed and by how much
- Time Context: When the change occurred
- Historical Data: Metric values over time for comparison
- Drill-Down Dimensions: Ways to slice the metric (geography, product, channel, etc.)
- Known Events (optional): Product changes, campaigns, external factors
Context Gathering
For The Metric:
"Tell me about the metric change you're investigating:
What metric changed?
- Example: 'Daily Active Users', 'Revenue', 'Conversion Rate', 'Page Load Time'
What happened?
- Dropped from X to Y
- Increased unexpectedly
- Suddenly became volatile
- Plateaued after growing
When did you notice it?
- Specific date or date range
- Gradual change or sudden jump?
How big is the change?
- Absolute: +/- 1,000 users
- Percentage: -15%
- Is this normal variance or unusual?"
For Historical Data:
"To understand if this change is unusual, I need historical context:
Option 1 - Provide Data:
date | metric_value
2024-12-01 | 10,000
2024-12-02 | 10,200
...
2024-12-15 | 8,500 โ drop happened
Option 2 - Database Query:
SELECT date, SUM(metric) as metric_value
FROM metrics_table
WHERE date >= '2024-11-01'
GROUP BY date
ORDER BY date
Option 3 - Dashboard Export: Export your dashboard data to CSV
How much history should I analyze? Recommend at least 2-3x the investigation period."
For Drill-Down Dimensions:
"To find what's driving the change, which dimensions can we analyze?
Common Dimensions:
- Geographic: Country, region, city
- Product: SKU, category, brand
- Customer: Segment, cohort, plan tier
- Channel: Organic, paid, email, direct
- Platform: Mobile, desktop, iOS, Android
- Time: Day of week, hour, season
Which dimensions are available in your data and most likely to reveal insights?"
For Known Events:
"Were there any changes around the time of the metric change?
Product Changes:
- New feature launches
- UI/UX changes
- Bug fixes or incidents
- Algorithm updates
Marketing:
- Campaign starts/stops
- Budget changes
- New channels
External:
- Seasonality
- Competitor actions
- Market events
- Press coverage
Any of these apply?"
For Baseline Expectations:
"What's 'normal' for this metric?
- Historical average
- Expected growth rate
- Acceptable variance range
- Seasonal patterns
This helps determine if the change is truly anomalous."
Workflow
Step 1: Validate the Change
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from datetime import datetime, timedelta
# Load historical data
df = pd.read_csv('metric_data.csv')
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values('date')
print(f"๐ Data Loaded:")
print(f" Date Range: {df['date'].min()} to {df['date'].max()}")
print(f" Total Days: {len(df)}")
# Define investigation period
investigation_date = '2024-12-15'
baseline_days = 30 # Compare to previous 30 days
investigation_value = df[df['date'] == investigation_date]['metric_value'].values[0]
baseline = df[df['date'] < investigation_date].tail(baseline_days)
baseline_mean = baseline['metric_value'].mean()
baseline_std = baseline['metric_value'].std()
# Calculate change
absolute_change = investigation_value - baseline_mean
percent_change = (absolute_change / baseline_mean) * 100
z_score = (investigation_value - baseline_mean) / baseline_std
print(f"\n๐ Change Detection:")
print(f" Investigation Date: {investigation_date}")
print(f" Current Value: {investigation_value:,.0f}")
print(f" Baseline (30d avg): {baseline_mean:,.0f}")
print(f" Change: {absolute_change:+,.0f} ({percent_change:+.1f}%)")
print(f" Z-Score: {z_score:.2f}")
if abs(z_score) > 2:
print(f" โ ๏ธ SIGNIFICANT: Change is {abs(z_score):.1f} std deviations from normal")
else:
print(f" โน๏ธ MINOR: Change is within normal variance")
Checkpoint: "Confirmed the change is real and significant. Ready to investigate root cause?"
Step 2: Visualize the Trend
def plot_metric_trend(df, investigation_date, metric_col='metric_value'):
"""Visualize metric over time with investigation date marked"""
fig, ax = plt.subplots(figsize=(14, 6))
# Plot full trend
ax.plot(df['date'], df[metric_col], marker='o', linewidth=2, markersize=4)
# Mark investigation date
inv_date = pd.to_datetime(investigation_date)
ax.axvline(inv_date, color='red', linestyle='--', linewidth=2, label='Investigation Date')
# Add baseline mean line
baseline = df[df['date'] < inv_date].tail(30)
ax.axhline(baseline[metric_col].mean(), color='blue', linestyle=':',
alpha=0.5, label='30-Day Baseline')
# Highlight unusual period
unusual = df[df['date'] >= inv_date]
ax.scatter(unusual['date'], unusual[metric_col], color='red', s=100, zorder=5)
ax.set_xlabel('Date')
ax.set_ylabel('Metric Value')
ax.set_title(f'Metric Trend Analysis - Investigation Date: {investigation_date}')
ax.legend()
ax.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('metric_trend.png', dpi=300, bbox_inches='tight')
plt.show()
plot_metric_trend(df, investigation_date)
Step 3: Systematic Drill-Down
def drill_down_analysis(detailed_df, metric_col, dimension, investigation_date):
"""
Compare metric by dimension before vs after investigation date
"""
# Split data into before and after
before = detailed_df[detailed_df['date'] < investigation_date]
after = detailed_df[detailed_df['date'] >= investigation_date]
# Aggregate by dimension
before_agg = before.groupby(dimension)[metric_col].mean()
after_agg = after.groupby(dimension)[metric_col].mean()
# Calculate change
comparison = pd.DataFrame({
'before': before_agg,
'after': after_agg
})
comparison['change'] = comparison['after'] - comparison['before']
comparison['pct_change'] = (comparison['change'] / comparison['before']) * 100
comparison['contribution'] = comparison['change'].abs()
# Sort by contribution to find biggest drivers
comparison = comparison.sort_values('contribution', ascending=False)
return comparison
# Load detailed data (with dimensions)
detailed_df = pd.read_csv('detailed_metric_data.csv')
detailed_df['date'] = pd.to_datetime(detailed_df['date'])
# Drill down by each dimension
dimensions = ['country', 'product_category', 'channel', 'device_type']
print("\n๐ฌ Systematic Drill-Down Analysis:")
for dim in dimensions:
if dim not in detailed_df.columns:
continue
print(f"\n{'='*60}")
print(f"Dimension: {dim.upper()}")
print('='*60)
analysis = drill_down_analysis(detailed_df, 'metric_value', dim, investigation_date)
# Show top contributors to change
top_5 = analysis.head(5)
print(f"\nTop Contributors to Change:")
for idx, row in top_5.iterrows():
print(f"\n {idx}:")
print(f" Before: {row['before']:,.0f}")
print(f" After: {row['after']:,.0f}")
print(f" Change: {row['change']:+,.0f} ({row['pct_change']:+.1f}%)")
Step 4: Hypothesis Testing
def test_hypotheses(detailed_df, investigation_date):
"""
Test common hypotheses for metric changes
"""
before = detailed_df[detailed_df['date'] < investigation_date]
after = detailed_df[detailed_df['date'] >= investigation_date]
hypotheses = []
# Hypothesis 1: Change concentrated in specific segment
for dim in ['country', 'channel', 'device_type']:
if dim not in detailed_df.columns:
continue
before_dist = before.groupby(dim)['metric_value'].sum() / before['metric_value'].sum()
after_dist = after.groupby(dim)['metric_value'].sum() / after['metric_value'].sum()
# Check if distribution shifted significantly
max_shift = (after_dist - before_dist).abs().max()
if max_shift > 0.05: # >5% shift in any segment
shifted_segment = (after_dist - before_dist).abs().idxmax()
hypotheses.append({
'hypothesis': f'Mix shift in {dim}',
'evidence': f'{shifted_segment} changed by {max_shift*100:.1f}%',
'strength': 'Strong' if max_shift > 0.10 else 'Moderate'
})
# Hypothesis 2: Volume drop (fewer total events)
before_count = len(before)
after_count = len(after)
count_change = (after_count - before_count) / before_count
if abs(count_change) > 0.10:
hypotheses.append({
'hypothesis': 'Volume change',
'evidence': f'Event count {count_change:+.1%}',
'strength': 'Strong' if abs(count_change) > 0.20 else 'Moderate'
})
# Hypothesis 3: Quality drop (per-unit metric changed)
if 'user_count' in before.columns:
before_per_user = before['metric_value'].sum() / before['user_count'].sum()
after_per_user = after['metric_value'].sum() / after['user_count'].sum()
per_user_change = (after_per_user - before_per_user) / before_per_user
if abs(per_user_change) > 0.05:
hypotheses.append({
'hypothesis': 'Per-user metric changed',
'evidence': f'Metric per user {per_user_change:+.1%}',
'strength': 'Strong' if abs(per_user_change) > 0.15 else 'Moderate'
})
return hypotheses
hypotheses = test_hypotheses(detailed_df, investigation_date)
print(f"\n๐งช Hypothesis Testing Results:")
for i, hyp in enumerate(hypotheses, 1):
print(f"\n {i}. {hyp['hypothesis']} [{hyp['strength']}]")
print(f" Evidence: {hyp['evidence']}")
Step 5: Correlation Analysis
def find_correlations(df, metric_col, investigation_date):
"""
Find what else changed around the same time
"""
# Look for other metrics that changed
numeric_cols = df.select_dtypes(include=[np.number]).columns
correlations = []
for col in numeric_cols:
if col == metric_col:
continue
# Calculate change in this metric
before = df[df['date'] < investigation_date][col].mean()
after = df[df['date'] >= investigation_date][col].mean()
change = (after - before) / before if before != 0 else 0
if abs(change) > 0.05: # Changed by more than 5%
# Calculate correlation with target metric
corr = df[metric_col].corr(df[col])
correlations.append({
'metric': col,
'change': change,
'correlation': corr
})
# Sort by absolute correlation
correlations = sorted(correlations, key=lambda x: abs(x['correlation']), reverse=True)
return correlations[:5] # Top 5
correlated_metrics = find_correlations(detailed_df, 'metric_value', investigation_date)
print(f"\n๐ Correlated Metrics That Also Changed:")
for metric in correlated_metrics:
print(f"\n {metric['metric']}:")
print(f" Changed: {metric['change']:+.1%}")
print(f" Correlation: {metric['correlation']:.2f}")
Step 6: Identify Root Cause
def synthesize_root_cause(drill_down_results, hypotheses, correlations):
"""
Synthesize findings into root cause explanation
"""
print(f"\n{'='*60}")
print("๐ฏ ROOT CAUSE ANALYSIS")
print('='*60)
# Find the dimension with biggest contribution
biggest_contributor = None
biggest_impact = 0
for dim, results in drill_down_results.items():
top_impact = results['contribution'].iloc[0]
if top_impact > biggest_impact:
biggest_impact = top_impact
biggest_contributor = (dim, results.index[0], results.iloc[0])
if biggest_contributor:
dim, segment, data = biggest_contributor
print(f"\n PRIMARY DRIVER:")
print(f" Dimension: {dim}")
print(f" Segment: {segment}")
print(f" Impact: {data['change']:+,.0f} ({data['pct_change']:+.1f}%)")
print(f" This explains {biggest_impact / abs(absolute_change) * 100:.0f}% of total change")
# Summarize hypotheses
strong_hypotheses = [h for h in hypotheses if h['strength'] == 'Strong']
if strong_hypotheses:
print(f"\n SUPPORTING EVIDENCE:")
for hyp in strong_hypotheses:
print(f" โ {hyp['hypothesis']}: {hyp['evidence']}")
# Highlight correlations
if correlations:
print(f"\n RELATED CHANGES:")
for metric in correlations[:3]:
print(f" โข {metric['metric']}: {metric['change']:+.1%}")
return {
'primary_driver': biggest_contributor,
'supporting_hypotheses': strong_hypotheses,
'correlations': correlations
}
# Collect drill-down results
drill_down_results = {}
for dim in dimensions:
if dim in detailed_df.columns:
drill_down_results[dim] = drill_down_analysis(
detailed_df, 'metric_value', dim, investigation_date
)
root_cause = synthesize_root_cause(drill_down_results, hypotheses, correlated_metrics)
Step 7: Generate Investigation Report
def generate_investigation_report(
metric_name,
investigation_date,
baseline_mean,
investigation_value,
absolute_change,
percent_change,
root_cause,
known_events=None
):
"""Create comprehensive investigation report"""
report = []
report.append("=" * 70)
report.append(f"ROOT CAUSE INVESTIGATION: {metric_name}")
report.append("=" * 70)
report.append(f"\nDate: {investigation_date}")
report.append(f"Analyst: [Your Name]")
report.append(f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M')}")
report.append(f"\n{'='*70}")
report.append("WHAT HAPPENED")
report.append("=" * 70)
report.append(f"\n{metric_name} changed significantly on {investigation_date}:")
report.append(f" โข Previous 30-day average: {baseline_mean:,.0f}")
report.append(f" โข Value on {investigation_date}: {investigation_value:,.0f}")
report.append(f" โข Change: {absolute_change:+,.0f} ({percent_change:+.1f}%)")
if root_cause['primary_driver']:
dim, segment, data = root_cause['primary_driver']
report.append(f"\n{'='*70}")
report.append("ROOT CAUSE")
report.append("=" * 70)
report.append(f"\nPrimary Driver: {segment} ({dim})")
report.append(f" โข Before: {data['before']:,.0f}")
report.append(f" โข After: {data['after']:,.0f}")
report.append(f" โข Impact: {data['change']:+,.0f} ({data['pct_change']:+.1f}%)")
if root_cause['supporting_hypotheses']:
report.append(f"\nSupporting Evidence:")
for hyp in root_cause['supporting_hypotheses']:
report.append(f" โข {hyp['hypothesis']}: {hyp['evidence']}")
if known_events:
report.append(f"\n{'='*70}")
report.append("KNOWN EVENTS")
report.append("=" * 70)
for event in known_events:
report.append(f" โข {event['date']}: {event['description']}")
report.append(f"\n{'='*70}")
report.append("RECOMMENDATIONS")
report.append("=" * 70)
report.append("\n1. [Specific action based on root cause]")
report.append("2. [Monitoring recommendation]")
report.append("3. [Prevention strategy]")
report.append(f"\n{'='*70}")
report.append("APPENDIX")
report.append("=" * 70)
report.append("\nFiles Generated:")
report.append(" โข metric_trend.png - Visual timeline")
report.append(" โข drill_down_analysis.csv - Detailed breakdowns")
report.append(" โข investigation_data.csv - Raw data used")
return "\n".join(report)
report_text = generate_investigation_report(
metric_name="Daily Active Users",
investigation_date=investigation_date,
baseline_mean=baseline_mean,
investigation_value=investigation_value,
absolute_change=absolute_change,
percent_change=percent_change,
root_cause=root_cause,
known_events=[
{'date': '2024-12-14', 'description': 'New app version released'},
{'date': '2024-12-15', 'description': 'Bug reported in Android app'}
]
)
print(report_text)
# Save report
with open('root_cause_investigation_report.txt', 'w') as f:
f.write(report_text)
Context Validation
Before proceeding, verify:
- Have sufficient historical data (at least 2x investigation period)
- Metric definition is consistent across time periods
- Drill-down dimensions are available and reliable
- Understand business context and recent changes
- Can access granular data if needed for deep dives
Output Template
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
ROOT CAUSE INVESTIGATION
Metric: Daily Active Users (DAU)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐
Investigation Date: December 15, 2024
๐ Change Detected: -15% drop (25,000 โ 21,250)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ฏ ROOT CAUSE IDENTIFIED
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
PRIMARY DRIVER: Android Mobile App (Platform)
Before: 12,000 users
After: 8,000 users
Impact: -4,000 users (-33%)
This explains 107% of total drop (other segments up slightly)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ DRILL-DOWN ANALYSIS
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
By Platform:
โ
Desktop: +500 (+5%)
โ
iOS: +200 (+3%)
๐ด Android: -4,000 (-33%) โ Problem
โ
Web Mobile: +100 (+2%)
By Geography:
โ
US: +100
โ
UK: +50
โ
All regions stable (no geographic concentration)
By User Cohort:
โ
New users: Stable
๐ด Returning users: -4,200 (driven by Android)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐งช HYPOTHESIS TESTING
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
CONFIRMED: Platform-specific issue
Evidence: Only Android affected
โ
CONFIRMED: Affects existing users
Evidence: New user acquisition stable
โ REJECTED: Marketing/Channel issue
Evidence: All channels show Android drop
โ REJECTED: Geographic issue
Evidence: Uniform across all regions
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ CORRELATED CHANGES
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โข Android crash rate: +450% (correlation: -0.85)
โข App store rating: -0.5 stars (correlation: -0.62)
โข Support tickets: +120% (correlation: -0.71)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ TIMELINE CORRELATION
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Dec 14, 10:00 PM: Android app v3.2.0 released
Dec 15, 12:00 AM: First crash reports
Dec 15, 8:00 AM: DAU drop visible
Dec 15, 10:00 AM: Bug ticket filed (#4523)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ก ROOT CAUSE CONCLUSION
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
The 15% DAU drop was caused by Android app version 3.2.0
crash on launch, affecting existing users. Bug prevents
app from loading, causing immediate 33% drop in Android
DAU. Other platforms unaffected.
Confidence: 95%
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ฏ RECOMMENDATIONS
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
IMMEDIATE (Today):
1. Roll back Android app to v3.1.9
2. Halt v3.2.0 rollout
3. Notify users via email about temporary fix
SHORT-TERM (This Week):
4. Root cause bug in v3.2.0 code
5. Add crash detection to pre-release testing
6. Implement gradual rollout (10% โ 50% โ 100%)
LONG-TERM (This Month):
7. Set up automated alerting for platform-specific drops
8. Improve QA process for mobile releases
9. Create rollback playbook
EXPECTED RECOVERY:
- Rollback deployed: +3,000 users within 24h
- Full recovery: 3-5 days as users update
- Estimated user loss: 5% churn from bad experience
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ SUPPORTING FILES
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ metric_trend.png (timeline visualization)
โ drill_down_analysis.csv (full breakdown)
โ hypothesis_testing.csv (test results)
โ crash_logs.txt (Android error logs)
Common Scenarios
Scenario 1: "Conversion rate suddenly dropped"
โ Validate drop is real vs noise โ Drill down by funnel step to find which step broke โ Analyze by traffic source, device, geography โ Check for recent product/site changes โ Identify technical issue or user experience problem
Scenario 2: "Revenue is down but we don't know why"
โ Decompose revenue = volume ร price ร mix โ Determine if it's fewer customers, lower spending, or different product mix โ Drill down by customer segment and product category โ Compare to seasonal patterns โ Identify specific segment driving the decline
Scenario 3: "Metric improved but team doesn't trust it"
โ Validate improvement is real (not data bug) โ Determine what actually changed โ Connect to known initiatives or changes โ Rule out technical explanations (logging bug, etc.) โ Provide evidence-based explanation
Scenario 4: "Weekly metric is volatile, hard to understand"
โ Separate signal from noise with statistics โ Identify if volatility is new or normal โ Find patterns (day of week effects, etc.) โ Recommend smoothing or better baseline โ Set up proper thresholds for alerting
Scenario 5: "Metric plateaued after growing steadily"
โ Identify when growth rate changed โ Check if it's saturation (hitting ceiling) โ Analyze if new user growth slowed or retention dropped โ Compare to market benchmarks โ Provide strategic recommendations
Handling Missing Context
User reports "metric dropped" without specifics: "Let me help investigate. I need:
- What's the specific metric?
- What time period are you comparing?
- Do you have historical data I can analyze?
- What dimensions can we drill down by?"
User doesn't know what dimensions to analyze: "Let me examine the data to see what's available. Common patterns:
- Geographic (country, region)
- Platform (mobile/desktop, iOS/Android)
- Source (organic, paid, channels)
- Customer (new vs returning, segments)
Which of these are in your data?"
User suspects a cause but isn't sure: "Great hypothesis! Let's test it systematically:
- I'll compare the suspected segment to others
- Check if timing aligns
- Look for counter-evidence
- Quantify the impact
What's your hypothesis?"
Incomplete historical data: "I can work with what you have. If data is limited, I'll:
- Use shorter comparison periods
- Focus on largest changes
- Note limitations in conclusions
- Recommend data to collect going forward"
Advanced Options
After basic investigation, offer:
Statistical Significance Testing: "Want me to calculate if this change is statistically significant vs random variance?"
Decomposition Analysis: "I can break down the metric mathematically (e.g., revenue = users ร avg order value ร frequency)"
Synthetic Control: "I can create a 'what if nothing changed' baseline using historical patterns"
Anomaly Detection: "I can build a model to automatically detect future anomalies in this metric"
Impact Quantification: "Want me to estimate the business impact in $ or user terms?"
Prevention Playbook: "I can create monitoring alerts and investigation runbooks to catch this faster next time"
Related Skills
Attack Tree Construction
Build comprehensive attack trees to visualize threat paths. Use when mapping attack scenarios, identifying defense gaps, or communicating security risks to stakeholders.
Grafana Dashboards
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.
Matplotlib
Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.
Scientific Visualization
Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.
Seaborn
Statistical visualization. Scatter, box, violin, heatmaps, pair plots, regression, correlation matrices, KDE, faceted plots, for exploratory analysis and publication figures.
Shap
Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Query Writing
For writing and executing SQL queries - from simple single-table queries to complex multi-table JOINs and aggregations
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Scientific Visualization
Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.
