Experiment Design
by AmnadTaowsoam
Comprehensive guide to A/B testing, multivariate testing, statistical significance, and experiment analysis for data-driven product decisions
Skill Details
Repository Files
1 file in this skill directory
name: Experiment Design description: Comprehensive guide to A/B testing, multivariate testing, statistical significance, and experiment analysis for data-driven product decisions
Experiment Design
Types of Experiments
1. A/B Test (Two Variants)
What: Compare two versions (A vs B)
Example:
- Control (A): Blue "Buy Now" button
- Treatment (B): Green "Buy Now" button
When to Use:
- Testing single change
- Clear hypothesis
- Binary decision (ship or don't ship)
Pros:
- Simple to implement
- Easy to analyze
- Clear winner
Cons:
- Only tests one change
- Can't test interactions
2. Multivariate Test (Multiple Changes)
What: Test multiple changes simultaneously
Example:
- Variable 1: Button color (Blue, Green, Red)
- Variable 2: Button text ("Buy Now", "Add to Cart", "Get Started")
- Variants: 3 × 3 = 9 combinations
When to Use:
- Testing multiple elements
- Want to find best combination
- Have enough traffic
Pros:
- Test interactions between variables
- Find optimal combination
Cons:
- Requires much more traffic
- Complex analysis
- Longer test duration
3. Sequential Testing
What: Continuously monitor and stop early if clear winner
Example:
- Start A/B test
- Check results daily
- Stop when statistical significance reached (could be day 3 or day 14)
When to Use:
- Want to ship winners fast
- High traffic
- Using tools that support it (Statsig, GrowthBook)
Pros:
- Faster results
- Less opportunity cost
Cons:
- Requires special statistical methods
- Can't "peek" with traditional A/B tests
4. Holdout Groups (Long-Term Effects)
What: Keep small % of users on old experience permanently
Example:
- 95% of users: New feature
- 5% of users: Old experience (holdout)
When to Use:
- Measure long-term effects
- Detect delayed negative impacts
- Validate cumulative changes
Pros:
- Detects long-term issues
- Measures true impact
Cons:
- Some users get worse experience
- Requires ongoing monitoring
When to Experiment
✅ Experiment When:
-
Significant Features (High Impact)
- Major redesign
- New pricing model
- Core flow changes
-
Uncertain Outcomes
- Don't know if it will work
- Conflicting opinions
- No clear data
-
Multiple Solution Options
- Two different approaches
- Want to pick the best
-
Optimization Opportunities
- Incremental improvements
- Conversion optimization
- Engagement optimization
❌ Don't Experiment When:
-
Obvious Bugs/Fixes
- Broken functionality
- Security issues
- Legal compliance
-
Very Low Traffic
- Can't reach statistical significance
- Would take months
-
Trivial Changes
- Copy typo fix
- Minor styling adjustment
-
Ethical Issues
- Manipulative dark patterns
- Harmful to users
Experiment Design Process
Step 1: Define Hypothesis
Template:
"If we [change], then [metric] will [improve by X%], because [reasoning]."
Example:
"If we change the CTA button from blue to green, then click-through rate will increase by 10%, because green is more attention-grabbing."
Step 2: Choose Metrics
Primary Metric: What you're optimizing
- Example: Click-through rate
Secondary Metrics: Other important outcomes
- Example: Conversion rate, revenue per user
Counter Metrics: Watch for negatives
- Example: Bounce rate, time on page
Step 3: Determine Sample Size
Inputs:
- Baseline conversion rate: 5%
- Expected improvement: 10% relative lift (5% → 5.5%)
- Significance level: 0.05 (95% confidence)
- Power: 0.80 (80% chance of detecting effect)
Output:
- Sample size needed: ~31,000 users per variant
Tools:
- Evan Miller's calculator: https://www.evanmiller.org/ab-testing/sample-size.html
- Optimizely sample size calculator
Step 4: Set Test Duration
Factors:
- Sample size needed
- Daily traffic
- Weekly patterns (run at least 1-2 weeks)
- Business cycles
Example:
- Sample size: 31,000 per variant (62,000 total)
- Daily traffic: 5,000
- Duration: 62,000 / 5,000 = 12.4 days → Run for 2 weeks
Step 5: Design Variants
Control (A): Current experience Treatment (B): New experience
Best Practices:
- Change only one thing (for A/B test)
- Make change meaningful (not trivial)
- Ensure variants are distinct
Step 6: Launch Test
Checklist:
- Hypothesis documented
- Metrics instrumented
- Sample size calculated
- Randomization working
- QA tested both variants
- Monitoring dashboard ready
Step 7: Analyze Results
Check:
- Statistical significance (p < 0.05)
- Practical significance (is improvement meaningful?)
- Secondary metrics (any red flags?)
- Segment analysis (works for everyone?)
Step 8: Decide (Ship, Iterate, Kill)
Ship if:
- Positive, significant, no red flags
Iterate if:
- Mixed results, some segments good
Kill if:
- Negative, not significant, opportunity cost too high
Choosing Metrics
Primary Metric (What We're Optimizing)
Characteristics:
- Directly tied to hypothesis
- Sensitive to change
- Measurable in test duration
Examples:
- Click-through rate (CTR)
- Conversion rate
- Sign-up completion rate
- Time to first action
Bad Primary Metrics:
- Revenue (too noisy, delayed)
- Retention (takes too long to measure)
- NPS (survey-based, low sample)
Secondary Metrics (Guardrails, Side Effects)
Purpose: Ensure we're not breaking other things
Examples:
- Revenue per user
- Engagement (sessions per user)
- Feature adoption
- Customer satisfaction
Counter Metrics (Watch for Negatives)
Purpose: Detect unintended negative consequences
Examples:
- Bounce rate (users leaving immediately)
- Error rate (technical issues)
- Support tickets (confusion)
- Churn rate (users leaving)
Example: Checkout Flow Test
Hypothesis:
"If we reduce checkout from 5 steps to 3 steps, conversion will increase by 15%."
Metrics:
- Primary: Checkout conversion rate
- Secondary: Average order value, time to complete checkout
- Counter: Cart abandonment rate, error rate, support tickets
Statistical Significance
P-Value < 0.05 (95% Confidence)
What it Means:
- Less than 5% chance result is due to random chance
- 95% confident the effect is real
Example:
- Control: 5.0% conversion
- Treatment: 5.5% conversion
- P-value: 0.03 ✅ (< 0.05, statistically significant)
Interpretation:
"We're 95% confident that the treatment is better than control."
Statistical Power (80%+)
What it Means:
- 80% chance of detecting an effect if it exists
- Reduces false negatives
Example:
- Power: 80%
- Means: 20% chance of missing a real effect
Minimum Detectable Effect (MDE)
What it Means:
- Smallest effect size you can reliably detect
- Depends on sample size
Example:
- Baseline: 5% conversion
- Sample size: 10,000 per variant
- MDE: 0.5% absolute (10% relative)
- Can detect: 5.0% → 5.5% or larger
Trade-off:
- Larger sample size → Smaller MDE (detect smaller effects)
- Smaller sample size → Larger MDE (only detect big effects)
Sample Size Calculation
Formula (Simplified)
n = (Z_α/2 + Z_β)² × (p₁(1-p₁) + p₂(1-p₂)) / (p₁ - p₂)²
Where:
- n = sample size per variant
- Z_α/2 = 1.96 (for 95% confidence)
- Z_β = 0.84 (for 80% power)
- p₁ = baseline conversion rate
- p₂ = expected conversion rate
Example Calculation
Inputs:
- Baseline conversion rate (p₁): 5% = 0.05
- Expected improvement: 10% relative lift
- New conversion rate (p₂): 5.5% = 0.055
- Significance level (α): 0.05
- Power (1-β): 0.80
Calculation:
n = (1.96 + 0.84)² × (0.05×0.95 + 0.055×0.945) / (0.05 - 0.055)²
n = 7.84 × (0.0475 + 0.052) / 0.000025
n = 7.84 × 0.0995 / 0.000025
n ≈ 31,200 per variant
Total sample size: 62,400 users
Using Online Calculators
Evan Miller's Calculator:
- Go to https://www.evanmiller.org/ab-testing/sample-size.html
- Enter baseline conversion rate: 5%
- Enter minimum detectable effect: 10% (relative)
- Get sample size: ~31,000 per variant
Optimizely Calculator:
- Go to Optimizely sample size calculator
- Enter baseline: 5%
- Enter minimum detectable effect: 0.5% (absolute)
- Get sample size: ~31,000 per variant
Test Duration
Minimum Duration: 1-2 Weeks
Why:
- Capture weekly patterns (weekday vs weekend)
- Avoid day-of-week bias
- Account for user behavior cycles
Example:
- Don't run Monday-Wednesday only
- Run at least Monday-Sunday (1 full week)
Full Business Cycles
Examples:
- E-commerce: Include payday (1st and 15th of month)
- B2B SaaS: Include full week (avoid Friday-only)
- Seasonal: Avoid holidays (unless testing holiday-specific)
Enough Data for Significance
Formula:
Duration = Sample Size Needed / Daily Traffic
Example:
- Sample size: 62,000 total
- Daily traffic: 5,000
- Duration: 62,000 / 5,000 = 12.4 days
- Run for: 2 weeks (14 days)
Not Too Long (Opportunity Cost)
Trade-off:
- Longer test = More confidence
- Longer test = Delayed learnings, slower iteration
Guideline:
- Most tests: 1-4 weeks
- High-traffic sites: 1-2 weeks
- Low-traffic sites: 2-4 weeks
- Don't run > 1 month (diminishing returns)
Experiment Variants
Control (Current Experience)
What: The existing experience
Example:
- Current checkout flow (5 steps)
- Current button color (blue)
- Current pricing page
Purpose: Baseline for comparison
Treatment (New Experience)
What: The proposed change
Example:
- New checkout flow (3 steps)
- New button color (green)
- New pricing page
Purpose: Test hypothesis
Multiple Treatments (If Testing Different Approaches)
Example:
- Control: 5-step checkout
- Treatment A: 3-step checkout (combine steps)
- Treatment B: 1-page checkout (all on one page)
Traffic Split:
- Control: 33%
- Treatment A: 33%
- Treatment B: 34%
Analysis:
- Compare each treatment to control
- Compare treatments to each other
Randomization
User-Level Randomization (Consistent Experience)
What: Each user always sees same variant
How:
const variant = hashUserId(userId) % 2 === 0 ? 'control' : 'treatment';
When to Use:
- Logged-in users
- Want consistent experience
- Testing flows (multi-step)
Pros:
- Consistent experience
- No confusion
Cons:
- Requires user ID
Session-Level (For Anonymous Users)
What: Each session sees same variant (but different sessions can differ)
How:
const variant = hashSessionId(sessionId) % 2 === 0 ? 'control' : 'treatment';
When to Use:
- Anonymous users
- Single-page tests
Pros:
- Works for anonymous users
Cons:
- Same user can see different variants across sessions
Stratified Sampling (For Segments)
What: Ensure even distribution across segments
Example:
- Segment 1: Free users (50% control, 50% treatment)
- Segment 2: Paid users (50% control, 50% treatment)
Why:
- Avoid imbalanced segments
- Enable segment analysis
Common Pitfalls
1. Peeking (Stopping Test Early When "Winning")
Problem:
Day 3: Treatment is winning! (p = 0.04) → Ship it!
Day 7: Treatment is losing... (p = 0.12) → Oops.
Why It's Bad:
- Increases false positive rate
- P-value fluctuates during test
Solution:
- Decide sample size upfront
- Don't look until test completes
- Or use sequential testing (proper method)
2. Sample Ratio Mismatch (Uneven Splits)
Problem:
Expected: 50% control, 50% treatment
Actual: 48% control, 52% treatment
Why It's Bad:
- Indicates randomization bug
- Results may be invalid
Solution:
- Check sample ratio before analyzing
- Investigate if mismatch > 1%
3. Novelty Effect (Users Trying New Thing)
Problem:
Week 1: Treatment is winning! (+20%)
Week 4: Treatment is same as control (0%)
Why It's Bad:
- Users try new thing out of curiosity
- Effect fades over time
Solution:
- Run test longer (2-4 weeks)
- Use holdout group for long-term measurement
- Segment by new vs returning users
4. Seasonality (Testing During Holidays)
Problem:
Test during Black Friday: +50% conversion
Test during normal week: +5% conversion
Why It's Bad:
- Holiday behavior is different
- Results don't generalize
Solution:
- Avoid testing during holidays
- Or run test across multiple weeks (include holiday + normal)
Sequential Testing
What is Sequential Testing?
Traditional A/B Test:
- Decide sample size upfront
- Run until sample size reached
- Analyze once at end
Sequential Testing:
- Monitor continuously
- Stop early if clear winner
- Adjust significance threshold
How It Works
Algorithm:
- Use adjusted significance threshold (not 0.05)
- Account for multiple looks
- Stop when threshold crossed
Example (Simplified):
Day 1: p = 0.10 → Continue
Day 3: p = 0.03 → Continue
Day 5: p = 0.001 → Stop! (clear winner)
Tools That Support Sequential Testing
- Statsig: Built-in sequential testing
- GrowthBook: Bayesian statistics
- Optimizely: Stats Engine (sequential)
Benefits
- Faster results (stop early if clear winner)
- Less opportunity cost
- Detect large effects quickly
Drawbacks
- Requires special tools
- Can't use traditional p-value
- More complex
Holdout Groups
What is a Holdout Group?
Definition: Small % of users kept on old experience permanently
Example:
- 95% of users: New feature
- 5% of users: Old experience (holdout)
Why Use Holdout Groups?
Measure Long-Term Effects:
- A/B test shows +10% conversion in 2 weeks
- Holdout shows +5% conversion after 6 months
- Learning: Effect diminishes over time
Detect Delayed Negative Impacts:
- A/B test shows +15% signups
- Holdout shows +10% churn after 3 months
- Learning: Feature attracts wrong users
How Long to Keep Holdout?
Guideline:
- 1-3 months for most features
- 6-12 months for major changes
- Permanent for critical features
When to Remove Holdout?
Remove if:
- No long-term differences detected
- Opportunity cost too high (5% of users on worse experience)
- Feature is critical (everyone should have it)
Experiment Analysis
Step 1: Compare Primary Metric
Example:
- Control: 5.0% conversion
- Treatment: 5.5% conversion
- Lift: +10% relative
- P-value: 0.03 ✅
Decision: Treatment is statistically significantly better.
Step 2: Check Secondary Metrics
Example:
- Revenue per user: $10.50 (control) vs $11.20 (treatment) ✅
- Time to checkout: 3.2 min (control) vs 2.8 min (treatment) ✅
Decision: Secondary metrics also improved.
Step 3: Check Counter Metrics
Example:
- Bounce rate: 30% (control) vs 32% (treatment) ⚠️
- Error rate: 0.5% (control) vs 0.5% (treatment) ✅
Decision: Slight increase in bounce rate, investigate.
Step 4: Segment Analysis
Did it work for everyone?
| Segment | Control | Treatment | Lift |
|---|---|---|---|
| Mobile | 4.5% | 5.2% | +15% ✅ |
| Desktop | 5.5% | 5.8% | +5% ✅ |
| Free users | 3.0% | 3.6% | +20% ✅ |
| Paid users | 7.0% | 7.1% | +1% ⚠️ |
Learning: Works great for mobile and free users, minimal impact on paid users.
Step 5: Statistical Significance
Check:
- P-value < 0.05 ✅
- Confidence interval doesn't include 0 ✅
Example:
- Lift: +10%
- 95% CI: [+5%, +15%]
- Interpretation: We're 95% confident the true lift is between 5% and 15%.
Step 6: Practical Significance
Is the improvement meaningful?
Example:
- Statistically significant: Yes (p = 0.04)
- Lift: +0.1% (5.0% → 5.005%)
- Decision: Not practically significant (too small to matter)
Guideline:
- Small lift but high volume → Ship (e.g., +0.1% on 1M users = 1,000 more conversions)
- Large lift but low volume → Maybe ship (e.g., +50% on 100 users = 50 more conversions)
Decision Framework
Ship If:
✅ Positive: Treatment is better than control ✅ Significant: P-value < 0.05 ✅ No Red Flags: Secondary and counter metrics look good ✅ Works for Key Segments: At least works for majority
Example:
- Conversion: +10% (p = 0.03) ✅
- Revenue: +8% (p = 0.05) ✅
- Bounce rate: No change ✅
- Works for mobile and desktop ✅
- Decision: Ship!
Iterate If:
⚠️ Mixed Results: Some metrics up, some down ⚠️ Works for Some Segments Only: E.g., only mobile, not desktop ⚠️ Close to Significance: P = 0.06 (just missed)
Example:
- Conversion: +10% (p = 0.03) ✅
- Revenue: -5% (p = 0.08) ⚠️
- Decision: Iterate. Conversion is up but revenue is down. Investigate why.
Kill If:
❌ Negative: Treatment is worse than control ❌ Not Significant: P-value > 0.05 ❌ Opportunity Cost Too High: Could be working on better ideas
Example:
- Conversion: +2% (p = 0.15) ❌
- Took 4 weeks to test
- Decision: Kill. Not significant, move on to next idea.
Tools
Feature Flags
LaunchDarkly:
- Feature flag management
- Gradual rollouts
- Kill switches
Split.io:
- Feature flags + experimentation
- Real-time metrics
Unleash:
- Open-source feature flags
- Self-hosted option
Experimentation Platforms
Optimizely:
- Full-stack experimentation
- Visual editor for web
- Stats Engine (sequential testing)
VWO (Visual Website Optimizer):
- A/B testing for web
- Heatmaps, session recordings
- Visual editor
GrowthBook:
- Open-source experimentation
- Bayesian statistics
- Feature flags
Statsig:
- Modern experimentation platform
- Sequential testing
- Free tier
Analytics
Amplitude:
- Product analytics
- Funnel analysis
- Cohort analysis
Mixpanel:
- Event-based analytics
- A/B test analysis
- Retention analysis
PostHog:
- Open-source product analytics
- Feature flags
- Session replay
A/B Testing for Engineers
1. Feature Flag Implementation
Node.js (LaunchDarkly):
const LaunchDarkly = require('launchdarkly-node-server-sdk');
const client = LaunchDarkly.init(process.env.LAUNCHDARKLY_SDK_KEY);
await client.waitForInitialization();
app.get('/checkout', async (req, res) => {
const user = {
key: req.user.id,
email: req.user.email,
custom: {
plan: req.user.plan
}
};
const showNewCheckout = await client.variation('new-checkout-flow', user, false);
if (showNewCheckout) {
res.render('checkout-new');
} else {
res.render('checkout-old');
}
});
Python (Statsig):
from statsig import statsig
statsig.initialize(os.environ['STATSIG_SERVER_KEY'])
@app.route('/checkout')
def checkout():
user = {
'userID': current_user.id,
'email': current_user.email,
'custom': {
'plan': current_user.plan
}
}
show_new_checkout = statsig.check_gate(user, 'new_checkout_flow')
if show_new_checkout:
return render_template('checkout_new.html')
else:
return render_template('checkout_old.html')
2. Metric Instrumentation
Segment (Event Tracking):
const Analytics = require('analytics-node');
const analytics = new Analytics(process.env.SEGMENT_WRITE_KEY);
// Track checkout started
analytics.track({
userId: user.id,
event: 'Checkout Started',
properties: {
variant: showNewCheckout ? 'treatment' : 'control',
cart_value: cart.total,
items_count: cart.items.length
}
});
// Track checkout completed
analytics.track({
userId: user.id,
event: 'Checkout Completed',
properties: {
variant: showNewCheckout ? 'treatment' : 'control',
order_id: order.id,
revenue: order.total
}
});
3. Data Pipeline
Architecture:
Application
↓ (events)
Segment
↓ (forwards to)
├── Amplitude (analytics)
├── Mixpanel (analytics)
├── Data Warehouse (BigQuery, Snowflake)
└── Statsig (experimentation)
4. Results Dashboard
Grafana Dashboard:
{
"dashboard": {
"title": "A/B Test: New Checkout Flow",
"panels": [
{
"title": "Conversion Rate by Variant",
"targets": [
{
"expr": "sum(checkout_completed{variant='control'}) / sum(checkout_started{variant='control'})",
"legendFormat": "Control"
},
{
"expr": "sum(checkout_completed{variant='treatment'}) / sum(checkout_started{variant='treatment'})",
"legendFormat": "Treatment"
}
]
},
{
"title": "Sample Size",
"targets": [
{
"expr": "sum(checkout_started{variant='control'})",
"legendFormat": "Control"
},
{
"expr": "sum(checkout_started{variant='treatment'})",
"legendFormat": "Treatment"
}
]
}
]
}
}
Real Experiment Examples
Example 1: Button Color Test (Classic)
Hypothesis:
"If we change the CTA button from blue to orange, click-through rate will increase by 10%, because orange is more attention-grabbing."
Test:
- Control: Blue button
- Treatment: Orange button
- Sample size: 10,000 per variant
- Duration: 1 week
Results:
- Control: 5.2% CTR
- Treatment: 5.7% CTR
- Lift: +9.6%
- P-value: 0.04 ✅
Decision: Ship orange button.
Example 2: Checkout Flow Optimization
Hypothesis:
"If we reduce checkout from 5 steps to 3 steps, conversion will increase by 15%, because users abandon due to flow length."
Test:
- Control: 5-step checkout
- Treatment: 3-step checkout (combined steps)
- Sample size: 50,000 per variant
- Duration: 2 weeks
Results:
- Control: 8.5% conversion
- Treatment: 9.8% conversion
- Lift: +15.3%
- P-value: 0.001 ✅
Secondary Metrics:
- Time to checkout: 4.2 min → 3.1 min ✅
- Error rate: 2.1% → 1.8% ✅
Decision: Ship 3-step checkout.
Example 3: Pricing Page Variants
Hypothesis:
"If we show annual pricing first (instead of monthly), annual plan adoption will increase by 25%, because anchoring effect."
Test:
- Control: Monthly pricing shown first
- Treatment: Annual pricing shown first
- Sample size: 20,000 per variant
- Duration: 3 weeks
Results:
- Control: 12% annual adoption
- Treatment: 18% annual adoption
- Lift: +50%
- P-value: 0.001 ✅
Counter Metrics:
- Overall conversion: 10.5% → 10.2% ⚠️ (slight drop)
Decision: Ship, but monitor overall conversion.
Example 4: Onboarding Flow
Hypothesis:
"If we add an interactive tutorial in onboarding, activation rate will increase by 30%, because users don't know how to get started."
Test:
- Control: No tutorial
- Treatment: Interactive tutorial (5 steps)
- Sample size: 15,000 per variant
- Duration: 2 weeks
Results:
- Control: 25% activation rate
- Treatment: 28% activation rate
- Lift: +12%
- P-value: 0.08 ❌ (not significant)
Segment Analysis:
- New users: +20% (p = 0.03) ✅
- Returning users: +2% (p = 0.5) ❌
Decision: Iterate. Show tutorial only to new users.
Advanced: Bayesian A/B Testing
Traditional (Frequentist) A/B Testing
Approach:
- Null hypothesis: No difference between A and B
- P-value: Probability of seeing this result if null is true
- Reject null if p < 0.05
Interpretation:
"There's a 95% chance the result is not due to random chance."
Bayesian A/B Testing
Approach:
- Prior belief: What we believe before test
- Likelihood: Data from test
- Posterior belief: Updated belief after test
Interpretation:
"There's a 95% probability that B is better than A."
Benefits of Bayesian
-
Easier to Interpret:
- "95% probability B is better" (intuitive)
- vs "p = 0.03" (confusing)
-
Can Stop Early:
- No peeking problem
- Stop when confident enough
-
Incorporates Prior Knowledge:
- Use historical data
- More accurate with small samples
Tools That Use Bayesian
- GrowthBook: Bayesian by default
- VWO: Bayesian engine option
- Google Optimize: Bayesian (deprecated)
Example
Test:
- Control: 5.0% conversion (1000 users)
- Treatment: 5.5% conversion (1000 users)
Frequentist:
- P-value: 0.15 (not significant)
- Decision: Can't conclude
Bayesian:
- Probability B > A: 87%
- Expected lift: +10%
- Decision: Likely better, but not confident enough (need 95%)
Summary
Quick Reference
Experiment Types:
- A/B test: Two variants
- Multivariate: Multiple changes
- Sequential: Stop early
- Holdout: Long-term measurement
When to Experiment:
- Significant features
- Uncertain outcomes
- Multiple options
- Optimization
Process:
- Define hypothesis
- Choose metrics
- Calculate sample size
- Set duration
- Design variants
- Launch
- Analyze
- Decide
Metrics:
- Primary: What we're optimizing
- Secondary: Guardrails
- Counter: Watch for negatives
Statistical Significance:
- P-value < 0.05
- Power > 80%
- Minimum detectable effect
Common Pitfalls:
- Peeking
- Sample ratio mismatch
- Novelty effect
- Seasonality
Decision Framework:
- Ship: Positive, significant, no red flags
- Iterate: Mixed results
- Kill: Negative, not significant
Tools:
- Feature flags: LaunchDarkly, Split.io
- Experimentation: Optimizely, Statsig, GrowthBook
- Analytics: Amplitude, Mixpanel, PostHog
Related Skills
Xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Analyzing Financial Statements
This skill calculates key financial ratios and metrics from financial statement data for investment analysis
Data Storytelling
Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.
Team Composition Analysis
This skill should be used when the user asks to "plan team structure", "determine hiring needs", "design org chart", "calculate compensation", "plan equity allocation", or requests organizational design and headcount planning for a startup.
Kpi Dashboard Design
Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Sql Optimization Patterns
Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
