Experiment Analyzer
by StreamPilotOrg
Analyze completed growth experiment results, validate hypotheses, generate insights, and suggest follow-up experiments. Use when experiments are completed, when the user asks about results or learnings, or when discussing what to do next based on experiment outcomes.
Skill Details
Repository Files
1 file in this skill directory
name: experiment-analyzer description: Analyze completed growth experiment results, validate hypotheses, generate insights, and suggest follow-up experiments. Use when experiments are completed, when the user asks about results or learnings, or when discussing what to do next based on experiment outcomes. allowed-tools: [Read, Write, Grep, Glob]
Experiment Analyzer Skill
Analyze completed growth experiments, extract insights, and drive continuous learning.
When to Activate
This skill should activate when:
- User marks experiment as "completed"
- User asks "what did we learn?"
- User mentions "results", "outcomes", or "analysis"
- User asks "what should we do next?"
- User wants to compare multiple experiments
- User asks about experiment success rates
Analysis Framework
1. Result Classification
Win (Positive + Significant)
- Result is better than baseline
- Statistical significance ≥ 95%
- Change is meaningful (usually ≥5%)
Loss (Negative + Significant)
- Result is worse than baseline
- Statistical significance ≥ 95%
- Change is meaningful
Inconclusive
- Statistical significance < 95%
- Not enough data to make decision
- Sample size may be insufficient
Neutral
- Minimal change (< ±2%)
- No meaningful impact either way
- May indicate hypothesis was off
2. Hypothesis Validation
Compare original hypothesis to results:
Hypothesis Components:
- Proposed change → Was it implemented as planned?
- Target audience → Did we reach the right users?
- Expected outcome → Did we hit the target?
- Rationale → Was our reasoning correct?
Validation Questions:
- Did we achieve the expected outcome? (Yes/No/Partially)
- Was the underlying assumption correct?
- What surprised us?
- What would we do differently?
3. ICE Score Retrospective
Compare predicted vs actual:
Impact Score Validation:
- Predicted Impact: [original score]
- Actual Impact: [calculate based on results]
- Delta: [difference]
- Learning: Was our impact prediction accurate?
Confidence Score Validation:
- Predicted Confidence: [original score]
- Outcome: [win/loss/inconclusive]
- Learning: Was our confidence justified?
Ease Score Validation:
- Predicted Ease: [original score]
- Actual Time: [if tracked]
- Learning: Was implementation as easy as expected?
4. Insight Generation
Key Questions:
- What worked? Specific elements that drove success
- What didn't work? Elements that failed or harmed metrics
- What was surprising? Unexpected findings
- What patterns emerge? Connections to other experiments
- What new questions arise? Areas to investigate further
Secondary Metrics:
- Review all secondary metrics tracked
- Look for unintended positive effects
- Watch for negative side effects
- Consider holistic impact
5. Follow-up Experiment Suggestions
Based on the outcome, suggest 2-3 follow-up experiments:
For Wins:
- Scale: Roll out to 100% of users
- Amplify: Make the winning element more prominent
- Extend: Apply pattern to related areas
- Optimize: Test variations to improve further
For Losses:
- Pivot: Try alternative approach to same problem
- Investigate: Run research to understand why
- Revert: Document and move on
- Learn: Apply learnings to future experiments
For Inconclusive:
- Re-run: Increase sample size or duration
- Simplify: Test smaller version to isolate variable
- Segment: Test with specific user segments
- Refine: Adjust hypothesis based on early signals
Analysis Process
Step 1: Load and Validate
1. Read experiment JSON from completed/archived folder
2. Verify results data exists:
- Primary metric
- Baseline value
- Result value
- Statistical significance
- Sample size
- Duration
3. Check if hypothesis is documented
4. Review ICE scores
Step 2: Calculate Key Metrics
Change Percentage = ((Result - Baseline) / Baseline) × 100
Result Classification:
- IF change% > 2% AND significance >= 95% → Win
- IF change% < -2% AND significance >= 95% → Loss
- IF significance < 95% → Inconclusive
- IF abs(change%) < 2% → Neutral
Step 3: Generate Insights
1. Classify result (Win/Loss/Inconclusive/Neutral)
2. Validate hypothesis against results
3. Review ICE score predictions
4. Extract key learnings
5. Identify surprising findings
6. Check secondary metrics
7. Look for patterns across related experiments
Step 4: Create Follow-up Ideas
1. Based on result type, brainstorm 2-3 follow-ups
2. For each follow-up:
- Draft hypothesis
- Explain rationale (reference current learnings)
- Suggest category
- Provide preliminary ICE estimate
3. Prioritize follow-ups by potential impact
Step 5: Generate Report
1. Create markdown analysis report
2. Include:
- Summary (result classification, key numbers)
- Hypothesis validation
- ICE score retrospective
- Key insights (bulleted list)
- Secondary metrics review
- Recommendations
- Follow-up experiment ideas
3. Save to experiments/archive/[id]_analysis.md
4. Update experiment JSON with learnings
Analysis Output Template
# Experiment Analysis: [Title]
**Date:** [Analysis date]
**Experiment ID:** [id]
**Status:** [Win/Loss/Inconclusive/Neutral] ✓/✗/?/○
## Summary
- **Primary Metric:** [metric name]
- **Baseline:** [baseline value]
- **Result:** [result value]
- **Change:** [+/-X%]
- **Statistical Significance:** [XX%]
- **Sample Size:** [count]
- **Duration:** [days]
## Hypothesis Validation
### Original Hypothesis
[Full hypothesis statement]
### Validation
- **Expected Outcome:** [what we expected]
- **Actual Outcome:** [what happened]
- **Hypothesis Validated:** [Yes/No/Partially]
**Analysis:**
[Explanation of whether and why hypothesis was validated]
## ICE Score Retrospective
| Component | Predicted | Actual/Assessment | Accuracy |
|-----------|-----------|------------------|----------|
| Impact | [score] | [calculate from results] | [good/overestimated/underestimated] |
| Confidence | [score] | [based on outcome] | [justified/overconfident/underconfident] |
| Ease | [score] | [based on actual effort] | [accurate/harder/easier] |
**Learnings for Future Scoring:**
- [What we learned about predicting impact]
- [What we learned about confidence]
- [What we learned about ease]
## Key Insights
1. **[Primary insight]** - [Explanation with data]
2. **[Secondary insight]** - [Explanation]
3. **[Surprising finding]** - [What we didn't expect]
## Secondary Metrics
| Metric | Change | Interpretation |
|--------|--------|----------------|
| [metric 1] | [+/-X%] | [Good/Bad/Neutral] |
| [metric 2] | [+/-X%] | [Good/Bad/Neutral] |
**Side Effects:**
- Positive: [Any unexpected positive impacts]
- Negative: [Any unexpected negative impacts]
## Recommendations
### Immediate Actions
- [ ] [Action item 1]
- [ ] [Action item 2]
### Strategic Implications
[Broader implications for product/growth strategy]
## Follow-up Experiment Ideas
### 1. [Experiment Title]
**Category:** [category]
**Hypothesis:**
[Full hypothesis following template]
**Rationale:**
[Why this follow-up based on current learnings]
**Preliminary ICE:**
- Impact: [score] - [reasoning]
- Confidence: [score] - [reasoning]
- Ease: [score] - [reasoning]
- **Total: [score]**
---
### 2. [Experiment Title]
[Repeat format]
---
### 3. [Experiment Title]
[Repeat format]
## Related Experiments
[List any related experiments and their outcomes for pattern recognition]
## Notes
[Any additional context, edge cases, or considerations]
Cross-Experiment Analysis
When user asks to analyze multiple experiments:
Metrics to Calculate:
- Success Rate: % of wins out of completed experiments
- Category Performance: Which funnel stages have best win rate?
- ICE Score Accuracy: How well do high-ICE experiments perform?
- Average Impact: What's the typical metric improvement?
- Cycle Time: Average days from backlog → completed
Pattern Recognition:
- Which types of experiments succeed most?
- Which audience segments respond best?
- Which testing methods are most reliable?
- What confidence levels actually predict success?
Portfolio View:
# Experiment Portfolio Analysis
## Overview
- Total Experiments: [count]
- Completed: [count]
- Win Rate: [X%]
- Average Change: [+X%]
## By Category
| Category | Experiments | Win Rate | Avg Impact |
|----------|-------------|----------|------------|
| Acquisition | [count] | [X%] | [+X%] |
| Activation | [count] | [X%] | [+X%] |
| Retention | [count] | [X%] | [+X%] |
| Revenue | [count] | [X%] | [+X%] |
| Referral | [count] | [X%] | [+X%] |
## ICE Score Performance
- Experiments with ICE > 500: [X% win rate]
- Experiments with ICE 300-500: [X% win rate]
- Experiments with ICE < 300: [X% win rate]
**Learning:** [Are high ICE scores actually better predictors?]
## Top Performers
1. [Experiment] - [+X%] change
2. [Experiment] - [+X%] change
3. [Experiment] - [+X%] change
## Key Patterns
- [Pattern 1 discovered across experiments]
- [Pattern 2]
- [Pattern 3]
## Recommendations
[Strategic recommendations based on portfolio analysis]
Integration Points
- Automatically trigger when
/experiment-updatesets status to "completed" - Work with ICE scorer skill to validate predictions
- Inform hypothesis generator with learnings
- Feed into metrics calculator for portfolio analysis
Continuous Improvement
After each analysis:
- Store learnings in a knowledge base
- Update ICE scoring calibration
- Refine hypothesis templates
- Build pattern library
- Improve follow-up suggestions
Related Skills
Attack Tree Construction
Build comprehensive attack trees to visualize threat paths. Use when mapping attack scenarios, identifying defense gaps, or communicating security risks to stakeholders.
Grafana Dashboards
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.
Matplotlib
Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.
Scientific Visualization
Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.
Seaborn
Statistical visualization. Scatter, box, violin, heatmaps, pair plots, regression, correlation matrices, KDE, faceted plots, for exploratory analysis and publication figures.
Shap
Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Query Writing
For writing and executing SQL queries - from simple single-table queries to complex multi-table JOINs and aggregations
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Scientific Visualization
Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.
