Ice Scorer
by StreamPilotOrg
Automatically score growth experiments using the ICE framework (Impact × Confidence × Ease). Use when the user creates a new experiment, mentions scoring or prioritization, or when analyzing experiment backlogs. Helps prioritize experiments by evaluating Impact (1-10), Confidence (1-10), and Ease (1-10).
Skill Details
Repository Files
1 file in this skill directory
name: ice-scorer description: Automatically score growth experiments using the ICE framework (Impact × Confidence × Ease). Use when the user creates a new experiment, mentions scoring or prioritization, or when analyzing experiment backlogs. Helps prioritize experiments by evaluating Impact (1-10), Confidence (1-10), and Ease (1-10). allowed-tools: [Read, Write]
ICE Scorer Skill
Automatically score growth experiments using the ICE (Impact, Confidence, Ease) prioritization framework.
When to Activate
This skill should activate when:
- User creates a new experiment without providing ICE scores
- User mentions "score", "prioritize", or "ICE"
- User asks "which experiment should I run first?"
- User wants to evaluate experiment backlog
- User compares multiple experiments
ICE Framework Scoring Guidelines
Impact (1-10): How much will this move the key metric?
Score 8-10: High Impact
- Affects North Star metric directly
- Expected change ≥15%
- Targets large user segment
- Critical business metric
Score 4-7: Medium Impact
- Affects important but secondary metrics
- Expected change 5-15%
- Targets meaningful user segment
- Supports key business goals
Score 1-3: Low Impact
- Affects minor or vanity metrics
- Expected change <5%
- Targets small user segment
- Nice-to-have improvement
Confidence (1-10): How certain are we this will work?
Score 8-10: High Confidence
- Strong quantitative data supporting hypothesis
- User research validates the problem
- Similar experiments succeeded elsewhere
- Multiple sources of evidence
- Detailed rationale (>100 characters)
Score 4-7: Medium Confidence
- Some supporting data or research
- Analogous experiments showed promise
- Logical reasoning with limited evidence
- Moderate rationale (50-100 characters)
Score 1-3: Low Confidence
- Speculative or gut feeling
- No supporting data
- Untested assumption
- Minimal rationale (<50 characters)
Ease (1-10): How easy is this to implement?
Score 8-10: High Ease
- < 1 day of work
- No engineering required, or minimal changes
- No external dependencies
- Can be done with existing tools
Score 4-7: Medium Ease
- 1-2 days of work
- Some engineering work required
- May need design support
- Uses existing infrastructure
Score 1-3: Low Ease
-
2 days of work
- Significant engineering effort
- Requires design and multiple teams
- Needs external resources or new tools
Scoring Process
When scoring an experiment:
-
Read the experiment file from the experiments folder
-
Analyze the hypothesis components:
- Proposed change
- Target audience
- Expected outcome (look for specific percentages)
- Rationale (check length and evidence quality)
-
Evaluate Impact:
- Is this a North Star metric or secondary metric?
- What's the expected percentage change?
- How many users will this affect?
- Consider the experiment category (acquisition, activation, etc.)
-
Evaluate Confidence:
- How much evidence supports the hypothesis?
- Is there user research or data mentioned?
- How detailed is the rationale?
- Are there comparable experiments?
-
Evaluate Ease:
- Estimate implementation time
- Does it need engineering? Design? External resources?
- How complex is the proposed change?
- Look for keywords: "redesign" (low ease), "copy change" (high ease)
-
Calculate total ICE score: Impact × Confidence × Ease
-
Interpret the score:
- 700+: Critical Priority - implement immediately
- 500-699: High Priority - strong candidate
- 300-499: Medium Priority - good experiment
- 150-299: Low Priority
- <150: Very Low Priority - deprioritize
-
Update the experiment JSON with ICE scores
-
Move to pipeline if score ≥ 300
Scoring Examples
Example 1: Onboarding Progress Indicators
Experiment: Add progress indicators to 5-step onboarding flow
Analysis:
- Impact: 7 - Activation is important, expected 15% increase
- Confidence: 6 - User research supports it, but not tested yet
- Ease: 9 - Simple UI element, <1 day of work
- Total: 378 - Medium-High Priority
Reasoning:
- Impact: Activation is a key metric but not the only North Star
- Confidence: User research provides evidence but no previous tests
- Ease: Adding progress bar is straightforward UI work
Example 2: Social Proof on Pricing Page
Experiment: Add customer logos and testimonials to pricing page
Analysis:
- Impact: 7 - Affects acquisition and conversion
- Confidence: 8 - Strong industry evidence for B2B social proof
- Ease: 9 - Design change only, no engineering
- Total: 504 - High Priority
Reasoning:
- Impact: Pricing page is high-traffic, affects key conversion
- Confidence: Multiple case studies show 10-15% improvement
- Ease: Simple asset placement, quick implementation
Example 3: Complete Platform Redesign
Experiment: Redesign entire user interface
Analysis:
- Impact: 9 - Could affect all metrics significantly
- Confidence: 4 - No data supporting specific improvements
- Ease: 2 - Months of work, multiple teams
- Total: 72 - Very Low Priority
Reasoning:
- Impact: Broad changes could have major impact
- Confidence: Too vague, no specific hypothesis about what will improve
- Ease: Massive undertaking, not a growth "experiment"
Keywords to Watch
Low Ease indicators:
- redesign, rebuild, refactor, overhaul, migration, infrastructure
High Ease indicators:
- copy change, button, color, image, text, email, simple
High Confidence indicators:
- "data shows", "research indicates", "we tested", "similar experiment"
High Impact indicators:
- North Star, conversion, activation, retention, revenue
- Specific percentages (e.g., "15% increase")
- Large user segments
Output Format
When providing ICE scores, explain your reasoning:
ICE Score Analysis for: [Experiment Title]
Impact: [score]/10
Reasoning: [Why this score based on metric importance, expected change, audience size]
Confidence: [score]/10
Reasoning: [Why this score based on evidence, data, research quality]
Ease: [score]/10
Reasoning: [Why this score based on time, resources, complexity]
Total ICE Score: [Impact × Confidence × Ease] = [total]
Priority: [Critical/High/Medium/Low/Very Low]
Recommendation: [What to do with this experiment]
[If score >= 300:]
✓ Moving to pipeline based on strong ICE score
Integration with Commands
This skill works automatically when:
/experiment-createcompletes - offer to score immediately/hypothesis-generatecreates ideas - suggest preliminary scores- User asks about prioritization
Continuous Learning
After experiments complete:
- Compare predicted Impact vs actual results
- Adjust scoring calibration based on outcomes
- Learn patterns for better Confidence scoring
- Refine Ease estimates based on actual time taken
Related Skills
Attack Tree Construction
Build comprehensive attack trees to visualize threat paths. Use when mapping attack scenarios, identifying defense gaps, or communicating security risks to stakeholders.
Grafana Dashboards
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.
Matplotlib
Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.
Scientific Visualization
Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.
Seaborn
Statistical visualization. Scatter, box, violin, heatmaps, pair plots, regression, correlation matrices, KDE, faceted plots, for exploratory analysis and publication figures.
Shap
Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Query Writing
For writing and executing SQL queries - from simple single-table queries to complex multi-table JOINs and aggregations
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Scientific Visualization
Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.
