Ice Scorer

by StreamPilotOrg

skill

Automatically score growth experiments using the ICE framework (Impact × Confidence × Ease). Use when the user creates a new experiment, mentions scoring or prioritization, or when analyzing experiment backlogs. Helps prioritize experiments by evaluating Impact (1-10), Confidence (1-10), and Ease (1-10).

Skill Details

Repository Files

1 file in this skill directory


name: ice-scorer description: Automatically score growth experiments using the ICE framework (Impact × Confidence × Ease). Use when the user creates a new experiment, mentions scoring or prioritization, or when analyzing experiment backlogs. Helps prioritize experiments by evaluating Impact (1-10), Confidence (1-10), and Ease (1-10). allowed-tools: [Read, Write]

ICE Scorer Skill

Automatically score growth experiments using the ICE (Impact, Confidence, Ease) prioritization framework.

When to Activate

This skill should activate when:

  • User creates a new experiment without providing ICE scores
  • User mentions "score", "prioritize", or "ICE"
  • User asks "which experiment should I run first?"
  • User wants to evaluate experiment backlog
  • User compares multiple experiments

ICE Framework Scoring Guidelines

Impact (1-10): How much will this move the key metric?

Score 8-10: High Impact

  • Affects North Star metric directly
  • Expected change ≥15%
  • Targets large user segment
  • Critical business metric

Score 4-7: Medium Impact

  • Affects important but secondary metrics
  • Expected change 5-15%
  • Targets meaningful user segment
  • Supports key business goals

Score 1-3: Low Impact

  • Affects minor or vanity metrics
  • Expected change <5%
  • Targets small user segment
  • Nice-to-have improvement

Confidence (1-10): How certain are we this will work?

Score 8-10: High Confidence

  • Strong quantitative data supporting hypothesis
  • User research validates the problem
  • Similar experiments succeeded elsewhere
  • Multiple sources of evidence
  • Detailed rationale (>100 characters)

Score 4-7: Medium Confidence

  • Some supporting data or research
  • Analogous experiments showed promise
  • Logical reasoning with limited evidence
  • Moderate rationale (50-100 characters)

Score 1-3: Low Confidence

  • Speculative or gut feeling
  • No supporting data
  • Untested assumption
  • Minimal rationale (<50 characters)

Ease (1-10): How easy is this to implement?

Score 8-10: High Ease

  • < 1 day of work
  • No engineering required, or minimal changes
  • No external dependencies
  • Can be done with existing tools

Score 4-7: Medium Ease

  • 1-2 days of work
  • Some engineering work required
  • May need design support
  • Uses existing infrastructure

Score 1-3: Low Ease

  • 2 days of work

  • Significant engineering effort
  • Requires design and multiple teams
  • Needs external resources or new tools

Scoring Process

When scoring an experiment:

  1. Read the experiment file from the experiments folder

  2. Analyze the hypothesis components:

    • Proposed change
    • Target audience
    • Expected outcome (look for specific percentages)
    • Rationale (check length and evidence quality)
  3. Evaluate Impact:

    • Is this a North Star metric or secondary metric?
    • What's the expected percentage change?
    • How many users will this affect?
    • Consider the experiment category (acquisition, activation, etc.)
  4. Evaluate Confidence:

    • How much evidence supports the hypothesis?
    • Is there user research or data mentioned?
    • How detailed is the rationale?
    • Are there comparable experiments?
  5. Evaluate Ease:

    • Estimate implementation time
    • Does it need engineering? Design? External resources?
    • How complex is the proposed change?
    • Look for keywords: "redesign" (low ease), "copy change" (high ease)
  6. Calculate total ICE score: Impact × Confidence × Ease

  7. Interpret the score:

    • 700+: Critical Priority - implement immediately
    • 500-699: High Priority - strong candidate
    • 300-499: Medium Priority - good experiment
    • 150-299: Low Priority
    • <150: Very Low Priority - deprioritize
  8. Update the experiment JSON with ICE scores

  9. Move to pipeline if score ≥ 300

Scoring Examples

Example 1: Onboarding Progress Indicators

Experiment: Add progress indicators to 5-step onboarding flow

Analysis:

  • Impact: 7 - Activation is important, expected 15% increase
  • Confidence: 6 - User research supports it, but not tested yet
  • Ease: 9 - Simple UI element, <1 day of work
  • Total: 378 - Medium-High Priority

Reasoning:

  • Impact: Activation is a key metric but not the only North Star
  • Confidence: User research provides evidence but no previous tests
  • Ease: Adding progress bar is straightforward UI work

Example 2: Social Proof on Pricing Page

Experiment: Add customer logos and testimonials to pricing page

Analysis:

  • Impact: 7 - Affects acquisition and conversion
  • Confidence: 8 - Strong industry evidence for B2B social proof
  • Ease: 9 - Design change only, no engineering
  • Total: 504 - High Priority

Reasoning:

  • Impact: Pricing page is high-traffic, affects key conversion
  • Confidence: Multiple case studies show 10-15% improvement
  • Ease: Simple asset placement, quick implementation

Example 3: Complete Platform Redesign

Experiment: Redesign entire user interface

Analysis:

  • Impact: 9 - Could affect all metrics significantly
  • Confidence: 4 - No data supporting specific improvements
  • Ease: 2 - Months of work, multiple teams
  • Total: 72 - Very Low Priority

Reasoning:

  • Impact: Broad changes could have major impact
  • Confidence: Too vague, no specific hypothesis about what will improve
  • Ease: Massive undertaking, not a growth "experiment"

Keywords to Watch

Low Ease indicators:

  • redesign, rebuild, refactor, overhaul, migration, infrastructure

High Ease indicators:

  • copy change, button, color, image, text, email, simple

High Confidence indicators:

  • "data shows", "research indicates", "we tested", "similar experiment"

High Impact indicators:

  • North Star, conversion, activation, retention, revenue
  • Specific percentages (e.g., "15% increase")
  • Large user segments

Output Format

When providing ICE scores, explain your reasoning:

ICE Score Analysis for: [Experiment Title]

Impact: [score]/10
Reasoning: [Why this score based on metric importance, expected change, audience size]

Confidence: [score]/10
Reasoning: [Why this score based on evidence, data, research quality]

Ease: [score]/10
Reasoning: [Why this score based on time, resources, complexity]

Total ICE Score: [Impact × Confidence × Ease] = [total]

Priority: [Critical/High/Medium/Low/Very Low]
Recommendation: [What to do with this experiment]

[If score >= 300:]
✓ Moving to pipeline based on strong ICE score

Integration with Commands

This skill works automatically when:

  • /experiment-create completes - offer to score immediately
  • /hypothesis-generate creates ideas - suggest preliminary scores
  • User asks about prioritization

Continuous Learning

After experiments complete:

  • Compare predicted Impact vs actual results
  • Adjust scoring calibration based on outcomes
  • Learn patterns for better Confidence scoring
  • Refine Ease estimates based on actual time taken

Related Skills

Attack Tree Construction

Build comprehensive attack trees to visualize threat paths. Use when mapping attack scenarios, identifying defense gaps, or communicating security risks to stakeholders.

skill

Grafana Dashboards

Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.

skill

Matplotlib

Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.

skill

Scientific Visualization

Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.

skill

Seaborn

Statistical visualization. Scatter, box, violin, heatmaps, pair plots, regression, correlation matrices, KDE, faceted plots, for exploratory analysis and publication figures.

skill

Shap

Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model

skill

Pydeseq2

Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.

skill

Query Writing

For writing and executing SQL queries - from simple single-table queries to complex multi-table JOINs and aggregations

skill

Pydeseq2

Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.

skill

Scientific Visualization

Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.

skill

Skill Information

Category:Skill
Allowed Tools:[Read, Write]
Last Updated:11/25/2025