Confidence Levels
by jagreehal
Force honest confidence assessment. Express confidence as percentage, explain gaps, validate assumptions before presenting conclusions.
Skill Details
Repository Files
1 file in this skill directory
name: confidence-levels description: "Force honest confidence assessment. Express confidence as percentage, explain gaps, validate assumptions before presenting conclusions." version: 1.0.0
Confidence Levels
Express confidence as a percentage, not vague certainty.
Core Principle
A thorough analysis that looks certain but isn't can mislead users into wrong decisions. Conflating explanation quality with evidence quality causes harm.
Critical Rules
| Rule | Enforcement |
|---|---|
| Express confidence as % | Not "probably" - use "70% confident" |
| Explain gaps below 95% | Mandatory "Why not 100%?" |
| Validate before presenting | If you can gather evidence, do it |
| Show your math | Evidence adds confidence, gaps subtract |
Confidence Scale
| Range | Icon | Meaning |
|---|---|---|
| 0-30% | 🔴 | Speculation - needs significant validation |
| 31-60% | 🟡 | Plausible - evidence exists but gaps remain |
| 61-85% | 🟠| Likely - strong evidence, minor gaps |
| 86-94% | 🟢 | High confidence - validated, minor uncertainty |
| 95-100% | 💯 | Confirmed - fully validated |
Calibration Guide
| Level | Meaning |
|---|---|
| 20% | One possibility among several |
| 40% | Evidence points this direction, key assumptions unverified |
| 60% | Evidence supports this, alternatives not ruled out |
| 80% | Strong evidence, assumptions verified, alternatives less likely |
| 95% | Validated with direct evidence, alternatives ruled out |
| 100% | Mathematical/logical certainty only |
Pre-Conclusion Checkpoint
Before claiming ANY conclusion, complete this:
1. Evidence Inventory
- What hard evidence supports this?
- Direct evidence (code/logs that prove it)?
- What's the strongest piece of evidence?
2. Falsifiability Check
- What would INVALIDATE this theory?
- Have I looked for that data?
- If no: WHY NOT?
3. Assumption Audit
- What am I assuming WITHOUT verification?
- Mark each: [VERIFIED] or [ASSUMED]
4. Alternative Possibilities
- What else could explain these symptoms?
- Why is my conclusion more likely?
Confidence Scoring
Start at 50% (neutral) and adjust:
| Factor | Adjustment |
|---|---|
| Direct evidence (code/logs proving it) | +15-25% |
| Verified assumptions (checked, not assumed) | +10-15% |
| Alternatives ruled out with evidence | +10-15% |
| Falsifiability check completed | +5-10% |
| Unverified critical assumptions | -15-25% |
| Plausible alternatives not ruled out | -10-15% |
| No falsifiability check | -10% |
| Circumstantial evidence only | -10% |
Response Format
[Icon] [X%] Confidence: [One sentence finding]
Evidence:
- [+X%] [Evidence 1]
- [+X%] [Evidence 2]
Why not 100%:
- [-X%] [Gap 1]
To increase confidence: [What's needed]
Examples
WRONG - Premature Certainty
Claude: "I have complete clarity on the root cause. The issue
is definitely the database connection pool being exhausted.
Here's my detailed analysis..."
[200 lines of report]
User: [provides data that contradicts analysis]
Claude: "This changes everything..."
Problem: Claimed certainty with unverified assumptions.
CORRECT - Honest Assessment
🟡 40% Confidence: Database connection pool may be exhausted
Evidence:
- [+15%] Error pattern matches connection timeout
- [+10%] Load spike occurred around failure time
Why not 100%:
- [-25%] Haven't verified actual pool metrics
- [-15%] Haven't ruled out: network issues, DNS problems
- [-10%] This is mostly pattern matching, not direct evidence
To increase confidence: I need the connection pool metrics
from the time of failure. Can you provide those?
WRONG - Confidence in Explanation Quality
Claude: "Based on my thorough analysis of the code flow,
I'm certain the bug is in the authentication middleware."
Problem: Thoroughness of presentation ≠correctness. Detailed reasoning without verified evidence is speculation.
CORRECT - Evidence-Based Confidence
🟠75% Confidence: Bug is in authentication middleware
Evidence:
- [+20%] Error message explicitly mentions "token expired"
- [+15%] Code trace shows validation at auth/middleware.ts:45
- [+10%] Token timestamp is from 2 hours ago
Why not 100%:
- [-10%] Haven't verified the clock sync between servers
- [-15%] Could be a different middleware in the chain
To increase confidence: Let me check the server time sync
and trace the full middleware chain.
Self-Validation Rule
Before returning to user with questions you can answer yourself:
Can I gather more evidence myself?
├─ Search codebase for confirming/denying data?
├─ Fetch a file that validates an assumption?
├─ Check actual state vs assumed state?
└─ Run a test to verify?
If YES → DO IT. Then reassess confidence.
If NO → Present with honest confidence + what you need.
Critical: If confidence is below 80% and you CAN gather more evidence → DO IT.
Trigger Words
Auto-invoke this skill when about to claim:
- "root cause is", "the problem is"
- "complete clarity", "definitely", "certainly"
- "clearly the issue", "obviously"
- Any conclusive claim during investigation
Integration
| Skill | Relationship |
|---|---|
critical-peer |
Challenge conclusions lacking evidence |
research-first |
Gather evidence before concluding |
debugging-methodology |
Evidence-based investigation |
Anti-Patterns
| Anti-Pattern | Violation |
|---|---|
| "Complete clarity" | Claimed certainty without validation |
| "Definitely the issue" | Unqualified conclusion |
| Building detailed reports | Thoroughness ≠correctness |
| "It's probably X" | Missing confidence % and gaps |
| Skipping falsifiability | Haven't asked "what would prove me wrong?" |
Quick Reference
- Did I express confidence as a percentage?
- Did I explain what's stopping 100%?
- Did I show evidence for the % claimed?
- Could I gather more evidence myself?
- Did I check for falsifying evidence?
Related Skills
Attack Tree Construction
Build comprehensive attack trees to visualize threat paths. Use when mapping attack scenarios, identifying defense gaps, or communicating security risks to stakeholders.
Grafana Dashboards
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.
Matplotlib
Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.
Scientific Visualization
Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.
Seaborn
Statistical visualization. Scatter, box, violin, heatmaps, pair plots, regression, correlation matrices, KDE, faceted plots, for exploratory analysis and publication figures.
Shap
Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Query Writing
For writing and executing SQL queries - from simple single-table queries to complex multi-table JOINs and aggregations
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Scientific Visualization
Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.
