Shap
by eyadsibai
Use when "SHAP", "Shapley values", "feature importance", "model explainability", or asking about "explain predictions", "interpretable ML", "feature attribution", "waterfall plot", "beeswarm plot", "model debugging
Skill Details
Repository Files
1 file in this skill directory
name: shap description: Use when "SHAP", "Shapley values", "feature importance", "model explainability", or asking about "explain predictions", "interpretable ML", "feature attribution", "waterfall plot", "beeswarm plot", "model debugging" version: 1.0.0
SHAP Model Explainability
Explain ML predictions using Shapley values - feature importance and attribution.
When to Use
- Explain why a model made specific predictions
- Calculate feature importance with attribution
- Debug model behavior and validate predictions
- Create interpretability plots (waterfall, beeswarm, bar)
- Analyze model fairness and bias
Quick Start
import shap
import xgboost as xgb
# Train model
model = xgb.XGBClassifier().fit(X_train, y_train)
# Create explainer
explainer = shap.TreeExplainer(model)
# Compute SHAP values
shap_values = explainer(X_test)
# Visualize
shap.plots.beeswarm(shap_values)
Choose Explainer
# Tree-based models (XGBoost, LightGBM, RF) - FAST
explainer = shap.TreeExplainer(model)
# Deep learning (TensorFlow, PyTorch)
explainer = shap.DeepExplainer(model, background_data)
# Linear models
explainer = shap.LinearExplainer(model, X_train)
# Any model (slower but universal)
explainer = shap.KernelExplainer(model.predict, X_train[:100])
# Auto-select best explainer
explainer = shap.Explainer(model)
Compute SHAP Values
# Compute for test set
shap_values = explainer(X_test)
# Access components
shap_values.values # SHAP values (feature attributions)
shap_values.base_values # Expected model output (baseline)
shap_values.data # Original feature values
Visualizations
Global Feature Importance
# Beeswarm - shows distribution and importance
shap.plots.beeswarm(shap_values)
# Bar - clean summary
shap.plots.bar(shap_values)
Individual Predictions
# Waterfall - breakdown of single prediction
shap.plots.waterfall(shap_values[0])
# Force - additive visualization
shap.plots.force(shap_values[0])
Feature Relationships
# Scatter - feature vs SHAP value
shap.plots.scatter(shap_values[:, "feature_name"])
# With interaction coloring
shap.plots.scatter(shap_values[:, "Age"], color=shap_values[:, "Income"])
Heatmap (Multiple Samples)
shap.plots.heatmap(shap_values[:100])
Common Patterns
Complete Analysis
import shap
# 1. Create explainer and compute
explainer = shap.TreeExplainer(model)
shap_values = explainer(X_test)
# 2. Global importance
shap.plots.beeswarm(shap_values)
# 3. Top feature relationships
shap.plots.scatter(shap_values[:, "top_feature"])
# 4. Individual explanation
shap.plots.waterfall(shap_values[0])
Compare Groups
# Compare feature importance across groups
group_a = X_test['category'] == 'A'
group_b = X_test['category'] == 'B'
shap.plots.bar({
"Group A": shap_values[group_a],
"Group B": shap_values[group_b]
})
Debug Errors
# Find misclassified samples
errors = model.predict(X_test) != y_test
error_idx = np.where(errors)[0]
# Explain why they failed
for idx in error_idx[:5]:
shap.plots.waterfall(shap_values[idx])
Interpret Values
- Positive SHAP → Feature pushes prediction higher
- Negative SHAP → Feature pushes prediction lower
- Magnitude → Strength of impact
- Sum of SHAP values = Prediction - Baseline
Baseline: 0.30
Age: +0.15
Income: +0.10
Education: -0.05
Prediction: 0.30 + 0.15 + 0.10 - 0.05 = 0.50
Best Practices
- Use TreeExplainer for tree models (fast, exact)
- Use 100-1000 background samples for KernelExplainer
- Start global (beeswarm) then go local (waterfall)
- Check model output type (probability vs log-odds)
- Validate with domain knowledge
vs Alternatives
| Tool | Best For |
|---|---|
| SHAP | Theoretically grounded, all model types |
| LIME | Quick local explanations |
| Feature Importance | Simple tree-based importance |
Resources
- Docs: https://shap.readthedocs.io/
- Paper: Lundberg & Lee (2017) "A Unified Approach to Interpreting Model Predictions"
Related Skills
Attack Tree Construction
Build comprehensive attack trees to visualize threat paths. Use when mapping attack scenarios, identifying defense gaps, or communicating security risks to stakeholders.
Grafana Dashboards
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.
Matplotlib
Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.
Scientific Visualization
Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.
Seaborn
Statistical visualization. Scatter, box, violin, heatmaps, pair plots, regression, correlation matrices, KDE, faceted plots, for exploratory analysis and publication figures.
Shap
Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Query Writing
For writing and executing SQL queries - from simple single-table queries to complex multi-table JOINs and aggregations
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Scientific Visualization
Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.
