Heterogeneity Analysis
by matheus-rech
Assess and interpret between-study heterogeneity in meta-analysis using I², Q statistic, tau², and prediction intervals. Use when users need to evaluate consistency across studies, understand sources of variation, or decide if pooling is appropriate.
Skill Details
Repository Files
1 file in this skill directory
name: heterogeneity-analysis description: Assess and interpret between-study heterogeneity in meta-analysis using I², Q statistic, tau², and prediction intervals. Use when users need to evaluate consistency across studies, understand sources of variation, or decide if pooling is appropriate. license: Apache-2.0 compatibility: Requires R with metafor package metadata: author: meta-agent version: "1.0.0" category: statistics domain: evidence-synthesis difficulty: intermediate estimated-time: "12 minutes" prerequisites: meta-analysis-fundamentals
Heterogeneity Analysis
This skill teaches assessment and interpretation of between-study heterogeneity, a critical component of meta-analysis quality.
Overview
Heterogeneity refers to variation in true effects across studies beyond what we'd expect from sampling error alone. High heterogeneity questions whether pooling is meaningful.
When to Use This Skill
Activate this skill when users:
- Ask about I², tau², or Q statistic
- Want to know if studies are "too different to combine"
- See conflicting results in their forest plot
- Ask about "inconsistency" or "variability"
- Need to interpret heterogeneity statistics
Key Heterogeneity Measures
1. Cochran's Q Statistic
What it is: Tests null hypothesis that all studies share a common effect.
Interpretation:
- Significant Q (p < 0.10) → Evidence of heterogeneity
- Non-significant Q → Does NOT prove homogeneity (low power)
Limitation: Underpowered with few studies, overpowered with many.
2. I² (I-squared)
What it is: Percentage of variability due to heterogeneity rather than chance.
Interpretation Guidelines (Cochrane):
| I² Value | Interpretation |
|---|---|
| 0-40% | Might not be important |
| 30-60% | May represent moderate heterogeneity |
| 50-90% | May represent substantial heterogeneity |
| 75-100% | Considerable heterogeneity |
Key Teaching Points:
- I² is a proportion, not an absolute measure
- Overlapping ranges are intentional—context matters
- Always consider clinical and methodological diversity
Socratic Questions:
- "If I² is 75%, what does that tell us about the studies?"
- "Can we still do a meta-analysis with high I²?"
- "What might cause studies to have different true effects?"
3. Tau² (Tau-squared)
What it is: Estimated variance of true effects across studies.
Interpretation:
- Tau² = 0 → No heterogeneity (all studies estimate same effect)
- Larger tau² → Greater spread of true effects
- Tau (square root) is on same scale as effect size
Advantage: Absolute measure, unlike I² which is relative.
4. Prediction Interval
What it is: Range where we expect the true effect of a NEW study to fall.
Why it matters:
- Wider than confidence interval
- Shows practical implications of heterogeneity
- Critical for clinical decision-making
Example:
Pooled effect: OR = 0.70, 95% CI [0.55, 0.89]
Prediction interval: [0.35, 1.40]
Interpretation: While the average effect favors treatment,
a new study might find effects ranging from strongly
beneficial (0.35) to slightly harmful (1.40).
R Code for Heterogeneity Assessment
Basic Heterogeneity Statistics
library(metafor)
# Fit random-effects model
res <- rma(yi = yi, sei = sei, data = dat, method = "REML")
# View heterogeneity statistics
print(res)
# Look for: tau², I², H², Q, p-value
# Extract specific values
res$tau2 # tau-squared
res$I2 # I-squared (as proportion)
res$QE # Q statistic
res$QEp # p-value for Q test
Confidence Intervals for I²
# Get confidence interval for I²
confint(res)
# Output includes:
# estimate ci.lb ci.ub
# tau^2 0.0234 0.0012 0.1456
# I^2(%) 62.4000 12.3000 89.2000
Prediction Interval
# Calculate prediction interval
predict(res)
# Or manually:
pi_lower <- res$beta - qt(0.975, res$k-2) * sqrt(res$tau2 + res$se^2)
pi_upper <- res$beta + qt(0.975, res$k-2) * sqrt(res$tau2 + res$se^2)
Visualizing Heterogeneity
# Forest plot with prediction interval
forest(res,
slab = dat$study,
addpred = TRUE, # Adds prediction interval
header = TRUE)
# Baujat plot (identifies outliers)
baujat(res)
# GOSH plot (sensitivity to study inclusion)
gosh_res <- gosh(res)
plot(gosh_res)
Teaching Framework
Step 1: Report the Statistics
"Let's look at your heterogeneity results:
- Q = 24.5, p = 0.003 (significant)
- I² = 67% [42%, 82%]
- Tau² = 0.08"
Step 2: Interpret in Context
"This suggests substantial heterogeneity. About 67% of the variation we see is due to real differences between studies, not just chance."
Step 3: Discuss Implications
"With this level of heterogeneity, we should:
- Still report the pooled effect, but with caution
- Explore sources of heterogeneity
- Consider subgroup or meta-regression analysis
- Report the prediction interval"
Step 4: Investigate Sources
"Let's think about what might cause these differences:
- Different populations (age, severity)?
- Different interventions (dose, duration)?
- Different outcome measures?
- Different study designs?"
Decision Framework
I² Assessment
│
├── I² < 40%
│ └── Heterogeneity likely unimportant
│ → Proceed with pooled estimate
│
├── I² 40-75%
│ └── Moderate heterogeneity
│ → Report pooled estimate
│ → Explore sources (subgroups)
│ → Report prediction interval
│
└── I² > 75%
└── Substantial heterogeneity
→ Question if pooling is meaningful
→ Mandatory exploration of sources
→ Consider narrative synthesis
→ Always report prediction interval
Common Misconceptions
-
"High I² means we can't do meta-analysis"
- Reality: High I² means we need to investigate and interpret carefully
- Pooling may still be appropriate with proper caveats
-
"Non-significant Q means no heterogeneity"
- Reality: Q test has low power with few studies
- Always report I² and tau² alongside Q
-
"I² tells us about clinical importance"
- Reality: I² is statistical, not clinical
- A small I² can hide clinically important variation
Assessment Questions
-
Basic: "What does I² = 50% mean?"
- Correct: About half the observed variation is due to true differences between studies
-
Intermediate: "Q test is non-significant but I² = 45%. How do you interpret this?"
- Correct: Q test may be underpowered; moderate heterogeneity may still exist
-
Advanced: "Pooled OR = 0.6 [0.4, 0.9] but prediction interval is [0.3, 1.2]. What's the clinical implication?"
- Correct: While average effect is beneficial, a new setting might see no effect or even harm
Related Skills
meta-analysis-fundamentals- Understanding pooled effectsforest-plot-creation- Visualizing heterogeneitypublication-bias-detection- Another source of concern
Adaptation Guidelines
Glass (the teaching agent) MUST adapt this content to the learner:
- Language Detection: Detect the user's language from their messages and respond naturally in that language
- Cultural Context: Adapt examples to local healthcare systems and research contexts when relevant
- Technical Terms: Maintain standard English terms (e.g., "forest plot", "effect size", "I²") but explain them in the user's language
- Level Adaptation: Adjust complexity based on user's demonstrated knowledge level
- Socratic Method: Ask guiding questions in the detected language to promote deep understanding
- Local Examples: When possible, reference studies or guidelines familiar to the user's region
Example Adaptations:
- 🇧🇷 Portuguese: Use Brazilian health system examples (SUS, ANVISA guidelines)
- 🇪🇸 Spanish: Reference PAHO/OPS guidelines for Latin America
- 🇨🇳 Chinese: Include examples from Chinese medical literature
Related Skills
Attack Tree Construction
Build comprehensive attack trees to visualize threat paths. Use when mapping attack scenarios, identifying defense gaps, or communicating security risks to stakeholders.
Grafana Dashboards
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.
Matplotlib
Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.
Scientific Visualization
Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.
Seaborn
Statistical visualization. Scatter, box, violin, heatmaps, pair plots, regression, correlation matrices, KDE, faceted plots, for exploratory analysis and publication figures.
Shap
Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Query Writing
For writing and executing SQL queries - from simple single-table queries to complex multi-table JOINs and aggregations
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Scientific Visualization
Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.
