R Econometrics
by meleantonio
Run IV, DiD, and RDD analyses in R with proper diagnostics
Skill Details
Repository Files
2 files in this skill directory
name: r-econometrics description: Run IV, DiD, and RDD analyses in R with proper diagnostics workflow_stage: analysis compatibility:
- claude-code
- cursor
- codex
- gemini-cli author: Awesome Econ AI Community version: 1.0.0 tags:
- R
- econometrics
- causal-inference
- fixest
- regression
R Econometrics
Purpose
This skill helps economists run rigorous econometric analyses in R, including Instrumental Variables (IV), Difference-in-Differences (DiD), and Regression Discontinuity Design (RDD). It generates publication-ready code with proper diagnostics and robust standard errors.
When to Use
- Running causal inference analyses
- Estimating treatment effects with panel data
- Creating publication-ready regression tables
- Implementing modern econometric methods (two-way fixed effects, event studies)
Instructions
Step 1: Understand the Research Design
Before generating code, ask the user:
- What is your identification strategy? (IV, DiD, RDD, or simple regression)
- What is the unit of observation? (individual, firm, country-year, etc.)
- What fixed effects do you need? (entity, time, two-way)
- How should standard errors be clustered?
Step 2: Generate Analysis Code
Based on the research design, generate R code that:
- Uses the
fixestpackage - Modern, fast, and feature-rich for panel data - Includes proper diagnostics:
- For IV: First-stage F-statistics, weak instrument tests
- For DiD: Parallel trends visualization, event study plots
- For RDD: Bandwidth selection, density tests
- Uses robust/clustered standard errors appropriate for the data structure
- Creates publication-ready output using
modelsummaryoretable
Step 3: Structure the Output
Always include:
# 1. Setup and packages
# 2. Data loading and preparation
# 3. Descriptive statistics
# 4. Main specification
# 5. Robustness checks
# 6. Visualization
# 7. Export results
Step 4: Add Documentation
Include comments explaining:
- Why each specification choice was made
- Interpretation of key coefficients
- Limitations and assumptions
Example Prompts
- "Run a DiD analysis with state and year fixed effects, clustering at the state level"
- "Estimate the effect of X on Y using Z as an instrument"
- "Create an event study plot showing treatment effects by year"
- "Run a sharp RDD with optimal bandwidth selection"
Example Output
# ============================================
# Difference-in-Differences Analysis
# ============================================
# Setup
library(tidyverse)
library(fixest)
library(modelsummary)
# Load data
df <- read_csv("data.csv")
# Prepare treatment variable
df <- df %>%
mutate(
post = year >= treatment_year,
treated = state %in% treatment_states,
treat_post = treated * post
)
# ----------------------------------------
# Main DiD Specification
# ----------------------------------------
# Two-way fixed effects
did_model <- feols(
outcome ~ treat_post | state + year,
data = df,
cluster = ~state
)
# View results
summary(did_model)
# ----------------------------------------
# Event Study
# ----------------------------------------
# Create relative time variable
df <- df %>%
mutate(rel_time = year - treatment_year)
# Event study regression
event_study <- feols(
outcome ~ i(rel_time, treated, ref = -1) | state + year,
data = df,
cluster = ~state
)
# Plot coefficients
iplot(event_study,
main = "Event Study: Effect on Outcome",
xlab = "Years Relative to Treatment")
# ----------------------------------------
# Robustness: Alternative Specifications
# ----------------------------------------
# Different clustering
did_robust <- feols(
outcome ~ treat_post | state + year,
data = df,
cluster = ~state + year # Two-way clustering
)
# ----------------------------------------
# Export Results
# ----------------------------------------
modelsummary(
list("Main" = did_model, "Two-way Cluster" = did_robust),
stars = c('*' = 0.1, '**' = 0.05, '***' = 0.01),
output = "results/did_table.tex"
)
Requirements
Software
- R 4.0+
Packages
fixest- Fast fixed effects estimationmodelsummary- Publication-ready tablestidyverse- Data manipulationggplot2- Visualization
Install with:
install.packages(c("fixest", "modelsummary", "tidyverse"))
Best Practices
- Always cluster standard errors at the level of treatment assignment
- Run pre-trend tests for DiD designs
- Report first-stage F-statistics for IV (should be > 10)
- Use
feolsoverlmfor panel data (faster and more features) - Document all specification choices in your code comments
Common Pitfalls
- ❌ Not clustering standard errors at the right level
- ❌ Ignoring weak instruments in IV estimation
- ❌ Using TWFE with staggered treatment timing (use
didorsunab()instead) - ❌ Not reporting robustness checks
References
- fixest documentation
- Cunningham (2021) Causal Inference: The Mixtape
- Angrist & Pischke (2009) Mostly Harmless Econometrics
Changelog
v1.0.0
- Initial release with IV, DiD, RDD support
Related Skills
Attack Tree Construction
Build comprehensive attack trees to visualize threat paths. Use when mapping attack scenarios, identifying defense gaps, or communicating security risks to stakeholders.
Grafana Dashboards
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.
Matplotlib
Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.
Scientific Visualization
Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.
Seaborn
Statistical visualization. Scatter, box, violin, heatmaps, pair plots, regression, correlation matrices, KDE, faceted plots, for exploratory analysis and publication figures.
Shap
Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Query Writing
For writing and executing SQL queries - from simple single-table queries to complex multi-table JOINs and aggregations
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Scientific Visualization
Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.
