Introduction to Causal Econometrics with Observational Data

Author

Xiang Ao, HBS

Published

July 4, 2026

Preface

Most applied questions are causal. Did a program raise earnings? Did a drug reduce mortality? Did a policy change behavior? With experimental data, a difference in means can sometimes answer the question. With observational data, it almost never can. Causal econometrics is the business of saying what would have to be true for a number we can compute to mean the causal object we want.

Prerequisites and target audience

This guide assumes graduate-level (or advanced undergraduate) familiarity with probability and statistics – expectations, conditional expectations, basic asymptotic theory (consistency, asymptotic normality) – and with regression analysis at the level of an introductory econometrics course (OLS, instrumental variables, panel data fixed effects). It does not assume measure-theoretic probability. Familiarity with directed acyclic graphs (DAGs) is helpful but not required; the graph-theoretic chapters build up the needed concepts from scratch.

On the computational side, the code examples are in R and assume basic fluency with the language, including the tidyverse (dplyr, tidyr, ggplot2) for data manipulation and plotting. See the R Packages Used in This Book appendix for the full list of packages and an installation script.

This book is a working guide in R. It is for applied researchers who already know regression and probability, and who want to see the methods carried out end to end. Each chapter gives the identifying assumptions and then runs code on a real or simulated data set. The book does not try to teach every package. It shows when a package is useful and how the econometric object maps to the code.

The book is organized in the order in which an applied project usually runs. Part I is identification. Part II is estimation. Part III is designs: difference-in-differences, instrumental variables, regression discontinuity, and shift-share IV. The remaining parts cover longitudinal settings, survival outcomes, mediation, and causal discovery. The appendix lists the R packages used in the examples.

A companion volume, Causal Econometrics with Julia, covers the same ground in Julia. Chapters are cross-linked where the two books diverge — usually because a particular estimator is easier or faster in one language than the other. A messier working notebook, Topics on Econometrics and Causal Inference, holds the rougher posts that fed into both books. Source files and datasets are available in the repository linked above, and the README lists the required R packages (package versions are not pinned with a lockfile); the “Edit this page” link at the foot of each chapter goes directly to the corresponding .qmd file. Corrections and suggestions are welcome through the issues tracker.

How to cite

Ao, Xiang. Introduction to Causal Econometrics with Observational Data. https://xiangao.github.io/causal_econometrics_guide/.

@book{ao_causal_econometrics_guide,
  author = {Ao, Xiang},
  title  = {Introduction to Causal Econometrics with Observational Data},
  url    = {https://xiangao.github.io/causal_econometrics_guide/}
}