---
title: "Treatment effects and matching"
date: "2019-01-10"
---
## Treatment effects in observational studies
Despite the popularity of randomized experiements in economics nowadays, most situations we have observational data in economic studies. One reason is experiemnts are expensive; the other reason is that sometimes it is simply not feasible to have experiments. If we have observational data, and we'd like to draw causal conclusions, then we have a few different situations. The worse situation is that we have an endogenous treatement. Or to say, we have unobserved confounders. In that case, we need instrumental variables. However instruments are hard to find, and even harder to justify. If we assume we don't have unobserved counfounders. Or to say, we have conditional independence of the treatment variable. That is, conditional on other variables in the model, there is no unobserved confounders. If that assumption holds, then we can make causal inference.
Stata has a set of eteffects and teffects commands are designed for treatment effects. eteffects is for treatment effects when you have endogenous treatment. In that case, you'll need instruments to model treatment at first stage. Then eteffects use control function approach for the second stage of modeling treatment effects on outcome. In this blog, we are trying to understand how teffects works. That is, when we don't need an instruments, or say we assume no unobserved confounders.
### Regression adjustement (RA)
The RA method is to allow all coveriates' effects differ between treatment and control. That is, it is the same model as an outcome model with treatment interacting with all other coveriates.
#### Example:
```{r}
#| label: stata-chunk1
#| engine: 'stata'
#| engine.path: '/usr/local/bin/stata'
#| cache: true
clear
webuse bweightex
teffects ra (bweight prenatal1 mmarried mage fbaby) (mbsmoke)
reg bweight i.mbsmoke##c.(prenatal1 mmarried mage fbaby)
margins r.mbsmoke
```
We can see the teffects return the same ATE(Average Treatment Effect) as the margins command after a regression with treatment interacting with all other covariates.
To estimate ATET (or ATT, Average Treatment effected on the Treated),
```{r}
#| label: stata-chunk2
#| engine: 'stata'
#| engine.path: '/usr/local/bin/stata'
#| cache: true
clear
webuse bweightex
teffects ra (bweight prenatal1 mmarried mage fbaby) (mbsmoke), atet
reg bweight i.mbsmoke##c.(prenatal1 mmarried mage fbaby)
margins r.mbsmoke, subpop(mbsmoke)
```
It is the comparison of treatment's effect on the potential outcomes, for the treated. That's why in margins, we have subpop(mbsmoke) option.
### Inverse Probability Weighting (IPW)
IPW estimators have two steps. The first step is to estimate the treatment model, that is, treatment as a function of some covariates. Usually a logit model is used. Then the probability of treatment is estimated. In the second stage, the inverse probability is used as weights to compute the outcome difference between treatment versus control units.
These steps produce consistent estimates of the effect parameters
because the treatment is assumed to be independent of the potential
outcomes after conditioning on the covariates.
We can manually conduct the two steps, but the nice thing about using Stata's teffects is that it takes account of the noise of estimating probability in the first step when calculating standard errors in the second step.
#### example
```{r}
#| label: stata-chunk3
#| engine: 'stata'
#| engine.path: '/usr/local/bin/stata'
#| cache: true
clear
webuse cattaneo2
teffects ipw (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu, probit)
probit mbsmoke mmarried c.mage##c.mage fbaby medu
predict ps
replace ps = 1/ps if mbsmoke==1
replace ps = 1/(1-ps) if mbsmoke==0
reg bweight mbsmoke [pweight=ps]
```
In the above example, we run teffects ipw and then manually replicate the two steps. The only thing different is the standard error for the treatment effect.
### Doubly robust estimators
RA estimator estimates the outcome directly, IPW esitmates the treatment assignment. There is also a group of estimators called doubly-robust estimators. They estimate both stages, and only require one of these stages to be correctly specified.
Stata implemented the augmented-IPW (AIPW) combination proposed by Robins and Rotnitzky (1995) and the IPW-regression-adjust ment (IPWRA) combination proposed by Wooldridge (2010).
The AIPW estimator augments the IPW estimator with a correction term. The term removes the bias if the treatment model is wrong and the outcome model is correct, and the term goes to 0 if the treatment model is correct and the outcome model is wrong.
The IPWRA estimator uses IPW probability weights when performing RA. The weights do not affect the accuracy of the RA estimator if the treatment model is wrong and the outcome model is correct. The weights correct the RA estimator if the treatment model is correct and the outcome model is wrong.
I have not figured out how to do these two commands manually. So we'll just run two examples of how they are used.
```{r}
#| label: stata-chunk4
#| engine: 'stata'
#| engine.path: '/usr/local/bin/stata'
#| cache: true
clear
webuse cattaneo2
teffects ipwra (bweight mmarried mage prenatal1 fbaby) (mbsmoke mmarried mage prenatal1 fbaby)
teffects aipw (bweight mmarried mage prenatal1 fbaby) (mbsmoke mmarried mage prenatal1 fbaby)
```
## Matching
First let's emphasize that matching usually does not deal with endogenous treatment problem. Some people have the wrong impression that matching resolves the endogeneity problem, but in fact it only helps for selection on observables. If you have selection on unobservables, or unobserved confounders, matching does not help.
Matching aims to blance the distribution of covariates in the treatment and control groups. That's all it does, to make comparisons between apples, not apples to oranges. In that sense, regression does that too. We use regression to adjust for differences between treatment and control. However, regression sometimes rely on extrapolation too much, when data do not overlap between treatment and control. It's not that matching can do magic on non-overlapping data, but it can make it clear that how bad the non-overlapping problem is. Simply running regression blindly will not have the researchers realize the non-overlapping problem. Combining matching with regression is probably a better idea. That is, running regression on matched sample is recommended.
Here we introduced a few popular matching matheds implemented in Stata, namely nnmatch, psmatch, and cem.
### Propensity score matching
The ideal situation of matching is exact matching; that is, treatment and control units can match to each other with exactly the same value of all covariates. This is often not possible, as the number of covariates increase and if they are not discrete. In high dimention (many covariates to match), it is very hard or even impossible to find exact matches. Therefore, how to reduce from high dimension to low dimenstion is key. Rosenbaum and Rubin 1983 proved that propensity score provides the one-dimension representation of high-dimension of covariates. Therefore, we only need to find matches based on propensity scores.
In Stata, teffects psmatch can do estimation after matching on propensity scores. Stata's psmatch2 command has been popular for propensity score matching too. The nice thing of these commands is that it does two steps in one command: first it estimate the logit or probit model for propensity score, then match the treatment and control groups, then estimate the outcome equation on matched sample. The standard errors are correct based on all these steps. If we do this manually, standard errors will not be correct.
#### example
```{r}
#| label: stata-chunk5
#| engine: 'stata'
#| engine.path: '/usr/local/bin/stata'
#| cache: true
clear
webuse cattaneo2
teffects psmatch (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu), atet
psmatch2 mbsmoke mmarried c.mage##c.mage fbaby medu, logit ties
reg bweight mbsmoke [aweight=_weight]
```
In the above example, suppose we are interesting in mbsmoke's effect on bweight. If we match smoking mothers with non-smoking mothers, by age, first babe, and education, the the teffects psmatch gives us the treatment effect on the treated. We can do this in two steps. First, by psmatch2 we generate _weight, which is a number for how many times a control unit is used to match to a treatment unit. Notice that to match teffects result, we use logit, and ties option. By default, teffects psmatch includes all ties (control units that have the same propensity scores that are close enough), but psmatch2 by default only include one. Then in the second step, we run a regression with _weight as the weight. Notice the standard errors will differ. We should use the standard errors reported in teffects.
### Nearest neighbor matching
Stata has the nnmatch option for teffects. nnmatch by default uses Mahalanobis distance, it can also specify to have exact matching. The advantage is this is nonparametric.
```{r}
#| label: stata-chunk6
#| engine: 'stata'
#| engine.path: '/usr/local/bin/stata'
#| cache: true
clear
webuse cattaneo2
teffects nnmatch (bweight mmarried mage fbaby medu) (mbsmoke) , atet
teffects psmatch (bweight) (mbsmoke mmarried mage fbaby medu), atet
```
Notice the specification is different in these two models.
### Coarsened Exact Matching (CEM)
CEM is not part of teffects. But Stata does have cem package implemented.
```{r}
#| label: stata-chunk7
#| engine: 'stata'
#| engine.path: '/usr/local/bin/stata'
#| cache: true
clear
webuse cattaneo2
cem mmarried mage fbaby medu, treatment(mbsmoke)
reg bweight mbsmoke [aweight=cem_weights]
```
In the above example, we use CEM on the same covariates, then apply weights in the final regression. The disadvantage is that we have not accounted for the noise in calculating those weights.
## Modern matching in R: balance diagnostics and entropy weighting
The 2019 vintage of this chapter used Stata's `teffects` throughout. R's
`MatchIt` and `WeightIt` packages have since become the standard for matching
workflows, and `cobalt` provides publication-quality balance tables and plots
that are now expected in applied papers.
### Balance checking with cobalt
After any matching or weighting step, always inspect covariate balance.
`cobalt`'s `bal.tab()` and `love.plot()` are the go-to tools:
```{r}
#| eval: false
#| echo: true
library(MatchIt)
library(cobalt)
# Propensity score matching on Cattaneo birth-weight data
data("lalonde", package = "MatchIt")
m_out <- matchit(treat ~ age + educ + black + hispan + married + nodegree +
re74 + re75,
data = lalonde, method = "nearest", distance = "glm",
ratio = 1)
# Balance table: standardized mean differences before and after matching
bal.tab(m_out, stats = c("mean.diffs", "variance.ratios"), thresholds = c(m = 0.1))
# Love plot: visualize balance improvement
love.plot(m_out, stats = "mean.diffs", threshold = 0.1,
abs = TRUE, var.order = "unadjusted")
```
The love plot shows each covariate's standardized mean difference (SMD)
before matching (open circles) and after (filled circles). The vertical line
at SMD = 0.1 is the conventional threshold for "balanced." If post-matching
points cross the line, the matching is not improving balance and a different
specification is needed.
### Entropy balancing
Propensity score matching discards unmatched units and can reduce effective
sample size substantially. **Entropy balancing** (Hainmueller 2012) keeps all
units and reweights the control group so that weighted covariate moments
exactly match the treatment group — without discarding anyone.
```{r}
#| eval: false
#| echo: true
library(WeightIt)
library(cobalt)
# Entropy balancing: exact balance on means, variances, and covariances
w_out <- weightit(treat ~ age + educ + black + hispan + married + nodegree +
re74 + re75,
data = lalonde, method = "ebal", estimand = "ATT",
moments = 1) # moments=1: balance means only
# moments=2: balance means + variances
summary(w_out)
# Check balance
bal.tab(w_out, stats = "mean.diffs", thresholds = c(m = 0.1))
love.plot(w_out, threshold = 0.1, abs = TRUE)
# Weighted outcome regression
library(lmtest); library(sandwich)
m_eb <- lm(re78 ~ treat + age + educ + black + hispan + married + nodegree +
re74 + re75,
data = lalonde, weights = w_out$weights)
coeftest(m_eb, vcov = vcovHC(m_eb, type = "HC3"))["treat", ]
```
| Method | Pros | Cons |
|---|---|---|
| PSM (nearest neighbor) | Intuitive; respects overlap | Discards units; balance not guaranteed |
| CEM | Exact balance in bins | Requires discretizing continuous vars |
| Entropy balancing | Exact mean balance; keeps all units | Weights can be extreme; no variance balance by default |
| IPW (logistic) | Simple; doubly-robust with outcome model | Extreme weights if overlap is poor |
For most applications, entropy balancing or IPW with trimmed weights (via
`WeightIt`'s `trim()`) is preferred over PSM: better effective sample size,
guaranteed balance, and easy integration with doubly-robust estimators.
See also the [weighting chapter](weighting-part2.qmd) for IPW estimators and
the [OLS-ATE chapter](ols-ate.qmd) for regression-based approaches.