16 IV & Regression Discontinuity

using CairoMakie
using GraphMakie, Graphs
using Distributions, Random, LinearAlgebra
using RDRobust
using Panelest
using Vcov
using DataFrames
using StatsModels, DataFramesMeta
import StatsAPI: coeftable
using Printf
using RegressionTables

If we have an unobserved confounder, unconfoundedness fails. Even when we have random experiment, we could have noncompliance. In that case, subjects are randomly assigned to treatment or control, but some of them do not comply with the assignment. In other words, there is selection bias in the sense that the treated group is self-selected. This selection could be correlated with the outcome.

16.1 Instrumental Variables

16.1.1 Uncontrolled Confounder

16.1.2 Why Does IV Work?

16.1.3 IV Under Potential Outcomes

W has two potential outcomes W(1) and W(0), as a function of Z.
Y has two potential outcomes Y(1) and Y(0), as a function of W.

Intention to Treat (ITT) is the average treatment effect of the treatment assignment Z. It is the difference between the average potential outcome under treatment and the average potential outcome under control. It is the average treatment effect of the treatment assignment Z, regardless of whether the subject complies with the assignment.

There are four possible groups of subjects in the case of binary treatment and binary instrument. The compliance type is defined by the latent pair of potential treatments \((W(0), W(1))\), not by an observed \((Z, W)\) cell:

complier: \(W(0)=0, W(1)=1\)
always-taker: \(W(0)=1, W(1)=1\)
never-taker: \(W(0)=0, W(1)=0\)
defier: \(W(0)=1, W(1)=0\)

A given observed \((Z, W)\) cell does not pin down the type (see the table below).

compliance groups
	W(1)=0	W(1)=1
W(0)=0	N	C
W(0)=1	D	A

These four groups (stratum) have different causal mechanisms. For example, the compliers are the only group that is affected by the treatment assignment Z in the same direction. The defiers are the only group that is affected by the treatment assignment Z in the opposite direction. The always-takers and never-takers are not affected by the treatment assignment Z.

Unfortunately we cannot identify the compliance group by looking at the data.

Z	W	G
0	0	C, N
0	1	A, D
1	0	N, D
1	1	C, A

16.1.4 Assumptions

Exclusion restriction: writing the potential outcome as a function of both treatment \(w\) and instrument \(z\), \(Y(w,z)\), the instrument has no direct effect on the outcome: \(Y(w, 1) = Y(w,0)\)
Exogeneity of instrument: \(Z \perp [W(0),W(1), Y(0), Y(1)]\)
Monotonicity (no defiers): \(W(0) \leq W(1)\)

16.1.5 LATE Identification

Since we excluded defiers, and the other two groups (always-takers and never-takers) are not affected by the treatment assignment Z. The only effect we can identify is the treatment effect on the compliers. This is called the Local Average Treatment Effect (LATE).

\[ \small \begin{aligned} & E[Y(Z=1) - Y(Z=0) | G = C] \\ &= \frac{E[Y(Z=1) - Y(Z=0)]}{P(W(1)=1, W(0)=0)} \\ &= \frac{E[Y|Z=1] - E[Y|Z=0]}{1-P(W=0|Z=1)-P(W=1|Z=0)} \text{ (rule out groups A and N) } \\ &= \frac{E[Y|Z=1] - E[Y|Z=0]}{P(W=1|Z=1)-P(W=1|Z=0)} \\ &= \frac{E[Y|Z=1] - E[Y|Z=0]}{E[W|Z=1]-E[W|Z=0]} \end{aligned} \]

16.1.6 Parametric Models

If we assume linearity, then this becomes the ratio of two OLS coefficients, sometimes called the Wald estimator.

\[ \begin{aligned} \tau_{LATE} &= \frac{E[Y|Z=1] - E[Y|Z=0]}{E[W|Z=1]-E[W|Z=0]} \\ &= \frac{Cov(Y,Z)}{Cov(W,Z)} \end{aligned} \]

Why do covariates help? It is worth separating two distinct assumptions. Conditioning on covariates \(X\) can make the as-if-random (conditional independence) assumption more credible: the instrument may be unconfounded only within levels of \(X\). The exclusion restriction – no direct effect of the instrument on the outcome – is a separate, substantive assumption that covariates do not generally repair. Adding \(X\) removes a direct \(Z \to Y\) path only if that path runs through \(X\), or if exclusion is conditional by design. So covariates help with conditional instrument independence, not with exclusion itself.

16.1.7 Example

Random.seed!(66)
nobs = 10000

# Here we have three variables w, m, z.
# m is the omitted variable, w and m are correlated.
# z is the instrument, which is correlated with w, but not m.
# u is independent of everything else.
covarMat = [
    1.0   0.36  0.64;
    0.36  1.0   0.0;
    0.64  0.0   1.0
]

mvn = MvNormal(zeros(3), covarMat)
mat = rand(mvn, nobs)'

data = DataFrame(w = mat[:, 1], m = mat[:, 2], z = mat[:, 3])
data.u = rand(Normal(0, 1), nobs)

# DGP
data.y = data.w .+ data.m .+ data.u

lm_biased = feols(data, @formula(y ~ w))
lm_full = feols(data, @formula(y ~ w + m))

# Manual 2SLS explicitly (educational)
# Stage 1: Predict endogenous variable w using instrument z
stage1 = feols(data, @formula(w ~ z))
data.w_hat = data.w .- stage1.residuals # Obtain fitted values

# Stage 2: Regress outcome y on predicted w_hat
tsls_manual = feols(data, @formula(y ~ w_hat));

# Biased OLS
println("── Biased OLS (w only) ──────────────────────")
show_regression_html(lm_biased)

# Full OLS is good
println("\n── Full OLS (w + m) ─────────────────────────")
show_regression_html(lm_full)

# Manual 2SLS is good
# Note: the coefficients from manual 2SLS are correct, but standard
# errors are slightly incorrect because they use residuals from w_hat,
# not the true w.
println("\n── Manual 2SLS ──────────────────────────────")
show_regression_html(tsls_manual)

── Biased OLS (w only) ──────────────────────

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	-0.012	0.010	-1.164	0.244
w	1.357	0.010	135.823	< 2e-16 ***
--- Signif. codes: 0 '*' 0.001 '' 0.01 '*' 0.05 '.' 0.1 ' ' 1


── Full OLS (w + m) ─────────────────────────

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	-0.009	0.010	-0.915	0.360
w	0.998	0.011	93.176	< 2e-16 ***
m	0.998	0.011	93.439	< 2e-16 ***
--- Signif. codes: 0 '*' 0.001 '' 0.01 '*' 0.05 '.' 0.1 ' ' 1


── Manual 2SLS ──────────────────────────────

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	-0.006	0.010	-0.577	0.564
w_hat	1.000	0.016	64.426	< 2e-16 ***
--- Signif. codes: 0 '*' 0.001 '' 0.01 '*' 0.05 '.' 0.1 ' ' 1

16.1.8 Control Function Approach

The manual 2SLS above implicitly implements the control function (CF) approach. Understanding why makes the extension to nonlinear models (next chapter) transparent.

Algebraic motivation. Because w is endogenous, decompose the structural error:

\[\varepsilon = \rho v + \eta, \qquad v \equiv w - E[w \mid z], \qquad E(z\eta)=0,\; E(zv)=0.\]

If we knew \(v\), controlling for it directly would eliminate the confounding channel. We don’t know \(v\), but the first-stage residual \(\hat{v} = w - \hat{w}\) is a consistent estimate. Substituting into the structural equation:

\[y = \alpha + \beta w + \rho\,\hat{v} + \text{error}.\]

OLS on this augmented second stage gives a consistent \(\hat\beta\) — identical to 2SLS by the Frisch–Waugh–Lovell (FWL) theorem. The coefficient \(\hat\rho\) on \(\hat{v}\) doubles as a Durbin–Wu–Hausman (DWH) endogeneity test: under \(H_0\) (w is exogenous), \(\rho = 0\).

import StatsBase: coef, stderror, coefnames

# Control function: add first-stage residual v̂ to the second stage OLS
data.v_hat = stage1.residuals

lm_cf = feols(data, @formula(y ~ w + v_hat))

# Compare point estimates from manual 2SLS and CF — should be identical by FWL
coef_tsls = coef(tsls_manual)[findfirst(==("w_hat"), coefnames(tsls_manual))]
coef_cf   = coef(lm_cf)[findfirst(==("w"),     coefnames(lm_cf))]

@printf("Manual 2SLS   β̂_w = %.8f\n", coef_tsls)
@printf("Control funct β̂_w = %.8f\n", coef_cf)

# DWH test: t-statistic on v̂
rho_hat = coef(lm_cf)[findfirst(==("v_hat"), coefnames(lm_cf))]
rho_se  = stderror(lm_cf)[findfirst(==("v_hat"), coefnames(lm_cf))]
@printf("\nDWH endogeneity test (ρ̂ on v̂): coef = %.4f  se = %.4f  t = %.3f\n",
        rho_hat, rho_se, rho_hat / rho_se)

Manual 2SLS   β̂_w = 0.99987086
Control funct β̂_w = 0.99987086

DWH endogeneity test (ρ̂ on v̂): coef = 0.6100  se = 0.0203  t = 30.077

The two \(\hat\beta_w\) values agree to machine precision, confirming FWL. The large \(|t|\) on \(\hat{v}\) rejects \(H_0 : \rho = 0\), confirming that w is endogenous.

Note

Why FWL fails in nonlinear models. FWL relies on the residual-maker matrix being idempotent and the second stage being linear — both fail once we replace OLS with Poisson or Probit. In a nonlinear second stage, \(\hat{v}\) must enter inside the link function (e.g. \(\exp(\mathbf{x}\boldsymbol\beta + \rho\hat{v})\)), which requires a structural assumption beyond instrument exogeneity. This is the subject of the next chapter.

16.2 When 2SLS with Covariates Is Actually LATE

The LATE identification above is clean because we have no covariates. The Wald estimator gives us LATE, full stop. But in practice we usually add covariates to defend conditional exogeneity:

\[ Y = \beta W + \gamma^\top X + U, \qquad W = \pi Z + \delta^\top X + V. \]

Does 2SLS on this still give us a LATE? Angrist and Pischke (2009) say it gives a weighted average of covariate-specific LATEs. Blandhol et al. (2025) show that this is generally not true.

16.2.1 Why the Linear Specification Hides an Assumption

By Frisch-Waugh-Lovell, the IV estimand is

\[ \beta_{iv} = \frac{E[Y \tilde Z]}{E[W \tilde Z]}, \qquad \tilde Z = Z - L[Z \mid X], \]

where \(L[Z \mid X]\) is the linear projection of \(Z\) on \(X\). So \(\tilde Z\) is the part of \(Z\) that a linear regression on \(X\) cannot explain.

What if the true \(E[Z \mid X]\) is not linear in \(X\)? Then \(L[Z \mid X] \neq E[Z \mid X]\), and the residual instrument \(\tilde Z\) still picks up some of the conditional mean. The IV estimand then mixes treatment effects across compliance groups. Blandhol et al. (2025) prove that \(\beta_{iv}\) is a non-negatively weighted average of conditional LATEs only if the rich covariates condition holds:

\[ L[Z \mid X] = E[Z \mid X]. \]

This is a parametric assumption about \(E[Z \mid X]\). It is implicit in the linear-in-\(X\) specification, and it is rarely defended. When it fails, \(\beta_{iv}\) picks up treatment effects for always-takers as well as compliers, and some always-taker terms enter with negative weight.

Warning

“2SLS with covariates estimates LATE” is true only if \(X\) enters saturated (one dummy per cell), or the true \(E[Z \mid X]\) happens to be linear in \(X\).

16.2.2 A Simulation

We build a DGP where \(E[Z \mid X]\) is wildly nonlinear in \(X\), so the linear-in-\(X\) specification has to fail. Then we compare four estimators against the true LATE.

using MLJ
using DecisionTree
using Random
using Statistics
using LinearAlgebra

Random.seed!(13)
n = 8000
plogis(z) = 1 / (1 + exp(-z))

# Single covariate on [-1, 1]
X = 2 .* rand(n) .- 1

# True conditional mean of Z is wildly nonlinear in X.
# A linear projection L[Z|X] cannot reproduce this.
pZ_true = plogis.(0.3 .+ 1.0 .* X .+ 2.0 .* sin.(2.5 .* π .* X))
Z       = Int.(rand(n) .< pZ_true)

# Potential treatment states with monotonicity (T1 >= T0).
# X enters the propensity, so X confounds W <-> Y.
U  = rand(n)
p0 = plogis.(-1.6 .+ 1.0 .* X)
p1 = min.(plogis.(-0.4 .+ 1.0 .* X) .+ 0.2, 0.95)
T0 = Int.(U .< p0)
T1 = Int.(U .< p1)
W  = ifelse.(Z .== 1, T1, T0)

G  = [t1 == 1 && t0 == 1 ? :AT :
      t1 == 0 && t0 == 0 ? :NT : :CP
      for (t1, t0) in zip(T1, T0)]

# Heterogeneous treatment effect — varies with X
τ  = 0.5 .+ 1.5 .* X
Y0 = 1.0 .+ 2.0 .* X .+ 0.8 .* X .^ 2 .+ randn(n)
Y1 = Y0 .+ τ
Y  = ifelse.(W .== 1, Y1, Y0)

true_LATE = mean(τ[G .== :CP])
@printf("Compliance shares  AT=%.2f  CP=%.2f  NT=%.2f\n",
        mean(G .== :AT), mean(G .== :CP), mean(G .== :NT))
@printf("True unconditional LATE = %.3f\n", true_LATE)

Compliance shares  AT=0.19  CP=0.42  NT=0.39
True unconditional LATE = 0.617

Now we compare four estimators. Julia has no equivalent of R’s DoubleML package, so we implement DDML PLIV manually with 5-fold cross-fitting using random-forest nuisance estimates. This is the same pattern the nonparametric.qmd chapter uses for DDML PLR.

# Helper: TSLS via the matrix expression (X is the exogenous matrix incl. constant).
function tsls_matrix(Y, W, Z, Xmat)
    Wmat = hcat(W, Xmat)
    Zmat = hcat(Z, Xmat)
    P    = Zmat * ((Zmat' * Zmat) \ Zmat')
    What = P * Wmat
    β    = (What' * Wmat) \ (What' * Y)
    return β[1]
end

# (a) Wald — biased: Z is not unconditionally exogenous because X confounds.
wald = (mean(Y[Z .== 1]) - mean(Y[Z .== 0])) /
       (mean(W[Z .== 1]) - mean(W[Z .== 0]))

# (b) TSLS with X entered linearly — the textbook spec.
b_lin  = tsls_matrix(Float64.(Y), Float64.(W), Float64.(Z),
                     hcat(ones(n), X))

# (c) TSLS with a rich polynomial basis — closer to saturation.
b_poly = tsls_matrix(Float64.(Y), Float64.(W), Float64.(Z),
                     hcat([X .^ k for k in 0:7]...))

# (d) DDML PLIV with random-forest nuisance estimates and 5-fold cross-fitting.
RandomForestRegressor = @load RandomForestRegressor pkg=DecisionTree verbosity=0
learner = RandomForestRegressor(n_trees=500, max_depth=5)

nsplits = 5
s       = shuffle(1:n)
folds   = [s[1 + floor(Int, (k-1)*n/nsplits) : floor(Int, k*n/nsplits)]
           for k in 1:nsplits]

mY = zeros(n); mW = zeros(n); mZ = zeros(n)
Xtab = DataFrame(X = X)

for fold_idx in 1:nsplits
    test  = folds[fold_idx]
    train = setdiff(1:n, test)
    for (out, dest) in ((Y, mY), (W, mW), (Z, mZ))
        mach = machine(learner, Xtab[train, :], Float64.(out[train]))
        MLJ.fit!(mach, verbosity = 0)
        dest[test] = MLJ.predict(mach, Xtab[test, :])
    end
end

rY = Y .- mY; rW = W .- mW; rZ = Z .- mZ
b_pliv = sum(rZ .* rY) / sum(rZ .* rW)

results = DataFrame(
    Estimator = ["True LATE",
                 "Wald (no covariates)",
                 "TSLS, X linear",
                 "TSLS, poly(X, 7)",
                 "DDML PLIV (manual, RF)"],
    Estimate  = [true_LATE, wald, b_lin, b_poly, b_pliv]
)
results

5×2 DataFrame

Row	Estimator	Estimate
	String	Float64
1	True LATE	0.616857
2	Wald (no covariates)	1.91029
3	TSLS, X linear	0.551637
4	TSLS, poly(X, 7)	0.585279
5	DDML PLIV (manual, RF)	0.581937

The linear-in-\(X\) 2SLS does not recover the true LATE. The polynomial 2SLS is closer, and DDML PLIV is close to the polynomial estimate. The Wald estimator is far off because \(Z\) is not unconditionally exogenous, and Wald has no way to use \(X\).

What are the polynomial 2SLS and DDML PLIV estimating, if not LATE? Both target \(\beta_{rich}\), a weighted average of conditional LATEs with weights proportional to \(\text{Var}(Z \mid X) \cdot P(\text{complier} \mid X)\). It is non-negatively weighted, so it is weakly causal. But it is not the unconditional LATE we usually want.

16.2.3 Diagnostic: RESET on Z ~ X

Rich covariates says \(E[Z \mid X]\) is linear in \(X\). That is exactly the null of Ramsey’s RESET test (Ramsey 1969). Julia has no direct equivalent of R’s lmtest::resettest(), but the test is just an F-test of joint significance of higher powers of \(X\).

using GLM

df_reset       = DataFrame(Z = Float64.(Z), X = X, X2 = X .^ 2, X3 = X .^ 3)
m_restricted   = lm(@formula(Z ~ X),                df_reset)
m_unrestricted = lm(@formula(Z ~ X + X2 + X3),      df_reset)
rss_r          = sum(residuals(m_restricted) .^ 2)
rss_u          = sum(residuals(m_unrestricted) .^ 2)
q              = 2          # restrictions
k_u            = 4          # parameters in unrestricted
F_stat         = ((rss_r - rss_u) / q) / (rss_u / (n - k_u))
pval           = 1 - cdf(FDist(q, n - k_u), F_stat)
@printf("RESET-style F on Z ~ X (joint test of X^2, X^3): F = %.2f, p = %.4g\n",
        F_stat, pval)

RESET-style F on Z ~ X (joint test of X^2, X^3): F = 131.57, p = 0

In our DGP it rejects strongly. The same test on real data is cheap, and it tells us whether the LATE interpretation is defensible before we report a 2SLS coefficient.

16.2.4 What to Do Instead

Blandhol et al. (2025) give four steps for empirical work.

Reconsider the covariates. If \(Z\) is unconditionally exogenous, drop them. A kitchen-sink set of controls makes rich covariates less likely.
Run RESET on Z ~ X. If it does not reject, the linear-IV-as-LATE interpretation is defensible.
If RESET rejects, report DDML PLIV alongside 2SLS. Julia has no first-class PLIV package, so the manual cross-fitting above is the recommended route. In R the package is DoubleML; in Stata it is ddml. DDML PLIV targets \(\beta_{rich}\), a non-negatively weighted average of conditional LATEs.
For a binary instrument, also estimate the unconditional LATE. With instrument propensity score weighting (Słoczyński 2024),

\[ \hat\beta_{late} = \frac{\sum_i Y_i [Z_i/\hat p(X_i) - (1-Z_i)/(1-\hat p(X_i))]}{\sum_i W_i [Z_i/\hat p(X_i) - (1-Z_i)/(1-\hat p(X_i))]}, \]

where \(\hat p(X) = P(Z = 1 \mid X)\) is estimated nonparametrically. Stata has kappalate (Słoczyński, Uysal, and Wooldridge). In Julia or R it is straightforward to build once \(\hat p(X)\) is in hand.

16.2.5 A Note on the FRDD Example

The fuzzy-RDD example at the end of this chapter is 2SLS with race, state of birth, and quarter-of-birth fixed effects entered as covariates. The 2SLS estimate and the local RDRobust estimate differ. We called that “sensitivity to model selection” earlier. The rich-covariates story is at least as plausible an explanation, and the RESET test is the right first thing to run.

16.2.6 Bottom Line

“2SLS with covariates estimates LATE” needs the rich covariates condition. That condition is a parametric assumption on \(E[Z \mid X]\) that researchers rarely defend, and a simple RESET test often rejects it. The honest alternatives are DDML PLIV for \(\beta_{rich}\) and IPSW (or DDML) for the unconditional LATE. Even when rich covariates holds, \(\beta_{rich}\) can be quite different from the unconditional LATE we usually care about.

16.3 Regression Discontinuity Design

RDD (regression discontinuity design) is a quasi-experimental design that is used to estimate causal effects of interventions when assignment to the intervention is determined by whether a subject’s value on an observed covariate exceeds a threshold. The idea is that the assignment is as good as random, so we can estimate the causal effect of the intervention by comparing the outcomes of subjects who are just above and just below the threshold.

16.3.1 Sharp RDD

We have a continuous variable \(X\), called the running variable, which determines the binary treatment \(W\). The treatment is assigned according to a threshold \(c\). The outcome variable \(Y\) is a function of \(X\) and \(W\).

\[ W=\mathbb{1}(X>c)\]

Lee (2008) studies the effect of incumbency advantage in elections. His identification strategy is based on the discontinuity generated by the rule that the party with a majority vote share wins. The forcing variable \(X_i\) is the difference in vote share between the Democratic and Republican parties in one election, with the threshold \(c = 0\). The outcome variable \(Y_i\) is vote share at the second election.

using CSV
senate = CSV.read("data/rdrobust_senate.csv", DataFrame)

# Calculate RD plot data
plot_data = rdplot(senate.vote, senate.margin, c=0.0)

fig = Figure()
ax = Axis(fig[1, 1], xlabel="Margin", ylabel="Vote")
scatter!(ax, plot_data.vars_bins.rdplot_mean_x, plot_data.vars_bins.rdplot_mean_y, 
         color=:gray, markersize=6)

# Plot polynomial fit for left side
poly_l = filter(r -> r.rdplot_x < 0.0, plot_data.vars_poly)
lines!(ax, poly_l.rdplot_x, poly_l.rdplot_y, color=:blue, linewidth=2)

# Plot polynomial fit for right side
poly_r = filter(r -> r.rdplot_x >= 0.0, plot_data.vars_poly)
lines!(ax, poly_r.rdplot_x, poly_r.rdplot_y, color=:blue, linewidth=2)

vlines!(ax, [0.0], color=:black, linestyle=:dash)
fig

16.3.2 Identification of SRDD

\[ \tau_{RD} = \lim_{x \to c^+} E[Y|X=x] - \lim_{x \to c^-} E[Y|X=x] \]

In words, the treatment effect is the difference in the outcome from the right side and from the left side. This is the same as the difference in the outcome at the threshold, as if the treatment is assigned randomly.

16.3.3 Estimation of SRDD

GPA = rand(Uniform(0, 4), 1000)
# Treatment is W = 1(GPA > 3), so the +2 jump must sit on the right
# (above-cutoff) side to match the right-minus-left estimand above.
future_success = 10 .+ 1.5 .* GPA .+ 2 .* (GPA .> 3) .+ rand(Normal(0, 1), 1000)

plot_data_gpa = rdplot(future_success, GPA, c=3.0)

fig2 = Figure()
ax2 = Axis(fig2[1, 1], xlabel="GPA", ylabel="Future Success")
scatter!(ax2, plot_data_gpa.vars_bins.rdplot_mean_x, plot_data_gpa.vars_bins.rdplot_mean_y, 
         color=:gray, markersize=6)

poly_gpa_l = filter(r -> r.rdplot_x < 3.0, plot_data_gpa.vars_poly)
lines!(ax2, poly_gpa_l.rdplot_x, poly_gpa_l.rdplot_y, color=:blue, linewidth=2)

poly_gpa_r = filter(r -> r.rdplot_x >= 3.0, plot_data_gpa.vars_poly)
lines!(ax2, poly_gpa_r.rdplot_x, poly_gpa_r.rdplot_y, color=:blue, linewidth=2)

vlines!(ax2, [3.0], color=:black, linestyle=:dash)
fig2

# Estimate the sharp RDD model
rdd_gpa = rdrobust(future_success, GPA, c=3.0)
@printf "Tau (conventional)   = %.4g\n" rdd_gpa.Estimate.tau_us[1]
@printf "P-value (robust)     = %.4f\n" rdd_gpa.pv.PValue[3]

Tau (conventional)   = 1.61
P-value (robust)     = 0.0003

16.3.4 Nonparametric Estimation

rdd_house = rdrobust(senate.vote, senate.margin, c=0.0)
@printf "Tau (conventional)   = %.4g\n" rdd_house.Estimate.tau_us[1]
@printf "P-value (robust)     = %.4f\n" rdd_house.pv.PValue[3]

Tau (conventional)   = 7.414
P-value (robust)     = 0.0000

16.3.5 Fuzzy RDD

In fuzzy RDD, the treatment is assigned according to a threshold \(c\), but there exist non-compliance. This is similar to IV case.

\[ \tau_{FRD} = \frac{\lim_{x \to c^+} E[Y|X=x] - \lim_{x \to c^-} E[Y|X=x]}{\lim_{x \to c^+} E[W|X=x] - \lim_{x \to c^-} E[W|X=x]} \]

For example, if food stamp eligibility is given to all households below a certain income, but not all households receive the food stamps. In other words, income does not solely determine the assignment. Cutoff point increases the probability of treatment but doesn’t completely determine treatment.

FRDD is IV.

16.3.6 FRDD Example

Fetter (2013)’s main question of interest is how much of the increase in the home ownership rate in the midcentury US was due to mortgage subsidies given out by the government.

We’re using the running variable quarter of birth (qob), which has been centered on the quarter of birth you’d need to be to be eligible for a mortgage subsidy for fighting in the Korean War (qob_minus_kw). This determines whether you were a veteran of either the Korean War or World War II (vet_wwko).

using CSV

# Load the real mortgages dataset (Fetter 2013) from causaldata R package
vet = CSV.read("data/mortgages.csv", DataFrame)

# Create an "above-cutoff" variable as the instrument
vet.above = vet.qob_minus_kw .> 0

# Impose a bandwidth of 12 quarters on either side
vet = vet[abs.(vet.qob_minus_kw) .< 12, :]

# Manual 2SLS explicitly (educational)
# We have Fixed Effects for state (bpl) and quarter of birth (qob).
# Endogenous variable: vet_wwko and qob_minus_kw * vet_wwko
# Instruments: above and qob_minus_kw * above
vet.vet_inter = vet.qob_minus_kw .* vet.vet_wwko
vet.above_inter = vet.qob_minus_kw .* vet.above

# Stage 1a: Predict vet_wwko
stage1a = feols(vet, @formula(vet_wwko ~ nonwhite + qob_minus_kw + above + above_inter + fe(bpl) + fe(qob)))
vet.vet_wwko_hat = vet.vet_wwko .- stage1a.residuals

# Stage 1b: Predict vet_inter
stage1b = feols(vet, @formula(vet_inter ~ nonwhite + qob_minus_kw + above + above_inter + fe(bpl) + fe(qob)))
vet.vet_inter_hat = vet.vet_inter .- stage1b.residuals

# Stage 2: Regress home_ownership on fitted endogenous variables
# Note: we use RobustVcov for heteroskedasticity-robust SEs
m_iv = feols(vet, @formula(home_ownership ~ nonwhite + qob_minus_kw + vet_wwko_hat + vet_inter_hat + fe(bpl) + fe(qob)), vcov=Vcov.robust())
show_regression_html(m_iv)

	Estimate	Std. Error	t value	Pr(>\|t\|)
nonwhite	-0.190	0.007	-27.951	< 2e-16 ***
qob_minus_kw	-0.007	0.002	-4.070	4.707e-05 ***
vet_wwko_hat	0.170	0.046	3.733	1.894e-04 ***
vet_inter_hat	-0.003	0.003	-1.092	0.275
--- Signif. codes: 0 '*' 0.001 '' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Fuzzy RDD using rdrobust (local polynomial, MSE-optimal bandwidth)
# Note: rdrobust uses a data-driven bandwidth — no global FE needed
m_rdd = rdrobust(vet.home_ownership,
                 vet.qob_minus_kw,
                 fuzzy = vet.vet_wwko,
                 c = 0.0)
@printf "Manual 2SLS     tau = %.4g\n" m_iv.beta[3]
@printf "RDRobust (FRDD) tau = %.4g\n" m_rdd.Estimate.tau_us[1]

Manual 2SLS     tau = 0.1702
RDRobust (FRDD) tau = 1.879

These two results are very different, which is one of RDD’s problems. It can be sensitive to model selection. For example, in the case of 2SLS, it’s a linear model. In the case of rdrobust, it’s a local regression, which can be sensitive to bandwidth selection.