# IV in Poisson with Fixed Effects
```{r}
#| include: false
library(fixest)
library(MASS)
library(tidyverse)
library(parallel)
library(gmm)
```
Poisson regression handles count and non-negative outcomes by specifying the conditional mean as an exponential function. Because it is a quasi-MLE, it requires only the mean to be correct — no distributional assumption on the count. When an explanatory variable is endogenous, we need an IV strategy adapted to the nonlinear mean. This chapter covers two cases: a continuous endogenous variable (where the control function works exactly) and a binary endogenous variable (where three CF/IV approaches are available with different trade-offs).
## The Control Function Framework
### Linear second stage — CF equals 2SLS
In the previous chapter we saw that adding the first-stage residual $\hat{v}_2$ to an OLS second stage recovers the same coefficient as 2SLS. The algebra is:
1. **First stage**: $y_2 = \mathbf{z}\boldsymbol{\pi} + v_2$, estimated by OLS. Residual $\hat{v}_2 = y_2 - \hat{y}_2$ satisfies $E(\mathbf{z}'\hat{v}_2) = 0$.
2. **Structural error decomposition**: write $\varepsilon_1 = \rho v_2 + \eta_1$, where $E(\mathbf{z}'\eta_1) = 0$ and $E(v_2\,\eta_1) = 0$ by construction.
3. **Augmented second stage**: substitute to get $y_1 = \mathbf{z}_1\boldsymbol{\delta}_1 + \alpha_1 y_2 + \rho\,\hat{v}_2 + \eta_1.$
OLS on this equation is the **control function estimator**. The Frisch–Waugh–Lovell (FWL) theorem guarantees it is numerically identical to 2SLS — no extra assumption beyond $E(\mathbf{z}'\varepsilon_1) = 0$ is required.
### Nonlinear second stage — CF diverges from 2SLS
When the conditional mean is $\exp(\mathbf{x}\boldsymbol{\beta})$ (Poisson), a direct IV/GMM approach solves the moment conditions $E[\mathbf{z}'(y_1 - \exp(\mathbf{x}\boldsymbol{\beta}))] = 0$ without modifying the exponential mean. The **control function approach** instead imposes the stronger *conditional mean assumption* (Wooldridge 2002, 2014):
$$\boxed{E[y_1 \mid \mathbf{z}_1,\, y_2,\, v_2] = \exp\!\bigl(\mathbf{z}_1\boldsymbol{\delta}_1 + \alpha_1 y_2 + \rho v_2\bigr).}$$
This says the unobservable enters the conditional mean **linearly inside** $\exp(\cdot)$. Under this assumption, substituting $\hat{v}_2$ for $v_2$ and running Poisson QMLE gives a consistent estimate of $\alpha_1$.
Three practical consequences:
| Property | Linear (OLS/2SLS) | Nonlinear (Poisson CF) |
|:---|:---|:---|
| Extra assumption beyond IV exogeneity | None — FWL applies | CF mean assumption required |
| CF vs. 2SLS/GMM numerically | **Identical** | **Different** — FWL fails |
| SE from second stage alone | Correct | **Understated** — bootstrap needed |
| Test of endogeneity | $t$-stat on $\hat{v}_2$ (DWH) | Same — $t$-stat on $\hat{v}_2$ |
::: {.callout-note}
**Intuition for the CF assumption.** The decomposition $\varepsilon_1 = \rho v_2 + \eta_1$ is exactly the same as in the linear case. What changes is where the decomposition gets applied: in the linear model it enters additively outside any transformation; in Poisson it must enter inside $\exp(\cdot)$ to be absorbed by the quasi-MLE score. If the true $\rho \neq 0$, omitting $\hat{v}_2$ from the exponential mean leaves a multiplicative endogeneity term $e^{\rho v_2}$ correlated with $y_2$, biasing $\hat\alpha_1$ upward or downward depending on the sign of $\rho$.
:::
---
## Part 1 — Continuous endogenous variable
### Data generating process
We simulate a panel where `time` (continuous, e.g. minutes on site) is endogenous: it is correlated with an unobserved factor `e` that also affects `visits`. The excluded instrument is `phone` (binary, e.g. phone-access indicator). Fixed effects are `ad` (20 groups) and `female`.
```{r}
#| echo: true
set.seed(42)
n_ad <- 20; obs_per <- 250; n <- n_ad * obs_per
fe_ad <- rnorm(n_ad) * 0.5
fe_grp <- rep(1:n_ad, each = obs_per)
female <- as.integer(runif(n) < 0.5)
phone <- as.integer(runif(n) < 0.4)
frfam <- runif(n)
e <- rnorm(n)
time <- 1.5 * phone + 0.5 * frfam + fe_ad[fe_grp] + e
true_alpha <- 0.8
lambda <- exp(0.5 + true_alpha * time + 0.4 * frfam +
fe_ad[fe_grp] + 0.3 * female + 0.5 * e)
visits <- rpois(n, lambda)
df1 <- data.frame(
visits = visits, time = time, phone = phone,
frfam = frfam, female = female, ad = fe_grp
)
cat(sprintf("n = %d groups = %d visits mean = %.2f\n", n, n_ad, mean(visits)))
```
### Naive Poisson — endogeneity bias
Ignoring the endogeneity of `time` gives a biased estimate of $\alpha_1$:
```{r}
#| echo: true
naive1 <- fepois(visits ~ time + frfam | ad + female, data = df1)
etable(naive1, title = "Naive Poisson — time is endogenous")
```
### Control function approach
The two-step CF procedure:
1. **First stage** — project `time` on `phone`, `frfam`, and fixed effects via `feols`. The residual $\hat{v}_2 = \text{time} - \hat{\text{time}}$ satisfies $E(\mathbf{z}'\hat{v}_2)=0$ by construction. A linear first stage is valid even when $y_2$ is continuous (Part 2 shows binary).
2. **Second stage** — Poisson QMLE with $\hat{v}_2$ added as a regressor inside the exponential mean. The CF assumption $E[y_1 \mid \mathbf{z}_1, y_2, v_2] = \exp(\mathbf{z}_1\boldsymbol{\delta}_1 + \alpha_1 y_2 + \rho v_2)$ holds exactly here because the DGP specifies $0.5\,e$ linearly inside $\exp(\cdot)$.
Note that unlike 2SLS in a linear model, $\hat{v}_2$ does *not* replace $y_2$ — both enter together.
```{r}
#| echo: true
# Step 1: linear projection
fs1 <- feols(time ~ phone + frfam | ad + female, data = df1)
df1$v2hat <- residuals(fs1)
# Step 2: Poisson CF
m1_cf <- fepois(visits ~ time + v2hat + frfam | ad + female, data = df1)
etable(m1_cf, title = "CF: linear projection first stage")
cat(sprintf("time coefficient: %.4f (true %.1f)\n", coef(m1_cf)["time"], true_alpha))
```
The coefficient on `time` recovers the true $\alpha_1 = 0.8$. The `v2hat` coefficient tests exogeneity — its $t$-statistic is the Hausman-type test.
### Bootstrap standard errors
The second-stage standard errors are incorrect because first-stage estimation error propagates. A cluster bootstrap (by `ad` group) corrects this. `parLapply` with a PSOCK cluster parallelises the 500 replications safely inside Quarto's rendering environment:
```{r}
#| echo: true
bootstrap_cf_continuous <- function(df, B = 500) {
clusters <- unique(df$ad)
K <- length(clusters)
boot_one <- function(b) {
library(fixest)
set.seed(b)
drawn <- sample(clusters, K, replace = TRUE)
df_b <- do.call(rbind, lapply(seq_along(drawn), function(i) {
sub <- df[df$ad == drawn[i], ]
sub$ad_boot <- i
sub
}))
fs <- feols(time ~ phone + frfam | ad_boot + female, data = df_b)
df_b$v2hat <- residuals(fs)
ss <- fepois(visits ~ time + v2hat + frfam | ad_boot + female, data = df_b)
coef(ss)["time"]
}
n_cores <- max(1L, detectCores() - 1L)
cl <- makePSOCKcluster(n_cores)
clusterExport(cl, c("df", "clusters", "K"), envir = environment())
results <- parLapply(cl, seq_len(B), boot_one)
stopCluster(cl)
unlist(results)
}
boot1 <- bootstrap_cf_continuous(df1, 500)
coef1 <- coef(m1_cf)["time"]
naive_se <- se(m1_cf)["time"]
cat(sprintf("time coefficient: %.4f (true %.1f)\n", coef1, true_alpha))
cat(sprintf("Naive 2nd-stage SE: %.4f\n", naive_se))
cat(sprintf("Bootstrap SE: %.4f\n", sd(boot1)))
cat(sprintf("Bootstrap 95%% CI: [%.4f, %.4f]\n",
quantile(boot1, 0.025), quantile(boot1, 0.975)))
```
The bootstrapped SE is larger than the naive second-stage SE, confirming that ignoring first-stage uncertainty understates uncertainty.
---
## Part 2 — Binary endogenous variable
When $y_2 \in \{0,1\}$, the same linear projection first stage is still valid, but two additional approaches exploiting the binary structure become available.
### Data generating process
We use a clean probit DGP for the binary endogenous variable with a fresh structural error `e2` independent of Part 1. The endogeneity coefficient `ρ = 0.2` is kept small so the CF assumption holds approximately well and estimates are close to the true value.
```{r}
#| echo: true
set.seed(123)
e2 <- rnorm(n)
# Binary endogenous variable: clean probit latent index
time_hi <- as.integer(1.5 * phone + 0.4 * frfam + e2 >= 0)
# Poisson outcome — true coefficient 0.8, endogeneity via ρ*e2 (ρ=0.2)
rho2 <- 0.2
lambda2 <- exp(0.5 + true_alpha * time_hi + 0.4 * frfam +
fe_ad[fe_grp] + 0.3 * female + rho2 * e2)
visits2 <- rpois(n, lambda2)
df2 <- data.frame(
visits = visits2, time_hi = time_hi, phone = phone,
frfam = frfam, female = female, ad = fe_grp, e2 = e2
)
cat(sprintf("time_hi mean: %.3f\n", mean(time_hi)))
cat(sprintf("Cor(time_hi, e2): %.3f (endogeneity)\n", cor(time_hi, e2)))
cat(sprintf("visits mean: %.2f\n", mean(visits2)))
```
### Three approaches: theory recap
| | First stage | What enters 2nd stage | 1st-stage assumption | 2nd-stage assumption |
|---|---|---|---|---|
| **A** | `feols` (OLS) | Residuals $\hat{v}_2$ | Exogeneity of $\mathbf{z}$ only | CF mean $\exp(\cdots + \rho v_2)$ correct |
| **B** | `feglm` (probit) | Gen. residuals $\hat{r}_2$ | Probit correctly specified | CF mean $\exp(\cdots + \rho r_2)$ correct |
| **C** | Any model | Fitted values $\hat{p}_2$ | Exogeneity of $\mathbf{z}$ only | Structural mean $\exp(\cdots)$ correct |
All three require the second-stage Poisson conditional mean to be correctly specified. A and C share the same (weaker) first-stage assumption; B additionally requires correct probit specification.
### Approach A — Linear projection CF
```{r}
#| echo: true
fs_a <- feols(time_hi ~ phone + frfam | ad + female, data = df2)
df2$v2hat <- residuals(fs_a)
m_a <- fepois(visits ~ time_hi + v2hat + frfam | ad + female, data = df2)
etable(m_a, title = "Approach A: OLS residual as control function")
```
### Approach B — Probit generalized residuals
```{r}
#| echo: true
fs_b <- feglm(time_hi ~ phone + frfam | ad + female,
data = df2, family = binomial(probit))
# Generalized residual: r̂ = y·φ(η)/Φ(η) − (1−y)·φ(η)/(1−Φ(η))
eta <- predict(fs_b, type = "link")
df2$gr2 <- ifelse(df2$time_hi == 1,
dnorm(eta) / pnorm(eta),
-dnorm(eta) / (1 - pnorm(eta)))
m_b <- fepois(visits ~ time_hi + gr2 + frfam | ad + female, data = df2)
etable(m_b, title = "Approach B: probit generalized residual")
```
### Approach C — Fitted values as instrument (IV/GMM Poisson)
Any function of the exogenous $\mathbf{z}$ is a valid instrument. Probit fitted probabilities are used here as the instrument for `time_hi`. We implement IV/GMM Poisson via `gmm::gmm()`, which solves the moment conditions $E[\mathbf{z}'(y_1 - \exp(\mathbf{x}\boldsymbol{\beta}))] = 0$ directly:
```{r}
#| echo: true
# Probit without FE for fitted probabilities (instrument)
fs_c <- glm(time_hi ~ phone + frfam, family = binomial(probit), data = df2)
df2$phat <- fitted(fs_c)
# Moment conditions: Z'(y - exp(X*beta)) = 0
# X = [time_hi, frfam, 1], Z = [phat, frfam, 1]
gmm_moments <- function(beta, x) {
y <- x[, "visits"]
th <- x[, "time_hi"]
fr <- x[, "frfam"]
ph <- x[, "phat"]
eta <- pmin(pmax(beta[1] * th + beta[2] * fr + beta[3], -30), 30)
resid <- y - exp(eta)
cbind(ph * resid, fr * resid, resid)
}
gmm_data <- as.matrix(df2[, c("visits", "time_hi", "frfam", "phat")])
m_c <- gmm(gmm_moments, x = gmm_data,
t0 = c(0, 0, log(mean(df2$visits))))
coef_c <- coef(m_c)
se_c <- sqrt(diag(vcov(m_c)))
cat(sprintf("\nApproach C — IV/GMM Poisson (no FE, for illustration)\n"))
cat(sprintf(" time_hi: coef = %7.4f SE = %7.4f z = %6.3f\n",
coef_c[1], se_c[1], coef_c[1] / se_c[1]))
cat(sprintf(" frfam: coef = %7.4f SE = %7.4f z = %6.3f\n",
coef_c[2], se_c[2], coef_c[2] / se_c[2]))
```
::: {.callout-note}
**Approach C with FE** requires demeaning the regressor and instrument matrices via within-group centering before solving the Poisson moment conditions. Because Poisson is nonlinear this is not straightforward. Approach A via `feols` + `fepois` is the practical choice when many fixed effects are present.
:::
### A vs C — not equivalent in Poisson
In a linear model, using $\hat{y}_2$ as an instrument (2SLS) is numerically identical to adding $\hat{v}_2 = y_2 - \hat{y}_2$ as a regressor — the Frisch-Waugh-Lovell theorem. **In Poisson this equivalence breaks**, because Approach A puts $\hat{v}_2$ inside $\exp(\cdot)$ while Approach C uses $\hat{y}_2$ in the moment conditions without modifying the exponential mean.
```{r}
#| echo: true
coef_a <- coef(m_a)["time_hi"]
coef_b <- coef(m_b)["time_hi"]
cat(sprintf("Approach A (CF, OLS residual, with FE): α̂₁ = %.4f\n", coef_a))
cat(sprintf("Approach B (CF, probit GR, with FE): α̂₁ = %.4f\n", coef_b))
cat(sprintf("Approach C (IV/GMM Poisson, no FE): α̂₁ = %.4f\n", coef_c[1]))
cat(sprintf("True α₁ = %.1f\n", true_alpha))
```
The estimates differ despite using the same exclusion restriction, confirming that 2SLS ≠ CF for Poisson.
### Bootstrap for Approach A (binary EEV)
```{r}
#| echo: true
bootstrap_cf_binary <- function(df, B = 500) {
clusters <- unique(df$ad)
K <- length(clusters)
boot_one <- function(b) {
library(fixest)
set.seed(b)
drawn <- sample(clusters, K, replace = TRUE)
df_b <- do.call(rbind, lapply(seq_along(drawn), function(i) {
sub <- df[df$ad == drawn[i], ]
sub$ad_boot <- i
sub
}))
fs <- feols(time_hi ~ phone + frfam | ad_boot + female, data = df_b)
df_b$v2hat <- residuals(fs)
ss <- fepois(visits ~ time_hi + v2hat + frfam | ad_boot + female, data = df_b)
coef(ss)["time_hi"]
}
n_cores <- max(1L, detectCores() - 1L)
cl <- makePSOCKcluster(n_cores)
clusterExport(cl, c("df", "clusters", "K"), envir = environment())
results <- parLapply(cl, seq_len(B), boot_one)
stopCluster(cl)
unlist(results)
}
boot2 <- bootstrap_cf_binary(df2, 500)
naive_se2 <- se(m_a)["time_hi"]
cat(sprintf("Approach A bootstrap (binary EEV, 500 reps, cluster by ad)\n"))
cat(sprintf(" Point estimate: %.4f (true %.1f)\n", coef_a, true_alpha))
cat(sprintf(" Naive 2nd-stage SE: %.4f\n", naive_se2))
cat(sprintf(" Bootstrap SE: %.4f\n", sd(boot2)))
cat(sprintf(" Bootstrap 95%% CI: [%.4f, %.4f]\n",
quantile(boot2, 0.025), quantile(boot2, 0.975)))
```
### Comparison across all methods
```{r}
#| echo: true
naive_p2 <- fepois(visits ~ time_hi + frfam | ad + female, data = df2)
coef_naive2 <- coef(naive_p2)["time_hi"]
methods <- c(
"True value",
"Naive Poisson (binary y₂)",
"A: Linear CF (with FE)",
"B: Probit GR CF (with FE)",
"C: IV/GMM Poisson (no FE)"
)
ests <- c(true_alpha, coef_naive2, coef_a, coef_b, coef_c[1])
cat("\n── Binary EEV: coefficient on time_hi ─────────────────────\n")
for (i in seq_along(methods)) {
cat(sprintf(" %-38s α̂₁ = %6.4f\n", methods[i], ests[i]))
}
cat("\nNote: Approaches A and B (with FE) recover the true value closely;\n")
cat("small upward bias from ρ=0.2 and the binary EEV is expected.\n")
cat("Approach C (no FE) diverges further because omitting fixed effects\n")
cat("introduces additional bias on top of endogeneity. With FE absorbed,\n")
cat("Approach A is the practical choice for count models with many groups.\n")
```