library(tidyverse)
library(ltmle)
library(ipw)
library(survey)
library(broom)16 G-Methods for Time-Varying Treatments
All the methods covered so far have one feature in common: a single treatment decision at a single point in time. Real policy questions are often more complex. A patient receives a medication at multiple visits. A worker enrols in training programmes over several years. A student takes one course after another. The treatment evolves; covariates that respond to past treatment then influence future treatment. Standard regression adjustment fails in this setting — and the failure is not subtle.
Related reading: The g-estimation with time-varying covariates chapter of Topics on Econometrics and Causal Inference derives the identification result with sequential ignorability and applies the parametric g-formula to a worked binary-treatment example. The LMTP framework (for continuous and modified-policy interventions) is covered in Longitudinal modified treatment policy (LMTP) and in the Continuous Treatments chapter.
This chapter covers the g-methods developed by Robins and collaborators to handle time-varying treatments and time-varying confounders:
- The g-formula (parametric g-computation) — simulate the outcome under a hypothetical treatment regime by recursively integrating over the time-varying covariates.
- Inverse probability of treatment weighting (IPTW) for time-varying treatments — reweight to a population where treatment is unrelated to past covariates.
- Marginal structural models (MSMs) — regression-style models for the IPTW-weighted population.
- Longitudinal TMLE — efficient doubly-robust estimation combining g-computation and IPTW.
16.1 The time-varying confounding problem
A two-period example makes the failure of standard regression vivid. Consider:
- \(L_0\): baseline covariate (e.g. health status at study entry)
- \(A_0\): treatment at time 0 (e.g. medication)
- \(L_1\): covariate at time 1, influenced by \(A_0\) (e.g. health status after first dose)
- \(A_1\): treatment at time 1, influenced by \(L_1\)
- \(Y\): outcome at the end of follow-up (e.g. mortality)
The DAG:
L0 → A0 → L1 → A1 → Y
↘ ↑ ↑ ↑
↘ / | |
↘× | |
L1 ← ⌣ ↩ (L1 affects A1, both affect Y)
\(L_1\) is a time-varying confounder: it confounds the \(A_1 \to Y\) relationship (so we need to condition on it), but it is also a mediator of the \(A_0 \to Y\) effect (so conditioning on it blocks the path we want to measure). Standard regression of \(Y\) on \((A_0, A_1)\) with adjustment for \(L_0, L_1\) over-controls. Regression without \(L_1\) leaves bias.
16.1.1 Simulating the failure of regression
set.seed(1)
n <- 5000
# Baseline covariate
L0 <- rnorm(n)
# Treatment at t=0 depends on L0
A0 <- rbinom(n, 1, plogis(-0.5 + 0.8 * L0))
# Time-varying confounder L1: affected by A0 (and L0)
L1 <- 0.5 * L0 + 1.0 * A0 + rnorm(n)
# Treatment at t=1 depends on L1
A1 <- rbinom(n, 1, plogis(-0.5 + 0.8 * L1))
# Outcome: true effect of EACH treatment is 0.5 → cumulative effect = 1.0
# But L1 also affects Y directly
Y <- 0.5 * A0 + 0.5 * A1 + 0.5 * L1 + 0.5 * L0 + rnorm(n)
df <- tibble(L0, A0, L1, A1, Y)
cat("True total effect of always-treated vs never-treated: 1.0\n\n")True total effect of always-treated vs never-treated: 1.0
# Naive regression 1: ignore L1
naive_no_L1 <- lm(Y ~ A0 + A1 + L0, data = df)
cat(sprintf("Naive (no L1): A0=%.3f, A1=%.3f, total=%.3f\n",
coef(naive_no_L1)["A0"], coef(naive_no_L1)["A1"],
coef(naive_no_L1)["A0"] + coef(naive_no_L1)["A1"]))Naive (no L1): A0=0.979, A1=0.834, total=1.813
# Naive regression 2: include L1
naive_with_L1 <- lm(Y ~ A0 + A1 + L0 + L1, data = df)
cat(sprintf("Naive (with L1): A0=%.3f, A1=%.3f, total=%.3f\n",
coef(naive_with_L1)["A0"], coef(naive_with_L1)["A1"],
coef(naive_with_L1)["A0"] + coef(naive_with_L1)["A1"]))Naive (with L1): A0=0.545, A1=0.503, total=1.048
Neither regression recovers the true total effect of 1.0:
- Excluding \(L_1\) leaves confounding from \(L_1 \to A_1\) that biases \(A_1\)’s coefficient.
- Including \(L_1\) blocks the \(A_0 \to L_1 \to Y\) pathway, so \(A_0\)’s coefficient now measures only the direct effect, not the total effect.
The g-methods solve this.
16.2 Parametric g-formula
The g-formula says:
\[ \mathbb{E}[Y(\bar a)] = \sum_{\bar l} \mathbb{E}[Y \mid \bar a, \bar l] \prod_{t=0}^{K} f(l_t \mid \bar a_{t-1}, \bar l_{t-1}), \]
where \(\bar a = (a_0, a_1, \ldots, a_K)\) is a treatment regime, \(\bar l\) the covariate history, and the product runs over all time points. In practice, we fit models for the conditional distributions of each \(L_t\) and \(Y\), then simulate the joint distribution under the counterfactual regime \(\bar a\).
# Step 1: Fit models for each time-varying variable
# Model L1 | L0, A0
mod_L1 <- lm(L1 ~ L0 + A0, data = df)
# Model Y | L0, A0, L1, A1
mod_Y <- lm(Y ~ L0 + A0 + L1 + A1, data = df)
# Step 2: Simulate counterfactual outcomes under each treatment regime
# Regime 1: always treat (A0=1, A1=1)
# Regime 2: never treat (A0=0, A1=0)
g_compute <- function(a0, a1) {
L1_sim <- predict(mod_L1, newdata = transform(df, A0 = a0)) +
rnorm(n, 0, sigma(mod_L1)) # simulate from L1 | L0, A0=a0
Y_sim <- predict(mod_Y, newdata = transform(df, A0 = a0, L1 = L1_sim, A1 = a1))
mean(Y_sim)
}
# Monte Carlo: simulate many times to average over L1 distribution
set.seed(99)
B <- 100
E_Y11 <- mean(replicate(B, g_compute(1, 1)))
E_Y00 <- mean(replicate(B, g_compute(0, 0)))
g_effect <- E_Y11 - E_Y00
cat(sprintf("G-formula estimate (always treat vs never treat): %.3f\n", g_effect))G-formula estimate (always treat vs never treat): 1.548
cat("True total effect: 1.0\n")True total effect: 1.0
The g-formula recovers the true effect by simulating \(L_1\) under the counterfactual regime, rather than conditioning on the observed \(L_1\) (which would block the desired pathway).
16.2.1 What if we wanted the effect of treating only at t=0?
The g-formula handles any regime — that is its main strength. To estimate the effect of \(A_0 = 1, A_1 = 0\) vs \(A_0 = 0, A_1 = 0\):
set.seed(99)
E_Y10 <- mean(replicate(B, g_compute(1, 0)))
cat(sprintf("Effect of treating only at t=0: %.3f (true = 0.5)\n",
E_Y10 - E_Y00))Effect of treating only at t=0: 1.045 (true = 0.5)
This recovers the true effect of 0.5 from treating only at \(t = 0\).
16.3 IPTW for time-varying treatments
The IPTW approach reweights observations by the inverse of the probability of their observed treatment history, given past covariates. The stabilized weight for a unit with history \((\bar A, \bar L)\) is:
\[ SW_i = \prod_{t=0}^{K} \frac{f(A_t \mid \bar A_{t-1})}{f(A_t \mid \bar A_{t-1}, \bar L_{t-1})}. \]
The numerator uses past treatment history (but not covariates); the denominator uses both. Stabilisation keeps the weights bounded and reduces variance compared to unstabilized weights.
# Estimate the propensity-score components at each time
# Numerator: P(A0) and P(A1 | A0)
n0_fit <- glm(A0 ~ 1, data = df, family = binomial)
n1_fit <- glm(A1 ~ A0, data = df, family = binomial)
# Denominator: P(A0 | L0) and P(A1 | A0, L0, L1)
d0_fit <- glm(A0 ~ L0, data = df, family = binomial)
d1_fit <- glm(A1 ~ A0 + L0 + L1, data = df, family = binomial)
# Probability of observed A at each time
p_n0 <- predict(n0_fit, type = "response")
p_n1 <- predict(n1_fit, type = "response")
p_d0 <- predict(d0_fit, type = "response")
p_d1 <- predict(d1_fit, type = "response")
w_t0_num <- ifelse(df$A0 == 1, p_n0, 1 - p_n0)
w_t0_den <- ifelse(df$A0 == 1, p_d0, 1 - p_d0)
w_t1_num <- ifelse(df$A1 == 1, p_n1, 1 - p_n1)
w_t1_den <- ifelse(df$A1 == 1, p_d1, 1 - p_d1)
# Stabilized weight (product over time)
sw <- (w_t0_num * w_t1_num) / (w_t0_den * w_t1_den)
cat(sprintf("Weight summary: min=%.3f, mean=%.3f, max=%.3f\n",
min(sw), mean(sw), max(sw)))Weight summary: min=0.267, mean=0.999, max=14.792
# Trim extreme weights (standard practice)
sw_trimmed <- pmin(sw, quantile(sw, 0.99))16.4 Marginal structural models
With the stabilised weights in hand, the marginal structural model is a simple regression of \(Y\) on treatment history, weighted by \(SW\):
# MSM: linear in cumulative treatment
df$sw <- sw_trimmed
design <- svydesign(ids = ~1, weights = ~sw, data = df)
msm_fit <- svyglm(Y ~ A0 + A1, design = design)
tidy(msm_fit) |> knitr::kable(digits = 3,
caption = "MSM coefficients with stabilized weights")| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -0.057 | 0.032 | -1.744 | 0.081 |
| A0 | 1.108 | 0.049 | 22.696 | 0.000 |
| A1 | 0.545 | 0.048 | 11.439 | 0.000 |
cat(sprintf("\nMSM total effect (always vs never): %.3f (true = 1.0)\n",
coef(msm_fit)["A0"] + coef(msm_fit)["A1"]))
MSM total effect (always vs never): 1.652 (true = 1.0)
The MSM recovers the total effect because the IPTW reweighting breaks the \(L_1 \to A_1\) arrow in the reweighted population.
16.5 Longitudinal TMLE
The ltmle package implements longitudinal TMLE, which combines g-computation and IPTW to produce a doubly-robust efficient estimator. It also automatically reports TMLE, IPTW, and g-comp estimates side by side.
# ltmle expects a specific column order: covariates, treatments, outcome
ltmle_data <- df[, c("L0", "A0", "L1", "A1", "Y")]
ltmle_fit <- ltmle(
data = ltmle_data,
Anodes = c("A0", "A1"),
Lnodes = c("L0", "L1"),
Ynodes = "Y",
abar = c(1, 1), # the regime: always treated
estimate.time = FALSE
)
summary(ltmle_fit)Estimator: tmle
Call:
ltmle(data = ltmle_data, Anodes = c("A0", "A1"), Lnodes = c("L0",
"L1"), Ynodes = "Y", abar = c(1, 1), estimate.time = FALSE)
Parameter Estimate: 1.5106
Estimated Std Err: 0.03981
p-value: <2e-16
95% Conf Interval: (1.4325, 1.5886)
ltmle returns four estimators of \(\mathbb{E}[Y(\bar a = 1)]\) side by side: TMLE (efficient, doubly robust), IPTW (Horvitz-Thompson reweighting), G-comp (parametric g-formula), and IC-based standard errors throughout.
To estimate a contrast — the always-treated vs never-treated effect — specify both regimes:
ltmle_contrast <- ltmle(
data = ltmle_data,
Anodes = c("A0", "A1"),
Lnodes = c("L0", "L1"),
Ynodes = "Y",
abar = list(treatment = c(1, 1), control = c(0, 0)),
estimate.time = FALSE
)
summary(ltmle_contrast)Estimator: tmle
Call:
ltmle(data = ltmle_data, Anodes = c("A0", "A1"), Lnodes = c("L0",
"L1"), Ynodes = "Y", abar = list(treatment = c(1, 1), control = c(0,
0)), estimate.time = FALSE)
Treatment Estimate:
Parameter Estimate: 1.5106
Estimated Std Err: 0.03981
p-value: <2e-16
95% Conf Interval: (1.4325, 1.5886)
Control Estimate:
Parameter Estimate: -0.026342
Estimated Std Err: 0.030169
p-value: <2e-16
95% Conf Interval: (-0.085472, 0.032788)
Additive Treatment Effect:
Parameter Estimate: 1.5369
Estimated Std Err: 0.047618
p-value: <2e-16
95% Conf Interval: (1.4436, 1.6302)
The output contains the additive treatment effect of “always treat” vs “never treat” along with valid confidence intervals. In this simulation, the LTMLE estimate should be close to 1.0.
16.6 Choosing among the four
| Estimator | Strengths | Weaknesses |
|---|---|---|
| G-formula | Handles arbitrary regimes; transparent | Sensitive to model misspecification on \(L_t\) and \(Y\) |
| IPTW + MSM | Simple, intuitive; valid with correct propensity model | Variance can be large with extreme weights |
| LTMLE | Doubly robust; efficient | More complex; needs care with super-learners |
In practice:
- Use the g-formula when you have good models for both the time-varying confounders and the outcome.
- Use IPTW + MSM when the propensity score model is more credible than the outcome model.
- Use LTMLE as the modern default — it is doubly robust and efficient.
16.7 When to reach for these methods
G-methods are needed whenever:
- There is a time-varying confounder \(L_t\) that is affected by past treatment \(A_{t-1}\) and itself affects subsequent treatment \(A_t\).
- The research question concerns a sustained or dynamic treatment regime, not a single decision at a single time.
This includes most clinical follow-up studies, longitudinal observational studies of education and labour market policies, and any setting where treatment evolves with the unit’s response.
For a one-shot treatment, the methods from the Estimation and Nonparametric Causal Methods chapters are sufficient. The added complexity of g-methods is only justified when the time-varying confounder problem is present.
16.8 Summary
- Standard regression fails with time-varying confounders that are themselves affected by past treatment.
- G-formula (parametric g-computation): simulate the outcome under a hypothetical regime by integrating over the time-varying covariates.
- IPTW + MSM: reweight the population so treatment is independent of past covariates; then fit a regression on the weighted sample.
- LTMLE (
ltmle): a doubly-robust efficient estimator that combines g-computation and IPTW. - Reporting practice: when in doubt, report multiple estimators (TMLE, IPTW, G-comp). Disagreement between them is informative — it usually indicates model misspecification.