19 G-Methods for Time-Varying Treatments

using DataFrames
using Distributions
using Random
using Statistics
using LinearAlgebra
using GLM
using Printf

All the methods covered so far have one feature in common: a single treatment decision at a single point in time. Real policy questions are often more complex. A patient receives a medication at multiple visits. A worker enrols in training programmes over several years. A student takes one course after another. The treatment evolves; covariates that respond to past treatment then influence future treatment. Standard regression adjustment fails in this setting — and the failure is not subtle.

Related reading: The g-estimation with time-varying covariates chapter of Topics on Econometrics and Causal Inference derives the identification result and applies the parametric g-formula to a worked binary-treatment example. The LMTP framework for continuous and modified-policy interventions is covered in Longitudinal modified treatment policy (LMTP) and in the Continuous Treatments chapter.

This chapter covers the g-methods developed by Robins and collaborators to handle time-varying treatments and time-varying confounders:

The g-formula (parametric g-computation) — simulate the outcome under a hypothetical treatment regime by recursively integrating over the time-varying covariates.
Inverse probability of treatment weighting (IPTW) for time-varying treatments — reweight to a population where treatment is unrelated to past covariates.
Marginal structural models (MSMs) — regression-style models for the IPTW-weighted population.

For doubly-robust longitudinal TMLE, R’s ltmle package is currently the de facto standard; there is no Julia equivalent. The methods covered here — g-formula and MSM via IPTW — are the workhorses and translate cleanly to native Julia using GLM.jl.

19.1 The time-varying confounding problem

A two-period example makes the failure of standard regression vivid:

\(L_0\): baseline covariate (e.g. health status at study entry)
\(A_0\): treatment at time 0 (e.g. medication)
\(L_1\): covariate at time 1, influenced by \(A_0\) (e.g. health status after first dose)
\(A_1\): treatment at time 1, depends on \(L_1\)
\(Y\): outcome at the end of follow-up

\(L_1\) is a time-varying confounder: it confounds \(A_1 \to Y\) (so we need to condition on it), but it is also a mediator of \(A_0 \to Y\) (so conditioning on it blocks the path we want to measure). Standard regression with \(L_0, L_1\) as covariates over-controls; regression without \(L_1\) leaves bias.

19.1.1 Simulating the failure of standard regression

Random.seed!(1)
n = 5000

L0 = randn(n)
A0 = Float64.(rand(n) .< @. 1 / (1 + exp(-(-0.5 + 0.8 * L0))))
L1 = @. 0.5 * L0 + 1.0 * A0 + randn()
A1 = Float64.(rand(n) .< @. 1 / (1 + exp(-(-0.5 + 0.8 * L1))))
Y  = @. 0.5 * A0 + 0.5 * A1 + 0.5 * L1 + 0.5 * L0 + randn()

df = DataFrame(L0 = L0, A0 = A0, L1 = L1, A1 = A1, Y = Y)

@printf("True total effect of always-treated vs never-treated: 1.000\n\n")

naive_no_L1   = lm(@formula(Y ~ A0 + A1 + L0),       df)
naive_with_L1 = lm(@formula(Y ~ A0 + A1 + L0 + L1), df)

@printf("Naive (no L1):   A0=%.3f, A1=%.3f, total=%.3f\n",
        coef(naive_no_L1)[2], coef(naive_no_L1)[3],
        coef(naive_no_L1)[2] + coef(naive_no_L1)[3])
@printf("Naive (with L1): A0=%.3f, A1=%.3f, total=%.3f\n",
        coef(naive_with_L1)[2], coef(naive_with_L1)[3],
        coef(naive_with_L1)[2] + coef(naive_with_L1)[3])

Neither regression recovers the true total effect of 1.0:

Excluding \(L_1\) leaves \(L_1 \to A_1\) confounding that biases \(A_1\).
Including \(L_1\) blocks the \(A_0 \to L_1 \to Y\) pathway, so the coefficient on \(A_0\) now measures only the direct effect, not the total effect.

The g-methods solve this.

19.2 Parametric g-formula

The g-formula says:

\[ \mathbb{E}[Y(\bar a)] = \sum_{\bar l} \mathbb{E}[Y \mid \bar a, \bar l] \prod_{t=0}^{K} f(l_t \mid \bar a_{t-1}, \bar l_{t-1}), \]

where \(\bar a = (a_0, a_1, \ldots, a_K)\) is a treatment regime and \(\bar l\) the covariate history. The product is over all time points. In practice we fit models for each conditional distribution, then simulate the joint distribution under the counterfactual regime \(\bar a\).

# Step 1: Fit nuisance models
mod_L1 = lm(@formula(L1 ~ L0 + A0), df)
mod_Y  = lm(@formula(Y  ~ L0 + A0 + L1 + A1), df)

# Step 2: Simulate counterfactual outcomes under a regime (a0, a1).
# The L1 distribution under A0 = a0 differs from the observed L1.
function g_compute(a0::Real, a1::Real; n_mc::Int = 100)
    sigma_L1 = sqrt(sum(abs2.(residuals(mod_L1))) / dof_residual(mod_L1))
    Y_sim = zeros(n)
    for _ in 1:n_mc
        df_a0 = DataFrame(L0 = L0, A0 = fill(a0, n))
        L1_sim = predict(mod_L1, df_a0) .+ sigma_L1 .* randn(n)
        df_sim = DataFrame(L0 = L0, A0 = fill(a0, n),
                           L1 = L1_sim, A1 = fill(a1, n))
        Y_sim .+= predict(mod_Y, df_sim)
    end
    Y_sim ./= n_mc
    return mean(Y_sim)
end

Random.seed!(99)
E_Y11 = g_compute(1.0, 1.0)
E_Y00 = g_compute(0.0, 0.0)
@printf("G-formula estimate (always vs never): %.3f  (true = 1.000)\n",
        E_Y11 - E_Y00)

The g-formula recovers the true effect because it simulates \(L_1\) under the counterfactual treatment regime, rather than conditioning on the observed \(L_1\) (which would block the desired pathway).

The g-formula handles any regime — that is its main strength. To estimate the effect of treating only at \(t = 0\):

Random.seed!(99)
E_Y10 = g_compute(1.0, 0.0)
@printf("Effect of treating only at t=0: %.3f  (true = 0.5)\n",
        E_Y10 - E_Y00)

19.3 IPTW for time-varying treatments

The IPTW approach reweights observations by the inverse of the probability of their observed treatment history given past covariates. The stabilised weight for unit \(i\) is:

\[ SW_i = \prod_{t=0}^{K} \frac{f(A_t \mid \bar A_{t-1})}{f(A_t \mid \bar A_{t-1}, \bar L_{t-1})}. \]

The numerator uses past treatment alone; the denominator adds covariates. Stabilisation keeps the weights bounded.

# Numerator: P(A0) and P(A1 | A0)
n0 = glm(@formula(A0 ~ 1),  df, Binomial(), LogitLink())
n1 = glm(@formula(A1 ~ A0), df, Binomial(), LogitLink())

# Denominator: P(A0 | L0) and P(A1 | A0, L0, L1)
d0 = glm(@formula(A0 ~ L0),           df, Binomial(), LogitLink())
d1 = glm(@formula(A1 ~ A0 + L0 + L1), df, Binomial(), LogitLink())

p_n0 = predict(n0); p_n1 = predict(n1)
p_d0 = predict(d0); p_d1 = predict(d1)

# Probability of the OBSERVED treatment at each time
w0_num = ifelse.(df.A0 .== 1, p_n0, 1 .- p_n0)
w0_den = ifelse.(df.A0 .== 1, p_d0, 1 .- p_d0)
w1_num = ifelse.(df.A1 .== 1, p_n1, 1 .- p_n1)
w1_den = ifelse.(df.A1 .== 1, p_d1, 1 .- p_d1)

sw = (w0_num .* w1_num) ./ (w0_den .* w1_den)
@printf("Weight summary: min=%.3f, mean=%.3f, max=%.3f\n",
        minimum(sw), mean(sw), maximum(sw))

# Trim extreme weights at 99th percentile
sw_trim = min.(sw, quantile(sw, 0.99))

19.4 Marginal structural models

With stabilised weights in hand, the marginal structural model is a weighted regression of \(Y\) on treatment history:

df.sw = sw_trim
msm_fit = lm(@formula(Y ~ A0 + A1), df, wts = df.sw)

println(coeftable(msm_fit))
@printf("\nMSM total effect (always vs never): %.3f  (true = 1.000)\n",
        coef(msm_fit)[2] + coef(msm_fit)[3])

The MSM recovers the total effect because the IPTW reweighting effectively breaks the \(L_1 \to A_1\) arrow in the reweighted population — making treatment independent of past covariates.

19.5 Hernán-style g-estimation: when to use which

Estimator	Strengths	Weaknesses
G-formula	Handles arbitrary regimes; transparent	Sensitive to model misspecification on \(L_t\) and \(Y\)
IPTW + MSM	Simple, intuitive; valid with correct propensity model	Variance large with extreme weights

In practice, reporting both is the standard. Agreement is reassuring; disagreement suggests model misspecification in at least one of them.

For doubly-robust efficient estimation, longitudinal TMLE combines both. The current state-of-the-art implementation is R’s ltmle package; a Julia port would be a valuable addition to the ecosystem but does not yet exist. The companion R chapter walks through the ltmle workflow.

19.6 A robustness check: g-formula with covariates only

Even when researchers don’t want to fully implement IPTW + MSM, the g-formula offers a useful robustness check on standard regression. If the standard regression and the g-formula disagree, the gap is a diagnostic for time-varying confounding bias.

Random.seed!(123)
g_total = E_Y11 - E_Y00
msm_total = coef(msm_fit)[2] + coef(msm_fit)[3]

@printf("%-30s %.3f\n", "True total effect:", 1.0)
@printf("%-30s %.3f  (wrong)\n", "Standard regression (no L1):",
        coef(naive_no_L1)[2] + coef(naive_no_L1)[3])
@printf("%-30s %.3f  (wrong)\n", "Standard regression (with L1):",
        coef(naive_with_L1)[2] + coef(naive_with_L1)[3])
@printf("%-30s %.3f  (correct)\n", "G-formula:", g_total)
@printf("%-30s %.3f  (correct)\n", "MSM (IPTW):", msm_total)

The standard regressions are wrong in opposite directions; the g-methods both recover the truth. When standard and g-formula estimates diverge, the divergence is the bias.

19.7 When to reach for these methods

G-methods are needed whenever:

There is a time-varying confounder \(L_t\) that is affected by past treatment \(A_{t-1}\) and itself affects subsequent treatment \(A_t\).
The research question concerns a sustained or dynamic treatment regime, not a single decision at a single time.

This includes most clinical follow-up studies, longitudinal observational studies of education and labour market policies, and any setting where treatment evolves with the unit’s response.

For a one-shot treatment, the methods from the Estimation and Nonparametric Causal Methods chapters are sufficient. The added complexity of g-methods is only justified when the time-varying confounder problem is present.

19.8 Summary

Standard regression fails with time-varying confounders that are themselves affected by past treatment.
G-formula (parametric g-computation): simulate the outcome under a hypothetical regime by integrating over the time-varying covariates. ~30 lines of native Julia.
IPTW + MSM: reweight the population so treatment is independent of past covariates; then fit a regression on the weighted sample. Built with GLM.jl.
Longitudinal TMLE: doubly-robust efficient estimator combining g-computation and IPTW. Currently best implemented in R’s ltmle package; a native Julia version is a valuable open ecosystem project.
Reporting practice: report both g-formula and MSM. Agreement is reassuring; disagreement is informative — it usually points to model misspecification.