10  Continuous and Multivalued Treatments

using DataFrames
using Distributions
using GLM
using Statistics
using Random
using LinearAlgebra
using Printf
using CairoMakie

The earlier chapters treated \(D\) as binary. Real treatments are often continuous (dose, dollars, hours) or multivalued (one of several treatment arms). For these we estimate the entire dose-response curve

\[ \mu(d) = \mathbb{E}[Y(d)], \quad d \in \mathcal{D}. \]

This chapter implements two workhorse approaches inline in Julia: the generalized propensity score (Hirano-Imbens 2004) and a doubly-robust extension. For modern modified-treatment-policy machinery (LMTP, Diaz- Hejazi 2020), R’s lmtp package is the workhorse — covered in the companion R chapter and in the longer LMTP blog chapter.

10.1 Setup

Random.seed!(42)
n  = 2000
X1 = randn(n)
X2 = randn(n)
# Continuous treatment, depends on X1, X2
D  = @. 2.0 + 0.5 * X1 + 0.5 * X2 + randn()
# Dose-response: nonlinear in D (hump-shaped)
true_mu(d) = 1.0 + 2.0 * d - 0.15 * d^2
Y  = @. true_mu(D) + 0.3 * X1 - 0.3 * X2 + 0.5 * randn()

df = DataFrame(Y = Y, D = D, X1 = X1, X2 = X2)
@printf("Treatment range: [%.2f, %.2f]\n", minimum(D), maximum(D))
@printf("True μ(d=2) = %.2f, μ(d=4) = %.2f, μ(d=6) = %.2f\n",
        true_mu(2), true_mu(4), true_mu(6))

10.2 Naive regression vs adjusted regression

A linear fit on \(D\) alone is doubly biased (confounding plus model misspecification):

df.D2 = df.D .^ 2
linear_naive = lm(@formula(Y ~ D),                       df)
linear_adj   = lm(@formula(Y ~ D + X1 + X2),             df)
poly_adj     = lm(@formula(Y ~ D + D2 + X1 + X2),        df)

ds = 0.0:0.5:10.0

predict_mu(fit) = Float64.(predict(fit, DataFrame(D = collect(ds),
                                                   D2 = collect(ds) .^ 2,
                                                   X1 = 0, X2 = 0)))
fits = (
    linear_naive = predict_mu(linear_naive),
    linear_adj   = predict_mu(linear_adj),
    poly_adj     = predict_mu(poly_adj),
)
truth = true_mu.(ds)

fig = Figure(size = (700, 400))
ax = Axis(fig[1, 1], xlabel = "Dose D", ylabel = "E[Y(d)]",
          title = "Naive fits vs true dose-response")
lines!(ax, ds, truth, color = :black, linestyle = :dash,
       linewidth = 2, label = "Truth")
lines!(ax, ds, fits.linear_naive, label = "Linear naive")
lines!(ax, ds, fits.linear_adj,   label = "Linear adj")
lines!(ax, ds, fits.poly_adj,     label = "Quadratic adj")
axislegend(ax, position = :rt, framevisible = false)
fig

Even the covariate-adjusted quadratic fit will leave bias if the true dose-response is not exactly quadratic.

10.3 Generalized Propensity Score (Hirano-Imbens 2004)

The generalized propensity score is the conditional density of the treatment given covariates:

\[ r(d, x) = f_{D \mid X}(d \mid x). \]

Under unconfoundedness, conditioning on \(r(D, X)\) alone identifies the dose-response curve. We estimate \(r\) assuming \(D \mid X \sim \mathcal{N}(\beta_0 + \beta'X, \sigma^2)\):

gps_model = lm(@formula(D ~ X1 + X2), df)
d_pred    = predict(gps_model)
sigma_e   = sqrt(sum(abs2.(residuals(gps_model))) / dof_residual(gps_model))

df.gps = @. pdf(Normal(d_pred, sigma_e), df.D)
@printf("GPS summary: min=%.4f, mean=%.4f, max=%.4f\n",
        minimum(df.gps), mean(df.gps), maximum(df.gps))

The Hirano-Imbens estimator regresses \(Y\) on flexible functions of \((D, \hat r)\) and then averages across the GPS distribution at each \(d\):

# Stage 2: regress Y on flexible function of D and the GPS
df.gps2 = df.gps .^ 2
hi_fit = lm(@formula(Y ~ D + D2 + gps + gps2 + D & gps), df)

# Dose-response at each d: average over the empirical GPS distribution at that d
function estimate_dose(d)
    gps_at_d = @. pdf(Normal(d_pred, sigma_e), d)
    nd = DataFrame(D = fill(d, n), D2 = fill(d^2, n),
                   gps = gps_at_d, gps2 = gps_at_d .^ 2)
    mean(predict(hi_fit, nd))
end

hi_dose = [estimate_dose(d) for d in ds]

fig2 = Figure(size = (700, 400))
ax2 = Axis(fig2[1, 1], xlabel = "Dose D", ylabel = "E[Y(d)]",
           title = "Hirano-Imbens GPS dose-response")
lines!(ax2, ds, truth, color = :black, linestyle = :dash,
       linewidth = 2, label = "Truth")
lines!(ax2, ds, fits.poly_adj, label = "Quadratic regression")
lines!(ax2, ds, hi_dose, label = "Hirano-Imbens GPS")
axislegend(ax2, position = :rt, framevisible = false)
fig2

The GPS estimator handles the curvature and does not require the researcher to specify the functional form of the dose-response.

10.4 Doubly-robust dose-response (Kennedy 2017)

The DR estimator combines an outcome model \(\hat\mu(d, x)\) with the GPS \(\hat r(d \mid x)\):

\[ \hat\mu_{DR}(d) = \frac{1}{n} \sum_i \left[ \hat\mu(d, X_i) + K_h(D_i - d) \frac{Y_i - \hat\mu(D_i, X_i)}{\hat r(D_i \mid X_i)} \right] \]

where \(K_h\) is a Gaussian kernel of bandwidth \(h\). The result is doubly robust and asymptotically normal.

# Outcome model: μ̂(D, X)
mu_fit = lm(@formula(Y ~ D + D2 + X1 + X2 + D & X1 + D & X2), df)

# Predict μ̂(D_i, X_i) for the observed treatments
mu_at_obs = predict(mu_fit)

# DR dose-response at a grid
function dr_estimate(d; h = 0.5)
    nd = DataFrame(D = fill(d, n), D2 = fill(d^2, n),
                   X1 = df.X1, X2 = df.X2)
    mu_at_d = predict(mu_fit, nd)
    kernel  = @. pdf(Normal(0, h), df.D - d)
    weight  = @. kernel / df.gps
    weight ./= mean(weight)
    correction = @. weight * (df.Y - mu_at_obs)
    mean(mu_at_d) + mean(correction)
end

dr_dose = [dr_estimate(d) for d in ds]

fig3 = Figure(size = (700, 400))
ax3 = Axis(fig3[1, 1], xlabel = "Dose D", ylabel = "E[Y(d)]",
           title = "Doubly-robust dose-response")
lines!(ax3, ds, truth, color = :black, linestyle = :dash,
       linewidth = 2, label = "Truth")
lines!(ax3, ds, hi_dose, label = "Hirano-Imbens GPS")
lines!(ax3, ds, dr_dose, label = "DR (Kennedy 2017)")
axislegend(ax3, position = :rt, framevisible = false)
fig3

The DR estimator is consistent if either \(\hat\mu\) or \(\hat r\) is correctly specified, and efficient when both are. The kernel bandwidth \(h\) controls the bias-variance trade-off.

10.5 Multivalued treatments

For a discrete multivalued treatment \(D \in \{0, 1, \ldots, K\}\), the generalised propensity score becomes a multinomial probability:

\[ \hat r(d, x) = \widehat{P}(D = d \mid X = x). \]

A multinomial regression supplies \(\hat r\); the dose-response curve becomes the average treatment effect at each level:

# Simulate a 4-level treatment
Random.seed!(99)
n_mv = 2000
X_mv = randn(n_mv, 2)
util  = hcat(zeros(n_mv),
             @.(0.2 + 0.5 * X_mv[:, 1]),
             @.(0.3 + 0.4 * X_mv[:, 2]),
             @.(0.5 + 0.3 * X_mv[:, 1] + 0.3 * X_mv[:, 2]))
# Softmax to get probabilities
exp_util = exp.(util)
prob_mv  = exp_util ./ sum(exp_util, dims = 2)
D_mv  = [rand(Distributions.Categorical(prob_mv[i, :])) - 1 for i in 1:n_mv]
Y_mv  = @. 0.5 * D_mv + 0.3 * X_mv[:, 1] - 0.2 * X_mv[:, 2] + randn()

# Group means by treatment level
mv_df = DataFrame(Y = Y_mv, D = D_mv, X1 = X_mv[:, 1], X2 = X_mv[:, 2])
by_d = combine(groupby(mv_df, :D), :Y => mean => :y_mean, :Y => length => :n)
@printf("%-6s %12s %12s\n", "D", "Mean(Y)", "N")
for r in eachrow(by_d)
    @printf("%-6d %12.3f %12d\n", r.D, r.y_mean, r.n)
end

In practice, regression-adjusted contrasts between adjacent levels (D = 1 vs D = 0, D = 2 vs D = 1, etc.) are the natural reporting unit for multivalued treatments.

10.6 When to reach for these methods

These methods are needed whenever:

  1. The treatment is continuous or multivalued.
  2. The research question concerns the shape of the dose-response curve, not just a single binary contrast.

For binary treatments, the standard Estimation and Heterogeneous Effects methods are sufficient.

For time-varying continuous treatments and modified-policy interventions that respect positivity, the LMTP framework (Diaz-Hejazi 2020) is the modern standard. Its R implementation (lmtp package) is mature; a Julia version is a valuable open ecosystem project.

10.7 Summary

  • Continuous and multivalued treatments require estimating a dose-response curve \(\mu(d)\), not a single scalar.
  • Generalized propensity score (Hirano-Imbens 2004) extends propensity-score methods to continuous \(D\); implementable in a few lines of Julia.
  • Doubly-robust dose-response (Kennedy 2017) adds an outcome model to the GPS for robustness and efficiency; also implementable inline.
  • For modified treatment policies and time-varying continuous treatments, use R’s lmtp package — see the companion blog chapter.