using DataFrames
using Distributions
using GLM
using Statistics
using Random
using LinearAlgebra
using Printf
using CairoMakie
using CSV20 Survival Analysis with Causal Inference
When the outcome is time-to-event — death, hospital re-admission, employment termination, machine failure — standard causal inference methods need to be adapted. Some observations are censored: we know the event has not occurred by the end of follow-up, but not when it will. Regression on the event time would discard or mishandle censored observations and produce biased estimates.
This chapter implements the workhorses of survival-causal inference in Julia: Kaplan-Meier estimation, inverse probability weighting for the adjusted survival curve, and the restricted mean survival time (RMST) contrast. For Cox proportional-hazards modelling, R’s survival package is more mature; for doubly-robust survival estimators, R’s riskRegression::ate is the workhorse. The companion R chapter walks through these in detail.
20.1 The censoring problem
We observe \(\min(T_i, C_i)\) and \(\delta_i = \mathbb{1}\{T_i \le C_i\}\), where \(T_i\) is the event time and \(C_i\) is the censoring time. The fundamental quantity of interest is the survival function
\[ S(t) = P(T > t) \]
— the probability of surviving past time \(t\). Causal contrasts on survival outcomes are typically expressed as:
- Hazard ratio (HR) — easy to estimate but hard to interpret causally (Hernán 2010).
- Restricted mean survival difference (RMST) — \(\int_0^\tau [S_1(t) - S_0(t)] dt\) — interpretable as “years gained” up to time \(\tau\).
- Survival probability difference at t — direct probability statement, requires picking a follow-up time.
20.2 Simulated survival data
df = CSV.read("data/survival_sim.csv", DataFrame)
n = nrow(df)
@printf("n = %d, events = %d (%.1f%%), censored = %d\n",
n, Int(sum(df.event)), 100 * mean(df.event), Int(sum(1 .- df.event)))20.3 Kaplan-Meier estimator
The Kaplan-Meier (KM) estimator computes \(\hat S(t)\) nonparametrically. Implement it inline:
"""
kaplan_meier(time, event)
Compute Kaplan-Meier survival estimates at the unique event times.
Returns (times, survival).
"""
function kaplan_meier(time, event)
df_sort = sort(DataFrame(t = time, d = event), :t)
event_times = unique(df_sort.t[df_sort.d .== 1])
survival = Float64[]
s = 1.0
for t in event_times
n_at_risk = sum(df_sort.t .>= t)
n_events = sum((df_sort.t .== t) .& (df_sort.d .== 1))
s *= 1 - n_events / n_at_risk
push!(survival, s)
end
return event_times, survival
end
t_trt, s_trt = kaplan_meier(df.time[df.trt .== 1], df.event[df.trt .== 1])
t_ctl, s_ctl = kaplan_meier(df.time[df.trt .== 0], df.event[df.trt .== 0])
fig = Figure(size = (700, 400))
ax = Axis(fig[1, 1], xlabel = "Time (years)", ylabel = "S(t)",
title = "Unadjusted Kaplan-Meier survival curves")
stairs!(ax, t_ctl, s_ctl, color = :steelblue, linewidth = 2,
label = "Control")
stairs!(ax, t_trt, s_trt, color = :firebrick, linewidth = 2,
label = "Treatment")
axislegend(ax, position = :rb, framevisible = false)
figThe unadjusted KM curves show the crude survival gap — it includes both the causal effect and any baseline imbalance.
20.4 IPW-adjusted survival
To account for confounding, weight observations by the inverse propensity score before computing KM:
ps_fit = glm(@formula(trt ~ age + sex), df, Binomial(), LogitLink())
df.ps = predict(ps_fit)
df.ipw = ifelse.(df.trt .== 1, 1 ./ df.ps, 1 ./ (1 .- df.ps))
df.ipw .= min.(df.ipw, quantile(df.ipw, 0.99))
"""
weighted_kaplan_meier(time, event, weights)
KM estimate with sample weights — used for IPW-adjusted survival.
"""
function weighted_kaplan_meier(time, event, weights)
df_sort = sort(DataFrame(t = time, d = event, w = weights), :t)
event_times = unique(df_sort.t[df_sort.d .== 1])
survival = Float64[]
s = 1.0
for t in event_times
w_at_risk = sum(df_sort.w[df_sort.t .>= t])
w_events = sum(df_sort.w[(df_sort.t .== t) .& (df_sort.d .== 1)])
s *= 1 - w_events / w_at_risk
push!(survival, s)
end
return event_times, survival
end
t_trt_w, s_trt_w = weighted_kaplan_meier(df.time[df.trt .== 1],
df.event[df.trt .== 1],
df.ipw[df.trt .== 1])
t_ctl_w, s_ctl_w = weighted_kaplan_meier(df.time[df.trt .== 0],
df.event[df.trt .== 0],
df.ipw[df.trt .== 0])
fig2 = Figure(size = (700, 400))
ax2 = Axis(fig2[1, 1], xlabel = "Time (years)", ylabel = "S(t)",
title = "IPW-adjusted Kaplan-Meier survival curves")
stairs!(ax2, t_ctl_w, s_ctl_w, color = :steelblue, linewidth = 2,
label = "Control (IPW-adj)")
stairs!(ax2, t_trt_w, s_trt_w, color = :firebrick, linewidth = 2,
label = "Treatment (IPW-adj)")
axislegend(ax2, position = :rb, framevisible = false)
fig2The IPW-adjusted curves show the survival gap that the causal effect would produce, removing age and sex imbalance.
20.5 Restricted Mean Survival Time
The RMST up to time \(\tau\) is
\[ \text{RMST}(\tau) = E[\min(T, \tau)] = \int_0^\tau S(t) \, dt. \]
The contrast \(\Delta_{\text{RMST}}(\tau) = \text{RMST}_1(\tau) - \text{RMST}_0(\tau)\) is interpretable as “years gained up to \(\tau\)” — a clean causal contrast that does not require the proportional hazards assumption.
Compute it by trapezoidal integration of the KM curve:
function rmst(times, survival, tau)
# Add time 0 and the truncation point, with appropriate S values
t_vec = [0.0; times; tau]
s_vec = [1.0; survival; survival[end]]
# Restrict to times ≤ tau
keep = t_vec .<= tau
t_use = t_vec[keep]
s_use = s_vec[keep]
# Add the tau endpoint
push!(t_use, tau)
push!(s_use, s_use[end])
# Trapezoidal area: width × left-height (since KM is step function)
sum(diff(t_use) .* s_use[1:end-1])
end
τ = 5.0
rmst_trt_unadj = rmst(t_trt, s_trt, τ)
rmst_ctl_unadj = rmst(t_ctl, s_ctl, τ)
rmst_trt_adj = rmst(t_trt_w, s_trt_w, τ)
rmst_ctl_adj = rmst(t_ctl_w, s_ctl_w, τ)
@printf("Unadjusted RMST(τ=5):\n")
@printf(" Treatment: %.3f years Control: %.3f years Diff: %.3f\n",
rmst_trt_unadj, rmst_ctl_unadj, rmst_trt_unadj - rmst_ctl_unadj)
@printf("IPW-adjusted RMST(τ=5):\n")
@printf(" Treatment: %.3f years Control: %.3f years Diff: %.3f\n",
rmst_trt_adj, rmst_ctl_adj, rmst_trt_adj - rmst_ctl_adj)The IPW-adjusted RMST difference is the causal “years gained up to 5 years” — directly interpretable for clinical or policy reporting.
20.6 Cox proportional hazards (regression adjustment)
Fitting a Cox model in Julia requires Survival.jl, which has a more limited API than R’s survival::coxph. The standard approach in Julia for survival-regression is either to:
- Use
Survival.jlfor basic Cox PH (limited adjustment capabilities). - Use a parametric Weibull/exponential model via
DistributionsandOptimdirectly. - Call out to R for serious survival modelling.
For applied work where the Cox model with covariate adjustment is the target, R remains the practical choice. See the companion R chapter for the full Cox + IPCW + doubly-robust workflow.
20.7 Doubly-robust survival estimation
R’s riskRegression::ate provides a doubly-robust estimator that combines a Cox outcome model with a propensity score and a censoring model. The Julia ecosystem currently lacks an equivalent — implementing a doubly-robust survival estimator would be a substantial undertaking (requires careful handling of censoring weights at each event time).
For now, the practical Julia workflow for survival-causal inference is:
- Description: unadjusted KM curves (this chapter).
- Adjusted survival: IPW-weighted KM (this chapter).
- Causal contrasts: RMST differences from the IPW-adjusted KM (this chapter).
- Cox HR with adjustment + doubly-robust estimators: call out to R via
RCall.jlor use the R companion chapter.
20.8 When to use these methods
These methods are needed whenever:
- The outcome is time-to-event with censoring.
- The treatment effect on survival probabilities or years-of-life is the causal question.
For non-censored outcomes, the standard Estimation and Heterogeneous Effects methods are sufficient.
20.9 Summary
- Survival outcomes require methods that handle censoring.
- Kaplan-Meier estimates \(S(t)\) nonparametrically; can be implemented inline in Julia in a few lines.
- IPW-adjusted KM: weight observations by inverse propensity score before computing the survival curve.
- RMST differences (years gained up to \(\tau\)): interpretable causal contrasts that don’t require proportional hazards.
- For Cox PH regression with adjustment and doubly-robust survival, use R’s mature
survival+riskRegressionecosystem — see the companion R chapter.