17  Survival Analysis with Causal Inference

library(tidyverse)
library(survival)
library(survminer)
library(riskRegression)

When the outcome is time-to-event — death, hospital re-admission, employment termination, machine failure — standard causal inference methods need to be adapted. Some observations are censored: we know the event has not yet occurred at the end of follow-up, but we do not know when it will. Standard regression on the event time would discard or mishandle these observations and produce biased estimates.

This chapter covers the workhorses of survival-causal inference:

  1. The Cox proportional hazards model with treatment adjustment — the classical regression approach to causal effects on time-to-event outcomes.
  2. Inverse probability of censoring weighting (IPCW) — extends IPTW to handle informative dropout.
  3. Restricted mean survival time (RMST) differences — a more interpretable, assumption-light alternative to hazard ratios.
  4. Doubly-robust survival estimators — combine the outcome model and the censoring weights for efficiency.

17.1 The challenge of censored outcomes

Censoring means we observe \(\min(T_i, C_i)\) and \(\delta_i = \mathbb{1}\{T_i \le C_i\}\), where \(T_i\) is the event time and \(C_i\) is the censoring time. The fundamental quantity of interest is the survival function

\[ S(t) = P(T > t) \]

— the probability of surviving past time \(t\). Treatment effects in survival analysis are typically expressed as:

  • Hazard ratio (HR): \(HR = h_1(t) / h_0(t)\) (the ratio of hazards under treatment vs control). Easy to estimate but hard to interpret causally (Hernán 2010).
  • Restricted mean survival difference (RMST): \(\int_0^\tau [S_1(t) - S_0(t)] dt\) — the difference in mean survival up to time \(\tau\). Causally interpretable as a “years gained” quantity.
  • Survival probability difference: \(S_1(t) - S_0(t)\) at a specified time. Reports a direct probability difference.

17.2 Setup: simulated survival data

df <- read_csv("data/survival_sim.csv", show_col_types = FALSE)
n <- nrow(df)
cat(sprintf("n = %d, events = %d (%.1f%%), censored = %d\n",
            n, sum(df$event), 100 * mean(df$event), sum(1 - df$event)))
n = 1500, events = 1500 (100.0%), censored = 0

17.3 Kaplan-Meier survival curves

The starting point of any survival analysis is the Kaplan-Meier (KM) estimator — a nonparametric estimate of \(S(t)\):

km_fit <- survfit(Surv(time, event) ~ trt, data = df)
ggsurvplot(km_fit, data = df,
           pval = TRUE,
           risk.table = TRUE,
           xlab = "Time (years)",
           legend.labs = c("Control", "Treatment"))

Kaplan-Meier survival curves by treatment arm. Treatment shifts the survival curve upward, but the gap reflects both the causal effect AND any baseline imbalance in age/sex.

The KM curve shows the unadjusted survival difference — useful for description but not for causal inference, because the treatment was assigned based on age.

17.4 Cox proportional hazards with adjustment

The standard approach is the Cox proportional hazards regression with covariate adjustment:

cox_fit <- coxph(Surv(time, event) ~ trt + age + sex, data = df)
tidy(cox_fit, exp = TRUE, conf.int = TRUE) |>
  knitr::kable(digits = 3,
               caption = "Cox PH regression coefficients (exponentiated = hazard ratios)")
Cox PH regression coefficients (exponentiated = hazard ratios)
term estimate std.error statistic p.value conf.low conf.high
trt 0.741 0.058 -5.199 0 0.662 0.830
age 1.029 0.003 10.315 0 1.024 1.035
sex 1.412 0.053 6.565 0 1.274 1.565

The estimated hazard ratio for treatment should be close to the true \(\exp(-0.4) \approx 0.67\).

17.4.1 The interpretation problem with hazard ratios

The Cox HR is not a causal contrast in the usual potential-outcomes sense — it averages over the distribution of survivors at each time, which itself depends on the treatment effect (Hernán 2010). HR estimates are useful for ranking effects but should be reported alongside more interpretable quantities such as RMST differences.

17.5 Restricted Mean Survival Time (RMST)

The restricted mean survival time is the expected survival time up to a cutoff \(\tau\):

\[ \text{RMST}(\tau) = E[\min(T, \tau)] = \int_0^\tau S(t)\, dt. \]

The RMST difference between treatment arms is a clean causal contrast when treatment is randomised (or when the conditioning set is right):

\[ \Delta_{\text{RMST}}(\tau) = \text{RMST}_1(\tau) - \text{RMST}_0(\tau) = \int_0^\tau [S_1(t) - S_0(t)]\, dt. \]

Crucially, it does not require the proportional hazards assumption.

# Use survRM2 via riskRegression / manually computing from KM
tau <- 5  # restrict to 5 years

# Manual RMST: integrate the KM curve up to tau
rmst_arm <- function(time, event, tau) {
  fit <- survfit(Surv(time, event) ~ 1)
  s    <- summary(fit, times = c(0, sort(unique(time[event == 1])), tau),
                  extend = TRUE)
  surv_at_t <- s$surv
  t_at_t    <- s$time
  # Trapezoidal integration of S(t) from 0 to tau
  use      <- t_at_t <= tau
  t_vals   <- c(t_at_t[use], tau)
  s_vals   <- c(surv_at_t[use], surv_at_t[max(which(use))])
  sum(diff(t_vals) * head(s_vals, -1))
}

rmst_trt  <- rmst_arm(df$time[df$trt == 1], df$event[df$trt == 1], tau)
rmst_ctrl <- rmst_arm(df$time[df$trt == 0], df$event[df$trt == 0], tau)

cat(sprintf("RMST(τ=5) under treatment: %.3f years\n", rmst_trt))
RMST(τ=5) under treatment: 0.289 years
cat(sprintf("RMST(τ=5) under control:   %.3f years\n", rmst_ctrl))
RMST(τ=5) under control:   0.260 years
cat(sprintf("Difference:                %.3f years\n", rmst_trt - rmst_ctrl))
Difference:                0.028 years

The RMST difference says “treated patients gain on average X.X years of survival up to 5 years” — a directly interpretable quantity for policy or clinical reporting.

17.5.1 Adjusted RMST via IPW

For an adjusted RMST that handles confounding, weight the KM curve by the inverse propensity score:

# Propensity score
ps_fit <- glm(trt ~ age + sex, data = df, family = binomial)
df$ps  <- predict(ps_fit, type = "response")
df$ipw <- ifelse(df$trt == 1, 1 / df$ps, 1 / (1 - df$ps))
df$ipw <- pmin(df$ipw, quantile(df$ipw, 0.99))  # trim

km_ipw <- survfit(Surv(time, event) ~ trt, data = df, weights = ipw)
print(km_ipw, rmean = tau)
Call: survfit(formula = Surv(time, event) ~ trt, data = df, weights = ipw)

      records    n events rmean* se(rmean) median 0.95LCL 0.95UCL
trt=0    1037 1501   1501  0.253   0.00474  0.210   0.197   0.225
trt=1     463 1482   1482  0.310   0.00562  0.261   0.244   0.295
    * restricted mean with upper limit =  5 

The IPW-adjusted RMST removes the baseline imbalance.

17.6 Inverse Probability of Censoring Weighting (IPCW)

When censoring is informative — e.g. sicker patients drop out earlier — treating censoring as random produces bias. The fix is to weight each unit by the inverse of their probability of remaining uncensored, conditional on covariates.

# Model the censoring hazard
# Use 1 - event as the "censoring" indicator
cens_fit <- coxph(Surv(time, 1 - event) ~ trt + age + sex, data = df)

# Predict survival of censoring at each event time for each unit
df_sub <- df |>
  arrange(time)
times_at_risk <- df_sub$time
# G(t | X): probability of being uncensored at time t given X
# Use survfit on cens_fit to get S_C(t | X)
sc_fit <- survfit(cens_fit, newdata = df_sub)
sc_at_t <- diag(summary(sc_fit, times = df_sub$time, extend = TRUE)$surv)

# IPCW weight: 1 / P(C > T_i | X_i) for event times, 0 for censoring times
df_sub$ipcw <- ifelse(df_sub$event == 1, 1 / pmax(sc_at_t, 0.01), 0)
summary(df_sub$ipcw[df_sub$event == 1])
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1       1       1       1       1       1 

The IPCW weights upweight observations that survived “improbably long without being censored” — replacing the dropouts they implicitly represent.

17.7 Doubly-robust survival estimation

The riskRegression package implements doubly-robust estimators that combine the outcome model (Cox) with the propensity score and the censoring model. The result is robust to misspecification of either:

# Average treatment effect on survival probability at t = 5
# ATE(t) = P(T(1) > t) - P(T(0) > t)
df$trt_f <- factor(df$trt)
cox_for_ate <- coxph(Surv(time, event) ~ trt_f + age + sex, data = df,
                     x = TRUE, y = TRUE)
ps_for_ate  <- glm(trt_f ~ age + sex, data = df, family = binomial, x = TRUE)

ate_surv <- ate(
  event       = cox_for_ate,
  treatment   = ps_for_ate,
  data        = df,
  times       = 5,
  estimator   = c("Gformula", "IPTW", "AIPTW"),
  cause       = 1,
  se          = TRUE,
  B           = 0,
  verbose     = FALSE
)

summary(ate_surv)
     Average treatment effect 

 - Treatment            : trt_f (2 levels: "0" "1")
 - Event                : event (cause: 1)
 - Time  [min;max]      : time [0.00239;1.31]
 - at risk/time         : 5
   number in treatment  0 0
   number in treatment  1 0

 Estimation procedure 
 - Estimators : G-formula, Inverse probability of treatment weighting, Augmented and double robust estimator
 - Uncertainty: Gaussian approximation 
                where the variance is estimated via the influence function 

 Testing procedure
 - Null hypothesis     : given two treatments (A,B) and a specific timepoint, equal risks 
 - Confidence level    : 0.95

 Results: 
 - Difference in standardized risk (B-A) between time zero and 'time' 
                reported on the scale [-1;1] (difference between two probabilities)
 (difference in average risks when treating all subjects with the experimental treatment (B),
                                vs. treating all subjects with the reference treatment (A))

 time trt_f=A risk(trt_f=A) trt_f=B risk(trt_f=B) difference      ci p.value
    5       0            NA       1            NA         NA [NA;NA]      NA

 difference      : estimated difference in standardized risks 
 ci              : pointwise confidence intervals 
 p.value         : (unadjusted) p-value 

The output reports three estimators side-by-side:

  • Gformula — parametric g-computation using the Cox model only.
  • IPTW — inverse probability of treatment weighting.
  • AIPTW — augmented IPTW (doubly robust); the recommended default.

Their agreement is reassuring; disagreement suggests model misspecification.

17.8 Hazard ratio vs RMST: which to report?

Modern survival-causal reporting guidance (Royston-Parmar 2013, Hernán 2010) increasingly favours RMST over HR for several reasons:

Quantity Pro Con
Hazard ratio Compact summary; familiar Requires PH; no causal interpretation; conditional on survivors
RMST difference Directly interpretable as “years gained”; no PH assumption Depends on cutoff \(\tau\); less familiar
Survival probability difference at t Direct probability statement Depends on choice of \(t\)

In practice, report at least:

  1. The KM curves with risk tables (description).
  2. The Cox HR with adjustment (familiar comparison to literature).
  3. The RMST difference at a clinically meaningful \(\tau\) (interpretable causal contrast).
  4. The DR ATE on survival probability at one or two follow-up times (the doubly-robust modern default).

17.9 Connections

  • The G-Methods chapter covers IPTW with time-varying treatments — directly applicable when treatment evolves during follow-up. The IPCW machinery here is the censoring analogue.
  • The Heterogeneous Effects chapter ML methods extend to survival via causal survival forests (grf::causal_survival_forest).
  • The Bayesian Causal Inference chapter extends to survival via Bayesian additive regression trees with a Weibull or piecewise-exponential likelihood (BART::wbart with the appropriate configuration).

17.10 Summary

  • Survival outcomes require methods that handle censoring (some units’ event times are unobserved).
  • Cox proportional hazards is the workhorse but the HR is not a causal contrast in the usual sense.
  • RMST differences are interpretable and avoid the PH assumption — prefer for causal reporting.
  • IPCW extends IPTW to handle informative dropout.
  • Doubly-robust survival (riskRegression::ate) combines outcome models, treatment weights, and censoring weights for efficient and robust estimation.
  • Reporting: combine KM curves (description), adjusted Cox (familiar), RMST difference (interpretable), and DR ATE on survival probability (modern default).