using DataFrames
using GLM
using Statistics
using Random
using LinearAlgebra
using Printf
using CairoMakie7 Matching Estimators
Matching is the oldest and most-taught approach to observational causal inference. For each treated unit, find a “twin” control unit with similar covariates, then compare outcomes within matched pairs. With enough good matches, the comparison approximates a randomised experiment.
The mature implementation in R (MatchIt, cobalt, WeightIt) has no direct Julia counterpart. This chapter implements the core ideas inline — propensity score matching, balance diagnostics, weighted estimation, entropy balancing — using GLM.jl, Optim.jl, and base Julia. For optimal matching, full matching, coarsened exact matching, and additional weighting variants (CBPS, energy balancing) with mature tooling, see the R companion chapter.
Related reading: The Matching and Weighting Part 1 and Treatment effects and matching chapters of Topics on Econometrics and Causal Inference walk through matching with
MatchItand Stata’steffectscommands respectively.
7.1 Assumptions
Matching identifies a causal effect under three assumptions:
- SUTVA — no interference, no hidden treatment versions.
- Ignorability — conditional on \(X\), \(D \perp (Y(0), Y(1))\).
- Overlap — every \(x\) has positive probability of both treatments.
7.2 A simulated example
Random.seed!(42)
n = 2000
X1 = randn(n)
X2 = randn(n)
# Treatment depends on covariates (confounding)
ps_true = @. 1 / (1 + exp(-(-0.5 + 0.6 * X1 + 0.4 * X2)))
D = Float64.(rand(n) .< ps_true)
# Outcome: true treatment effect = 1, plus confounding through X1, X2
Y = @. 1.0 * D + 0.8 * X1 + 0.6 * X2 + 0.5 * randn()
df = DataFrame(Y = Y, D = D, X1 = X1, X2 = X2)
@printf("Naive difference in means: %.3f\n", mean(Y[D .== 1]) - mean(Y[D .== 0]))
@printf("True ATE: 1.000\n")7.3 Estimating the propensity score
ps_fit = glm(@formula(D ~ X1 + X2), df, Binomial(), LogitLink())
df.ps = predict(ps_fit)
@printf("PS range: [%.3f, %.3f]\n", minimum(df.ps), maximum(df.ps))7.4 Nearest-neighbour 1:1 matching
"""
nearest_neighbor_match(ps_treated, ps_control)
Return a vector of indices into the control group, matching each treated
unit to its nearest-neighbour control on PS. Sampling is without replacement.
"""
function nearest_neighbor_match(ps_treated, ps_control)
n_t = length(ps_treated)
avail = trues(length(ps_control))
matches = zeros(Int, n_t)
for (i, p) in enumerate(ps_treated)
candidates = findall(avail)
if isempty(candidates)
break
end
diffs = abs.(ps_control[candidates] .- p)
best = candidates[argmin(diffs)]
matches[i] = best
avail[best] = false
end
matches
end
idx_T = findall(df.D .== 1)
idx_C = findall(df.D .== 0)
matched_C = nearest_neighbor_match(df.ps[idx_T], df.ps[idx_C])
matched_idx_C = idx_C[matched_C[matched_C .> 0]]
matched_idx_T = idx_T[1:length(matched_idx_C)]
@printf("Matched %d treated to %d controls\n",
length(matched_idx_T), length(matched_idx_C))7.5 Balance diagnostics
The standardised mean difference (SMD) before and after matching shows whether the matched groups are balanced:
function smd(x_t, x_c)
(mean(x_t) - mean(x_c)) / sqrt((var(x_t) + var(x_c)) / 2)
end
@printf("%-15s %12s %12s\n", "Covariate", "SMD before", "SMD after")
for v in [:X1, :X2]
smd_before = smd(df[!, v][df.D .== 1], df[!, v][df.D .== 0])
smd_after = smd(df[!, v][matched_idx_T], df[!, v][matched_idx_C])
@printf("%-15s %12.3f %12.3f\n", String(v), smd_before, smd_after)
endStandardised mean differences of magnitude < 0.1 after matching indicate adequate balance. Larger differences suggest the matching is insufficient and a more flexible method (e.g. Mahalanobis distance, or matching with replacement) is needed.
7.6 Estimating the ATT on matched data
After matching, the ATT is the simple difference in means:
att = mean(df.Y[matched_idx_T]) - mean(df.Y[matched_idx_C])
@printf("Matched ATT: %.3f (true ATE = 1.000)\n", att)
# Standard error: paired-difference SE (treats matched pairs as paired)
diffs = df.Y[matched_idx_T] .- df.Y[matched_idx_C]
se = std(diffs) / sqrt(length(diffs))
@printf("Paired-diff SE: %.3f\n", se)
@printf("95%% CI: [%.3f, %.3f]\n", att - 1.96 * se, att + 1.96 * se)The matched estimate is close to the truth, and the paired-difference standard error is the right inferential approach for matched pairs.
7.7 Matching with replacement
When the treated group is larger or the propensity distributions diverge, allowing each control to be matched more than once improves balance at the cost of effective sample size:
function nn_match_replace(ps_treated, ps_control)
[argmin(abs.(ps_control .- p)) for p in ps_treated]
end
matched_repl = nn_match_replace(df.ps[idx_T], df.ps[idx_C])
@printf("Average reuse: %.2f (1.0 = no replacement)\n",
length(matched_repl) / length(unique(matched_repl)))
att_repl = mean(df.Y[idx_T]) - mean(df.Y[idx_C[matched_repl]])
@printf("Matched-with-replacement ATT: %.3f\n", att_repl)7.8 Matching vs weighting
A close cousin of matching is inverse propensity weighting (IPW): weight treated units by \(1 / \hat e(X)\) and control units by \(1 / (1 - \hat e(X))\), then take the weighted difference in means:
w = ifelse.(df.D .== 1, 1 ./ df.ps, 1 ./ (1 .- df.ps))
w_trim = min.(w, quantile(w, 0.99))
att_ipw = sum(df.Y[df.D .== 1] .* w_trim[df.D .== 1]) / sum(w_trim[df.D .== 1]) -
sum(df.Y[df.D .== 0] .* w_trim[df.D .== 0]) / sum(w_trim[df.D .== 0])
@printf("IPW ATT (≈ ATE): %.3f\n", att_ipw)
@printf("Matched ATT (NN): %.3f\n", att)
@printf("Matched ATT (replace): %.3f\n", att_repl)
@printf("True ATE: 1.000\n")IPW and matching often agree closely. When they disagree, the divergence usually reflects either a misspecified propensity score (check with balance diagnostics) or an overlap problem.
7.9 Entropy balancing
Beyond IPW, the modern toolkit of weighting estimators includes entropy balancing (Hainmueller 2012), which yields weights with exact mean balance on every covariate by solving a small convex optimisation problem — no propensity-score model required. R’s WeightIt package implements this and several related methods (CBPS, energy balancing, SuperLearner-based weights); the Julia ecosystem currently lacks an equivalent package, but the core math is short enough to implement inline.
The entropy-balancing problem for the ATT: find weights \(w_i\) on the control units that solve
\[ \min_{w_i \geq 0} \sum_i w_i \log w_i \quad \text{subject to} \quad \sum_i w_i x_{ik} = \bar x_{k}^{\text{treated}} \;\; \forall k, \quad \sum_i w_i = N_{\text{treated}}. \]
Lagrangian duality reduces this to a low-dimensional convex problem in the Lagrange multipliers \(\lambda\) (one per balance constraint). The dual is:
\[ \min_{\lambda} \;\; \log\left(\sum_i \exp(-x_i' \lambda)\right) + \lambda' \bar x^{\text{treated}}. \]
The dual problem is small (one Lagrange multiplier per covariate) and strictly convex, so it can be solved with a few iterations of Newton’s method:
"""
entropy_balance(X_control, X_treated; tol=1e-8, maxiter=50)
Compute entropy-balancing weights on the control units so that the
weighted control mean of every covariate equals the treated mean.
Returns a length-`size(X_control, 1)` vector of normalised weights that
sum to 1.
"""
function entropy_balance(X_control::AbstractMatrix, X_treated::AbstractMatrix;
tol::Float64 = 1e-8, maxiter::Int = 50)
target = vec(mean(X_treated, dims=1)) # length p target means
Xc = X_control # n_c × p
p = size(Xc, 2)
λ = zeros(p)
for _ in 1:maxiter
z = -Xc * λ
m = maximum(z)
w = exp.(z .- m)
w ./= sum(w) # normalised weights
mean_w = vec(w' * Xc) # weighted control mean
grad = target .- mean_w # negative gradient
if maximum(abs.(grad)) < tol
break
end
# Hessian: Σ w_i (x_i - mean_w)(x_i - mean_w)' (with tiny ridge for stability)
Xc_centered = Xc .- reshape(mean_w, 1, :)
H = Matrix((Xc_centered .* w)' * Xc_centered) + 1e-6 .* Matrix{Float64}(I, p, p)
# Newton step: λ ← λ + H⁻¹ grad
step = H \ grad
λ .+= step
end
z = -Xc * λ
m = maximum(z)
w = exp.(z .- m)
w ./ sum(w)
end
# Apply to the simulated example from earlier sections
Xc_mat = Matrix(df[df.D .== 0, [:X1, :X2]])
Xt_mat = Matrix(df[df.D .== 1, [:X1, :X2]])
w_ebal = entropy_balance(Xc_mat, Xt_mat)
@printf("Sum of weights: %.6f (should be 1.0)\n", sum(w_ebal))
@printf("Max weight: %.4f, min weight: %.6f\n",
maximum(w_ebal), minimum(w_ebal))
# Verify the mean-balance constraints
control_means_unweighted = mean(Xc_mat, dims=1)
control_means_balanced = w_ebal' * Xc_mat
treated_means = mean(Xt_mat, dims=1)
@printf("\n%-15s %12s %12s %12s\n",
"Covariate", "Treated", "Control (raw)", "Control (ebal)")
for (j, name) in enumerate([:X1, :X2])
@printf("%-15s %12.4f %12.4f %12.4f\n",
String(name),
treated_means[j], control_means_unweighted[j],
control_means_balanced[j])
endNotice that the entropy-balanced control means exactly equal the treated means — this is guaranteed by the construction. The cost is that the weights can be quite variable when the covariate distributions differ sharply (poor overlap).
The ATT estimate uses these weights on the control outcomes:
Yt = df.Y[df.D .== 1]
Yc = df.Y[df.D .== 0]
# ATT = mean(Y_treated) - weighted mean(Y_control)
att_ebal = mean(Yt) - dot(w_ebal, Yc)
@printf("Entropy-balanced ATT: %.3f (true ATE = 1.000)\n", att_ebal)Entropy balancing is particularly effective when balance on covariate means is the analyst’s primary diagnostic. For balance on full covariate distributions, R’s WeightIt package also implements energy balancing (Huling & Mak 2024); no Julia port currently exists.
7.10 When to use which method
| Method | When to use | Limitation |
|---|---|---|
| NN matching (no replacement) | Simple, intuitive; ATT estimand | Drops unmatched units |
| NN matching (with replacement) | Few treated, many controls | Wasteful of controls |
| IPW | Good propensity model | Sensitive to extreme weights |
| Entropy balancing | Want exact mean balance; small covariate set | Balances means only, not distributions |
| Doubly robust (AIPW) | Combine matching/IPW with regression | More complex |
For applied work in Julia, IPW + AIPW is usually the more practical default than 1:1 matching, because Julia’s regression and weighting infrastructure (GLM, Panelest) is mature whereas matching algorithms are not. For matched analyses, the R ecosystem (MatchIt, cobalt) is the workhorse — the companion R chapter walks through full matching, CEM, and balance diagnostics in detail.
7.11 Summary
- Matching restricts comparisons to similar treated–control pairs, approximating a randomised experiment.
- Propensity score nearest-neighbour matching is the simplest form and implementable in a few lines of Julia.
- Balance diagnostics (standardised mean differences) verify the matching’s success.
- Estimation: paired-difference SE on the matched data; regression adjustment on the matched sample (an even better default).
- For richer matching (optimal, full, CEM) and mature balance tooling, use R’s
MatchIt/cobaltecosystem.