Getting Started with NPCausal.jl

NPCausal.jl is a modern, high-performance Julia implementation of the popular nonparametric causal inference methods developed by Edward Kennedy.

By default, this package leverages MLJ.jl alongside the extremely fast gradient boosted trees package EvoTrees.jl. It also utilizes native Julia multithreading to perform the cross-fitting loop in parallel, virtually eliminating the bottleneck associated with ensembled nuisance parameter estimation.

Installation

using Pkg
Pkg.add(url="https://github.com/yourusername/NPCausal.jl")

1. Average Treatment Effect (ATE)

The ate() function provides doubly robust estimation of the Average Treatment Effect.

using NPCausal
using DataFrames
using Random

# Generate dummy data
Random.seed!(42)
n = 1000
X = DataFrame(x1 = randn(n), x2 = randn(n))
a = rand([0, 1, 2], n) # Categorical treatment
y = X.x1 .+ X.x2 .* (a .== 1) .+ 2 .* (a .== 2) .+ randn(n)

# Estimate ATE using 2-fold cross-fitting for a fast docs example
results = ate(y, a, X; nsplits=2)

# View Average Treatment Effects
println(results.means)

# View Contrasts (e.g., E[Y(1)] - E[Y(0)])
println(results.contrasts)
[ Info: Starting cross-fitting across 2 folds with 1 threads...
3×5 DataFrame
 Row │ parameter  Estimate   StdError  CI_Lower   CI_Upper
     │ String     Float64    Float64   Float64    Float64
─────┼─────────────────────────────────────────────────────
   1 │ E{Y(0)}    -0.125862  0.128521  -0.377762  0.126038
   2 │ E{Y(1)}    -0.240627  0.174557  -0.582758  0.101504
   3 │ E{Y(2)}     2.18618   0.134748   1.92207   2.45029
3×5 DataFrame
 Row │ parameter          Estimate   StdError  CI_Lower   CI_Upper
     │ String             Float64    Float64   Float64    Float64
─────┼─────────────────────────────────────────────────────────────
   1 │ E{Y(1)} - E{Y(0)}  -0.114765  0.212921  -0.532091  0.302561
   2 │ E{Y(2)} - E{Y(0)}   2.31204   0.180571   1.95812   2.66596
   3 │ E{Y(2)} - E{Y(1)}   2.42681   0.218566   1.99842   2.8552

2. Average Treatment Effect on the Treated (ATT)

If you have a binary treatment variable and are specifically interested in the treatment effect for the treated subpopulation, use the att() function.

using NPCausal
using DataFrames
using Random

# Generate dummy data
Random.seed!(42)
n = 800
X = DataFrame(x1 = randn(n), x2 = randn(n))
# Binary treatment (0 or 1)
a = rand([0, 1], n)
y = X.x1 .+ 3 .* X.x2 .* a .+ randn(n)

# Estimate ATT using 2-fold cross-fitting
results = att(y, a, X; nsplits=2)

# View Average Treatment Effect on the Treated
println(results.res)
3×6 DataFrame
 Row │ parameter      Estimate    StdError  CI_Lower   CI_Upper  P_Value
     │ String         Float64     Float64   Float64    Float64   Float64
─────┼────────────────────────────────────────────────────────────────────
   1 │ E(Y|A=1)        0.0283145  0.161009  -0.287263  0.343892  0.860407
   2 │ E{Y(0)|A=1}     0.487443   0.562077  -0.614228  1.58911   0.385823
   3 │ E{Y-Y(0)|A=1}  -0.459129   0.581395  -1.59866   0.680406  0.429702

Performance Note

For optimal performance, start Julia with multiple threads (e.g., julia -t auto). NPCausal.jl will automatically distribute the cross-validation folds across available CPU threads, performing the nuisance estimation in parallel.