22 Causal Discovery: Observed Variables

using CausalInference
using RecursiveCausalDiscovery
using CausalGraphs
using Graphs
using CairoMakie
using Statistics
using LinearAlgebra
using Random
using Printf
using DataFrames
using CSV
using Distributions: Normal, quantile

Earlier chapters assumed the graph was known, or at least that we could draw a defensible DAG from subject knowledge. Causal discovery asks what can be learned about the graph from observational data.

22.1 The Problem

Given \(n\) i.i.d. observations of \(p\) variables \((X_1, \ldots, X_p)\), we want to recover the directed acyclic graph (DAG) \(G\) that generated the data.

There is an important limit: observational data alone cannot distinguish between DAGs with the same conditional independence structure. The set of observationally equivalent DAGs is a Markov equivalence class. The data can identify this class, represented by a CPDAG, not necessarily the one true DAG.

A CPDAG has the same skeleton (undirected edges) as all DAGs in its MEC. Some edges can be oriented (they have the same direction in every member of the MEC); others remain undirected because both directions are consistent with the CI structure.

22.1.1 Why some edges cannot be oriented

Three DAGs are observationally equivalent — they produce the same set of conditional independences — and therefore indistinguishable from data:

\[X \to Y \to Z \qquad X \leftarrow Y \leftarrow Z \qquad X \leftarrow Y \to Z\]

All three imply \(X \perp\!\!\!\perp Z \mid Y\) and no other CI. Their CPDAG shows \(X - Y - Z\) with no arrowheads.

The structure \(X \to Y \leftarrow Z\) (a collider at \(Y\)) is not equivalent: here \(X \perp\!\!\!\perp Z\) but \(X \not\!\perp\!\!\!\perp Z \mid Y\). Because it has a unique CI signature, the \(\to Y \leftarrow\) orientation is identifiable from data. Algorithms orient these colliders; non-collider edges often remain undirected.

22.1.2 What a CPDAG tells you

A directed edge \(X \to Y\) in the CPDAG means: in every DAG consistent with the data, \(X\) causes \(Y\). An undirected edge \(X - Y\) means some consistent DAGs have \(X \to Y\) and others have \(X \leftarrow Y\) — data alone cannot decide.

For causal estimation: once you have a CPDAG, you can use background knowledge (temporal ordering, experimental context) to orient the remaining undirected edges, and then apply identification strategies from earlier chapters.

22.2 Simulation

We generate a random Gaussian linear DAG using an Erdős–Rényi random graph with 8 nodes and edge probability 0.35. Each edge \(j \to i\) generates a structural equation

\[X_i = \sum_{j \in \text{pa}(i)} \beta_{ji} X_j + \varepsilon_i, \quad \varepsilon_i \sim N(0,1)\]

where coefficients \(\beta_{ji}\) are drawn uniformly from \([-1, -0.25] \cup [0.25, 1]\).

Random.seed!(2025)

n_vars    = 8
n_samples = 2000
edge_prob = 0.35
labels    = ["X$i" for i in 1:n_vars]

adj_mat   = gen_er_dag_adj_mat(n_vars, edge_prob)
data_mat  = gen_gaussian_data(adj_mat, n_samples)

true_dag  = SimpleDiGraph(adj_mat)
true_skel = SimpleGraph(true_dag)

@printf "Variables: %d   Samples: %d   True edges: %d\n" n_vars n_samples ne(true_skel)

Variables: 8   Samples: 2000   True edges: 9

draw_graph(adjmat_to_admg(adj_mat, labels); direction="TB")

Figure 22.1: True DAG used for simulation.

22.3 CI Test Infrastructure

Both algorithms use a Fisher-Z conditional independence test appropriate for Gaussian data. Given the partial correlation \(\hat\rho_{XY \mid Z}\), the test statistic is

\[T = \sqrt{n - |Z| - 3} \cdot \operatorname{atanh}(\hat\rho_{XY \mid Z}) \;\sim\; N(0,1) \quad \text{under } H_0: X \perp\!\!\!\perp Y \mid Z\]

Choosing the significance level \(\alpha\) involves a bias-variance tradeoff on the skeleton:

Low \(\alpha\) (e.g., 0.001): conservative test — it fails to reject independence more easily, so it removes more edges. Fewer spurious edges (false positives) but may discard real ones (false negatives). Produces a sparser skeleton.
High \(\alpha\) (e.g., 0.05): liberal test — it rejects independence (declares dependence) more often, so it keeps more edges. Fewer false removals but may retain spurious edges. Produces a denser skeleton.

With \(n = 2000\) and \(p = 8\) we use \(\alpha = 0.01\), a common default. With smaller \(n\) or larger \(p\) the CI tests lose power and true edges get dropped, so consider raising \(\alpha\) to avoid over-pruning.

We wrap fisher_z in a counter to track exactly how many CI tests each algorithm calls:

function make_counted_ci(data, sig)
    count = Ref(0)
    function ci(x, y, S, d)
        count[] += 1
        fisher_z(x, y, collect(Int, S), Matrix{Float64}(d), sig)
    end
    ci, count
end

make_counted_ci (generic function with 1 method)

22.4 PC Algorithm

The PC algorithm (Spirtes & Glymour, 1991) starts from a complete undirected graph and removes edges whenever a conditional independence is found. It searches for separating sets \(Z\) of increasing size, conditioning on subsets of the current neighbors. After skeleton discovery, Meek’s rules orient as many edges as possible.

How it works, step by step:

Start with a complete undirected graph (every pair connected).
For each pair \((X, Y)\), test \(X \perp\!\!\!\perp Y \mid Z\) for all \(Z \subseteq \text{neighbors}(X) \setminus \{Y\}\), starting with \(|Z|=0\). Remove the edge if any such test passes.
Repeat with conditioning sets of increasing size until no new edges are removed.
Identify colliders: if \(X - Z - Y\) and \(X, Y\) non-adjacent, orient as \(X \to Z \leftarrow Y\) only if \(Z\) was not in the separating set of \(X\) and \(Y\).
Apply Meek’s orientation rules to propagate orientations without creating new colliders or cycles.

sig_level = 0.01

ci_pc, ctr_pc = make_counted_ci(data_mat, sig_level)
cpdag_pc = pcalg(n_vars, ci_pc, data_mat)
skel_pc  = SimpleGraph(cpdag_pc)

f1_skel_pc  = f1_score(true_skel, skel_pc)
f1_cpdag_pc = f1_score(true_dag,  cpdag_pc)

@printf "PC:    skeleton F1 = %.3f   CPDAG F1 = %.3f   CI tests = %d\n" f1_skel_pc f1_cpdag_pc ctr_pc[]

PC:    skeleton F1 = 0.750   CPDAG F1 = 0.500   CI tests = 205

Note

Reading the output: F1 scores

The F1 score is the harmonic mean of precision and recall, ranging from 0 (worst) to 1 (perfect).

Skeleton F1 measures whether the algorithm found the right edges regardless of direction. An edge is a true positive if it exists in both the estimated skeleton and the true one.
CPDAG F1 is stricter: it compares the two graphs’ directed-edge sets, with an undirected CPDAG edge represented as both directions. A correctly-adjacent but unoriented edge therefore contributes one true positive (the direction matching the true DAG) and one false positive (the reverse) — a partial penalty — while a wrongly oriented edge contributes a false positive and a false negative.

A high skeleton F1 with lower CPDAG F1 means the algorithm found the right adjacencies but left some edges unoriented — the typical case when the MEC contains many equivalent DAGs.

Computational complexity: In the worst case, PC tests all subsets of neighbors, giving exponential growth with the maximum degree. In sparse graphs this is manageable; in dense graphs it becomes prohibitive.

22.5 RSL-D Algorithm

RSL-D (Recursive Structure Learning — Diamond-free; Mokhtarian et al., JMLR 2025) works by recursively removing variables that satisfy a removability criterion.

Two sets are central to the criterion:

Markov boundary \(\text{Mb}(X)\): the minimal set \(S\) such that \(X \perp\!\!\!\perp (\text{all other variables}) \mid S\). Conditioning on \(\text{Mb}(X)\) screens \(X\) off from everything else in the graph. In a Gaussian DAG with no hidden variables, \(\text{Mb}(X)\) consists of \(X\)’s parents, children, and co-parents (other parents of \(X\)’s children) — equivalently, all variables directly adjacent to \(X\) in the skeleton.
Neighbourhood \(\text{Ne}(X)\): the set of variables directly adjacent to \(X\) in the current skeleton (connected by an undirected edge at this stage of the algorithm).

Under causal sufficiency and faithfulness, \(\text{Mb}(X) = \text{Ne}(X)\); the distinction arises during the algorithm’s recursive steps as variables are removed one by one.

Variable \(X\) is removable if, for every pair \(Y \in \text{Mb}(X)\), \(Z \in \text{Ne}(X)\):

\[\exists\, W \subseteq \text{Mb}(X) \setminus \{Y, Z\} \text{ such that } Y \perp\!\!\!\perp Z \mid W\]

(Lemma 3 of the paper). Removing variables one by one and recording the local neighbourhood structure recovers the full CPDAG.

A variable is removable if its local neighborhood can be learned before the rest of the graph is fully known. The algorithm removes such variables one at a time and records the local structure.

How it works, step by step:

Estimate the Markov boundary \(\text{Mb}(X)\) of each variable (all variables that, conditioned on, make \(X\) independent of everything else).
Find a removable variable \(X\) — one whose neighbourhood satisfies the criterion above.
Record the local skeleton structure around \(X\), then remove \(X\) from the graph.
Repeat on the reduced variable set until all variables have been processed.
Reconstruct the full CPDAG from the recorded local structures.

The key efficiency advantage: RSL-D uses at most \(O(p \cdot d \cdot |\text{MB}|^2)\) CI tests (where \(d\) is the max neighbourhood size). PC can require exponentially many in the worst case as the separator-set search grows.

ci_rsl, ctr_rsl = make_counted_ci(data_mat, sig_level)
cpdag_rsl = rsld(data_mat, ci_rsl)
skel_rsl  = SimpleGraph(cpdag_rsl)

f1_skel_rsl  = f1_score(true_skel, skel_rsl)
f1_cpdag_rsl = f1_score(true_dag,  cpdag_rsl)

@printf "RSL-D: skeleton F1 = %.3f   CPDAG F1 = %.3f   CI tests = %d\n" f1_skel_rsl f1_cpdag_rsl ctr_rsl[]

RSL-D: skeleton F1 = 1.000   CPDAG F1 = 0.947   CI tests = 43

22.6 Comparison

Note

Reading the CPDAG figure

In the side-by-side graph display:

A directed edge \(X_i \to X_j\) means this orientation is invariant across all equivalent DAGs — the algorithm is confident about the direction.
A bidirected (red) edge \(X_i \leftrightarrow X_j\) in the CPDAG display represents an unoriented edge: both \(X_i \to X_j\) and \(X_i \leftarrow X_j\) are statistically equivalent. This is not a hidden common cause — it means the data cannot determine direction. (Hidden common causes are a latent-variable concept, covered in the next chapter.)
A missing edge means the algorithm found a conditional independence separating those two variables.

Compare the estimated CPDAG to the true DAG: extra edges are false positives (the algorithm wrongly retained a spurious association); missing edges are false negatives (a true relationship was incorrectly removed by a CI test).

DataFrame(
    Algorithm   = ["PC",              "RSL-D"],
    Skeleton_F1 = round.([f1_skel_pc,  f1_skel_rsl],  digits=3),
    CPDAG_F1    = round.([f1_cpdag_pc, f1_cpdag_rsl], digits=3),
    CI_Tests    = [ctr_pc[],           ctr_rsl[]],
)

Table 22.1: Algorithm comparison on the simulated 8-node DAG (n = 2000, significance 0.01)

2×4 DataFrame

Row	Algorithm	Skeleton_F1	CPDAG_F1	CI_Tests
	String	Float64	Float64	Int64
1	PC	0.75	0.5	205
2	RSL-D	1.0	0.947	43

side_by_side([
    (adjmat_to_admg(adj_mat, labels),  "True DAG"),
    (cpdag_to_admg(cpdag_pc,  labels), "PC — estimated CPDAG"),
    (cpdag_to_admg(cpdag_rsl, labels), "RSL-D — estimated CPDAG"),
]; note="Red ↔ edges in the CPDAGs are unoriented (not confounded) — both directions are statistically equivalent.")

True DAG

PC — estimated CPDAG

RSL-D — estimated CPDAG

Red ↔ edges in the CPDAGs are unoriented (not confounded) — both directions are statistically equivalent.

Figure 22.2: True DAG vs. estimated CPDAGs. Blue directed edges are oriented; red double-headed edges are unoriented (both directions are CI-equivalent).

Note

Interpreting the comparison table

Skeleton F1 is the primary metric for comparing algorithms: it measures edge recovery independent of orientation, since orientation accuracy is bounded by the MEC.
CPDAG F1 rewards correctly oriented edges and penalizes leaving orientable edges undirected.
CI tests is the computational cost metric. Fewer tests means faster runtime, especially important when \(n\) is large and each test involves inverting covariance matrices.

On this dataset RSL-D typically matches or exceeds PC’s accuracy while using substantially fewer CI tests — the advantage grows with \(p\) and graph density.

22.7 Monte Carlo Evaluation

A single dataset can be lucky. We average over 100 replications to get stable estimates.

What to look for in the MC results:

The mean skeleton F1 is the headline: how often does each algorithm recover the true adjacencies?
The distribution of CI test counts (shown in Figure 22.3) shows whether the efficiency advantage is consistent or only appears on average.
High variance in F1 across replications suggests the algorithms are sensitive to the particular random graph — denser or sparser graphs drawn from the same Erdős–Rényi model can be very different problems.

function evaluate_once(seed; n_vars=8, n_samples=2000, edge_prob=0.35, sig=0.01)
    Random.seed!(seed)
    adj  = gen_er_dag_adj_mat(n_vars, edge_prob)
    data = gen_gaussian_data(adj, n_samples)
    skel = SimpleGraph(SimpleDiGraph(adj))

    ci_pc,  ctr_pc  = make_counted_ci(data, sig)
    ci_rsl, ctr_rsl = make_counted_ci(data, sig)

    cp_pc  = pcalg(n_vars, ci_pc,  data)
    cp_rsl = rsld(data, ci_rsl)

    (
        f1_pc  = f1_score(skel, SimpleGraph(cp_pc)),
        f1_rsl = f1_score(skel, SimpleGraph(cp_rsl)),
        ci_pc  = ctr_pc[],
        ci_rsl = ctr_rsl[],
    )
end

mc = [evaluate_once(s) for s in 1:100]

@printf "\nMonte Carlo averages (100 replications, n=%d, p=%d)\n" n_samples n_vars
@printf "%-8s  Skeleton F1  CI tests\n" "Method"
@printf "%-8s  %10.3f  %8.1f\n" "PC"    mean(r.f1_pc  for r in mc) mean(r.ci_pc  for r in mc)
@printf "%-8s  %10.3f  %8.1f\n" "RSL-D" mean(r.f1_rsl for r in mc) mean(r.ci_rsl for r in mc)


Monte Carlo averages (100 replications, n=2000, p=8)
Method    Skeleton F1  CI tests
PC             0.894     237.2
RSL-D          0.946      57.2

let
    fig = Figure(size=(650, 380))
    ax  = Axis(fig[1,1];
        xlabel = "CI tests",
        ylabel = "Frequency",
        title  = "CI Test Counts: PC vs RSL-D (100 replications)")
    hist!(ax, [r.ci_pc  for r in mc]; bins=20, color=(:steelblue, 0.65), label="PC")
    hist!(ax, [r.ci_rsl for r in mc]; bins=20, color=(:tomato, 0.65),    label="RSL-D")
    axislegend(ax)
    fig
end

Figure 22.3: Distribution of CI test counts over 100 replications. RSL-D consistently uses fewer CI tests.

22.8 Real Data: Student Performance in PISA

The simulation had a luxury real research never does: a known true DAG, so we could score recovery with F1. On real data there is no ground truth, so the questions change. Instead of “how close is the estimate to the truth?” we ask: do PC and RSL-D agree; what can background knowledge orient; and is the structure stable under resampling?

We illustrate on the OECD’s PISA 2022 assessment, restricted to United States students and six variables with a clear substantive ordering.

Node	Meaning	Role
`HISEI`	Highest parental occupational status (ISEI)	Family background
`HOMEPOS`	Home possessions index	Family background
`IMMIG`	Immigration status (1 native … 3 first-gen)	Demographic
`GRADE`	Grade relative to modal grade for age	Schooling
`GENDER`	1 = female, 0 = male	Demographic
`MATH`	Mean of the 10 mathematics plausible values	Outcome

pisa   = CSV.read("data/pisa_usa2022.csv", DataFrame)
plabs  = ["HISEI","HOMEPOS","IMMIG","GRADE","GENDER","MATH"]
Xp     = Matrix{Float64}(pisa[:, plabs])
wp     = Float64.(pisa.W)
np_, pp_ = size(Xp)
@printf "Students (complete cases): %d\n" np_

Students (complete cases): 3890

Note

Three honesty caveats, kept explicit:

Plausible values. Each student’s ability is reported as 10 plausible values (posterior draws), not one score. We use their mean, understating the measurement variance; a full treatment runs discovery on each PV and pools.
Categorical-as-Gaussian. IMMIG, GRADE, GENDER are discrete; Fisher-Z assumes joint Gaussianity — a standard but imperfect approximation.
One country, one wave. Pooling would inject country/time structure that appears as spurious edges unless modelled as extra nodes.

22.8.1 Respecting the survey design

PISA is not a simple random sample — students carry final weights W_FSTUWT. The CI tests are functions of the covariance matrix and a sample size, so we supply design-consistent inputs: a weighted covariance/correlation matrix, and Kish’s effective sample size \(n_{\text{eff}} = (\sum_i w_i)^2 / \sum_i w_i^2\).

mu_p  = (Xp' * wp) ./ sum(wp)
Xc_p  = Xp .- mu_p'
Sig_p = (Xc_p' * (wp .* Xc_p)) ./ sum(wp)      # design-weighted covariance
sd_p  = sqrt.(diag(Sig_p)); Cw_p = Sig_p ./ (sd_p * sd_p')
neff  = sum(wp)^2 / sum(wp .^ 2)
@printf "Raw n = %d   Kish effective n = %.0f\n" np_ neff

Raw n = 3890   Kish effective n = 3481

let dfc = DataFrame(Variable = plabs)
    for (k, l) in enumerate(plabs); dfc[!, l] = round.(Cw_p[:, k], digits=2); end
    dfc
end

Table 22.2: Design-weighted correlation matrix (PISA 2022, USA).

6×7 DataFrame

Row	Variable	HISEI	HOMEPOS	IMMIG	GRADE	GENDER	MATH
	String	Float64	Float64	Float64	Float64	Float64	Float64
1	HISEI	1.0	0.42	-0.17	0.05	-0.02	0.33
2	HOMEPOS	0.42	1.0	-0.18	0.05	0.03	0.4
3	IMMIG	-0.17	-0.18	1.0	0.07	0.01	-0.05
4	GRADE	0.05	0.05	0.07	1.0	0.08	0.17
5	GENDER	-0.02	0.03	0.01	0.08	1.0	-0.09
6	MATH	0.33	0.4	-0.05	0.17	-0.09	1.0

22.8.2 Two algorithms on the weighted data

The CI test recomputes correlations from a data matrix, so to honour the design we build a synthetic sample of \(n_{\text{eff}}\) rows whose empirical covariance equals the design-weighted \(\Sigma_w\) exactly (a Cholesky recolour). Both PC and RSL-D then run on these design-consistent inputs unchanged.

m_eff = round(Int, neff)
Random.seed!(1)
Z  = randn(m_eff, pp_); Z .-= mean(Z, dims=1)
Z  = Z * inv(cholesky(Symmetric(cov(Z))).U)     # whiten
Xw = Z * cholesky(Symmetric(Sig_p)).U .+ mu_p'  # recolour to weighted covariance
@printf "Synthetic covariance reproduces Σ_w (max abs diff = %.1e)\n" maximum(abs.(cov(Xw) .- Sig_p))

sig_p = 0.01
ci_p(x, y, S, d) = fisher_z(x, y, collect(Int, S), Matrix{Float64}(d), sig_p)

cpdag_pcp  = pcalg(pp_, ci_p, Xw)
cpdag_rslp = rsld(Xw, ci_p)
skel_pcp   = SimpleGraph(cpdag_pcp)
skel_rslp  = SimpleGraph(cpdag_rslp)

common = length(intersect(Set(Tuple.(sort.([ [e.src,e.dst] for e in edges(skel_pcp)]))),
                          Set(Tuple.(sort.([ [e.src,e.dst] for e in edges(skel_rslp)])))))
@printf "Skeleton edges — PC: %d   RSL-D: %d   in both: %d\n" ne(skel_pcp) ne(skel_rslp) common

Synthetic covariance reproduces Σ_w (max abs diff = 9.1e-13)
Skeleton edges — PC: 9   RSL-D: 9   in both: 9

side_by_side([
    (cpdag_to_admg(cpdag_pcp,  plabs), "PC — estimated CPDAG"),
    (cpdag_to_admg(cpdag_rslp, plabs), "RSL-D — estimated CPDAG"),
]; note="Red ↔ edges are unoriented (not confounded) — both directions are statistically equivalent.")

PC — estimated CPDAG

RSL-D — estimated CPDAG

Red ↔ edges are unoriented (not confounded) — both directions are statistically equivalent.

Figure 22.4: Estimated CPDAGs from PC and RSL-D on PISA 2022 (USA). Blue directed edges are oriented; red double-headed edges are unoriented (both directions CI-equivalent). No true-DAG panel — on real data we have no ground truth.

Important

Read the disagreement, not just the agreement. A purely data-driven method may orient an edge as MATH → HISEI — a child’s test score causing their parent’s occupational status. That is causally backwards, and the data cannot protect us: which direction this edge takes is not identifiable from observational data alone, so any orientation the algorithm prints reflects its (possibly mistaken) v-structure decisions under misspecification, not evidence about the true direction. This is exactly why background knowledge is not optional.

22.8.3 Orienting with background knowledge

Sex and immigration background are fixed at birth; family socioeconomic status is established long before the test; grade precedes the assessment; the score is the outcome. That gives a tier ordering

\[ \{\text{GENDER}, \text{IMMIG}\} \prec \{\text{HISEI}, \text{HOMEPOS}\} \prec \text{GRADE} \prec \text{MATH}. \]

A cross-tier edge must point earlier→later; a within-tier edge the ordering cannot decide. We impose this on the discovered skeleton, because the ordering is knowledge we hold independently of how the data oriented things.

tier_p = Dict("GENDER"=>0,"IMMIG"=>0,"HISEI"=>1,"HOMEPOS"=>1,"GRADE"=>2,"MATH"=>3)
di_e = Tuple{Symbol,Symbol}[]; bi_e = Tuple{Symbol,Symbol}[]
for e in edges(skel_pcp)
    a, b = plabs[e.src], plabs[e.dst]
    if tier_p[a] == tier_p[b]
        push!(bi_e, (Symbol(a), Symbol(b)))          # within-tier: undirected
    else
        lo, hi = tier_p[a] < tier_p[b] ? (a, b) : (b, a)
        push!(di_e, (Symbol(lo), Symbol(hi)))        # cross-tier: earlier → later
    end
end
oriented_admg = make_graph(vertices=Symbol.(plabs), di_edges=di_e, bi_edges=bi_e)

ADMG([:HISEI, :HOMEPOS, :IMMIG, :GRADE, :GENDER, :MATH], [(:IMMIG, :HISEI), (:HISEI, :MATH), (:IMMIG, :HOMEPOS), (:HOMEPOS, :MATH), (:IMMIG, :GRADE), (:GENDER, :GRADE), (:GRADE, :MATH), (:GENDER, :MATH)], [(:HISEI, :HOMEPOS)], Dict{Symbol, Vector{Symbol}}(), Dict{Symbol, Bool}(:MATH => 0, :GRADE => 0, :HISEI => 0, :HOMEPOS => 0, :GENDER => 0, :IMMIG => 0))

Warning

Where data-driven orientation disagreed with us. Left to itself, an algorithm may orient IMMIG–GRADE as GRADE → IMMIG — schooling causing immigration status, which is impossible. Such mistakes come from v-structure and propagation decisions made under misspecification; the data is not informative about these directions. Imposing the tier order overrides them. The lesson: never ship a discovered orientation you have prior reason to disbelieve.

draw_graph(oriented_admg; direction="LR")

Figure 22.5: PC skeleton oriented by the temporal/logical tiers. Cross-tier edges point earlier-to-later; HISEI–HOMEPOS stays undirected (same tier, shown red).

Every edge into MATH is now a candidate causal parent of achievement — the bridge from discovery to estimation (back-door adjustment, e.g. on the parents of the chosen treatment variable). The only edge the ordering leaves undirected, HISEI – HOMEPOS, is two same-tier measures of family background that temporal logic cannot separate.

22.8.4 Stability: does the skeleton survive resampling?

A single fit can be lucky. We resample students with replacement, re-estimate the weighted PC skeleton, and record how often each edge appears.

function weighted_skel(Xb, wb, labs, sig)
    mu = (Xb' * wb) ./ sum(wb); Xc = Xb .- mu'
    Σ  = (Xc' * (wb .* Xc)) ./ sum(wb)
    m  = round(Int, sum(wb)^2 / sum(wb .^ 2))
    Z  = randn(m, length(labs)); Z .-= mean(Z, dims=1)
    Z  = Z * inv(cholesky(Symmetric(cov(Z))).U)
    Xs = Z * cholesky(Symmetric(Σ)).U .+ mu'
    ci(x, y, S, d) = fisher_z(x, y, collect(Int, S), Matrix{Float64}(d), sig)
    SimpleGraph(pcalg(length(labs), ci, Xs))
end

B_pisa = 200
adj_b  = Vector{Matrix{Float64}}(undef, B_pisa)   # one slot per replicate (race-free)
Threads.@threads for b in 1:B_pisa
    rng = MersenneTwister(1000 + b)
    idx = rand(rng, 1:np_, np_)
    g   = weighted_skel(Xp[idx, :], wp[idx], plabs, sig_p)
    A   = zeros(pp_, pp_)
    for e in edges(g); A[e.src, e.dst] += 1; A[e.dst, e.src] += 1; end
    adj_b[b] = A
end
edge_freq = sum(adj_b) ./ B_pisa

6×6 Matrix{Float64}:
 0.0    1.0   1.0    0.015  0.0   1.0
 1.0    0.0   1.0    0.0    0.17  1.0
 1.0    1.0   0.0    0.895  0.02  0.0
 0.015  0.0   0.895  0.0    0.97  1.0
 0.0    0.17  0.02   0.97   0.0   0.98
 1.0    1.0   0.0    1.0    0.98  0.0

edge_names = String[]; edge_vals = Float64[]
for i in 1:pp_-1, j in i+1:pp_
    if edge_freq[i, j] > 0
        push!(edge_names, "$(plabs[i]) — $(plabs[j])")
        push!(edge_vals,  edge_freq[i, j])
    end
end
ord = sortperm(edge_vals)
let
    fig = Figure(size=(680, 380))
    ax  = Axis(fig[1,1]; xlabel="Selection frequency across bootstrap resamples",
               yticks=(1:length(ord), edge_names[ord]), title="Bootstrap edge stability (weighted PC)")
    barplot!(ax, 1:length(ord), edge_vals[ord]; direction=:x, color=(:steelblue, 0.8))
    vlines!(ax, [0.5]; linestyle=:dash, color=:gray)
    xlims!(ax, 0, 1)
    fig
end

Figure 22.6: Bootstrap edge-selection frequency (weighted PC, 200 resamples). Edges near 1.0 are robust; near 0.5 are sample-dependent.

The family-background→math edges (HISEI, HOMEPOS) are the most stable features: the SES gradient in achievement is not an artefact. On real data, causal discovery is best read as hypothesis generation — it proposes a skeleton and the orientations the data support, which you then reconcile with subject-matter knowledge before estimating effects.

22.9 Summary

Both PC and RSL-D target the Markov equivalence class. The practical difference is computational: RSL-D usually uses fewer CI tests, especially as the graph gets denser.

From CPDAG to causal estimation: Once a CPDAG is in hand, the workflow continues as follows:

Use background knowledge to orient undirected edges. Temporal ordering is the most common source: if \(X\) is measured before \(Y\), the edge must be \(X \to Y\). Domain expertise, experimental context, or exclusion restrictions can orient others.
Check identifiability. With a fully oriented DAG, apply the backdoor, front-door, or ID algorithm (Chapter 3) to determine whether the effect of interest is identifiable.
Estimate. Use the identified functional with the estimators from Chapter 4 (RA, IPW, AIPW).

If some edges remain undirected after exhausting background knowledge, you can either report the CPDAG and bound the estimand over the equivalence class, or collect additional interventional data to break the remaining equivalences.

Note

What cannot be recovered from observational data alone

Both algorithms identify the CPDAG, not necessarily the true DAG. Some edge directions require extra assumptions, temporal ordering, or interventional data.

The next chapter addresses a harder problem: recovering the causal skeleton when some variables are unobserved.