20 Causal Discovery: Latent Variables

In the previous chapter we assumed all causally relevant variables were observed. This assumption — called causal sufficiency — rarely holds in economics: unobserved ability confounds wage regressions, unobserved demand shocks confound price–quantity relationships, unobserved peer effects confound social network studies.

When latent confounders exist, constraint-based discovery algorithms that assume causal sufficiency (PC, GES) return incorrect results. This chapter introduces two algorithms in pcalg that handle the latent variable case.

20.0.1 Why PC fails with latent variables

Suppose the true graph is $X \leftarrow U \rightarrow Y$ where $U$ is unobserved. In the observed data, $X$ and $Y$ are marginally correlated (through $U$) and no observed conditioning set separates them. PC will incorrectly draw an edge $X - Y$ and may orient it as $X \to Y$ or $X \leftarrow Y$, neither of which exists in the truth.

More subtly, conditioning on a collider that is a descendant of $U$ can open a path between $X$ and $Y$, further distorting the CI structure that PC observes. With even a single hidden common cause, the d-separation statements in the observed data no longer correspond to those of any DAG over just the observed variables — they require the richer MAG representation.

20.1 Maximal Ancestral Graphs and PAGs

With latent variables, the appropriate representation is a Maximal Ancestral Graph (MAG). Over a set of observed variables $\mathbf{O}$, a MAG encodes:

$X \to Y$: $X$ is an ancestor of $Y$ in the full DAG
$X \leftrightarrow Y$: $X$ and $Y$ have a common hidden ancestor (hidden confounder)
No edge: $X \perp\!\!\!\perp Y \mid Z$ for some $Z \subseteq \mathbf{O} \setminus \{X, Y\}$

The MAG over observed variables is the “shadow” of the full DAG: it summarizes all causal and confounding relationships that are visible in the observed data, without requiring us to know or name the latent variables.

Just as DAGs have Markov equivalence classes represented by CPDAGs, MAGs have equivalence classes represented by Partial Ancestral Graphs (PAGs). In a PAG, the mark $\circ$ on an edge endpoint means “could be arrowhead or tail in some member of the equivalence class.”

PAG mark	Meaning	Economic interpretation
$X \to Y$	$X$ causes $Y$ in every equivalent MAG	Robust causal direction
$X \circ\!\!\to Y$	Some MAGs have $X \to Y$, others $X \leftrightarrow Y$	Direction uncertain
$X \leftrightarrow Y$	Hidden common cause in every equivalent MAG	Definite unmeasured confounder
$X \;\circ\!\!-\!\!\circ\; Y$	Could be $\to$, $\leftarrow$, or $\leftrightarrow$	Maximally uncertain

Reading a PAG for policy purposes:

A definite $X \to Y$ edge is the most useful finding: any intervention on $X$ propagates to $Y$ regardless of which specific MAG the data came from.
A definite $X \leftrightarrow Y$ edge is a warning: before using regression of $Y$ on $X$ to estimate a causal effect, you must address the hidden confounder — through an instrument, a proxy, or a natural experiment.
$\circ$ marks indicate what you don’t know. Additional data, temporal ordering, or experimental variation can resolve them.

20.2 FCI and RFCI: The R Toolkit

R’s pcalg package provides two algorithms for the latent-variable case:

Algorithm	Output	Cost
FCI (Spirtes, Meek & Richardson, 1995)	Full PAG (skeleton + all orientation marks)	Expensive: 10 orientation rules, many CI tests
RFCI (Colombo, Maathuis, Kalisch & Richardson, 2012)	Skeleton + partial orientations	Cheap: skips the most expensive CI tests

RFCI’s key property: its skeleton is asymptotically correct under the same assumptions FCI requires, and its directed edges are a subset of FCI’s directed edges. So RFCI gives you fewer orientations but never wrong ones — a controlled efficiency–information tradeoff.

20.3 Simulation with Hidden Variables

We reuse the same 8-node Gaussian linear DAG from the previous chapter and hide 2 nodes, treating them as unobserved confounders.

set.seed(2025)

n_vars     <- 8
n_samples  <- 2000
edge_prob  <- 0.35
latent_idx <- c(3, 6)
obs_idx    <- setdiff(1:n_vars, latent_idx)
all_labels <- paste0("X", 1:n_vars)
obs_labels <- all_labels[obs_idx]

true_dag   <- randomDAG(n_vars, prob = edge_prob, lB = 0.25, uB = 1)
nodes(true_dag) <- all_labels
data_full  <- rmvDAG(n_samples, true_dag, errDist = "normal")
colnames(data_full) <- all_labels
data_obs   <- data_full[, obs_idx]
n_obs      <- length(obs_idx)

sprintf("Total variables: %d   Hidden: %s   Observed: %d",
        n_vars, paste(all_labels[latent_idx], collapse = ", "), n_obs)

[1] "Total variables: 8   Hidden: X3, X6   Observed: 6"

20.3.1 True structure over observed variables

Two observed variables are adjacent in the MAG iff no subset of the observed variables d-separates them in the full DAG. We compute this with pcalg::dsep.

all_subsets <- function(xs) {
  out <- list(character(0))
  for (k in seq_along(xs)) out <- c(out, combn(xs, k, simplify = FALSE))
  out
}

# Ancestors of a node in a graphNEL DAG (excluding the node itself).
ancestors_of <- function(dag, target) {
  ie <- inEdges(dag)
  visited <- character(0)
  queue   <- target
  while (length(queue)) {
    cur <- queue[1]; queue <- queue[-1]
    if (cur %in% visited) next
    visited <- c(visited, cur)
    queue   <- c(queue, ie[[cur]])
  }
  setdiff(visited, target)
}

# True MAG over observed nodes as a pcalg-encoded PAG amat:
# directed edges where one observed node is an ancestor of the other in the
# full DAG, bidirected edges where neither is an ancestor of the other.
true_mag_amat <- function(dag, observed_names) {
  p <- length(observed_names)
  amat <- matrix(0L, p, p, dimnames = list(observed_names, observed_names))
  for (i in seq_len(p - 1)) for (j in seq.int(i + 1, p)) {
    u <- observed_names[i]; v <- observed_names[j]
    others <- setdiff(observed_names, c(u, v))
    sep <- any(vapply(all_subsets(others),
                      function(S) dsep(u, v, S, dag),
                      logical(1)))
    if (sep) next
    anc_u <- ancestors_of(dag, u)
    anc_v <- ancestors_of(dag, v)
    if      (u %in% anc_v) { amat[i, j] <- 3L; amat[j, i] <- 2L }   # u -> v
    else if (v %in% anc_u) { amat[i, j] <- 2L; amat[j, i] <- 3L }   # v -> u
    else                   { amat[i, j] <- 2L; amat[j, i] <- 2L }   # u <-> v
  }
  amat
}

# Skeleton version (1 wherever an edge exists, ignoring direction) — used in
# the Monte Carlo F1 comparison below.
true_mag_skel <- function(dag, observed_names) {
  a <- true_mag_amat(dag, observed_names)
  (a != 0L) * 1L
}

amat_true_pag  <- true_mag_amat(true_dag, obs_labels)

Warning in bfs(object, node, TRUE): graph is not connected; returning bfs
applied to each connected component

amat_true_mag  <- (amat_true_pag != 0L) * 1L  # skeleton for F1 comparisons

# igraph representations used for plotting
ig_true_full <- graphnel_to_ig(true_dag)   # full 8-node DAG
ig_true_pag  <- pag_to_ig(amat_true_pag, obs_labels)   # true MAG over obs nodes

# Shared layout: compute from full DAG, subset coords for observed-node plots
full_layout  <- igraph::layout_with_sugiyama(ig_true_full)$layout
obs_layout   <- full_layout[obs_idx, ]     # positions of the observed nodes
rownames(obs_layout) <- obs_labels

sprintf("True MAG skeleton edges (over %d observed nodes): %d",
        n_obs, sum(amat_true_mag) / 2)

[1] "True MAG skeleton edges (over 6 observed nodes): 10"

op <- par(mfrow = c(1, 2), mar = c(1, 1, 3, 1))
plot_ig(ig_true_full, layout = full_layout, main = "Full DAG (all 8 nodes)")
plot_ig(ig_true_pag,  layout = obs_layout,  main = "True MAG (observed nodes)")
par(op)

Figure 20.1: Full DAG (all 8 nodes) and the true MAG skeleton over observed variables. The MAG skeleton includes both direct paths and paths through hidden nodes.

20.4 FCI Algorithm

The FCI algorithm is the standard extension of PC to the latent-variable case. Like PC, it starts from a complete graph and removes edges via CI tests. After skeleton discovery, it applies a richer set of orientation rules that can produce bidirected edges $X \leftrightarrow Y$ indicating hidden common causes.

How FCI extends PC:

Phase 1 (skeleton): Same as PC — remove edges via CI tests with growing conditioning sets.
Phase 2 (initial orientation): Mark all edge endpoints as $\circ$ (uncertain).
Phase 3 (collider detection): For each unshielded triple $X \;\circ\!-\!\circ\; Z \;\circ\!-\!\circ\; Y$ with $X, Y$ non-adjacent: orient as $X \;\circ\!\!\!\to Z \leftarrow\!\!\!\circ\; Y$ if $Z$ is not in the separating set of $X$ and $Y$.
Phase 4 (rule propagation): Apply 10 orientation rules that propagate known marks without creating contradictions, including rules that can produce definite $\to$ and $\leftrightarrow$ edges.

The richer mark set ($\to$, $\leftarrow$, $\circ$, $\leftrightarrow$) allows FCI to express what is known vs. unknown — at the cost of more complex output to interpret.

make_counted_ci <- function(suff_stat) {
  count <- 0L
  ci <- function(x, y, S, suffStat) {
    count <<- count + 1L
    gaussCItest(x, y, S, suffStat)
  }
  list(ci = ci, count = function() count)
}

sig_level <- 0.01
suff_obs  <- list(C = cor(data_obs), n = n_samples)

ci_fci   <- make_counted_ci(suff_obs)
fci_fit  <- fci(suff_obs, ci_fci$ci, labels = obs_labels,
                alpha = sig_level, verbose = FALSE)
ci_tests_fci <- ci_fci$count()

# pcalg PAG amat encoding: 0 = no edge, 1 = circle, 2 = arrowhead, 3 = tail.
# An edge between i and j is described by amat[i,j] (mark at i)
# and amat[j,i] (mark at j).
pag_skeleton_amat <- function(amat) {
  out <- (amat != 0 | t(amat) != 0) * 1L
  diag(out) <- 0L
  out
}

f1_skel <- function(amat_true, amat_est) {
  diag(amat_true) <- diag(amat_est) <- 0
  tp <- sum(amat_true & amat_est) / 2
  fp <- sum(!amat_true & amat_est) / 2
  fn <- sum(amat_true & !amat_est) / 2
  if (tp == 0) 0 else 2 * tp / (2 * tp + fp + fn)
}

skel_fci  <- pag_skeleton_amat(fci_fit@amat)
f1_fci_v  <- f1_skel(amat_true_mag, skel_fci)
ig_fci    <- pag_to_ig(fci_fit@amat, obs_labels)
sprintf("FCI:    skeleton F1 = %.3f   CI tests = %d", f1_fci_v, ci_tests_fci)

[1] "FCI:    skeleton F1 = 0.824   CI tests = 165"

plot_ig(ig_fci, layout = obs_layout)

Figure 20.2: PAG estimated by FCI. Dashed edges indicate endpoints with circle (○) marks — orientations the algorithm could not determine. Solid double-headed arrows (↔︎) flag definite hidden common causes.

Reading the PAG plot

In pcalg’s plot output:

A plain arrow $X \to Y$ is a definite direct cause (in every equivalent MAG).
A double-headed arrow $X \leftrightarrow Y$ is a definite hidden common cause.
A circle endpoint is the $\circ$ mark — uncertain at that endpoint.

In practice for economic research: focus first on $\leftrightarrow$ edges — these flag definite confounding and tell you where IV or proxy strategies are needed. Then examine $\to$ edges — these are causal claims that survive across all statistically equivalent structures.

When FCI gives wrong answers: FCI assumes the CI tests are perfectly accurate (no finite-sample error). In practice, with small $n$ or many variables, some false CI decisions propagate through the 10 orientation rules. The skeleton quality (F1) is generally more reliable than the orientation quality.

20.5 RFCI Algorithm

RFCI (Really Fast Causal Inference; Colombo et al., 2012) replaces FCI’s expensive Phase 4 with a leaner orientation phase that uses only a subset of the orientation rules — the ones whose validity does not require expensive additional CI tests on triples or quadruples.

Key differences from FCI:

Skeleton phase: identical to FCI.
Collider check: before orienting an unshielded triple as a collider, RFCI runs additional CI tests to verify the orientation. This costs more in Phase 3 but means RFCI can safely skip the expensive Phase 4 rules.
Orientation output: RFCI produces a partial PAG. Some edges that FCI orients are left as $\circ\!-\!\circ$ in RFCI; conversely, every edge RFCI does orient is guaranteed correct under the same asymptotic conditions.

Why this matters in practice: the skeleton is what most applied users actually need (to flag where confounding may be present); the full PAG orientation adds analytical complexity that often does not survive translation into a policy-relevant claim. RFCI is the workhorse for screening.

ci_rfci    <- make_counted_ci(suff_obs)
rfci_fit   <- rfci(suff_obs, ci_rfci$ci, labels = obs_labels,
                   alpha = sig_level, verbose = FALSE)
ci_tests_rfci <- ci_rfci$count()

skel_rfci  <- pag_skeleton_amat(rfci_fit@amat)
f1_rfci_v  <- f1_skel(amat_true_mag, skel_rfci)
ig_rfci    <- pag_to_ig(rfci_fit@amat, obs_labels)
sprintf("RFCI:   skeleton F1 = %.3f   CI tests = %d", f1_rfci_v, ci_tests_rfci)

[1] "RFCI:   skeleton F1 = 0.824   CI tests = 138"

FCI vs. RFCI output: same plot type, fewer marks

Both fci() and rfci() return fciAlgo objects and plot with the same plot() method. The visual difference is that RFCI’s PAG typically has more circle marks — it has been more conservative about orientation. Edges that do receive an arrowhead or tail mark in RFCI are reliable.

If you only need the skeleton (the most common use case in large-$p$ screening problems), RFCI is the right default. If you need orientations to plan an IV strategy, run FCI on top.

20.6 Comparison

kable(data.frame(
  Algorithm   = c("FCI", "RFCI"),
  Output      = c("Full PAG", "Skeleton + partial PAG"),
  Skeleton_F1 = round(c(f1_fci_v, f1_rfci_v), 3),
  CI_Tests    = c(ci_tests_fci, ci_tests_rfci)
), row.names = FALSE)

Table 20.1: Algorithm comparison on the latent-variable scenario (2 hidden nodes, n = 2000)

Algorithm	Output	Skeleton_F1	CI_Tests
FCI	Full PAG	0.824	165
RFCI	Skeleton + partial PAG	0.824	138

op <- par(mfrow = c(1, 3), mar = c(1, 1, 3, 1))
plot_ig(ig_true_pag, layout = obs_layout, main = "True MAG")
plot_ig(ig_fci,      layout = obs_layout, main = "FCI — estimated PAG")
plot_ig(ig_rfci,     layout = obs_layout, main = "RFCI — estimated PAG")
par(op)

Figure 20.3: True MAG vs. PAGs recovered by FCI and RFCI. Directed arrows are definite cause-to-effect; bidirected (↔︎) edges flag hidden common causes; circle (○) marks indicate orientations the algorithm could not resolve. RFCI typically leaves more circles than FCI because its orientation phase is deliberately more conservative.

20.7 Monte Carlo Evaluation

evaluate_latent <- function(seed, n_vars = 8, n_samples = 2000,
                            edge_prob = 0.35, n_latent = 2, sig = 0.01) {
  set.seed(seed)
  labs <- paste0("X", 1:n_vars)
  dag  <- randomDAG(n_vars, prob = edge_prob, lB = 0.25, uB = 1)
  nodes(dag) <- labs
  d    <- rmvDAG(n_samples, dag, errDist = "normal")
  colnames(d) <- labs

  lat  <- sort(sample.int(n_vars, n_latent))
  obs  <- setdiff(seq_len(n_vars), lat)
  obs_labs <- labs[obs]
  d_obs <- d[, obs]
  ss    <- list(C = cor(d_obs), n = n_samples)

  true_skel <- true_mag_skel(dag, obs_labs)

  ci_f <- make_counted_ci(ss)
  fci_r <- fci(ss, ci_f$ci, labels = obs_labs, alpha = sig, verbose = FALSE)

  ci_r <- make_counted_ci(ss)
  rfci_r <- rfci(ss, ci_r$ci, labels = obs_labs, alpha = sig, verbose = FALSE)

  c(f1_fci  = f1_skel(true_skel, pag_skeleton_amat(fci_r@amat)),
    f1_rfci = f1_skel(true_skel, pag_skeleton_amat(rfci_r@amat)),
    ci_fci  = ci_f$count(),
    ci_rfci = ci_r$count())
}

mc <- as.data.frame(t(sapply(1:100, evaluate_latent)))

mc_summary <- data.frame(
  Method      = c("FCI", "RFCI"),
  Skeleton_F1 = round(c(mean(mc$f1_fci),  mean(mc$f1_rfci)),  3),
  CI_Tests    = round(c(mean(mc$ci_fci),  mean(mc$ci_rfci)),  1)
)
kable(mc_summary,
      caption = "Monte Carlo averages (100 replications, n = 2000, p = 8, 2 latent)",
      row.names = FALSE)

Monte Carlo averages (100 replications, n = 2000, p = 8, 2 latent)
Method	Skeleton_F1	CI_Tests
FCI	0.938	150.5
RFCI	0.943	102.7

mc_long <- data.frame(
  method = rep(c("FCI", "RFCI"), each = nrow(mc)),
  ci     = c(mc$ci_fci, mc$ci_rfci)
)
ggplot(mc_long, aes(x = ci, fill = method)) +
  geom_histogram(bins = 20, position = "identity", alpha = 0.55, color = "white") +
  scale_fill_manual(values = c(FCI = "steelblue", RFCI = "tomato")) +
  labs(x = "CI tests", y = "Frequency",
       title = "CI Test Counts: FCI vs RFCI (100 replications)") +
  theme_minimal()

Figure 20.4: CI-test count distributions across 100 replications. RFCI’s lighter Phase 4 typically — though not always — runs fewer tests than FCI.

20.8 Summary

Property	PC / GES	FCI	RFCI
Latent confounders	✗ assumes none	✓ handles	✓ handles
Output	CPDAG	Full PAG	Skeleton + partial PAG
Skeleton consistency	✓	✓	✓
Orientation coverage	full	full (under faithfulness)	partial (conservative)
CI cost	PC: PC’s cost; GES: none	FCI’s full Phase 4	reduced Phase 4

FCI and RFCI address complementary needs. If you need to orient edges, use FCI. If you only need the skeleton and efficiency matters, RFCI is the standard R choice.

20.8.1 Practical decision guide

Use PC or GES when:

You are confident in causal sufficiency (all relevant variables are measured)
You want a CPDAG that you can orient further with background knowledge

Use FCI when:

You suspect hidden confounders but don’t know where
You need edge orientations and $\leftrightarrow$ marks to guide IV or proxy strategies
Sample size is large enough that CI-test errors don’t cascade badly through orientation rules

Use RFCI when:

You suspect latent confounders and have many variables ($p > 20$)
The goal is to prune the adjacency graph before applying domain knowledge or estimation
You want a fast first pass — promote candidates to FCI later

20.8.2 From discovery to estimation

Causal discovery is a first step, not a final answer. The typical research workflow is:

Discover — run FCI or RFCI to get candidate adjacencies and, if using FCI, some edge marks
Refine — apply temporal ordering, institutional knowledge, and exclusion restrictions to orient remaining edges
Identify — check whether the effect of interest is identified given the refined graph (backdoor, front-door, ID algorithm)
Estimate — use TMLE, AIPW, or the estimators in earlier chapters

Causal discovery narrows the space of possible structures; domain knowledge finishes the job.

Going further: orientation in the latent case

FCI’s PAG output distinguishes definite causes ($\to$), definite hidden common causes ($\leftrightarrow$), and uncertain cases ($\circ$). The PAG can be combined with background knowledge such as temporal ordering to further orient edges before estimation.

20.9 How Much Does Causal Discovery Help in Economics?

An honest assessment: recovering even the true MAG skeleton is far from recovering the full DAG. Several layers of information remain missing:

What is recovered	What is still missing
RFCI skeleton	Edge directions; direct cause vs. hidden common cause; latent nodes
FCI PAG	Unique MAG; latent nodes; most edges still carry $\circ$ marks
True MAG	Latent nodes and their connections
Full DAG	Nothing — this is the goal

The core econometric question is almost always “what is the causal effect of $X$ on $Y$?” Causal discovery with latent variables does not answer this directly. A definite $X \leftrightarrow Y$ in a PAG tells you confounding exists — but not how to remove it. Without oriented edges you cannot even check whether the effect is identified, let alone estimate it.

Where these methods do add value in economics:

Falsifying structural models. If your model implies $X \perp\!\!\!\perp Z \mid W$, test it. A rejection means the model’s independence assumptions are inconsistent with the data — discovery tells you which assumptions fail before you commit to a full structural estimation.
Flagging where instruments are needed. A definite $X \leftrightarrow Y$ in the PAG is a data-driven diagnostic: OLS of $Y$ on $X$ is biased, and you need an instrument, proxy, or natural experiment. Discovery does not supply the instrument but tells you where to look for one.
High-dimensional variable selection. With many candidate controls, knowing which pairs are conditionally independent prunes the problem before applying identification strategies. This is most useful in settings with $p > 20$ variables where theory does not specify the full graph.
Hypothesis generation when theory is silent. For new economic phenomena — platform markets, fintech, peer effects in novel settings — where theory does not give a strong causal ordering, discovery algorithms generate hypotheses worth investigating with better-powered research designs.

The bottom line: causal discovery is a complement to the standard econometric toolkit — IV, RD, DiD, synthetic control — not a substitute. It is most useful as a diagnostic and hypothesis-generating tool early in a research project, before the hard work of finding credible identifying variation begins.

# Causal Discovery: Latent Variables ```{r} #| include: false suppressPackageStartupMessages({ library(pcalg) library(graph) library(igraph) library(ggplot2) library(knitr) }) # ─── igraph helpers ────────────────────────────────────────────────────────── graphnel_to_ig <- function(g_nel) { ig <- igraph::graph_from_graphnel(g_nel) el <- igraph::as_edgelist(ig) fwd <- paste(el[,1], el[,2]); rev_ <- paste(el[,2], el[,1]) is_undir <- fwd %in% rev_ seen <- character(0); keep <- logical(nrow(el)) for (k in seq_len(nrow(el))) { if (!is_undir[k]) { keep[k] <- TRUE; next } key <- paste(sort(c(el[k,1], el[k,2])), collapse="|") if (key %in% seen) { keep[k] <- FALSE } else { seen <- c(seen, key); keep[k] <- TRUE } } ig2 <- igraph::delete_edges(ig, which(!keep)) igraph::E(ig2)$undirected <- is_undir[keep] igraph::V(ig2)$label.color <- "black" ig2 } # Build an igraph from a pcalg PAG amat (0=none,1=circle,2=arrowhead,3=tail). # Dashed edges = at least one circle endpoint; double arrows = both arrowheads. pag_to_ig <- function(amat, labels = rownames(amat)) { p <- nrow(amat) frm <- integer(0); too <- integer(0); amode <- integer(0); lty <- integer(0) for (i in seq_len(p-1)) for (j in seq.int(i+1, p)) { if (amat[i,j]==0 && amat[j,i]==0) next mi <- amat[i,j]; mj <- amat[j,i] frm <- c(frm,i); too <- c(too,j) am <- if (mi==2 && mj==2) 3L else if (mj==2) 1L else if (mi==2) 2L else 0L amode <- c(amode, am) lty <- c(lty, if (mi==1 || mj==1) 2L else 1L) } g <- igraph::make_empty_graph(n=p, directed=TRUE) igraph::V(g)$name <- labels igraph::V(g)$label.color <- "black" if (length(frm)) { g <- igraph::add_edges(g, c(rbind(frm, too))) igraph::E(g)$amode <- amode; igraph::E(g)$lty <- lty } g } plot_ig <- function(g, main="", layout=NULL) { if (is.null(layout)) { dir_idx <- if (!is.null(igraph::E(g)$amode)) which(igraph::E(g)$amode %in% c(1L,2L)) else seq_len(igraph::ecount(g)) g_dir <- if (length(dir_idx)>0) igraph::subgraph_from_edges(g,dir_idx) else g layout <- igraph::layout_with_sugiyama(g_dir)$layout } n_e <- igraph::ecount(g) am <- if (!is.null(igraph::E(g)$amode)) igraph::E(g)$amode else if (!is.null(igraph::E(g)$undirected)) ifelse(igraph::E(g)$undirected,0L,1L) else rep(1L,n_e) lt <- if (!is.null(igraph::E(g)$lty)) igraph::E(g)$lty else rep(1L,n_e) plot(g, layout=layout, main=main, vertex.size=26, vertex.color="white", vertex.frame.color="black", vertex.label.cex=0.9, vertex.label.color="black", vertex.label.font=2, edge.arrow.size=0.45, edge.arrow.mode=am, edge.color="black", edge.lty=lt, edge.curved=0) } ``` In the previous chapter we assumed all causally relevant variables were observed. This assumption — called **causal sufficiency** — rarely holds in economics: unobserved ability confounds wage regressions, unobserved demand shocks confound price–quantity relationships, unobserved peer effects confound social network studies. When latent confounders exist, constraint-based discovery algorithms that assume causal sufficiency (PC, GES) return incorrect results. This chapter introduces two algorithms in `pcalg` that handle the latent variable case. ### Why PC fails with latent variables Suppose the true graph is $X \leftarrow U \rightarrow Y$ where $U$ is unobserved. In the observed data, $X$ and $Y$ are marginally correlated (through $U$) and no observed conditioning set separates them. PC will incorrectly draw an edge $X - Y$ and may orient it as $X \to Y$ or $X \leftarrow Y$, neither of which exists in the truth. More subtly, conditioning on a collider that is a descendant of $U$ can *open* a path between $X$ and $Y$, further distorting the CI structure that PC observes. With even a single hidden common cause, the d-separation statements in the observed data no longer correspond to those of any DAG over just the observed variables — they require the richer MAG representation. ## Maximal Ancestral Graphs and PAGs With latent variables, the appropriate representation is a **Maximal Ancestral Graph (MAG)**. Over a set of *observed* variables $\mathbf{O}$, a MAG encodes: - $X \to Y$: $X$ is an ancestor of $Y$ in the full DAG - $X \leftrightarrow Y$: $X$ and $Y$ have a common hidden ancestor (hidden confounder) - No edge: $X \perp\!\!\!\perp Y \mid Z$ for some $Z \subseteq \mathbf{O} \setminus \{X, Y\}$ The MAG over observed variables is the "shadow" of the full DAG: it summarizes all causal and confounding relationships that are visible in the observed data, without requiring us to know or name the latent variables. Just as DAGs have Markov equivalence classes represented by CPDAGs, MAGs have equivalence classes represented by **Partial Ancestral Graphs (PAGs)**. In a PAG, the mark $\circ$ on an edge endpoint means "could be arrowhead or tail in some member of the equivalence class." | PAG mark | Meaning | Economic interpretation | |---|---|---| | $X \to Y$ | $X$ causes $Y$ in every equivalent MAG | Robust causal direction | | $X \circ\!\!\to Y$ | Some MAGs have $X \to Y$, others $X \leftrightarrow Y$ | Direction uncertain | | $X \leftrightarrow Y$ | Hidden common cause in every equivalent MAG | Definite unmeasured confounder | | $X \;\circ\!\!-\!\!\circ\; Y$ | Could be $\to$, $\leftarrow$, or $\leftrightarrow$ | Maximally uncertain | **Reading a PAG for policy purposes:** - A definite $X \to Y$ edge is the most useful finding: any intervention on $X$ propagates to $Y$ regardless of which specific MAG the data came from. - A definite $X \leftrightarrow Y$ edge is a warning: before using regression of $Y$ on $X$ to estimate a causal effect, you must address the hidden confounder — through an instrument, a proxy, or a natural experiment. - $\circ$ marks indicate what you *don't* know. Additional data, temporal ordering, or experimental variation can resolve them. ## FCI and RFCI: The R Toolkit R's `pcalg` package provides two algorithms for the latent-variable case: | Algorithm | Output | Cost | |---|---|---| | **FCI** (Spirtes, Meek & Richardson, 1995) | Full PAG (skeleton + all orientation marks) | Expensive: 10 orientation rules, many CI tests | | **RFCI** (Colombo, Maathuis, Kalisch & Richardson, 2012) | Skeleton + partial orientations | Cheap: skips the most expensive CI tests | RFCI's key property: its **skeleton is asymptotically correct** under the same assumptions FCI requires, and its directed edges are a subset of FCI's directed edges. So RFCI gives you fewer orientations but never *wrong* ones — a controlled efficiency–information tradeoff. ## Simulation with Hidden Variables We reuse the same 8-node Gaussian linear DAG from the previous chapter and hide 2 nodes, treating them as unobserved confounders. ```{r} set.seed(2025) n_vars <- 8 n_samples <- 2000 edge_prob <- 0.35 latent_idx <- c(3, 6) obs_idx <- setdiff(1:n_vars, latent_idx) all_labels <- paste0("X", 1:n_vars) obs_labels <- all_labels[obs_idx] true_dag <- randomDAG(n_vars, prob = edge_prob, lB = 0.25, uB = 1) nodes(true_dag) <- all_labels data_full <- rmvDAG(n_samples, true_dag, errDist = "normal") colnames(data_full) <- all_labels data_obs <- data_full[, obs_idx] n_obs <- length(obs_idx) sprintf("Total variables: %d Hidden: %s Observed: %d", n_vars, paste(all_labels[latent_idx], collapse = ", "), n_obs) ``` ### True structure over observed variables Two observed variables are adjacent in the MAG iff no subset of the *observed* variables d-separates them in the full DAG. We compute this with `pcalg::dsep`. ```{r} all_subsets <- function(xs) { out <- list(character(0)) for (k in seq_along(xs)) out <- c(out, combn(xs, k, simplify = FALSE)) out } # Ancestors of a node in a graphNEL DAG (excluding the node itself). ancestors_of <- function(dag, target) { ie <- inEdges(dag) visited <- character(0) queue <- target while (length(queue)) { cur <- queue[1]; queue <- queue[-1] if (cur %in% visited) next visited <- c(visited, cur) queue <- c(queue, ie[[cur]]) } setdiff(visited, target) } # True MAG over observed nodes as a pcalg-encoded PAG amat: # directed edges where one observed node is an ancestor of the other in the # full DAG, bidirected edges where neither is an ancestor of the other. true_mag_amat <- function(dag, observed_names) { p <- length(observed_names) amat <- matrix(0L, p, p, dimnames = list(observed_names, observed_names)) for (i in seq_len(p - 1)) for (j in seq.int(i + 1, p)) { u <- observed_names[i]; v <- observed_names[j] others <- setdiff(observed_names, c(u, v)) sep <- any(vapply(all_subsets(others), function(S) dsep(u, v, S, dag), logical(1))) if (sep) next anc_u <- ancestors_of(dag, u) anc_v <- ancestors_of(dag, v) if (u %in% anc_v) { amat[i, j] <- 3L; amat[j, i] <- 2L } # u -> v else if (v %in% anc_u) { amat[i, j] <- 2L; amat[j, i] <- 3L } # v -> u else { amat[i, j] <- 2L; amat[j, i] <- 2L } # u <-> v } amat } # Skeleton version (1 wherever an edge exists, ignoring direction) — used in # the Monte Carlo F1 comparison below. true_mag_skel <- function(dag, observed_names) { a <- true_mag_amat(dag, observed_names) (a != 0L) * 1L } amat_true_pag <- true_mag_amat(true_dag, obs_labels) amat_true_mag <- (amat_true_pag != 0L) * 1L # skeleton for F1 comparisons # igraph representations used for plotting ig_true_full <- graphnel_to_ig(true_dag) # full 8-node DAG ig_true_pag <- pag_to_ig(amat_true_pag, obs_labels) # true MAG over obs nodes # Shared layout: compute from full DAG, subset coords for observed-node plots full_layout <- igraph::layout_with_sugiyama(ig_true_full)$layout obs_layout <- full_layout[obs_idx, ] # positions of the observed nodes rownames(obs_layout) <- obs_labels sprintf("True MAG skeleton edges (over %d observed nodes): %d", n_obs, sum(amat_true_mag) / 2) ``` ```{r} #| label: fig-latent-dag #| fig-cap: "Full DAG (all 8 nodes) and the true MAG skeleton over observed variables. The MAG skeleton includes both direct paths and paths through hidden nodes." #| fig-width: 11 #| fig-height: 4.5 op <- par(mfrow = c(1, 2), mar = c(1, 1, 3, 1)) plot_ig(ig_true_full, layout = full_layout, main = "Full DAG (all 8 nodes)") plot_ig(ig_true_pag, layout = obs_layout, main = "True MAG (observed nodes)") par(op) ``` ## FCI Algorithm The **FCI algorithm** is the standard extension of PC to the latent-variable case. Like PC, it starts from a complete graph and removes edges via CI tests. After skeleton discovery, it applies a richer set of orientation rules that can produce bidirected edges $X \leftrightarrow Y$ indicating hidden common causes. **How FCI extends PC:** 1. **Phase 1 (skeleton):** Same as PC — remove edges via CI tests with growing conditioning sets. 2. **Phase 2 (initial orientation):** Mark all edge endpoints as $\circ$ (uncertain). 3. **Phase 3 (collider detection):** For each unshielded triple $X \;\circ\!-\!\circ\; Z \;\circ\!-\!\circ\; Y$ with $X, Y$ non-adjacent: orient as $X \;\circ\!\!\!\to Z \leftarrow\!\!\!\circ\; Y$ if $Z$ is not in the separating set of $X$ and $Y$. 4. **Phase 4 (rule propagation):** Apply 10 orientation rules that propagate known marks without creating contradictions, including rules that can produce definite $\to$ and $\leftrightarrow$ edges. The richer mark set ($\to$, $\leftarrow$, $\circ$, $\leftrightarrow$) allows FCI to express what is known vs. unknown — at the cost of more complex output to interpret. ```{r} make_counted_ci <- function(suff_stat) { count <- 0L ci <- function(x, y, S, suffStat) { count <<- count + 1L gaussCItest(x, y, S, suffStat) } list(ci = ci, count = function() count) } sig_level <- 0.01 suff_obs <- list(C = cor(data_obs), n = n_samples) ci_fci <- make_counted_ci(suff_obs) fci_fit <- fci(suff_obs, ci_fci$ci, labels = obs_labels, alpha = sig_level, verbose = FALSE) ci_tests_fci <- ci_fci$count() ``` ```{r} # pcalg PAG amat encoding: 0 = no edge, 1 = circle, 2 = arrowhead, 3 = tail. # An edge between i and j is described by amat[i,j] (mark at i) # and amat[j,i] (mark at j). pag_skeleton_amat <- function(amat) { out <- (amat != 0 | t(amat) != 0) * 1L diag(out) <- 0L out } f1_skel <- function(amat_true, amat_est) { diag(amat_true) <- diag(amat_est) <- 0 tp <- sum(amat_true & amat_est) / 2 fp <- sum(!amat_true & amat_est) / 2 fn <- sum(amat_true & !amat_est) / 2 if (tp == 0) 0 else 2 * tp / (2 * tp + fp + fn) } skel_fci <- pag_skeleton_amat(fci_fit@amat) f1_fci_v <- f1_skel(amat_true_mag, skel_fci) ig_fci <- pag_to_ig(fci_fit@amat, obs_labels) sprintf("FCI: skeleton F1 = %.3f CI tests = %d", f1_fci_v, ci_tests_fci) ``` ```{r} #| label: fig-fci-pag #| fig-cap: "PAG estimated by FCI. Dashed edges indicate endpoints with circle (○) marks — orientations the algorithm could not determine. Solid double-headed arrows (↔) flag definite hidden common causes." #| fig-width: 5.5 #| fig-height: 4.5 plot_ig(ig_fci, layout = obs_layout) ``` ::: {.callout-note} **Reading the PAG plot** In `pcalg`'s plot output: - A **plain arrow** $X \to Y$ is a definite direct cause (in every equivalent MAG). - A **double-headed arrow** $X \leftrightarrow Y$ is a definite hidden common cause. - A **circle endpoint** is the $\circ$ mark — uncertain at that endpoint. In practice for economic research: focus first on $\leftrightarrow$ edges — these flag definite confounding and tell you where IV or proxy strategies are needed. Then examine $\to$ edges — these are causal claims that survive across all statistically equivalent structures. ::: **When FCI gives wrong answers:** FCI assumes the CI tests are perfectly accurate (no finite-sample error). In practice, with small $n$ or many variables, some false CI decisions propagate through the 10 orientation rules. The skeleton quality (F1) is generally more reliable than the orientation quality. ## RFCI Algorithm **RFCI** (Really Fast Causal Inference; Colombo et al., 2012) replaces FCI's expensive Phase 4 with a leaner orientation phase that uses only a subset of the orientation rules — the ones whose validity does not require expensive *additional* CI tests on triples or quadruples. **Key differences from FCI:** 1. **Skeleton phase:** identical to FCI. 2. **Collider check:** before orienting an unshielded triple as a collider, RFCI runs *additional* CI tests to verify the orientation. This costs more in Phase 3 but means RFCI can safely *skip* the expensive Phase 4 rules. 3. **Orientation output:** RFCI produces a *partial* PAG. Some edges that FCI orients are left as $\circ\!-\!\circ$ in RFCI; conversely, every edge RFCI does orient is guaranteed correct under the same asymptotic conditions. **Why this matters in practice:** the skeleton is what most applied users actually need (to flag where confounding may be present); the full PAG orientation adds analytical complexity that often does not survive translation into a policy-relevant claim. RFCI is the workhorse for screening. ```{r} ci_rfci <- make_counted_ci(suff_obs) rfci_fit <- rfci(suff_obs, ci_rfci$ci, labels = obs_labels, alpha = sig_level, verbose = FALSE) ci_tests_rfci <- ci_rfci$count() skel_rfci <- pag_skeleton_amat(rfci_fit@amat) f1_rfci_v <- f1_skel(amat_true_mag, skel_rfci) ig_rfci <- pag_to_ig(rfci_fit@amat, obs_labels) sprintf("RFCI: skeleton F1 = %.3f CI tests = %d", f1_rfci_v, ci_tests_rfci) ``` ::: {.callout-note} **FCI vs. RFCI output: same plot type, fewer marks** Both `fci()` and `rfci()` return `fciAlgo` objects and plot with the same `plot()` method. The visual difference is that RFCI's PAG typically has **more circle marks** — it has been more conservative about orientation. Edges that *do* receive an arrowhead or tail mark in RFCI are reliable. If you only need the skeleton (the most common use case in large-$p$ screening problems), RFCI is the right default. If you need orientations to plan an IV strategy, run FCI on top. ::: ## Comparison ```{r} #| label: tbl-latent-comparison #| tbl-cap: "Algorithm comparison on the latent-variable scenario (2 hidden nodes, n = 2000)" kable(data.frame( Algorithm = c("FCI", "RFCI"), Output = c("Full PAG", "Skeleton + partial PAG"), Skeleton_F1 = round(c(f1_fci_v, f1_rfci_v), 3), CI_Tests = c(ci_tests_fci, ci_tests_rfci) ), row.names = FALSE) ``` ```{r} #| label: fig-latent-skeletons #| fig-cap: "True MAG vs. PAGs recovered by FCI and RFCI. Directed arrows are definite cause-to-effect; bidirected (↔) edges flag hidden common causes; circle (○) marks indicate orientations the algorithm could not resolve. RFCI typically leaves more circles than FCI because its orientation phase is deliberately more conservative." #| fig-width: 12 #| fig-height: 4.5 op <- par(mfrow = c(1, 3), mar = c(1, 1, 3, 1)) plot_ig(ig_true_pag, layout = obs_layout, main = "True MAG") plot_ig(ig_fci, layout = obs_layout, main = "FCI — estimated PAG") plot_ig(ig_rfci, layout = obs_layout, main = "RFCI — estimated PAG") par(op) ``` ## Monte Carlo Evaluation ```{r} #| cache: true #| warning: false #| message: false evaluate_latent <- function(seed, n_vars = 8, n_samples = 2000, edge_prob = 0.35, n_latent = 2, sig = 0.01) { set.seed(seed) labs <- paste0("X", 1:n_vars) dag <- randomDAG(n_vars, prob = edge_prob, lB = 0.25, uB = 1) nodes(dag) <- labs d <- rmvDAG(n_samples, dag, errDist = "normal") colnames(d) <- labs lat <- sort(sample.int(n_vars, n_latent)) obs <- setdiff(seq_len(n_vars), lat) obs_labs <- labs[obs] d_obs <- d[, obs] ss <- list(C = cor(d_obs), n = n_samples) true_skel <- true_mag_skel(dag, obs_labs) ci_f <- make_counted_ci(ss) fci_r <- fci(ss, ci_f$ci, labels = obs_labs, alpha = sig, verbose = FALSE) ci_r <- make_counted_ci(ss) rfci_r <- rfci(ss, ci_r$ci, labels = obs_labs, alpha = sig, verbose = FALSE) c(f1_fci = f1_skel(true_skel, pag_skeleton_amat(fci_r@amat)), f1_rfci = f1_skel(true_skel, pag_skeleton_amat(rfci_r@amat)), ci_fci = ci_f$count(), ci_rfci = ci_r$count()) } mc <- as.data.frame(t(sapply(1:100, evaluate_latent))) mc_summary <- data.frame( Method = c("FCI", "RFCI"), Skeleton_F1 = round(c(mean(mc$f1_fci), mean(mc$f1_rfci)), 3), CI_Tests = round(c(mean(mc$ci_fci), mean(mc$ci_rfci)), 1) ) kable(mc_summary, caption = "Monte Carlo averages (100 replications, n = 2000, p = 8, 2 latent)", row.names = FALSE) ``` ```{r} #| label: fig-latent-mc #| fig-cap: "CI-test count distributions across 100 replications. RFCI's lighter Phase 4 typically — though not always — runs fewer tests than FCI." #| fig-width: 7 #| fig-height: 4 #| warning: false #| message: false mc_long <- data.frame( method = rep(c("FCI", "RFCI"), each = nrow(mc)), ci = c(mc$ci_fci, mc$ci_rfci) ) ggplot(mc_long, aes(x = ci, fill = method)) + geom_histogram(bins = 20, position = "identity", alpha = 0.55, color = "white") + scale_fill_manual(values = c(FCI = "steelblue", RFCI = "tomato")) + labs(x = "CI tests", y = "Frequency", title = "CI Test Counts: FCI vs RFCI (100 replications)") + theme_minimal() ``` ## Summary | Property | PC / GES | FCI | RFCI | |---|---|---|---| | Latent confounders | ✗ assumes none | ✓ handles | ✓ handles | | Output | CPDAG | Full PAG | Skeleton + partial PAG | | Skeleton consistency | ✓ | ✓ | ✓ | | Orientation coverage | full | full (under faithfulness) | partial (conservative) | | CI cost | PC: PC's cost; GES: none | FCI's full Phase 4 | reduced Phase 4 | FCI and RFCI address complementary needs. If you need to orient edges, use FCI. If you only need the skeleton and efficiency matters, RFCI is the standard R choice. ### Practical decision guide **Use PC or GES when:** - You are confident in causal sufficiency (all relevant variables are measured) - You want a CPDAG that you can orient further with background knowledge **Use FCI when:** - You suspect hidden confounders but don't know where - You need edge orientations and $\leftrightarrow$ marks to guide IV or proxy strategies - Sample size is large enough that CI-test errors don't cascade badly through orientation rules **Use RFCI when:** - You suspect latent confounders and have many variables ($p > 20$) - The goal is to prune the adjacency graph before applying domain knowledge or estimation - You want a fast first pass — promote candidates to FCI later ### From discovery to estimation Causal discovery is a first step, not a final answer. The typical research workflow is: 1. **Discover** — run FCI or RFCI to get candidate adjacencies and, if using FCI, some edge marks 2. **Refine** — apply temporal ordering, institutional knowledge, and exclusion restrictions to orient remaining edges 3. **Identify** — check whether the effect of interest is identified given the refined graph (backdoor, front-door, ID algorithm) 4. **Estimate** — use TMLE, AIPW, or the estimators in earlier chapters Causal discovery narrows the space of possible structures; domain knowledge finishes the job. ::: {.callout-note} **Going further: orientation in the latent case** FCI's PAG output distinguishes definite causes ($\to$), definite hidden common causes ($\leftrightarrow$), and uncertain cases ($\circ$). The PAG can be combined with background knowledge such as temporal ordering to further orient edges before estimation. ::: ## How Much Does Causal Discovery Help in Economics? An honest assessment: recovering even the true MAG skeleton is far from recovering the full DAG. Several layers of information remain missing: | What is recovered | What is still missing | |---|---| | RFCI skeleton | Edge directions; direct cause vs. hidden common cause; latent nodes | | FCI PAG | Unique MAG; latent nodes; most edges still carry $\circ$ marks | | True MAG | Latent nodes and their connections | | Full DAG | Nothing — this is the goal | The core econometric question is almost always "what is the causal effect of $X$ on $Y$?" Causal discovery with latent variables does not answer this directly. A definite $X \leftrightarrow Y$ in a PAG tells you confounding exists — but not how to remove it. Without oriented edges you cannot even check whether the effect is identified, let alone estimate it. **Where these methods do add value in economics:** 1. **Falsifying structural models.** If your model implies $X \perp\!\!\!\perp Z \mid W$, test it. A rejection means the model's independence assumptions are inconsistent with the data — discovery tells you *which* assumptions fail before you commit to a full structural estimation. 2. **Flagging where instruments are needed.** A definite $X \leftrightarrow Y$ in the PAG is a data-driven diagnostic: OLS of $Y$ on $X$ is biased, and you need an instrument, proxy, or natural experiment. Discovery does not supply the instrument but tells you where to look for one. 3. **High-dimensional variable selection.** With many candidate controls, knowing which pairs are conditionally independent prunes the problem before applying identification strategies. This is most useful in settings with $p > 20$ variables where theory does not specify the full graph. 4. **Hypothesis generation when theory is silent.** For new economic phenomena — platform markets, fintech, peer effects in novel settings — where theory does not give a strong causal ordering, discovery algorithms generate hypotheses worth investigating with better-powered research designs. **The bottom line:** causal discovery is a complement to the standard econometric toolkit — IV, RD, DiD, synthetic control — not a substitute. It is most useful as a diagnostic and hypothesis-generating tool early in a research project, before the hard work of finding credible identifying variation begins.