import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.special import expit
import sympy as sp
from tqdm import tqdm
pd.set_option("display.precision", 3)
Show that (17.1) and (17.2) are equivalent:
$$f_{X,Y \mid Z}(x, y \mid z) = f_{X \mid Z}(x \mid z) f_{Y \mid Z}(y \mid z) \tag{17.1}$$$$f(x \mid y, z) = f(x \mid z) \tag{17.2}$$Solution:
By the definition of conditional probability:
$$ \begin{align*} &f_{X,Y \mid Z}(x,y \mid z) = f_{X \mid Z}(x \mid z) f_{Y \mid Z}(y \mid z) \\ &\iff \frac{f_{X,Y,Z}(x,y,z)}{f_Z(z)} = \frac{f_{X,Z}(x,z)}{f_Z(z)} \frac{f_{Y,Z}(y,z)}{f_Z(z)} \\ &\iff f_{X,Y,Z}(x,y,z)= \frac{f_{X,Z}(x,z)}{f_Z(z)} f_{Y,Z}(y,z) \\ &\iff \frac{f_{X,Y,Z}(x,y,z)}{f_{Y,Z}(y,z)} = \frac{f_{X,Z}(x,z)}{f_Z(z)} \\ &\iff f_{X \mid Y, Z}(x \mid y, z) = f_{X \mid Z}(x \mid z) \iff f(x \mid y,z) = f(x \mid z) \end{align*} $$Prove Theorem 17.2:
The following implications hold:
$$ \begin{align*} X \amalg Y \mid Z &\Rightarrow Y \amalg Y \mid Z \tag{1}\\ X \amalg Y \mid Z \text{ and } U = h(X) &\Rightarrow U \amalg Y \mid Z \tag{2}\\ X \amalg Y \mid Z \text{ and } U = h(X) &\Rightarrow X \amalg Y \mid (Z, U) \tag{3}\\ X \amalg Y \mid Z \text{ and } X \amalg W \mid (Y, Z) &\Rightarrow X \amalg (W,Y) \mid Z \tag{4}\\ X \amalg Y \mid Z \text{ and } X \amalg Z \mid Y &\Rightarrow X \amalg (Y, Z) \tag{5} \end{align*} $$Solution:
(1)
$$ \begin{align*} f_{Y,X \mid Z}(y, x \mid z) &= f_{X,Y \mid Z}(x, y \mid z) \\ &= f_{X \mid Z}(x \mid z) f_{Y \mid Z}(y \mid z) \tag{$X \amalg Y \mid Z$}\\ &= f_{Y \mid Z}(y \mid z) f_{X \mid Z}(x \mid z). \end{align*} $$(2)
$$ \begin{align*} f_{U, Y \mid Z}(u, y \mid z) &= f_{X, Y \mid Z}(h(x), y \mid z)) \\ &= f_{X, Y \mid Z}(x', y \mid z)) \tag{letting $x' = h(x)$} \\ &= f_{X \mid Z}(x' \mid z) f_{Y \mid Z}(y \mid z) \tag{$X \amalg Y \mid Z$}\\ &= f_{X \mid Z}(h(x) \mid z) f_{Y \mid Z}(y \mid z) \\ &= f_{U \mid Z}(u \mid z) f_{Y \mid Z}(y \mid z) \end{align*} $$(3)
Showing satisfaction of (17.2):
$$ \begin{align*} f_{X, Y \mid U, Z}(x, y \mid u, z) &= \frac{f_{X, Y, U \mid Z}(x, y, u \mid z)}{f_{U \mid Z}(u \mid z)}\\ &= \frac{f_{X, Y \mid Z}(x, y \mid z)}{f_{X \mid Z}(x \mid z)} \tag{$u = h(x)$}\\ &= \frac{f_{X \mid Z}(x \mid z) f_{Y \mid Z}(y \mid z)}{f(x \mid z)} \tag{$X \amalg Y \mid Z$}\\ &= f_{Y \mid Z}(y \mid z) \\ &= f_{Y \mid X, Z}(y \mid x, z) \tag{$X \amalg Y \mid Z$, Def (17.2)} \\ &= f_{Y \mid U, Z}(y \mid u, z) \tag{$u = h(x)$} \end{align*} $$(4)
$$ \begin{align*} f_{X, W, Y \mid Z}(x, w, y \mid z) &= f_{X, W \mid Y, Z}(x, w \mid y, z) f(y) \tag{condition on $y$}\\ &= \left[f_{X \mid Y, Z}(x \mid y, z) f_{W \mid Y, Z}(w \mid y, z)\right] f(y) \tag{$X \amalg W \mid Y, Z$}\\ &= f_{X, Y \mid Z}(x, y \mid z) f_{W \mid Y, Z}(w \mid y, z) \tag{remove conditioning on $y$}\\ &= \left[f_{X \mid Z}(x \mid z) f_{Y \mid Z}(y \mid z)\right] f_{W \mid Y, Z}(w \mid y, z) \tag{$X \amalg Y \mid Z$} \\ &= f_{X \mid Z}(x \mid z) f_{W, Y \mid Z}(w, y \mid z) \end{align*} $$(5)
$$ \begin{align*} f_{X, Y, Z}(x, y, z) &= f_{X, Y \mid Z}(x, y \mid z) f(z) \tag{condition on $z$} \\ &= f_{X \mid Z}(x \mid z) f_{Y \mid Z}(y \mid z) f(z) \tag{$X \amalg Y \mid Z$}\\ &= f_{X, Z}(x, z) f_{Y \mid Z}(y \mid z) \tag{remove condition on $z$}\\ &= f_{X, Z \mid Y}(x, z \mid y) f(y) f_{Y \mid Z}(y \mid z) \tag{condition on $y$}\\ &= f_{X \mid Y}(x \mid y) f_{Z \mid Y}(z \mid y) f(y) f_{Y \mid Z}(y \mid z) \tag{$X \amalg Z \mid Y$}\\ &= f_X(x) f_{Z \mid Y}(z \mid y) f_{Y \mid Z}(y \mid z) \\ &= f_X(x) \frac{f_{Z, Y}(z, y) f_{Z, Y}(y, z) }{f(y)f(z)} \end{align*} $$$$ \begin{align*} f_{X, Y, Z}(x, y, z) &= f_{X, Y \mid Z}(x, y \mid z) f(z) \tag{condition on $z$} \\ &= f_{X \mid Z}(x \mid z) f_{Y \mid Z}(y \mid z) f(z) \tag{$X \amalg Y \mid Z$}\\ &= f_{X \mid Z}(x \mid z) f(y, z) \\ &= f_{X \mid Y, Z}(x \mid y, z) f(y) f(y, z) \tag{condition on $y$} \\ &= f_{X \mid Y}(x \mid y) f(y) f(y, z) \tag{$X \amalg Z \mid Y$, Def. (17.2)} \\ &= f(x) f(y, z) \end{align*} $$Let $X, Y$ and $Z$ have the following joint distribution:
$Z = 0$ | $Y = 0$ | $Y = 1$ | $Z = 1$ | $Y = 0$ | $Y = 1$ | |
---|---|---|---|---|---|---|
$X = 0$ | $.405$ | .$045$ | $X = 0$ | $.125$ | $.125$ | |
$X = 1$ | $.045$ | $0.005$ | $X = 1$ | $.125$ | $.125$ |
(a) Find the conditional distribution of $X$ and $Y$ given $Z = 0$ and the conditional distribution of $X$ and $Y$ given $Z = 1$.
Solution:
For each event $(x, y)$, and for $k \in \{0, 1\}$, the probability given $Z = k$ is simply $P(X = x, Y = y, Z = k) / P(Z = k)$, where the numerator is given in the table and the denominator is:
$$\mathbb{P}(Z = k) = \sum_{(x,y,z) \in \Omega : z = k} \mathbb{P}(x,y,z) = 0.5$$Thus, the conditional distribution of $X$ and $Y$ given $Z = 0$ is:
$Y = 0$ | $Y = 1$ | |
---|---|---|
$X = 0$ | $.81$ | $.09$ |
$X = 1$ | $.09$ | $.01$ |
and the conditional distribution of $X$ and $Y$ given $Z = 1$ is:
$Y = 0$ | $Y = 1$ | |
---|---|---|
$X = 0$ | $.25$ | $.25$ |
$X = 1$ | $.25$ | $.25$ |
(b) Show that $X \amalg Y \mid Z$.
Solution:
Observe the fourth column of the following table equals the product of the fifth and sixth columns:
$z$ | $x$ | $y$ | $f_{X, Y \mid Z}(x, y \mid z)$ | $f_{X \mid Z}(x \mid z)$ | $f_{Y \mid Z}(y \mid z)$ |
---|---|---|---|---|---|
0 | 0 | 0 | $.81$ | $.9$ | $.9$ |
0 | 0 | 1 | $.09$ | $.9$ | $.1$ |
0 | 1 | 0 | $.09$ | $.1$ | $.9$ |
0 | 1 | 1 | $.01$ | $.1$ | $.1$ |
1 | 0 | 0 | $.25$ | $.5$ | $.5$ |
1 | 0 | 1 | $.25$ | $.5$ | $.5$ |
1 | 1 | 0 | $.25$ | $.5$ | $.5$ |
1 | 1 | 1 | $.25$ | $.5$ | $.5$ |
Thus, (17.1) holds for each event.
(c) Find the marginal distribution of $X$ and $Y$.
Solution:
For each $(x,y)$, the marginal probability is given:
$$ \mathbb{P}(X=x, Y=y) = \sum_{k} \mathbb{P}(X=x, Y=y, Z=k) $$Therefore, the marginal distribution of $X$ and $Y$ is:
$Y = 0$ | $Y = 1$ | |
---|---|---|
$X = 0$ | $.53$ | $.17$ |
$X = 1$ | $.17$ | $.13$ |
(d) Show that $X$ and $Y$ are not marginally independent.
Solution:
It suffices to show there is an event $(x, y)$ such that $f(x)f(y) \ne f(x,y)$. Let $x = 0$ and $y = 0$. Then, $f(x) = 0.7$, $f(y) = 0.7$, but $f(x,y) = 0.53 \ne 0.49 = f(x)f(y)$.
Consider the three DAGs in Figure 17.6 without a collider. Prove that $X \amalg Z \mid Y$.
Solution:
Visualizing the three DAGs:
from graphviz import Digraph
a = Digraph()
a.edge("X", "Y")
a.edge("Y", "Z")
a
b = Digraph()
b.edge("Z", "Y")
b.edge("Y", "X")
b
c = Digraph()
c.edge("Y", "X")
c.edge("Y", "Z")
c
Observe that in each of the three DAGs, there is a path from $X$ to $Z$ (passing through $Y$). Furthermore, $Y$ is not a collider in each of the DAGS. Thus, for each DAG, $X$ and $Z$ are d-separated given $Y$, and therefore the result holds by Theorem 17.10.
Consider the DAG in Figure 17.6 with a collider. Prove that $X \amalg Z$ and that $X$ and $Z$ are dependent given $Y$.
g = Digraph()
g.edge("X", "Y")
g.edge("Z", "Y")
g
Solution:
Since $Y$ is a collider, $X$ and $Z$ are d-separated, and therefore, by Theorem 17.10, $X \amalg Z$. However, given $Y$, $X$ and $Z$ are d-connected, and therefore $X$ and $Z$ are dependent given $Y$.
Let $X \in \{0, 1\}$, $Y \in \{0, 1\}$, $Z \in \{0, 1, 2\}$. Suppose the distribution of $(X, Y, Z)$ is Markov to:
$$X \longrightarrow Y \longrightarrow Z$$Create a joint distribution $f(x, y, z)$ that is Markov to this DAG. Generate 1000 random vectors from this distribution. Estimate the distribution from the data using maximum likelihood. Compare the estimated distribution to the true distribution. Let $\theta = (\theta_{000}, \theta_{001}, \dots, \theta_{112})$ where $\theta_{rst} = \mathbb{P}(X=r, Y=s, Z=t)$. Use the bootstrap to get standard errors and 95 percent confidence intervals for these 12 parameters.
Solution:
The joint density Markov to this DAG must be able to be expressed in the form:
$$f(x, y, z) = f(x) f(y \mid x) f(z \mid y)$$We define the conditional probabilities:
df = pd.DataFrame(
[
["X", None, 0, None, 0.3],
["X", None, 1, None, 0.7],
["Y", "X", 0, 0, 0.1],
["Y", "X", 1, 0, 0.9],
["Y", "X", 0, 1, 0.75],
["Y", "X", 1, 1, 0.25],
["Z", "Y", 0, 0, 0.3],
["Z", "Y", 1, 0, 0.3],
["Z", "Y", 2, 0, 0.4],
["Z", "Y", 0, 1, 0.8],
["Z", "Y", 1, 1, 0.1],
["Z", "Y", 2, 1, 0.1],
],
columns=[
"Variable",
"Conditioning Variable",
"Value of Variable",
"Value of Conditioning Variable",
"Probability",
],
)
def get_expression(row):
if row["Conditioning Variable"] is None:
return f"P({row['Variable']} = {row['Value of Variable']})"
else:
return f"P({row['Variable']} = {row['Value of Variable']} | {row['Conditioning Variable']} = {int(row['Value of Conditioning Variable'])})"
df["Expression"] = df.apply(
lambda row: get_expression(row),
axis=1,
)
df.set_index("Expression", inplace=True)
print(df["Probability"])
Expression P(X = 0) 0.30 P(X = 1) 0.70 P(Y = 0 | X = 0) 0.10 P(Y = 1 | X = 0) 0.90 P(Y = 0 | X = 1) 0.75 P(Y = 1 | X = 1) 0.25 P(Z = 0 | Y = 0) 0.30 P(Z = 1 | Y = 0) 0.30 P(Z = 2 | Y = 0) 0.40 P(Z = 0 | Y = 1) 0.80 P(Z = 1 | Y = 1) 0.10 P(Z = 2 | Y = 1) 0.10 Name: Probability, dtype: float64
def get_prob(var=None, val=None, cond_var=None, cond_val=np.nan):
return df.query(
f"Variable == @var & `Value of Variable` == @val & (`Value of Conditioning Variable` == @cond_val | `Value of Conditioning Variable`.isna())"
)["Probability"].values[0]
Generating samples:
import numpy as np
np.random.seed(0)
N = 1000
X = np.random.choice(
[0, 1], size=N, replace=True, p=[get_prob("X", k) for k in [0, 1]]
)
Y_given_X_0 = np.random.choice(
[0, 1], size=N, replace=True, p=[get_prob("Y", k, "X", 0) for k in [0, 1]]
)
Y_given_X_1 = np.random.choice(
[0, 1], size=N, replace=True, p=[get_prob("Y", k, "X", 1) for k in [0, 1]]
)
Y = np.where(X, Y_given_X_1, Y_given_X_0)
Z_given_Y_0 = np.random.choice(
[0, 1, 2],
size=N,
replace=True,
p=[get_prob("Z", k, "Y", 0) for k in [0, 1, 2]],
)
Z_given_Y_1 = np.random.choice(
[0, 1, 2],
size=N,
replace=True,
p=[get_prob("Z", k, "Y", 1) for k in [0, 1, 2]],
)
Z = np.where(Y, Z_given_Y_1, Z_given_Y_0)
Estimating the distribution using maximum likelihood:
import pandas as pd
data = {}
for var_name, var in list(zip(["X", "Y", "Z"], [X, Y, Z])):
ks = [0, 1]
if var_name == "X":
cond_var = None
elif var_name == "Y":
cond_var = X
cond_var_name = "X"
else:
cond_var = Y
cond_var_name = "Y"
ks = [0, 1, 2]
for k in ks:
if cond_var is None:
mle = (var == k).mean()
data.update({f"P({var_name} = {k})": mle})
else:
for c in [0, 1]:
mle = (var[np.where(cond_var == c)[0]] == k).mean()
data.update(
{f"P({var_name} = {k} | {cond_var_name} = {c})": mle}
)
mle_df = pd.DataFrame.from_dict(data, orient="index")
mle_df.columns = ["MLE"]
df["MLE"] = mle_df
df[["Probability", "MLE"]]
Probability | MLE | |
---|---|---|
Expression | ||
P(X = 0) | 0.30 | 0.307 |
P(X = 1) | 0.70 | 0.693 |
P(Y = 0 | X = 0) | 0.10 | 0.101 |
P(Y = 1 | X = 0) | 0.90 | 0.899 |
P(Y = 0 | X = 1) | 0.75 | 0.746 |
P(Y = 1 | X = 1) | 0.25 | 0.254 |
P(Z = 0 | Y = 0) | 0.30 | 0.319 |
P(Z = 1 | Y = 0) | 0.30 | 0.319 |
P(Z = 2 | Y = 0) | 0.40 | 0.361 |
P(Z = 0 | Y = 1) | 0.80 | 0.816 |
P(Z = 1 | Y = 1) | 0.10 | 0.100 |
P(Z = 2 | Y = 1) | 0.10 | 0.084 |
Using the bootstrap to get standard errors and 95 percent confidence intervals for these 12 parameters.
def get_mle(X, Y, Z):
result = np.zeros((2, 2, 3))
for r in [0, 1]:
for s in [0, 1]:
for t in [0, 1, 2]:
result[r, s, t] = sum((X == r) & (Y == s) & (Z == t)) / N
return result
B = 1000
theta_hat_boot = np.zeros((B, 2, 2, 3))
for b in tqdm(range(B)):
sample_indices = np.random.choice(range(N), size=N, replace=True)
X_sample = X[sample_indices]
Y_sample = Y[sample_indices]
Z_sample = Z[sample_indices]
theta_hat_boot[b] = get_mle(X_sample, Y_sample, Z_sample)
se = theta_hat_boot.std(axis=0)
theta_hat = get_mle(X, Y, Z)
theta_hat = np.ravel(theta_hat)
se = np.ravel(se)
df = pd.DataFrame(
{"Estimate": theta_hat, "LB": theta_hat - 2 * se, "UB": theta_hat + 2 * se}
)
df.index = [
f"theta_{r}{s}{t}" for t in [0, 1, 2] for s in [0, 1] for r in [0, 1]
]
df
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:22<00:00, 44.50it/s]
Estimate | LB | UB | |
---|---|---|---|
theta_000 | 0.005 | 5.276e-04 | 0.009 |
theta_100 | 0.014 | 6.682e-03 | 0.021 |
theta_010 | 0.012 | 5.152e-03 | 0.019 |
theta_110 | 0.222 | 1.947e-01 | 0.249 |
theta_001 | 0.028 | 1.769e-02 | 0.038 |
theta_101 | 0.026 | 1.565e-02 | 0.036 |
theta_011 | 0.170 | 1.458e-01 | 0.194 |
theta_111 | 0.161 | 1.377e-01 | 0.184 |
theta_002 | 0.186 | 1.614e-01 | 0.211 |
theta_102 | 0.147 | 1.252e-01 | 0.169 |
theta_012 | 0.017 | 8.811e-03 | 0.025 |
theta_112 | 0.012 | 4.972e-03 | 0.019 |
Consider the DAG in Figure 17.12:
g = Digraph()
for k in [1, 2, 3, 4]:
g.edge("X", f"Y_{k}")
g.edge(f"Z_{k}", f"Y_{k}")
g
(a) Write down the factorization of the joint density.
Solution:
By Definition 17.3:
$$f(x,y_1,y_2,y_3,y_4, z_1, z_2, z_3, z_4) = f(x) \prod_{k=1}^4 \left[ f(z_k) f(y_k \mid x, z_k) \right].$$(b) Prove that $X \amalg Z_j$.
Solution:
Observe that the only undirected path from $X$ to $Z_j$ is via $Y_j$, and that $Y_j$ is a collider. Therefore, $X$ and $Z_j$ are d-separated. Thus the result holds by Theorem 17.10.
Let $V = (X, Y, Z)$ have the following joint distribution
$$ \begin{align*} X &\sim \text{Bernoulli}\left( \frac{1}{2} \right) \\ Y \mid X = x &\sim \text{Bernoulli} \left( \frac{e^{4x-2}}{1 + e ^ {4x - 2}}\right) \\ Z \mid X = x, Y = y &\sim \text{Bernoulli} \left( \frac{e^{2(x + y)-2}}{1 + e ^ {2(x + y) - 2}} \right). \end{align*} $$(a) Find an expression for $\mathbb{P}(Z = z \mid Y = y)$. In particular, find $\mathbb{P}(Z = 1 \mid Y = 1)$.
(b) Write a program to simulate the model. Conduct a simulation and compute $\mathbb{P}(Z = 1 \mid Y = 1)$ empirically. Plot this as a function of the simulation size $N$. It should converge to the theoretical value you computed in (a).
(c) (Refers to the material in the appendix.) Write down an expression for $\mathbb{P}(Z = 1 \mid Y := y)$. In particular, find $\mathbb{P}(Z = 1 \mid Y := 1)$.
(d) (Refers to the material in the appendix.) Modify your program to simulate the intervention "set $Y = 1$." Conduct a simulation and compute $\mathbb{P}(Z = 1 \mid Y := 1)$ empirically. Plot this as a function of the simulation size $N$. It should converge to the theoretical value you computed in (c).
Solution:
(a) Let $p_x = \text{expit}(x) = \exp(x) / (1 + \exp(x))$, and let $\alpha = p_2$ and $\beta = p_{-2}$. Then,
$$ \begin{align*} \mathbb{P}(X = x \mid Y = y) = \begin{cases} \alpha & y = x \\ \beta & y \ne x \end{cases} \end{align*} $$and
$$ \begin{align*} \mathbb{P}(Z = z \mid Y = y, X = x) = \begin{cases} \alpha & x = y = z\\ \beta & x = y, x \ne z \\ p_0 = 1/2 & y \ne x. \\ \end{cases} \end{align*} $$The probability function has the form $f(x, y, z) = f(x) f(y \mid x) f(z \mid x, y)$. Therefore,
$$ \begin{align*} \mathbb{P}(Z = z \mid Y = y) &= \frac{\mathbb{P}(Y = y, Z = z)}{\mathbb{P}(Y = y)} = \frac{f(y, z)}{f(y)} \\ &= \frac{\sum_x f(x, y, z)}{f(y)} = \frac{\sum_x f(x) f(y \mid x) f(z \mid x, y)}{f(y)} \\ &= \sum_x f(z \mid x, y) \frac{f(y \mid x) f(x)}{f(y)} = \sum_x f(z \mid x, y) \frac{f(x,y)}{f(y)} \\ &= \sum_x f(z \mid x, y) f(x \mid y) \end{align*} $$Thus,
$$ \begin{align*} \mathbb{P}(Z = z \mid Y = y) &= \sum_{x=0}^1 \mathbb{P}(Z = z \mid X = x, Y = y) \mathbb{P}(X = x \mid Y = y) \\ &= \begin{cases} \alpha^2 + \beta / 2 & z = 0, y = 0 \\ \alpha \beta + \beta / 2 & z = 1, y = 0 \\ \alpha \beta + \beta / 2 & z = 0, y = 1 \\ \alpha ^ 2 + \beta / 2 & z = 1, y = 1 \\ \end{cases} \\ &= \begin{cases} \alpha ^ 2 + \beta / 2 \approx 0.835 & z = y \\ \alpha \beta + \beta / 2 \approx 0.165 & z \ne y \\ \end{cases} \end{align*} $$So $\mathbb{P}(Z = 1 \mid Y = 1) \approx 0.835$.
(b) Conducting the simulation:
Ns = np.logspace(1, 6, 100, dtype=int)
theoretical_value = expit(2) * expit(2) + expit(0) * expit(-2)
empirical_values = {}
for N in Ns:
np.random.seed(0)
X = np.random.choice([0, 1], size=N, replace=True)
Y = np.where(np.random.uniform(size=N) < 1 / (1 + np.exp(2 - 4 * X)), 1, 0)
Z = np.where(
np.random.uniform(size=N) < 1 / (1 + np.exp(2 - 2 * (X + Y))), 1, 0
)
empirical_values[N] = Z[np.where(Y == 1)[0]].mean()
plt.plot(*zip(*sorted(empirical_values.items())), label="Empirical Value")
plt.hlines(
y=theoretical_value,
xmin=min(Ns),
xmax=max(Ns),
color="orange",
label="Theoretical Value",
)
plt.xscale("log")
plt.xlabel("N")
plt.grid()
plt.legend()
plt.title("P(Z = 1 | Y = 1)")
plt.show()
(c) We are now setting $Y = y$. This changes the joint probability to
$$f^*(x, z) = f(x) f(z \mid x, y)$$Therefore, the derivation of $\mathbb{P}(Z = z \mid Y := y)$ proceeds as in part (a), but yields:
$$ \begin{align*} \mathbb{P}(Z = z \mid Y := y) &= \sum_{x=0}^1 \mathbb{P}(Z = z \mid X = x, Y = y) P(X = x) \\ &= \begin{cases} \alpha / 2 + 1 / 4 & z = y \\ \beta / 2 + 1 / 4 & z \ne y \end{cases} \end{align*} $$Therefore,
$$ \begin{align*} \mathbb{P}(Z = 1 \mid Y := 1) &= \alpha / 2 + 1 / 4 \\ &\approx 0.690. \end{align*} $$Ns = np.logspace(1, 6, 100, dtype=int)
theoretical_value = expit(0) * expit(0) + expit(0) * expit(2)
empirical_values = {}
for N in Ns:
np.random.seed(0)
X = np.random.choice([0, 1], size=N, replace=True)
# Y = np.where(np.random.uniform(size=N) < 1 / (1 + np.exp(2 - 4 * X)), 1, 0)
Y = np.ones(shape=N)
Z = np.where(
np.random.uniform(size=N) < 1 / (1 + np.exp(2 - 2 * (X + Y))), 1, 0
)
empirical_values[N] = Z[np.where(Y == 1)[0]].mean()
plt.plot(*zip(*sorted(empirical_values.items())), label="Empirical Value")
plt.hlines(
y=theoretical_value,
xmin=min(Ns),
xmax=max(Ns),
color="orange",
label="Theoretical Value",
)
plt.xscale("log")
plt.xlabel("N")
plt.grid()
plt.legend()
plt.title("P(Z = 1 | Y := 1)")
plt.show()
This is a continuous, Gaussian version of the last question. Let $V = (X, Y, Z)$ have the following joint distribution
$$ \begin{align*} X &\sim \text{Normal}(0, 1) \\ Y \mid X = x &\sim \text{Normal}(\alpha x, 1) \\ Z \mid X = x, Y = y &\sim \text{Normal}(\beta y + \gamma x, 1) \end{align*} $$Here, $\alpha$, $\beta$ and $\gamma$ are fixed parameters. Economists refer to models like this as structural equation models.
(a) Find an explicit expression for $f(z \mid y)$ and $\mathbb{E}(Z \mid Y = y) = \int z f(z \mid y) \, dz$.
(b) (Refers to material in the appendix.) Find an explicit expression for $f(z \mid Y := y)$ and then find $\mathbb{E}(Z \mid Y := y) \equiv \int z f(z \mid Y := y) \, dy$. Compare to (a).
(c) Find the joint distribution of $(Y, Z)$. Find the correlation $\rho$ between $Y$ and $Z$.
(d) (Refers to material in the appendix.) Suppose that $X$ is not observed and we try to make causal conclusions from the marginal distribution of $(Y, Z)$. (Think of $X$ as unobserved confounding variables.) In particular, suppose we declare that $Y$ causes $Z$ if $\rho \ne 0$ and we declare $Y$ does not cause $Z$ if $\rho = 0$. Show that this will lead to erroneous conclusions.
(e) (Refers to material in the appendix.) Suppose we conduct a randomized experiment in which $Y$ is randomly assigned. To be concrete, suppose that
$$ \begin{align*} X & \sim \text{Normal}(0, 1) \\ Y & \sim \text{Normal}(\alpha, 1) \\ Z \mid X = x, Y = y & \sim \text{Normal}(\beta y + \gamma x, 1). \end{align*} $$Show that the method in (d) now yields correct conclusions (i.e., $\rho = 0$ if and only if $f(z \mid Y := y)$ does not depend on $y$).
Solution:
We first identify the joint distribution $f(x,y)$:
$$ \begin{align*} f(x, y) &= f(y \mid x) f(x) \\ &= \frac{1}{\sqrt{2 \pi}} \exp \left\{- \frac12 (y - \alpha x) ^ 2 \right\} \cdot \frac{1}{\sqrt{2 \pi}} \exp \left\{- \frac12 x ^ 2 \right\} \\ &= \frac{1}{2 \pi} \exp \left\{-\frac12 \left[ (y - \alpha x) ^2 + x^2 \right]\right\} \end{align*} $$Then $f(x,y,z)$:
$$ \begin{align*} f(x, y, z) &= f(z \mid x, y) f(x, y) \\ &= \frac{1}{\sqrt{2 \pi}} \exp \left\{- \frac{1}{2}\left(z - \beta y - \gamma x \right)^2 \right\} \cdot \frac{1}{2 \pi} \exp \left\{-\frac12 \left[ (y - \alpha x) ^2 + x^2 \right]\right\} \\ &= (2 \pi) ^ {-3 / 2} \exp \left\{- \frac{1}{2} \left[ \left(z - \beta y - \gamma x \right)^2 + \left[ (y - \alpha x) ^2 + x^2 \right] \right]\right\} \tag{*}\\ &= (2 \pi) ^ {-3 / 2} \left(\det \Sigma \right)^{-1/2} \exp \left\{- \frac{1}{2} (\mathbf{u} - \bm{\mu})^T \Sigma^{-1} (\mathbf{u} - \bm{\mu}) \right\} \end{align*} $$where
$$ \mathbf{u} = \begin{bmatrix} x \\ y \\ z \end{bmatrix}, \qquad \bm{\mu} = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}, \qquad \text{and} \qquad \Sigma = \begin{bmatrix} 1 & \alpha & \alpha \beta + \gamma\\\alpha & \alpha^{2} + 1 & \alpha^{2} \beta + \alpha \gamma + \beta\\\alpha \beta + \gamma & \alpha^{2} \beta + \alpha \gamma + \beta & \alpha^{2} \beta^{2} + 2 \alpha \beta \gamma + \beta^{2} + \gamma^{2} + 1 \end{bmatrix} $$We establish $\bm{\mu}$ is the mean of the joint distribution by computing the marginal means:
We may show $\Sigma$ is the covariance matrix by expanding $\left(z - \beta y - \gamma x \right)^2 + \left( (y - \alpha x) ^2 + x^2 \right) = \sum_{i \in \{1,2,3\}} \sum_{j \in \{1,2,3\}} c_{ij} \mathbb{u}_i \mathbb{u}_j$. Defining the matrix $\Sigma^{-1}$ by
$$ \Sigma_{ij}^{-1} = \begin{cases} c_{ij} & i = j \\ c_{ij} / 2 & i \ne j, \end{cases} $$we may observe $\left(z - \beta y - \gamma x \right)^2 + \left( (y - \alpha x) ^2 + x^2 \right) = (\mathbf{u} - \bm{\mu})^T \Sigma^{-1} (\mathbf{u} - \bm{\mu})$. $\Sigma$ is obtained by computing the inverse.
We may perform the algebra using sympy
:
import sympy as sp
from IPython.display import display, Latex
x, y, z, alpha, beta, gamma = sp.symbols("x y z \\alpha \\beta \\gamma")
expression = (z - (beta * y + gamma * x)) ** 2 + (
(y - alpha * x) ** 2 + x**2
)
print("Original Expression:")
display(expression)
expanded = sp.collect(
sp.expand(expression), syms=[x, y, z, x * y, x * z, y * z]
)
print("Expanded Expression:")
display(expanded)
matrix = sp.Matrix(
[
[alpha**2 + gamma**2 + 1, (beta * gamma - alpha), -gamma],
[(beta * gamma - alpha), beta**2 + 1, -beta],
[-gamma, -beta, 1],
]
)
display(Latex("$\Sigma^{-1}$:"))
display(matrix)
display(Latex("$\Sigma$:"))
display(matrix.inv())
Original Expression:
Expanded Expression:
By Theorem 2.44(1), the marginal distribution of $(Y, Z)$ is $(Y, Z) \sim N(\mu_{Y, Z}, \Sigma_{Y,Z})$, where:
$$ \mu_{Y, Z} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \qquad \text{and} \qquad \Sigma_{Y,Z} = \begin{bmatrix} \alpha^{2} + 1 & \alpha^{2} \beta + \alpha \gamma + \beta\\ \alpha^{2} \beta + \alpha \gamma + \beta & \alpha^{2} \beta^{2} + 2 \alpha \beta \gamma + \beta^{2} + \gamma^{2} + 1 \end{bmatrix}. $$Then, by Theorem 2.44(2), the conditional distribution $Z \mid Y$ is $Z \mid Y \sim N(\mu_{Z \mid Y}, \Sigma_{Z \mid Y})$, where
$$ \begin{align*} \mu_{Z \mid Y} &= \mu_Z + \Sigma_{ZY}\Sigma_{YY}^1(y - \mu_Y) \\ &= 0 + \frac{\alpha ^2 \beta + \alpha \gamma + \beta}{\alpha ^ 2 + 1}(y - 0) \\ &= \frac{\alpha (\alpha \beta + \gamma) + \beta}{\alpha^2 + 1} y \end{align*} $$and
$$ \begin{align*} \Sigma_{Z \mid Y} &= \Sigma_{ZZ} - \Sigma_{ZY} \Sigma_{YY}^{-1} \Sigma_{YZ} \\ &= \alpha^{2} \beta^{2} + 2 \alpha \beta \gamma + \beta^{2} + \gamma^{2} + 1 - (\alpha^{2} \beta + \alpha \gamma + \beta)^2 / (1 + \alpha^2) \\ &= \frac{\alpha ^2 + \gamma ^ 2 + 1}{\alpha ^ 2 + 1} \end{align*} $$alpha, beta, gamma = sp.symbols("\\alpha \\beta \\gamma")
expression = (
(alpha**2) * (beta**2)
+ 2 * alpha * beta * gamma
+ beta**2
+ gamma**2
+ 1
- (((alpha**2) * beta + alpha * gamma + beta) ** 2 / (1 + alpha**2))
)
sp.simplify(expression)
The PDF is then:
$$ \begin{align*} f_{Z \mid Y}(z \mid y) = \frac{1}{\sqrt{2 \pi \Sigma_{Z \mid Y}}} \exp \left\{-\frac{1}{2} \left(\frac{z - \mu_{Z \mid Y}}{\Sigma_{Z \mid Y}} \right)\right\}, \end{align*} $$and the mean is
$$\mu_{Z \mid Y} = \frac{\alpha (\alpha \beta + \gamma) + \beta}{\alpha^2 + 1} y.$$(b)
In this case, the joint PDF is given by
$$ \begin{align*} f(x, y, z) &= f(z \mid x, y) f(x) \\ &= \frac{1}{\sqrt{2 \pi}} \exp \left\{-\frac{1}{2}\left(z - \beta y - \gamma x \right)^2 \right\} \cdot \frac{1}{\sqrt{2 \pi}} \exp \left\{- \frac{x^2}{2} \right\} \\ &= \frac{1}{2 \pi} \exp \left\{-\frac{1}{2}\left(z - \beta y - \gamma x \right)^2 - \frac{x^2}{2} \right\} \\ &= \frac{1}{2 \pi} \left( \det \Sigma_{X,Z} \right)^{-1 / 2} \exp \left\{-\frac{1}{2}(\bm{u} - \mu_{X, Z})^T \Sigma_{X,Z}^{-1} (\bm{u} - \mu_{X, Z}) \right\} \\ \end{align*} $$where
$$ \bm{u} = \begin{bmatrix} x \\ z \end{bmatrix} \qquad \qquad \mu_{X,Z} = \begin{bmatrix} 0 \\ \beta y \end{bmatrix} \qquad \text{and} \qquad \Sigma_{X,Z} = \begin{bmatrix} 1 & \gamma \\ \gamma & \gamma^2 + 1 \end{bmatrix}. $$This established as follows. First, $\mu_{X, Z}$ is determined by computing the marginal means: $\mathbb{E}[X] = 0$ and $\mathbb{E}[Z] = \beta y$. Then, $\Sigma_{X,Z}$ is established by equating coefficients from the expanded vectorized expression
$$ \begin{align*} (\bm{u} - \mu_{X, Z})^T \Sigma_{X,Z}^{-1} (\bm{u} - \mu_{X, Z}) &= \Sigma_{X,Z;xx}^{-1} x^2 + 2\Sigma_{X,Z;xz}^{-1} x (z - \beta y) + \Sigma_{X,Z;zz}^{-1}(z - \beta y)^2 \end{align*} $$and the expanded original expression:
$$ \begin{align*} \left(z - \beta y - \gamma x \right)^2 + x^2 &= (\gamma ^2 + 1) x^2 - 2 \gamma x(z - \beta y) + (z - \beta y)^2. \end{align*} $$This yields
$$ \begin{align*} \Sigma^{-1} = \begin{bmatrix} \gamma^2 + 1 & -\gamma \\ -\gamma & 1 \end{bmatrix}, \end{align*} $$and $\Sigma_{X,Z}$ is found by taking the inverse.
The marginal distribution $Z \mid Y := y$ is $Z \mid Y := y \sim N(\beta y, \gamma^2 + 1)$, which has mean $\beta y$. This is the result in (a) when setting $\alpha = 0$. That is, setting $Y := y$ is equivalent to assuming no influence of $X$ on $Y$.
(c) Computed in (a), $(Y, Z)$ is $(Y, Z) \sim N(\mu_{Y, Z}, \Sigma_{Y,Z})$, where:
$$ \mu_{Y, Z} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \qquad \text{and} \qquad \Sigma_{Y,Z} = \begin{bmatrix} \alpha^{2} + 1 & \alpha^{2} \beta + \alpha \gamma + \beta\\ \alpha^{2} \beta + \alpha \gamma + \beta & \alpha^{2} \beta^{2} + 2 \alpha \beta \gamma + \beta^{2} + \gamma^{2} + 1 \end{bmatrix}. $$From the covariance matrix, we get the correlation:
$$ \rho = \frac{\text{Cov}(Y,Z)}{\sigma_Y \sigma_Z} = \frac{\alpha^{2} \beta + \alpha \gamma + \beta}{\sqrt{(\alpha^{2} + 1)(\alpha^{2} \beta^{2} + 2 \alpha \beta \gamma + \beta^{2} + \gamma^{2} + 1)}}$$(d) Suppose $\alpha > 0$, $\beta = 0$ and $\gamma > 0$. Observe $\rho > 0$. However, since $\beta = 0$, $Y$ has no influence on $Z$.
(e) In this case,
$$ \begin{align*} f(x, y, z) &= \frac{1}{\sqrt{2 \pi}} \exp\left\{-\frac{x^2}{2}\right\}\frac{1}{\sqrt{2 \pi}} \exp\left\{-\frac{(y - \alpha)^2}{2}\right\}\frac{1}{\sqrt{2 \pi}} \exp\left\{-\frac{(z - \beta y - \gamma x)^2}{2}\right\} \\ &= (2 \pi)^{-3/2} \exp \left\{-\frac{1}{2} \left( x^2 + (y - \alpha)^2 + (z - \beta y - \gamma x)^2\right) \right\} \end{align*} $$The marginal means are $\mathbb{E}[X] = 0$, $\mathbb{E}[Y] = \alpha$, and $\mathbb{E}[Z] = \alpha \beta$. Expanding the original and vectorized expressions:
$$ \begin{align*} (\bm{u} - \bm{\mu})^T \Sigma^{-1} (\bm{u} - \bm{\mu}) &= \Sigma_{xx}^{-1} x^2 + \Sigma_{xy}^{-1} 2x(y - \alpha) + \Sigma_{xz}^{-1} 2x(z - \alpha \beta) + \Sigma_{yy}^{-1} (y - \alpha)^2 + \Sigma_{yz}^{-1} 2(y - \alpha)(z - \alpha \beta) + \Sigma_{zz}^{-1} (z - \alpha \beta)^2 \end{align*} $$x, y, z, alpha, beta, gamma = sp.symbols("x y z \\alpha \\beta \\gamma")
expr = (x**2) + (y - alpha) ** 2 + (z - beta * y - gamma * x) ** 2
sp.collect(
sp.expand(expr), [x, y, z, x * y, x * z, y * z, x**2, y**2, z**2]
)
and then matching coefficients of monomials $x^2, y^2, z^2, xy, xz, yz$, yields the precision matrix:
$$ \Sigma^{-1} = \begin{bmatrix} \gamma^2 + 1 & \beta y & -\gamma \\ \beta y & \beta^2 + 1 & -\beta \\ -\gamma & -\beta & 1\\ \end{bmatrix} $$with inverse:
$$ \Sigma = \begin{bmatrix} 1 & 0 & \gamma \\ 0 & 1 & \beta \\ \gamma & \beta & 1 + \beta ^ 2 + \gamma ^ 2\\ \end{bmatrix} $$Therefore,
$$ \begin{align*} f(x, y, z) &= (2 \pi)^{-3/2} \left( \det \Sigma \right)^{-1/2} \exp \left\{-\frac{1}{2}(\bm{u} - \mu)^T \Sigma^{-1} (\bm{u} - \mu) \right\} \\ \end{align*} $$where
$$ \bm{u} = \begin{bmatrix} x \\ y \\ z \end{bmatrix} \qquad \text{and} \qquad \mu = \begin{bmatrix} 0 \\ \alpha \\ \alpha \beta \end{bmatrix} $$By Theorem 2.44(1), the marginal $(Y,Z)$ is distributed as $N(\mu_{Y,Z}, \Sigma_{Y,Z})$, where:
$$ \mu_{Y,Z} = \begin{bmatrix} \alpha \\ \alpha \beta \end{bmatrix} \qquad \text{and} \qquad \Sigma_{Y,Z} = \begin{bmatrix} 1 & \beta \\ \beta & 1 + \beta ^ 2 + \gamma ^ 2\\ \end{bmatrix} $$From the covariance matrix, we get the correlation:
$$\rho = \frac{\text{Cov}(Y,Z)}{\sigma_Y \sigma_Z} = \frac{\beta}{\sqrt{1 + \beta^2 + \gamma^2}}.$$Observe that $\rho = 0 \iff \beta = 0$. Therefore, the method in (d) now yields correct conclusions.