import math
from functools import lru_cache
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from scipy.stats import chi2, norm
from scipy.optimize import fsolve
plt.style.use("default.mplstyle")
Prove Theorem 15.2
The following statements are equivalent:
Where $Y$ and $Z$ are both binary random variables, $A \amalg B$ means the random variables $A$ and $B$ are independent, $\psi$ is the odds ratio:
$$\psi = \frac{p_{00}p_{11}}{p_{01}p_{10}},$$where $p_{ij} = \mathbb{P}(Z = i, Y = j)$, $\gamma = \log(\psi)$ is the log odds ratio, and dotted subscripts denote summation over the corresponding indices.
Solution:
$(1) \Rightarrow (2)$: Suppose $Y \amalg Z$. Then $p_{ij} = \mathbb{P}(Z = i, Y=j) = \mathbb{P}(Z=i)\mathbb{P}(Y=j)$. Thus,
\begin{align*} \psi &= \frac{p_{00}p_{11}}{p_{01}p_{10}} \\ &= \frac{ \mathbb{P}(Z = 0, Y=0) \mathbb{P}(Z = 1, Y=1)}{ \mathbb{P}(Z = 0, Y=1) \mathbb{P}(Z = 1, Y=0)} \\ &= \frac{ \mathbb{P}(Z = 0) \mathbb{P}(Y=0) \mathbb{P}(Z = 1) \mathbb{P}(Y = 1)}{\mathbb{P}(Z = 0)\mathbb{P}(Y = 1)\mathbb{P}(Z = 1)\mathbb{P}(Y = 0)} = 1. \end{align*}$(2) \Rightarrow (1)$:
\begin{align*} \psi &= \frac{\text{odds}(Y \mid Z)}{\text{odds}(Y \mid Z^c)} = 1\\ &\Rightarrow \text{odds}(Y \mid Z) = \text{odds}(Y \mid Z^c) \\ &\Rightarrow \mathbb{P}(Y \mid Z) = \mathbb{P}(Y \mid Z^c) \Rightarrow Y \amalg Z \end{align*}Since, if $\mathbb{P}(Y \mid Z) = \mathbb{P}(Y \mid Z^c)$,
\begin{align*} \mathbb{P}(Y)\mathbb{P}(Z) &= [\mathbb{P}(Y \mid Z)\mathbb{P}(Z) + \mathbb{P}(Y \mid Z^c)\mathbb{P}(Z^c)]\mathbb{P}(Z) \\ &= \mathbb{P}(Y \mid Z)\mathbb{P}(Z) = \mathbb{P}(Y, Z) \end{align*}$(1) \Rightarrow (4)$: Let $i,j \in \{0, 1\}$. Then,
\begin{align*} p_{i\cdot}p_{\cdot j} &= (p_{i0} + p_{i1})(p_{0j} + p_{1j}) \\ &= p_{i0}p_{0j} + p_{i0}p_{1j} + p_{i1}p_{0j} + p_{i1}p_{1j} \\ &= \mathbb{P}(Z = i) \mathbb{P}(Y = 0)\mathbb{P}(Z = 0)\mathbb{P}(Y = j) + \mathbb{P}(Z = i)\mathbb{P}(Y = 0)\mathbb{P}(Z = 1)\mathbb{P}(Y = j) \\ &+ \mathbb{P}(Z = i)\mathbb{P}(Y = 1)\mathbb{P}(Z = 0)\mathbb{P}(Y = j) + \mathbb{P}(Z = i)\mathbb{P}(Y = 1)\mathbb{P}(Z = 1)\mathbb{P}(Y = j) \\ &= \mathbb{P}(Z = i)\mathbb{P}(Y = j) \mathbb{P}(Z = 0)\mathbb{P}(Y = 0) + \mathbb{P}(Z = 0)\mathbb{P}(Y = 1) +\mathbb{P}(Z = 1)\mathbb{P}(Y = 0) + \mathbb{P}(Z = 1)\mathbb{P}(Y = 1) \\ &= p_{ij}[p_{00} + p_{01} + p_{10} + p_{11}] = p_{ij} \end{align*}$(4) \Rightarrow (1)$:
\begin{align*} \mathbb{P}(Z = i, Y = j) &= p_{ij} \\ &= p_{i\cdot}p_{\cdot j} \tag{4} \\ &= (p_{i0} + p_{i1})(p_{0j} + p_{1j}) \\ &= p_{i0}p_{0j} + p_{i0}p_{1j} + p_{i1}p_{0j} + p_{i1}p_{1j} \\ &= [p_{i0}\mathbb{P}(Z = 0 \mid Y = j) + p_{i0}\mathbb{P}(Z = 1 \mid Y = j) \\ &+ p_{i1}\mathbb{P}(Z = 0 \mid Y = j) + p_{i1}\mathbb{P}(Z = 1 \mid Y = j)] \mathbb{P}(Y = j) \\ &= [p_{i0} + p_{i1}]\mathbb{P}(Y = j) \\ &= \mathbb{P}(Z = i)\mathbb{P}(Y = j) \end{align*}$(2) \iff (3)$: Follows from $\log(1) = 0$.
Prove Theorem 15.3:
Consider testing
$$H_0: Y \amalg Z \text{ versus } H_1 : Y \not\amalg$$The likelihood ratio test statistic for (15.3) is
$$T = 2 \sum_{i=0}^1 \sum_{j=1}^1 X_{ij} \log \left( \frac{X_{ij}X_{\cdot\cdot}}{X_{i\cdot}X_{\cdot j}}\right).$$Under $H_0 : T \leadsto \chi_1^2$. Thus, an approximate level $\alpha$ test is obtained by rejecting $H_0$ when $T > \chi_{1, \alpha}^2$.
Solution:
Recall the likelihood ratio statistic. We consider testing
$$H_0: \theta \in \Theta_0 \text{ versus } H_1: \theta \notin \Theta_0$$The likelihood ratio statistic is
$$T = 2 \log \left( \frac{\mathcal{L}(\hat{\theta})}{\mathcal{L}(\hat{\theta}_0)} \right) = 2 \left[\mathcal{l}(\hat{\theta}) - \mathcal{l}(\hat{\theta}_0) \right]$$where $\hat{\theta}$ is the MLE, and $\hat{\theta}_0$ is the MLE restricted to $\Theta_0$.
In our case, the MLE is $\hat{p} = X / n$, and $\Theta_0$ is the set of $p$ such that $p_{i\cdot}p_{\cdot j} = p_{ij}$. By the equivariance of the MLE, $\hat{\theta}_0 = \hat{p}_{i \cdot} \hat{p}_{\cdot j} = \frac{X_{i \cdot}}{n} \frac{X_{\cdot j}}{n}$.
We have
$\mathcal{L}(p) \propto \prod_{i=0}^1 \prod_{j=0}^1 p_{ij}^{X_{ij}},$
and so
$\mathcal{l}(p) = \sum_{i=0}^1 \sum_{j=0}^1 X_{ij} \log(p_{ij})$
therefore,
\begin{align*} T &= 2 \left[\sum_{i=0}^1 \sum_{j=0}^1 X_{ij} \log \left(\frac{X_{ij}}{n}\right) - \sum_{i=0}^1 \sum_{j=0}^1 X_{ij} \log(\frac{X_{i \cdot}}{n} \frac{X_{\cdot j}}{n}) \right] \\ &=2 \left[\sum_{i=0}^1 \sum_{j=0}^1 X_{ij} \log \left( \frac{X_{ij}X_{\cdot \cdot}}{X_{i \cdot}X_{\cdot j}}\right) \right]. \end{align*}Observe that there are three degrees of freedom under $H_1$ (after choosing $p_{00}$, $p_{01}$, and $p_{10}$, $p_{11}$ is uniquely determined), but only two degrees of freedom under $H_0$ (after choosing $p_{0 \cdot}$ and $p_{\cdot 0}$, all $p_{ij}$ are uniquely determined). The result follows from Theorem 10.22.
Prove Theorem 15.6.
The MLE's of $\psi$ and $\gamma$ are
$$\hat{\psi} = \frac{X_{00}X_{11}}{X_{01}X_{10}}, \quad \hat{\gamma} = \log \hat{\psi}.$$The asymptotic standard errors (computed using the delta method) are
$$\hat{\text{se}}(\hat{\gamma}) = \sqrt{\frac{1}{X_{00}} + \frac{1}{X_{01}} + \frac{1}{X_{10}} + \frac{1}{X_{11}}}$$$$\hat{\text{se}}(\hat{\psi}) = \hat{\psi}\hat{\text{se}}(\hat{\gamma})$$Solution:
That $\hat{\psi}$ and $\hat{\gamma}$ are the MLEs of $\psi$ and $\gamma$ follow from equivariance of the MLE and the fact that the MLE of $p$ is $X / n$.
We now use the delta method to derive the standard errors. We first compute the information:
\begin{align*} \mathcal{L}(p) &\propto \prod_{i=0}^1\prod_{j=0}^1 p_{ij}^{X_{ij}} \\ \Rightarrow \mathcal{l}(p) &= \sum_{i=0}^1\sum_{j=0}^1 X_{ij} \log(p_{ij}) \\ \Rightarrow \nabla \mathcal{l}(p) &= \begin{bmatrix} X_{00} / p_{00} \\ X_{01} / p_{01} \\ X_{10} / p_{10} \\ X_{11} / p_{11} \\ \end{bmatrix} \\ \Rightarrow \nabla^2 \mathcal{l}(p) &= - \begin{bmatrix} X_{00} / p_{00}^2 & 0 & 0 & 0\\ 0 & X_{01} / p_{01}^2 & 0 & 0 \\ 0 & 0 & X_{10} / p_{10}^2 & 0 \\ 0 & 0 & 0 & X_{11} / p_{11}^2 \\ \end{bmatrix} \\ \Rightarrow I_n(p) &= - \mathbb{E}(D^2 \mathcal{l}(p)) = n \begin{bmatrix} 1 / p_{00} & 0 & 0 & 0\\ 0 & 1 / p_{01} & 0 & 0 \\ 0 & 0 & 1 / p_{10} & 0 \\ 0 & 0 & 0 & 1 / p_{11} \\ \end{bmatrix} \\ \end{align*}The inverse of the information is:
$$ J_n(p) = \frac{1}{n} \begin{bmatrix} p_{00} & 0 & 0 & 0\\ 0 & p_{01} & 0 & 0 \\ 0 & 0 & p_{10} & 0 \\ 0 & 0 & 0 & p_{11} \\ \end{bmatrix} $$whence
$$ \hat{J}_n = J_n(\hat{p}) = \frac{1}{n^2} \begin{bmatrix} X_{00} & 0 & 0 & 0\\ 0 & X_{01} & 0 & 0 \\ 0 & 0 & X_{10} & 0 \\ 0 & 0 & 0 & X_{11} \\ \end{bmatrix}. $$Let $g(p) = \gamma = \log\left(\frac{p_{00}p_{11}}{p_{01}p_{10}}\right) = \log p_{00} + \log p_{11} - \log p_{01} - \log p_{10}$. Then
\begin{align*} \nabla g(p) = \begin{bmatrix} 1 / p_{00} \\ -1 / p_{01} \\ -1 / p_{10} \\ 1 / p_{11} \end{bmatrix} \end{align*}whence
\begin{align*} \hat{\nabla}g = \nabla g(\hat{p}) = n \begin{bmatrix} 1 / X_{00} \\ -1 / X_{01} \\ -1 / X_{10} \\ 1 / X_{11} \end{bmatrix} \end{align*}Then
\begin{align*} \hat{\text{se}}(\hat{\gamma}) &= \sqrt{(\hat{\nabla}g)^T \hat{J}_n (\hat{\nabla}g)} \\ \Rightarrow \hat{\text{se}}(\hat{\gamma}) &= \sqrt{ \frac{1}{X_{00}} + \frac{1}{X_{01}} + \frac{1}{X_{10}} + \frac{1}{X_{11}}}. \end{align*}Meanwhile, if $g(p) = \psi = \frac{p_{00}p_{11}}{p_{01}p_{10}}$, then
\begin{align*} \nabla g(p) = \begin{bmatrix} \frac{p_{11}}{p_{01}p_{10}} \\ -\frac{p_{00}p_{11}}{p_{01}^2p_{10}} \\ -\frac{p_{00}p_{11}}{p_{01}p_{10}^2} \\ \frac{p_{00}}{p_{01}p_{10}} \\ \end{bmatrix} \Rightarrow \hat{\nabla}g = n \begin{bmatrix} \frac{X_{11}}{X_{01}X_{10}} \\ -\frac{X_{00}X_{11}}{X_{01}^2X_{10}} \\ -\frac{X_{00}X_{11}}{X_{01}X_{10}^2} \\ \frac{X_{00}}{X_{01}X_{10}} \\ \end{bmatrix} \end{align*}Therefore,
\begin{align*} (\hat{\nabla}g)^T \hat{J}_n \hat{\nabla}g = \hat{\psi}^2 \left(\frac{1}{X_{00}} + \frac{1}{X_{01}} + \frac{1}{X_{10}} + \frac{1}{X_{11}}\right), \end{align*}and thus
$$\hat{\text{se}}(\hat{\psi}) = \hat{\psi}\hat{\text{se}}(\hat{\gamma}).$$The New York Times (January 8, 2003, page A12) reported the following data on death sentencing and race, from a study in Maryland:
Death Sentence | No Death Sentence | |
---|---|---|
Black Victim | 14 | 641 |
White Victim | 62 | 594 |
Analyze the data using the tools from this chapter. Interpret the results. Explain why, based on on this information, you can't make causal conclusions. (The authors of the study did use much more information in their full report.)
data = {"Black Victim": {"Death Sentence": 14, "No Death Sentence": 641},
"White Victim": {"Death Sentence": 62, "No Death Sentence": 594}}
data = pd.DataFrame(data).transpose()
X = data.values
Each of the following tests for association:
conclusively rejects the null hypothesis of independence, reporting a $p$-values very near zero.
The likelihood ratio test:
T = 0
for i in [0, 1]:
for j in [0, 1]:
T += X[i,j] * math.log((X[i, j] * sum(sum(X))) / (sum(X[i,:]) * sum(X[:, j])))
T *= 2
z = chi2.ppf(1 - 0.05, df=1)
print(f"95% quantile for chi-squared distribution with df=1: {z:3f}")
print(f"Likelihood ratio test statistic: {T:.3f}")
print(f"p-value: {1 - chi2.cdf(T, df=1):.12f}")
95% quantile for chi-squared distribution with df=1: 3.841459 Likelihood ratio test statistic: 34.534 p-value: 0.000000004190
Pearon's $\chi^2$ test
n = sum(sum(X))
U = 0
for i in [0, 1]:
for j in [0, 1]:
E = sum(X[i, :]) * sum(X[:, j]) / n
U += ((X[i,j] - E) ** 2) / E
print(f"Pearson's chi^2 test statistic: {U:.3f}")
print(f"p-value: {1 - chi2.cdf(U, df=1):.12f}")
Pearson's chi^2 test statistic: 32.104 p-value: 0.000000014616
Wald test for $\gamma = 0$:
psi_hat = (X[0, 0] * X[1, 1]) / (X[0, 1] * X[1, 0])
gamma_hat = math.log(psi_hat)
print(f"psi_hat: {psi_hat:.3f}")
print(f"gamma_hat: {gamma_hat:.3f}")
# estimated standard error of gamma_hat
se_gamma_hat = math.sqrt((1 / X[0, 0]) + (1 / X[0, 1]) + (1 / X[1, 0]) + (1 / X[1, 1]))
print(f"se_gamma_hat: {se_gamma_hat:.3f}")
W = gamma_hat / se_gamma_hat
p_value = 1- norm.cdf(np.abs(W))
print(f"p-value: {p_value:.9f}")
z = norm.ppf(1 - 0.05 / 2)
print(f"95% C.I. for gamma: ({gamma_hat - z * se_gamma_hat:.3f}, {gamma_hat + z * se_gamma_hat:.3f})")
print(f"95% C.I. for psi: ({np.exp(gamma_hat - z * se_gamma_hat):.3f}, {np.exp(gamma_hat + z * se_gamma_hat):.3f})")
psi_hat: 0.209 gamma_hat: -1.564 se_gamma_hat: 0.301 p-value: 0.000000105 95% C.I. for gamma: (-2.155, -0.974) 95% C.I. for psi: (0.116, 0.378)
We cannot make causal conclusions from these results because these tests only demonstrate evidence of association, and association does not imply causation. For example, it make be the case that there is a confounding variable in the study leading to the results seen.
Analyze the data on the variables Age and Financial Status from:
http://lib.stat.cmu.edu/DASL/Datafiles/montanadat.html
Solution:
The data come from a 1992 poll of ~200 Montana residents regarding their financial status compared to a year prior. The age of the respondents are also included.
The information is encoded in the following way (from the file):
AGE = 1 under 35, 2 35-54, 3 55 and over
FIN = Financial status 1 worse, 2 same, 3 better than a year ago
!curl -s --insecure https://www.stat.cmu.edu/~larry/all-of-statistics/=data/montana.dat --output data/montana.dat
data = pd.read_csv(
"data/montana.dat",
skiprows=33,
sep="\s+",
names="AGE SEX INC POL AREA FIN STAT".split(),
usecols=["AGE", "FIN"],
).query("AGE != '*' & FIN != '*'")
data.head()
AGE | FIN | |
---|---|---|
0 | 3 | 2 |
1 | 2 | 3 |
2 | 1 | 2 |
3 | 3 | 1 |
4 | 3 | 2 |
Both of our tests reject the null hypothesis of independence:
X = data.groupby(["AGE", "FIN"]).size().values.reshape((3,3))
I, J = X.shape
# computing the likelihood ratio test statistic
T = 0
for i in range(I):
for j in range(J):
ratio = (X[i, j] * np.sum(X)) / (np.sum(X[i, :]) * np.sum(X[:, j]))
T += X[i, j] * math.log(ratio)
T *= 2
print(f"T: {T:.3f}")
nu = (I - 1) * (J - 1)
print(f"p-value: {1 - chi2.cdf(T, df=nu):.6f}")
T: 22.064 p-value: 0.000195
# Computing Pearson's chi^2 test statistic
U = 0
for i in range(I):
for j in range(J):
E = np.sum(X[i, :]) * np.sum(X[:, j]) / n
U += ((X[i,j] - E) ** 2) / E
print(f"U: {U:.3f}")
nu = (3 - 1) * (3 - 1)
print(f"p-value: {1 - chi2.cdf(U, df=nu):.6f}")
U: 1060.653 p-value: 0.000000
We can conclude there is an association between a respondent's age and their assessment of their financial status compared to a year prior.
Estimate the correlation between temperature and latitude using the data from
http://lib.stat.cmu.edu/DASL/Datafiles/USTemperatures.html
Use the correlation coefficient. Provide estimates, tests, and confidence intervals.
!curl -s --insecure https://www.stat.cmu.edu/~larry/all-of-statistics/=data/temp.dat --output data/temp.dat
data = pd.read_csv("data/temp.dat",
skiprows=31,
sep="\t")
data.head()
City | JanTemp | Lat | Long | |
---|---|---|---|---|
0 | Mobile, AL | 44 | 31.2 | 88.5 |
1 | Montgomery, AL | 38 | 32.9 | 86.8 |
2 | Phoenix, AZ | 35 | 33.6 | 112.5 |
3 | Little Rock, AR | 31 | 35.4 | 92.8 |
4 | Los Angeles, CA | 47 | 34.3 | 118.7 |
# computing the sample correlation using the nonparametric plug-in estimator
Y = data["JanTemp"].values
Z = data["Lat"].values
X = np.stack([Y, Z], axis=1)
Y_hat = np.mean(Y)
Z_hat = np.mean(Z)
n = len(Y)
s_Y = np.sqrt(np.power(np.linalg.norm(Y - Y_hat), 2) / (n - 1))
s_Z = np.sqrt(np.power(np.linalg.norm(Z - Z_hat), 2) / (n - 1))
rho_hat = np.sum((Y - Y_hat) * (Z - Z_hat)) / ((n-1) * s_Y * s_Z)
print(f"Sample correlation: {rho_hat:.3f}")
Sample correlation: -0.848
def get_rho_hat(X):
mu_hat = np.mean(X, axis=0)
s_1 = np.sqrt(np.power(np.linalg.norm(X[:, 0] - mu_hat[0]), 2) / (n - 1))
s_2 = np.sqrt(np.power(np.linalg.norm(X[:, 1] - mu_hat[1]), 2) / (n - 1))
sample_cov = np.sum((X[:, 0] - mu_hat[0]) * (X[:, 1] - mu_hat[1]))
return sample_cov / ((n - 1) * s_1 * s_2)
def get_bootstrap_ci(X):
rho_hat = get_rho_hat(X)
B = 1000
T = np.empty(B)
for i in range(B):
X_star = X[np.random.choice(X.shape[0], size=n, replace=True)]
T[i] = get_rho_hat(X_star)
se_boot = np.sqrt(np.var(T))
lb = rho_hat - z * se_boot
ub = rho_hat + z * se_boot
return se_boot, (lb, ub)
z = norm.ppf(1 - 0.05 / 2)
se_boot, ci_boot = get_bootstrap_ci(X)
print(f"Bootstrap 95% CI: ({ci_boot[0]:.3f}, {ci_boot[1]:.3f})")
def get_fishers_method_ci(X):
rho_hat = get_rho_hat(X)
theta_hat = (1 / 2) * (math.log(1 + rho_hat) - math.log(1 - rho_hat))
se_theta_hat = 1 / math.sqrt(n - 3)
a = theta_hat - z * se_theta_hat
b = theta_hat + z * se_theta_hat
lb = (math.exp(2 * a) - 1) / (math.exp(2 * a) + 1)
ub = (math.exp(2 * b) - 1) / (math.exp(2 * b) + 1)
return (lb, ub)
ci_fisher = get_fishers_method_ci(X)
print(f"Fisher's Method 95% CI: ({ci_fisher[0]:.3f}, {ci_fisher[1]:.3f})")
Bootstrap 95% CI: (-0.965, -0.732) Fisher's Method 95% CI: (-0.908, -0.753)
# Wald test using the standard error from the bootstrap
W = rho_hat / se_boot
print("p-value: ", norm.cdf(-2 * abs(W)))
p-value: 2.2416470099624988e-179
We can conclude that the two variables are correlated, and, even without assuming normality of the variables, we can therefore also conclude they are dependent.
Test whether calcium intake and drop in blood pressure are associated. Use the data in
!curl -s --insecure https://www.stat.cmu.edu/~larry/all-of-statistics/=data/calcium.dat --output data/calcium.dat
data = pd.read_csv("data/calcium.dat",
skiprows=32,
sep="\t",
usecols=["Treatment", "Decrease"])
data
Treatment | Decrease | |
---|---|---|
0 | Calcium | 7 |
1 | Calcium | -4 |
2 | Calcium | 18 |
3 | Calcium | 17 |
4 | Calcium | -3 |
5 | Calcium | -5 |
6 | Calcium | 1 |
7 | Calcium | 10 |
8 | Calcium | 11 |
9 | Calcium | -2 |
10 | Placebo | -1 |
11 | Placebo | 12 |
12 | Placebo | -1 |
13 | Placebo | -3 |
14 | Placebo | 3 |
15 | Placebo | -5 |
16 | Placebo | 5 |
17 | Placebo | 2 |
18 | Placebo | -11 |
19 | Placebo | -1 |
20 | Placebo | -3 |
Y = data['Treatment'].map({'Placebo': 1, 'Calcium': 2}).values
Z = data['Decrease'].values
n_1 = sum(Y == 1)
n_2 = sum(Y == 2)
def F_1(z):
return (1 / n_1) * np.sum((Z < z) * (Y == 1))
def F_2(z):
return (1 / n_2) * np.sum((Z < z) * (Y == 2))
zs = np.arange(-10, 20, .1)
plt.plot(zs, [F_1(z) for z in zs], label='F_1(z)')
plt.plot(zs, [F_2(z) for z in zs], label='F_2(z)')
plt.legend()
plt.title("CDFs")
plt.show()
D = max([abs(F_1(z) - F_2(z)) for z in zs])
test_statistic = math.sqrt((n_1 * n_2) / (n_1 + n_2)) * D
print(f"D: {D:.3f}")
print(f"Test statistic: {test_statistic:.3f}")
# approximating H(t) to a specified tolerance
def H(t, xtol=1e-8):
j_max = int(np.ceil(np.sqrt(- np.log(xtol / 2) / (2 * t * t) )))
jj = np.arange(1, j_max+1)
return 1 + 2 * sum((-1) ** (jj) * np.exp(-2 * (jj ** 2) * t * t))
# computing H^{-1}(x)
@lru_cache(maxsize=None)
def H_inv(x):
return fsolve(lambda t: H(t) - x, x0=1)[0]
D: 0.409 Test statistic: 0.936
T = np.logspace(-3, 0.25, 100)
plt.plot(T, list(map(H, T)))
plt.xlabel("t")
plt.ylabel("H(t)")
Text(0, 0.5, 'H(t)')
X = np.linspace(0.01, 0.99, 99)
plt.plot(X, list(map(H_inv, X)))
plt.xlabel("$1-a$")
plt.ylabel("$H^{-1}(1-a)$")
plt.hlines(y=test_statistic, xmin=min(X), xmax=max(X), color='orange', label="Test statistic")
plt.title("$H^{-1}(1-a)$")
plt.legend()
plt.show()
Because the test statistic does not exceed the $H^{-1}(0.95)$, the $0.05$-sized test does not reject $H_0$.