import math
import numpy as np
import pandas as pd
from numpy.random import default_rng
rng = default_rng(42)
import matplotlib as mpl
import plotly.graph_objects as go
from matplotlib import pyplot as plt
from scipy.stats import binom, chi2, norm
from tqdm.notebook import tqdm
mpl.rcParams["font.size"] = 18
Prove Theorem 10.6
Suppose the true value of $\theta$ is $\theta_ \ne \theta_0$. The power $\beta(\theta_)$ - the probability of correctly rejecting the null hypothesis - is given (approximately) by
$$1 - \Phi \left( \frac{\theta_0 - \theta_*}{\hat{se}} + z_{\alpha / 2} \right) + \Phi \left( \frac{\theta_0 - \theta_*}{\hat{se}} - z_{\alpha / 2} \right)$$Solution:
Observe $z_{\alpha / 2} \ge 0$. We have
\begin{align*} \beta(\theta_*) &= P_{\theta_*}(|W| > z_{\alpha / 2}) \\ &= P_{\theta_*}(W > z_{\alpha / 2}) + P_{\theta_*}(W < -z_{\alpha / 2}) \\ &= P_{\theta_*} \left( \frac{\hat{\theta} - \theta_0}{\hat{se}} > z_{\alpha / 2} \right) + P_{\theta_*}\left(\frac{\hat{\theta} - \theta_0}{\hat{se}} < -z_{\alpha / 2}\right) \\ &= P_{\theta_*} \left( \frac{\hat{\theta} - \theta_*}{\hat{se}} + \frac{\theta_* - \theta_0}{\hat{se}} > z_{\alpha / 2} \right) + P_{\theta_*}\left(\frac{\hat{\theta} - \theta_*}{\hat{se}} + \frac{\theta_* - \theta_0}{\hat{se}} < -z_{\alpha / 2}\right) \\ &= P_{\theta_*} \left( \frac{\hat{\theta} - \theta_*}{\hat{se}} > \frac{\theta_0 - \theta_*}{\hat{se}} + z_{\alpha / 2}\right) + P_{\theta_*}\left(\frac{\hat{\theta} - \theta_*}{\hat{se}} < \frac{\theta_0 - \theta_*}{\hat{se}} -z_{\alpha / 2} \right) \\ &\rightarrow P \left( Z > \frac{\theta_0 - \theta_*}{\hat{se}} + z_{\alpha / 2}\right) + P\left(Z < \frac{\theta_0 - \theta_*}{\hat{se}} - z_{\alpha / 2}\right) \\ &= 1 - \Phi\left( \frac{\theta_0 - \theta_*}{\hat{se}} + z_{\alpha / 2} \right) + \Phi \left( \frac{\theta_0 - \theta_*}{\hat{se}} - z_{\alpha / 2} \right) \end{align*}Prove Theorem 10.14:
If the test statistic has a continuous distribution, then under $H_0 : \theta = \theta_0$, the p-value has a Uniform (0,1) distribution. Therefore, if we reject $H_0$ when the p-value is less than $\alpha$, the probability of a Type I error is $\alpha$.
Solution:
Let $a \in (0,1)$. We have
\begin{align*} P(\text{p-value} \le a) &\approx P(2\Phi(-|w|) \le a) \tag{Theorem 10.13} \\ &= P\left(|w| \ge -\Phi ^{-1} \left(\frac{a}{2}\right)\right) \\ &= P\left(w \ge -\Phi ^{-1}\left(\frac{a}{2}\right)\right) + P\left(w \le \Phi^{-1} \left(\frac{a}{2}\right)\right) \\ &= \frac{a}{2} + \frac{a}{2} = a. \tag{$w \sim N(0,1)$} \end{align*}Since the p-value is a probability, it can only assume values in $[0,1]$. Therefore, if $a \le 0$, $P(\text{p-value} \le a) = 0$, and if $a \ge 1$, $P(\text{p-value} \le a) = 1$. Thus, the p-value has a uniform $(0,1)$ distribution.
Prove Theorem 10.10.
The size $\alpha$ Wald test rejects $H_0 : \theta = \theta_0$ versus $H_1 : \theta \ne \theta_0$ if and only if $\theta_0 \notin C$ where
$$C = (\hat{\theta} - \hat{se} z_{\alpha / 2}, \hat{\theta} + \hat{se} z_{\alpha / 2}).$$Thus, testing the hypothesis is equivalent to checking whether the null value is in the confidence interval.
Solution:
Suppose the size $\alpha$ Wald tests rejects $H_0$. We have
\begin{align*} &|W| \ge z_{\alpha / 2} \\ &\iff \left|\frac{\hat{\theta} - \theta_0}{\hat{se}} \right| \ge z_{\alpha / 2} \\ &\iff \left|\hat{\theta} - \theta_0 \right| \ge \hat{se}z_{\alpha / 2} \tag{$\hat{se} \ge 0$}\\ &\iff \theta_0 \notin C \end{align*}Prove Theorem 10.12:
Suppose that the size $\alpha$ test is of the form
$$\text{reject } H_0 \text{ if and only if } T(X^n) \ge c_{\alpha}.$$Then,
$$\text{p-value} = \sup_{\theta \in \Theta_0} P_{\theta}(T(X^n) \ge T(x^n))$$where $x^n$ is the observed value of $X^n$. If $\Theta_0 = \{\theta_0\}$ then
$$\text{p-value } = P_{\theta_0}(T(X^n) \ge T(x^n)).$$Solution:
From the definitions of p-value and $\alpha$, we have
$$\text{p-value} = \inf_{\alpha} \left\{ \sup_{\theta \in \Theta_0} P_{\theta}(T(X^n) \ge c_{\alpha}) \mid T(x^n) \ge c_{\alpha}\right\}.$$We may observe that $\sup_{\theta \in \Theta_0} P_{\theta}(X^n \ge c_{\alpha})$ decreases as $c_{\alpha}$ increases, and, assuming there is an $\alpha$ for which $c_{\alpha} = T(x^n)$, the infinimum is attained when $c_{\alpha} = T(x^n)$, yielding
$$\text{p-value} = \sup_{\theta \in \Theta_0} P_{\theta}(T(X^n) \ge (T(x^n)).$$Let $X_1, \dots X_n \sim \text{Uniform}$ and let $Y = \max \{X_1, \dots, X_n\}$. We want to test
$$H_0 : \theta = 1/2 \text{ versus } H_1 : \theta > 1/2.$$The Wald test is not appropriate since $Y$ does not converge to a Normal. Suppose we decide to test this hypothesis by rejecting $H_0$ when $Y > c$.
(a) Find the power function.
Solution:
The power function is \begin{align*} \beta(\theta) &= P_{\theta}(Y > c) \\ &= P_{\theta}(\max \{X_1, \dots, X_n \} > c) \\ &= 1 - \prod_{i=1}^n P_{\theta}(X_i \le c) \\ &= \begin{cases} 0 & \theta \le c \\ 1 - \left(\frac{c}{\theta}\right)^n & \theta > c. \end{cases} \end{align*}
(b) What choice of $c$ will make the size of the test .05?
Solution: Let $\theta_0 = .5$. Letting $c^*$ be the desired value, we have
\begin{align*} .05 &= \sup_{\theta \in \Theta_0} \beta(\theta) \\ &= \beta(\theta_0) \tag{$\Theta_0 = \{\theta_0\}$} \end{align*}meaning $c^* < \theta_0$, and
\begin{align*} .05 &= 1 - \left(\frac{c^*}{\theta_0}\right)^n \\ \Rightarrow c^* &= \theta_0 \cdot .95 ^{1/n} = 0.5 \cdot .95 ^{1/n} \end{align*}(c) In a sample of size $n=20$ with $Y=0.48$ what is the p-value? What conclusion about $H_0$ would you make?
Solution:
The p-value is the smallest test size such that the null hypothesis is rejected. In this case, the p-value is $\inf \{ \alpha : Y > c_{\alpha} \}$, where $c_{\alpha} = 0.5 \cdot (1-\alpha)^{1/n}$. Observe that as $\alpha$ decreases, $c_{\alpha}$ increases. Thus, the p-value is $\alpha^*$, where $c_{\alpha^*} = Y$, i.e.
$$0.48 = 0.5 \cdot (1-\alpha^*)^{1/20} \Rightarrow \alpha^* = 1-(0.48 / 0.5)^{20} \approx 0.558,$$meaning this test provides little or no evidence against the null hypothesis.
(d) In a sample of size $n = 20$ with $Y = 0.52$ what is the p-value? What conclusion about $H_0$ would you make?
Solution: Note that for all $\alpha$, $c_{\alpha} \le 0.5$. Therefore, we have $Y > c_{\alpha}$ for all $\alpha$. Since $\alpha \in [0,1]$, the smallest test size where the null is rejected is the size 0 test, $Y > c_{0} = 0.5$. Thus, the p-value is 0; and $H_0$ is almost surely false.
There is a theory that people can postpone their death until after an important event. To test the theory, Phillips and King (1988) collected data on deaths aroun the Jewish holiday Passover. Of 1919 deaths, 922 died the week before the holiday and 997 died the week after. Think of this as a binomial and test the null hypothesis that $\theta = 1/2$. Report and interpret the p-value. Also construct a confidence interval for $\theta$.
Solution:
Let $a=997$ be the number of deaths after Passover, and $n=1919$ be the total number of deaths. The maximum likelihood estimator for $\theta$ is $\hat{\theta} = a / n$, with standard error $\hat{se} = \sqrt{\frac{\hat{\theta}(1-\hat{\theta})}{n}} \approx 0.011$, and the Wald test statistic is $W = \frac{\hat{\theta} - \theta_0}{\hat{se}}$. The p-value is $2\Phi(-|w|)$, where $w$ is the observed value of $W$.
theta_0 = 0.5 # parameter in null hypothesis
n = 1919
after = 997
theta_hat = after / n
se_hat = math.sqrt(theta_hat * (1 - theta_hat) / n)
w = (theta_hat - theta_0) / se_hat
p = 2 * norm.cdf(-np.abs(w))
print(f"p-value: {p:.3f}")
alpha = 0.05
z = norm.ppf(1 - alpha / 2)
print(
f"{100 * (1 - alpha):.0f}% Confidence Interval: ({theta_hat - z * se_hat:.3f}, {theta_hat + z * se_hat:.3f})"
)
p-value: 0.087 95% Confidence Interval: (0.497, 0.542)
Since the p-value is within 0.05 - 0.10, we have weak evidence against $H_0$.
In 1861, 10 essays appear in the New Orleans Daily Crescent. They were signed "Quintus Curtius Snodgrass" and some people suspected they were actually written by Mark Twain. To investigate this, we will consider the proportion of three letter words found in an author's work. From eight Twain essays we have:
$$.225 \quad .262 \quad .217 \quad .240 \quad .230 \quad .229 \quad .235 \quad .217$$From 10 Snodgrass essays we have:
$$.209 \quad .205 \quad .196 \quad .210 \quad .202 \quad .207 \quad .224 \quad .223 \quad .220 \quad .201$$(a) Perform a Wald test for equality of the means. Use the nonparametric plug-in estimator. Report the p-value and a 95 percent confidence interval for the difference of means. What do you conclude?
Solution: (Following example 10.8). Let $\mu_1$ and $\mu_2$, respectively, be the mean proportions of three letter words from Twain essays and Snodgrass essays, respectively. We test the null hypothesis that $\mu_1 = \mu_2$ by testing $H_0: \delta = 0$ versus $H_1 : \delta \ne 0$, where $\delta = \mu_1 - \mu_2$. The nonparametric plug-in estimate of $\delta$ is $\hat{\delta} = \bar{X} - \bar{Y}$ with estimated standard error
$$\hat{se} = \sqrt{\frac{s_1^2}{m} + \frac{s_2^2}{n}}$$where $s_1^2$ and $s_2^2$ are sample variances.
twain = np.array([0.225, 0.262, 0.217, 0.240, 0.230, 0.229, 0.235, 0.217])
snodgrass = np.array(
[0.209, 0.205, 0.196, 0.210, 0.202, 0.207, 0.224, 0.223, 0.220, 0.201]
)
m = twain.size
n = snodgrass.size
delta_hat = np.mean(twain) - np.mean(snodgrass)
print(f"delta_hat: {delta_hat:.3f}")
s_1 = np.std(twain, ddof=1)
s_2 = np.std(snodgrass, ddof=1)
se_hat = math.sqrt((s_1**2) / m + (s_2**2) / n)
w = delta_hat / se_hat
p = 2 * norm.cdf(-np.abs(w))
print(f"p-value: {p:.4f}")
alpha = 0.05
z = norm.ppf(1 - alpha / 2)
print(
f"{100 * (1 - alpha):.0f}% Confidence Interval: ({delta_hat - z * se_hat:.3f}, {delta_hat + z * se_hat:.3f})"
)
delta_hat: 0.022 p-value: 0.0002 95% Confidence Interval: (0.010, 0.034)
The very small p-value indicates the data strongly supports the conclusion that the means are different. Snodgrass may be Twain, but according to this metric, the writing styles are different.
(b) Now use a permutation test to avoid the use of large sample methods. What is your conclusion?
def T(data):
return np.mean(data[:m]) - np.mean(data[m:])
data = np.concatenate([twain, snodgrass])
t_obs = T(data)
B = 100000 # number of permutations to sample
count = 0
for i in range(B):
permutation = rng.permutation(data)
t_test = T(permutation)
if t_test > t_obs:
count += 1
p = count / B
print(f"p-value: {p:.6f}")
p-value: 0.000440
The conclusion is the same as that for part (a).
Let $X_1, \dots, X_n \sim N(\theta, 1)$. Consider testing
$$H_0 : \theta = 0 \text{ versus } \theta = 1.$$Let the rejection region be $R = \{x^n : T(x^n) > c \}$ where $T(x^n) = n^{-1} \sum_{i=1}^n X_i$.
(a) Find $c$ so that the test has size $\alpha$.
Solution: Let $\theta_0 = 0$. Observe that, under the null hypothesis, $T(X^n) \sim N(0, \frac{1}{n})$. \begin{align*} \alpha &= P_{\theta_0}(T(X^n) > c^*) \\ &= P_{\theta_0} \left( \frac{T(X^n)}{\sqrt{1 / n}} > c^* / \sqrt{1/n} \right) \\ &= P(Z > \sqrt{n} c^*) \\ \Rightarrow c^* &= \frac{1}{\sqrt{n}} \Phi^{-1}(1-\alpha) = n^{-1/2}z_{\alpha} \end{align*}
(b) Find the power under $H_1$, that is, find $\beta(1)$.
Under the alternative hypothesis, $T(X^n) \sim N(1, \frac{1}{n})$.
\begin{align*} \beta(1) &= P_{\theta=1}\left(T(X^n) \ge c_{\alpha}\right) \\ &= P_{\theta=1}\left(\frac{T(X^n) - 1}{\frac{1}{\sqrt{n}}} \ge \frac{c_{\alpha} - 1}{\frac{1}{\sqrt{n}}}\right) \\ &= P(Z \ge \sqrt{n}(c_{\alpha} - 1)) \\ &= 1 - \Phi(z_{\alpha} - \sqrt{n}) \end{align*}(c) Show that $\beta(1) \rightarrow 1$ as $n \rightarrow \infty$.
Solution: As $n \rightarrow \infty$, $z_{\alpha} - \sqrt{n} \rightarrow -\infty$, $\Phi(z_{\alpha} - \sqrt{n}) \rightarrow 0$, and $\beta(1) = 1 - \Phi(z_{\alpha} - \sqrt{n}) \rightarrow 1$.
Let $\hat{\theta}$ be the MLE of a parameter $\theta$ and let $\hat{se} = \{nI(\hat{\theta})\}^{-1/2}$ where $I(\theta)$ is the Fisher information. Consider testing
$$H_0 : \theta = \theta_0 \text{ versus } \theta \ne \theta_0.$$Consider the Wald test with rejection region $R = \{x^n : |Z| > z_{\alpha / 2} \}$ where $Z = (\hat{\theta} - \theta_0)/\hat{se}$. Let $\theta_1 > \theta_0$ be some alternative. Show that $\beta(\theta_1) \rightarrow 1$.
Solution:
\begin{align*} \beta(\theta_1) &= P_{\theta_1} \left( |Z| > z_{\alpha / 2} \right) \\ &= P_{\theta_1} \left( Z > z_{\alpha / 2} \right) + P_{\theta_1} \left( Z < -z_{\alpha / 2} \right) \\ &= P_{\theta_1} \left( \frac{\hat{\theta} - \theta_1}{\hat{se}} + \frac{\theta_1 - \theta_0}{\hat{se}} > z_{\alpha / 2} \right) + P_{\theta_1} \left( \frac{\hat{\theta} - \theta_1}{\hat{se}} + \frac{\theta_1 - \theta_0}{\hat{se}} < -z_{\alpha / 2} \right) \\ &= P \left( Z' > z_{\alpha / 2} - \frac{\theta_1 - \theta_0}{\hat{se}}\right) + P \left(Z' < -z_{\alpha /2 } - \frac{\theta_1 - \theta_0}{\hat{se}} \right) \tag{$Z' \sim N(0,1)$} \\ &= 1 - \Phi \left(z_{\alpha / 2} - \frac{\theta_1 - \theta_0}{\hat{se}}\right) + \Phi \left(-z_{\alpha /2 } - \frac{\theta_1 - \theta_0}{\hat{se}} \right) \end{align*}Observe that $\hat{se} \rightarrow 0$ as $n \rightarrow \infty$, meaning the latter two terms vanish.
Here are the number of elderly Jewish and Chinese women who died just before and after the Chinese Harvest Moon Festival.
Week | Chinese | Jewish |
---|---|---|
-2 | 55 | 141 |
-1 | 33 | 145 |
1 | 70 | 139 |
2 | 49 | 161 |
Compare the two mortality patterns. (Phillips and Smith (1990)).
Solution:
We first organize the data into a pd.DataFrame
object.
df = pd.DataFrame(
{
"Week": [-2, -1, 1, 2],
"Chinese": [55, 33, 70, 49],
"Jewish": [141, 145, 139, 161],
}
)
df.set_index("Week", inplace=True)
before = [-2, -1]
after = [1, 2]
chinese_before, jewish_before = df.loc[before].sum().values
chinese_after, jewish_after = df.loc[after].sum().values
We shall first investigate the hypothesis $H_0: \delta = 0$ against $H_1: \delta \neq 0$, where $\delta = p_1 - p_2$, and $p_1$ and $p_2$ are the probabilities of elderly Chinese and Jewish women who die within two weeks of the festival dying after the festival.
The MLE is $\hat{\delta} = \hat{p}_1 - \hat{p}_2$, with estimated standard error
$$\hat{se} = \sqrt{ \frac{\hat{p}_1 (1 - \hat{p}_1)}{n_1} + \frac{\hat{p}_2 (1 - \hat{p}_2)}{n_2}}$$where $n_1$ and $n_2$ are the number of Chinese and Jewish women in the sample, respectively.
p_1_hat = chinese_after / (chinese_before + chinese_after)
p_2_hat = jewish_after / (jewish_before + jewish_after)
delta_hat = p_1_hat - p_2_hat
n_1 = df["Chinese"].sum()
n_2 = df["Jewish"].sum()
se_hat = math.sqrt((p_1_hat * (1 - p_1_hat) / n_1) + (p_2_hat * (1 - p_2_hat) / n_2))
w = delta_hat / se_hat
p = 2 * norm.cdf(-np.abs(w))
print(f"p-value: {p:.4f}")
alpha = 0.05
z = norm.ppf(1 - alpha / 2)
print(
f"{100 * (1 - alpha):.0f}% Confidence Interval: ({delta_hat - z * se_hat:.3f}, {delta_hat + z * se_hat:.3f})"
)
p-value: 0.1164 95% Confidence Interval: (-0.016, 0.142)
With this test, there is a little or no evidence against $H_0$.
For both Chinese and Jewish populations, we shall conduct a $\chi^2$ test to compare $H_0: p = p_0 = \left(\frac12, \frac12\right)^T$ against $H_1: p \ne p_0$, where the first and seconds elements of $p$ refer to the proportion of a population dying prior to and after the festival, respectively.
For the Chinese population:
mu_chinese_hat = (1 / 2) * n_1
t_pearson_chinese = (chinese_before - mu_chinese_hat) ** 2 / (mu_chinese_hat) + (
chinese_after - mu_chinese_hat
) ** 2 / (mu_chinese_hat)
critical_value = chi2.ppf(q=0.95, df=1)
p_value = 1 - chi2.cdf(t_pearson_chinese, 1)
print(f"Test Statistic: {t_pearson_chinese:.3f}")
print(
f"95th percentile for chi squared with one degree of freedom: {critical_value:.3f}"
)
print(f"p-value: {p_value:.3f}")
Test Statistic: 4.643 95th percentile for chi squared with one degree of freedom: 3.841 p-value: 0.031
For the Jewish population:
mu_jewish_hat = (1 / 2) * n_2
t_pearson_jewish = (jewish_before - mu_jewish_hat) ** 2 / (mu_jewish_hat) + (
jewish_after - mu_jewish_hat
) ** 2 / (mu_jewish_hat)
critical_value = chi2.ppf(q=0.95, df=1)
p_value = 1 - chi2.cdf(t_pearson_jewish, 1)
print(f"Test Statistic: {t_pearson_jewish:.3f}")
print(
f"95th percentile for chi squared with one degree of freedom: {critical_value:.3f}"
)
print(f"p-value: {p_value:.3f}")
Test Statistic: 0.334 95th percentile for chi squared with one degree of freedom: 3.841 p-value: 0.563
With this test, we do have evidence that the probability of dying after the festival is different than before for Chinese women, but we do not have it for Jewish women.
We now perform a likelihood ratio test to compare mortality before and after the event for both groups:
Chinese | Jewish | |
---|---|---|
Before | 88 | 286 |
After | 119 | 300 |
$H_0: \theta = \theta_0$ against $H_1: \theta \ne \theta_0$. The MLE is
$$\hat{\theta} = \left( \frac{88}{793}, \frac{119}{793}, \frac{286}{793}, \frac{300}{793} \right)$$and the MLE under the null hypothesis is
$$\hat{\theta_0} = \left( \frac{103.5}{793}, \frac{103.5}{793}, \frac{293}{793}, \frac{293}{793} \right)$$theta_hat = np.array([88, 119, 286, 300]) / 793
theta_hat_0 = np.array([103.5, 103.5, 293, 293]) / 793
_lambda = 2 * np.sum(
[theta_hat[i] * 793 * math.log(theta_hat[i] / theta_hat_0[i]) for i in range(4)]
)
critical_value = chi2.ppf(q=0.95, df=3)
p_value = 1 - chi2.cdf(_lambda, df=3)
print(f"Likelihood Ratio Statistic: {_lambda:.3f}")
print(
f"95th percentile for chi squared with three degrees of freedom: {critical_value:.3f}"
)
print(f"p-value: {p_value:.3f}")
Likelihood Ratio Statistic: 4.995 95th percentile for chi squared with three degrees of freedom: 7.815 p-value: 0.172
This test reports little or no evidence that, for both groups, the mortality rates are distinct.
We finally conduct simple binomial tests for each week:
from scipy.stats import binomtest
theta_0 = n_1 / (n_1 + n_2)
p_1 = binomtest(55, 55 + 141, theta_0, alternative="two-sided").pvalue
p_2 = binomtest(33, 33 + 145, theta_0, alternative="two-sided").pvalue
p_3 = binomtest(70, 70 + 139, theta_0, alternative="two-sided").pvalue
p_4 = binomtest(49, 49 + 161, theta_0, alternative="two-sided").pvalue
results = pd.DataFrame({"p-value": [p_1, p_2, p_3, p_4]}, index=[-2, -1, 1, 2])
results
p-value | |
---|---|
-2 | 0.516380 |
-1 | 0.021061 |
1 | 0.017937 |
2 | 0.388072 |
This test suggests there is strong evidence for a difference in the mortality rates in the weeks directly preceding and following the event, but not in the other two weeks.
A randomized, double-blind experiment was conducted to assess the effectiveness of several drugs for reducing postoperative nausea. The data are as follows.
Number of Patients | Incidence of Nausea | |
---|---|---|
Placebo | 80 | 45 |
Chlorpromazine | 75 | 26 |
Dimenhydrinate | 85 | 52 |
Pentobarbital (100 mg) | 67 | 35 |
Pentobarbital (150 mg) | 85 | 37 |
(a) Test each drug versus the placebo at the 5 per cent level. Also, report the estimated odds-ratios. Summarize your findings.
(b) Use the Bonferroni and the FDR method to adjust for multiple testing.
Solution:
df = pd.DataFrame(
data={
"Number of Patients": [80, 75, 85, 67, 85],
"Incidence of Nausea": [45, 26, 52, 35, 37],
},
index=[
"Placebo",
"Chlorpromazine",
"Dimenhydrinate",
"Pentobarbital (100 mg)",
"Pentobarbital (150 mg)",
],
)
alpha = 0.05
df['p_hat'] = df['Incidence of Nausea'] / df['Number of Patients']
p_0 = df.loc['Placebo']['p_hat']
n_0 = df.loc['Placebo']['Number of Patients']
df['se_hat'] = (df['p_hat'] * (1 - df['p_hat']) / df['Number of Patients'] + p_0 * (1 - p_0) / n_0) ** (1 / 2)
df['delta_hat'] = df['p_hat'] - p_0
df['w'] = df['delta_hat'] / df['se_hat']
df['p_value'] = 2 * norm.cdf(-abs(df['w']))
z = norm.ppf(1 - alpha / 2)
df['Reject Null'] = df['p_value'] <= alpha
# se_hat = math.sqrt(p_hat * (1 - p_hat) /)
df['Estimated Odds'] = df['Incidence of Nausea'] / (df['Number of Patients'] - df['Incidence of Nausea'])
df['Estimated Odds Ratio'] = (45 / 35) / df['Estimated Odds']
df[['p_hat', 'p_value', 'Reject Null', 'Estimated Odds Ratio']]
p_hat | p_value | Reject Null | Estimated Odds Ratio | |
---|---|---|---|---|
Placebo | 0.562500 | 1.000000 | False | 1.000000 |
Chlorpromazine | 0.346667 | 0.005703 | True | 2.423077 |
Dimenhydrinate | 0.611765 | 0.520232 | False | 0.815934 |
Pentobarbital (100 mg) | 0.522388 | 0.626664 | False | 1.175510 |
Pentobarbital (150 mg) | 0.435294 | 0.099639 | False | 1.667954 |
Chlorpromazine is the only treatment for which the data provides strong evidence of a difference of performance from the placebo ($p = 0.0057$). A patient administered Chlorpromazine is 69% less likely to experience nausea than a patient with no treatment.
(b) To apply the Bonferroni Method, we reject the null hypothesis $H_{0i}$ if
$$P_i < \frac{\alpha}{m}$$$P_i$ is the $p$-value of the $i$th test, and $m$ is the number of tests. This ensures the probability of falsely rejecting any null hypothesis is less than or equal to $\alpha$.
m = 4 # four different hypothesis tests
df['Reject Null (Bonferroni)'] = df['p_value'] < alpha / m
df['Reject Null (Bonferroni)']
Placebo False Chlorpromazine True Dimenhydrinate False Pentobarbital (100 mg) False Pentobarbital (150 mg) False Name: Reject Null (Bonferroni), dtype: bool
ordered_p_values = np.sort(df['p_value'].values)
C_m = sum([(1 / i) for i in range(1, m + 1)]) # p-values are not independent, since they all
# depend on the placebo data
l = [(i * alpha) / (C_m * m) for i in range(1, m + 1)]
R = max([i for i in range(m) if ordered_p_values[i] < l[i]])
T = ordered_p_values[R]
df['Reject Null (Benjamini-Hochberg)'] = df['p_value'] <= T
df['Reject Null (Benjamini-Hochberg)']
Placebo False Chlorpromazine True Dimenhydrinate False Pentobarbital (100 mg) False Pentobarbital (150 mg) False Name: Reject Null (Benjamini-Hochberg), dtype: bool
Both approaches for controlling for multiple testing give the same result as directly evaluating the p-values.
Let $X_1, \dots, X_n \sim \text{Poisson}(\lambda)$.
(a) Let $\lambda_0 > 0$. Find the size $\alpha$ Wald test for
$$H_0 : \lambda = \lambda_0 \text{ versus } H_1 : \lambda \ne \lambda_0$$Solution: If $X \sim \text{Poisson}(\lambda)$, then $E(X) = Var(X) = \lambda$. Hence, our estimated standard error is $$\hat{\text{se}} = \sqrt{\bar{X} / n}$$ and, under $H_0$, $$\frac{\bar{X} - \lambda_0}{\hat{\text{se}}} \leadsto N(0,1).$$
Thus, the size $\alpha$ Wald test is: reject $H_0$ when $|W| > z_{\alpha / 2}$, where
$$W = \frac{\bar{X} - \lambda_0}{ \sqrt{\bar{X} / n}}.$$(b) (Computer experiment.) Let $\lambda_0 = 1, n = 20$ and $\alpha = 0.05$. Simulate $X_1, \dots, X_m \sim \text{Poisson}(\lambda_0)$ and perform the Wald test. Repeat many times and count how often you reject the null. How close is the Type I error rate to 0.05?
lambda_0 = 1
n = 20
k = 1000000
alpha = 0.05
z = norm.ppf(1 - alpha / 2)
np.random.seed(42)
X = np.random.poisson(lam=lambda_0, size=(n, k))
X_bar = np.mean(X, axis=0)
se = np.sqrt(X_bar / n)
W = (X_bar - lambda_0) / se
type_one_error_rate = np.sum(np.abs(W) > z) / k
print(f"The type 1 error rate is {type_one_error_rate:.4f} ({k} trials).")
The type 1 error rate is 0.0524 (1000000 trials).
The error rate is within 5% of the target error rate.
Let $X_1, \dots, X_n \sim N(\mu, \sigma ^ 2)$. Construct the likelihood ratio test for
$$H_0 : \mu = \mu_0 \text{ versus } \mu \ne \mu_0.$$Compare to the Wald test.
Solution:
Recall the likelihood function: \begin{align*} \mathcal{L}(\mu ; X_1,\dots, X_n) = \mathcal{L}(\mu) &= \prod_{i=1}^n \frac{1}{\sqrt{2 \pi \sigma^2}} \exp{\left[-(X_i - \mu)^2 / (2\sigma^2)\right]} \\ &= \left(2 \pi \sigma ^ 2 \right) ^ {-(n / 2)} \exp \left[ \sum_{i = 1} ^ n -(X_i - \mu)^2 / (2\sigma^2) \right], \end{align*} which is maximized by $\mu = \bar{X}$. The maximizer in $\Theta_0 = \{ \mu_0 \}$ is trivially $\mu_0$. Thus, the likelihood statistic is:
\begin{align*} \lambda = 2 \log \left(\frac{\mathcal{L}(\bar{X})}{\mathcal{L}(\mu_0)}\right) &= \frac{1}{\sigma^2} \left[ \sum_{i = 1} ^ n (X_i - \mu_0)^2-(X_i - \bar{X})^2 \right] \\ &= \frac{1}{\sigma^2} \left[ \sum_{i = 1} ^ n X_i^2 -2\mu_0X_i + \mu_0^2 - X_i^2 + 2X_i\bar{X} - \bar{X}^2 \right] \\ &= \frac{1}{\sigma^2} \left[ \sum_{i = 1} ^ n \mu_0^2 - \bar{X}^2 -2X_i(\mu_0 - \bar{X}) \right] \\ &= \frac{1}{\sigma^2} \left[ n\mu_0^2 -n\bar{X}^2 - 2n\bar{X}\mu_0 + 2n\bar{X}^2 \right] \\ &= \frac{n}{\sigma^2} \left[\mu_0^2 -2\bar{X}\mu_0 + \bar{X}^2 \right] \\ & = \frac{(\mu_0 - \bar{X})^2}{\sigma^2 / n} \end{align*}By Theorem 10.22, under $H_0$, $\lambda \leadsto \chi^2_1$, i.e. the distribution constructed from squaring a single standard normal distribution. Thus, the size $\alpha$ likelihood ratio test: reject $H_0$ when $\lambda > \chi^2_{1, \alpha}$.
Meanwhile, the MLE for the mean is $\bar{X}$, which we know has variance $\sigma^2 / n$. Thus, the Wald test statistic is
$$W = \frac{\mu_0 - \bar{X}}{\sqrt{\sigma^2 / n}} = \sqrt{\lambda}$$which converges under $H_0$ to $N(0,1)$. The tests are equivalent.
mu = math.pi
mu_0 = 3
sigma = math.e
n = 1000
k = 1000
alpha = 0.05
X = np.random.normal(loc=mu, scale=sigma, size=(n, k))
X_bar = np.mean(X, axis=0)
lam = ((mu_0 - X_bar) ** 2) / ((sigma ** 2) / n)
W = (mu_0 - X_bar) / math.sqrt(sigma ** 2 / n)
z_w = norm.ppf(1 - alpha / 2)
z_lr = chi2.ppf(1-alpha, df=1)
# Wald test and Likelihood ratio test give the same results:
assert(np.array_equal((np.abs(W) > z_w), (lam > z_lr)))
Let $X_1, \dots, X_n \sim N(\mu, \sigma^2)$. Construct the likelihood ratio test for $$H_0 : \sigma = \sigma_0 \text{ versus } H_1 : \sigma \ne \sigma_0$$ Compare to the Wald test.
Solution:
We first seek the MLE of the standard deviation of a normal distribution with a known mean. We have the log-likelihood function:
$$\mathcal{l}(\sigma) = -\frac{n \log (2 \pi \sigma^2)}{2} - \frac{1}{2\sigma^2} \sum_{i = 1} ^ n (X_i - \mu)^2$$with derivative:
$$\frac{d}{d\sigma} \mathcal{l}(\sigma) = -\frac{n}{\sigma} + \frac{1}{\sigma^3} \sum_{i = 1} ^ n (X_i - \mu)^2$$Equating the derivative with 0, we get the MLE:
$$\hat{\sigma} = \sqrt{\frac{1}{n} \sum_{i = 1} ^ n (X_i - \mu)^2}$$Computing the likelihood ratio statistic:
\begin{align*} \lambda &= 2 \cdot \log \left( \frac{\mathcal{L}(\hat{\sigma})}{\mathcal{L}(\sigma_0)} \right) \\ &= 2 \cdot \left[ \mathcal{l}(\hat{\sigma}) - \mathcal{l}(\sigma_0) \right] \\ &= n \left[\left(\frac{\hat{\sigma}}{\sigma_0}\right)^2 - 2 \cdot \log \left( \frac{\hat{\sigma}}{\sigma_0} \right) - 1 \right] \tag{$\hat{\sigma}^2 =n\sum_{i=1}^n (X_i - \mu)^2$} \\ \end{align*}The size $\alpha$ likelihood ratio test rejects the null hypothesis if $\lambda$ exceeds $\chi^2_{1, \alpha}$.
For the Wald test, we must compute the estimated standard error of the MLE, $\hat{\sigma}$:
$$ \hat{\text{se}} = \sqrt{1 / I_n(\hat{\sigma})}$$where $I_n(\sigma) = n I(\sigma)$ is the Fisher information, and
$$I(\sigma) = - \mathbb{E}_{\sigma} \left( \frac{d^2 \log f(X, \sigma)}{d \sigma} \right) $$where
$$ f(X, \sigma) = \frac{1}{\sqrt{2 \pi \sigma ^2}} \exp \left[ -(X - \mu) ^2 / (2 \sigma ^2) \right].$$So,
\begin{align*} \log f(X, \sigma) &= -\frac{\log(2 \pi \sigma^2)}{2} - \frac{(X-\mu)^2}{2\sigma^2}, \\ \frac{d \log f(X, \sigma)}{d\sigma} &= s(X, \sigma) = \frac{(X-\mu)^2}{\sigma ^3} - \frac{1}{\sigma}, \\ \frac{d^2 \log f(X, \sigma)}{d\sigma^2} &= s'(X, \sigma) = \frac{1}{\sigma^2} - \frac{3(X-\mu)^2}{\sigma^4}, \\ I(\sigma) &= \frac{3}{\sigma^4}\mathbb{E}((X - \mu)^2) - \frac{1}{\sigma^2} = \frac{2}{\sigma^2}, \\ I_n(\sigma) &= \frac{2n}{\sigma^2}, \\ \hat{\text{se}} &= \sqrt{\frac{\hat{\sigma}^2}{2n}} \end{align*}and the Wald statistic is therefore:
$$W = \frac{\hat{\sigma} - \sigma_0}{\hat{\sigma} / \sqrt{2n}}$$Checking the above results via simulation:
mu = 3
sigma_0 = 6
n = 1000
k = 10000
X = np.random.normal(loc=mu, scale=sigma_0, size=(n,k))
sigma_hat = np.sqrt(np.sum(np.power(X - mu, 2), axis=0) / n)
lam = n * (-2 * np.log(sigma_hat / sigma_0) + np.power(sigma_hat / sigma_0, 2) - 1)
Z = np.arange(0, 10, 0.1)
chi2_pdf = chi2.pdf(Z, df=1)
_ = plt.hist(lam, bins=50, histtype='step', density=True)
_ = plt.plot(Z, chi2_pdf)
_ = plt.ylim(0,1)
se_hat = sigma_hat / np.sqrt(2 * n)
W = (sigma_hat - sigma_0) / se_hat
Z = np.arange(-4, 4, 0.01)
norm_pdf = norm.pdf(Z, loc=0, scale=1)
_ = plt.hist(W, bins=100, histtype='step', density=True)
_ = plt.plot(Z, norm_pdf)
Let $X \sim \text{Binomial}(n, p)$. Construct the likelihood ratio test for
$$H_0 : p = p_0 \text{ versus } H_1 : p \ne p_0.$$Compare to the Wald test.
Solution:
Since $$\mathcal{L}(X, n, p) = \binom{n}{X}p^X (1-p)^{n-X}$$ the log likelihood is: $$\log \mathcal{L}(X, n, p) = \log \binom{n}{X} + X \log(p) + (n - X) \log(1-p),$$ which is maximized by $\hat{p} = X / n$. The likelihood ratio statistic is given by:
\begin{align*} \lambda &= -2 \cdot \log\left( \frac{\mathcal{L}(X, \hat{p})}{\mathcal{L}(X, p_0)} \right) \\ &= -2 \cdot \left( X \log \left(\frac{\hat{p}}{p_0} \right) + (n - X) \log \left(\frac{1-\hat{p}}{1-p_0} \right) \right). \\ \end{align*}The standard error of the MLE is $\sqrt{\hat{p}(1-\hat{p}) / n }$. This can be derived by recognizing a draw from $\text{Binomial}(n,p)$ is equivalent to $n$ draws from $\text{Bernoulli}(p)$, and applying the Fisher information method. The Wald statistic is then:
$$W = \sqrt{n} \frac{(\hat{p} - p_0)}{\sqrt{\hat{p}(1-\hat{p})}}$$Let $\theta$ be a scalar parameter and suppose we test
$$H_0 : \theta = \theta_0 \text{ versus } H_1 : \theta \ne \theta_0.$$Let $W$ be the Wald test statistic and let $\lambda$ be the likelihood ratio test statistic. Show that these tests are equivalent in the sense that
$$\frac{W^2}{\lambda} \xrightarrow{P} 1$$as $n \to \infty$. Hint: Use a Taylor expansion of the log-likelihood $\mathcal{l}(\theta)$ to show that
$$\lambda \approx \left(\sqrt{n}(\hat{\theta} - \theta_0) \right) ^ 2 \left(-\frac{1}{n} \mathcal{l}''(\hat{\theta}) \right).$$Solution:
Assuming the log-likelihood is twice differentiable at $\hat{\theta}$, it admits the Taylor expansion:
\begin{align*} \mathcal{l}(\theta_0) &= \mathcal{l}(\hat{\theta}) + \mathcal{l}'(\hat{\theta})(\theta_0 - \hat{\theta}) + \mathcal{l}''(\hat{\theta})\frac{(\theta_0 - \hat{\theta})^2}{2} + \mathcal{o}((\theta_0 - \hat{\theta})^3) \\ &\approx \mathcal{l}(\hat{\theta}) + \mathcal{l}'(\hat{\theta})(\theta_0 - \hat{\theta}) + \mathcal{l}''(\hat{\theta})\frac{(\theta_0 - \hat{\theta})^2}{2} \tag{for $\theta_0$ near $\hat{\theta}$}\\ &= \mathcal{l}(\hat{\theta}) + \mathcal{l}''(\hat{\theta})\frac{(\theta_0 - \hat{\theta})^2}{2} \tag{$\hat{\theta}$ is an extremum}\\ \end{align*}Thus, the likelihood ratio statistic can be expressed:
\begin{align*} \lambda &= 2 \cdot \log \left( \frac{\mathcal{L}(\hat{\theta})}{\mathcal{L}(\theta_0)}\right) \\ &= 2 \cdot (\mathcal{l}(\hat{\theta}) - \mathcal{l}(\theta_0)) \\ &\approx 2 \cdot \left( -\mathcal{l}''(\hat{\theta})\frac{(\theta_0 - \hat{\theta})^2}{2} \right) \\ &= \left(\sqrt{n}(\hat{\theta} - \theta_0) \right) ^ 2 \left(-\frac{1}{n} \mathcal{l}''(\hat{\theta}) \right). \end{align*}Meanwhile, the Wald Statistic is:
\begin{align*} W &= \frac{\hat{\theta} - \theta_0}{\hat{\text{se}}} \\ &= \sqrt{I(\hat{\theta})n}(\hat{\theta} - \theta_0) \end{align*}Thus,
\begin{align*} \frac{W^2}{\lambda} &= \frac{I(\hat{\theta})n(\hat{\theta} - \theta_0) ^2}{\left(\sqrt{n}(\hat{\theta} - \theta_0) \right) ^ 2 \left(-\frac{1}{n} \mathcal{l}''(\hat{\theta}) \right)} \\ &= -\frac{I(\hat{\theta})}{\mathcal{l}''(\hat{\theta}) / n} \end{align*}Note that
$$\mathcal{l}''(\hat{\theta}) = \sum_{i} \frac{d}{d\theta^2} \log f(X, \theta) \bigg\rvert_{\hat{\theta}}$$and
$$I(\hat{\theta}) = -\mathbb{E}_{\hat{\theta}}(\frac{d^2}{d\theta^2} \log f(X, \theta))$$Thus, by the Weak Law of Large Numbers, $\mathcal{l}''(\hat{\theta}) \xrightarrow{P} I(\hat{\theta})$. By Theorem 5.5d, the result follows.