import numpy as np
import pandas as pd
import math
from numpy.random import default_rng
rng = default_rng(42)
from scipy.stats import norm
import matplotlib as mpl
from matplotlib import pyplot as plt
import plotly.graph_objects as go
from tqdm.notebook import tqdm
mpl.rcParams['font.size'] = 18
Let $X_1, \dots, X_n \sim \text{Gamma}(\alpha,\beta)$. Find the method of moments estimator for $\alpha$ and $\beta$.
Solution:
The first and second moments of $X_1$ are $\alpha \beta$ and $\alpha ^ 2 \beta ^ 2 + \alpha \beta ^ 2$, respectively. We, thus, solve the system:
\begin{align*} \hat{\alpha} \hat{\beta} &= \frac1n \sum_{i=1}^n X_i \\ \hat{\alpha} ^ 2 \hat{\beta} ^ 2 + \hat{\alpha} \hat{\beta} ^ 2&= \frac1n \sum_{i=1}^n X_i^2 \\ \end{align*}yielding
\begin{align*} \hat{\alpha} &= \frac{\bar{X}^2}{\frac1n \sum_{i=1}^n X_i^2 - \bar{X}^2}\\ \hat{\beta} &= \frac{\frac1n \sum_{i=1}^n X_i^2 - \bar{X}^2}{\bar{X}}\\ \end{align*}alpha = 54
beta = 11
n = 1000
X = rng.gamma(alpha, beta, size = n)
X_bar = np.mean(X)
X2_bar = np.mean(np.power(X, 2))
alpha_hat = X_bar ** 2 / (X2_bar - X_bar ** 2)
beta_hat = (X2_bar - X_bar ** 2) / X_bar
print(f"True parameter values: alpha: {alpha:.0f} beta: {beta:.0f}")
print(f"Method of Moments estimators (n={n}): alpha_hat: {alpha_hat:.3f}, beta_hat: {beta_hat:.3f}")
True parameter values: alpha: 54 beta: 11 Method of Moments estimators (n=1000): alpha_hat: 54.032, beta_hat: 10.932
Let $X_1, \dots, X_n \sim \text{Uniform}(a,b)$ where $a$ and $b$ are unknown parameters and $a < b$.
(a) Find the method of moments estimators for $a$ and $b$.
(b) Find the MLE $\hat{a}$ and $\hat{b}$.
(c) Let $\tau = \int x \, dF(x)$. Find the MLE of $\tau$.
(d) Let $\hat{\tau}$ be the MLE of $\tau$. Let $\tilde{\tau}$ be the nonparametric plug-in estimator of $\tau = \int x \, dF(x)$. Suppose that $a=1, b=3$ and $n=10$. Find the MSE of $\hat{\tau}$ by simulation. Find the MSE of $\tilde{\tau}$ analytically. Compare.
Solution:
(a) We have the system of equations
\begin{align*} \mu = \frac{\hat{a} + \hat{b}}{2} &= \sum_{i=1}^n X_i\\ \mu^2 + \sigma^2 = \left(\frac{\hat{a} + \hat{b}}{2}\right)^2 + \frac{(\hat{b}-\hat{a})^2}{12} &= \sum_{i=1}^n X_i^2 \end{align*}yielding
\begin{align*} \hat{a} &= \bar{X} - \sqrt{3 \left( \sum_{i=1}^n X_i^2 - \bar{X}^2 \right)} \\ \hat{b} &= \bar{X} + \sqrt{3 \left( \sum_{i=1}^n X_i^2 - \bar{X}^2 \right)} \end{align*}a = 3
b = 7
n = 100000
X = rng.uniform(low=a, high=b, size = n)
X_bar = np.mean(X)
X2_bar = np.mean(np.power(X, 2))
a_hat = X_bar - np.sqrt(3 * (X2_bar - X_bar ** 2))
b_hat = X_bar + np.sqrt(3 * (X2_bar - X_bar ** 2))
print(f"True parameter values: a: {a:.0f} b: {b:.0f}")
print(f"Method of Moments estimators (n={n}): a_hat: {a_hat:.3f}, b_hat: {b_hat:.3f}")
True parameter values: a: 3 b: 7 Method of Moments estimators (n=100000): a_hat: 3.005, b_hat: 7.002
(b) By reasoning analogous to that in Example 9.12, the MLEs are $\hat{a} = \min\{X_i\}_{i=1}^n$ and $\hat{b} = \max\{X_i\}_{i=1}^n$. In particular, let
$$\mathcal{L}_n(a, b) = \prod_{i=1}^n f(X_i, a, b)$$be the likelihood function, where
\begin{align*} f(X_i, a, b) = \begin{cases} \frac{1}{b-a} & a \le X_i \le b \\ 0 & \text{otherwise} \end{cases} \end{align*}Observe that if $a > \hat{a}$, then there is a $j$ such that $X_j < a$, and thus $f(X_j, a, b) = 0$, meaning $\mathcal{L}_n(a,b) = 0$. A similar argument shows that if $b < \hat{b}$, $\mathcal{L}_n(a,b) = 0$. If $a < \hat{a}$ and $b > \hat{b}$, the likelihood function is $\mathcal{L}_n(a, b) = \left(\frac{1}{b-a}\right)^n$, which is maximized when $a=\hat{a}$ and $b=\hat{b}$.
(c) Note that $\tau = \frac{a + b}{2}$, so, by equivariance of the MLE, its MLE is $\frac{\hat{a} + \hat{b}}{2}$.
(d) The nonparametric plug in estimator for $\tau$ is $\bar{X}$, which is an unbiased estimator of the mean with variance $\sigma^2 / n$, where $\sigma^2 = (b-a)^2 / 12$. By the bias-variance decomposition, the mean squared error is thus $(b-a)^2 / (12n) = 1 / 30 \approx 0.033$. Meanwhile,
n = 10
k = 1000000 # one million simulations
a = 1
b = 3
mu = (a + b) / 2
X = rng.uniform(low=a, high=b, size=(k,n))
a_hat = np.min(X, axis=1)
b_hat = np.max(X, axis=1)
tau_hat= (a_hat + b_hat) / 2
mse = np.mean(np.power(tau_hat - mu, 2))
print(f"MSE of MLE for tau: {mse:.3f}")
MSE of MLE for tau: 0.015
Let $X_1, \dots, X_n \sim N(\mu, \sigma^2)$. Let $\tau$ be the .95 percentile, i.e. $P(X < \tau) = 0.95$.
(a) Find the MLE of $\tau$.
(b) Find an expression for an approximate $1 - \alpha$ confidence interval for $\tau$.
(c) Suppose the data are
3.23, -2.50, 1.88, -0.68, 4.43, 0.17,
1.03, -0.07, -0.01, 0.76, 1.76, 3.18,
0.33, -0.31, 0.30, -0.61, 1.52, 5.43,
1.54, 2.28, 0.42, 2.33, -1.03, 4.00,
0.39
Find the MLE $\hat{\tau}$. Find the standard error using the delta method. Find the standard error using the parametric bootstrap.
Solution:
(a) Note that $\tau = \mu + \Phi^{-1}(.95)\sigma = \mu + z_{0.05} \sigma$, where $\Phi(t)$ is the standard normal CDF. By equivariance of the MLE, we have $\hat{\tau} = \hat{\mu} + z_{0.05}\hat{\sigma}$, where $\hat{\mu} = \bar{X}$ and $\hat{\sigma} = S = \sqrt{n^{-1} \sum_i (X_i - \bar{X})^2}$ are the MLEs of $\mu$ and $\sigma$, respectively. All together,
$$\hat{\tau} = \bar{X} + z_{0.05}\sqrt{n^{-1} \sum_i (X_i - \bar{X})^2}.$$(b) Using the multiparameter delta method to find the standard error. The Fisher Information Matrix is (see Exercise 8)
\begin{align*} I_n(\mu, \sigma) = \begin{bmatrix} \frac{n}{\sigma^2} & 0 \\ 0 & \frac{2n}{\sigma^2} \end{bmatrix}. \end{align*}Hence,
\begin{align*} J_n = I_n^{-1}(\mu, \sigma) = \frac1n \begin{bmatrix} \sigma^2 & 0 \\ 0 & \frac{\sigma^2}{2} \end{bmatrix}. \end{align*}We have $\tau = g(\mu, \sigma) = \mu + z_{0.05} \sigma$, so
\begin{align*} \nabla g = \begin{bmatrix} 1 \\ z_{0.05} \end{bmatrix} \end{align*}and thus
\begin{align*} \hat{\textrm{se}}(\hat{\tau}) &= \sqrt{ (\hat{\nabla} g)^T \hat{J}_n (\hat{\nabla}g)} \\ &= \hat{\sigma} \sqrt{\frac{1 + z_{0.05}^2 / 2}{n}} \end{align*}and a $1-\alpha$ confidence interval is given by
\begin{align*} (\hat{\tau} - z_{\alpha / 2} \hat{\textrm{se}}(\hat{\tau}), \hat{\tau} + z_{\alpha / 2} \hat{\textrm{se}}(\hat{\tau})). \end{align*}(c)
X = np.array([
3.23, -2.50, 1.88, -0.68, 4.43, 0.17,
1.03, -0.07, -0.01, 0.76, 1.76, 3.18,
0.33, -0.31, 0.30, -0.61, 1.52, 5.43,
1.54, 2.28, 0.42, 2.33, -1.03, 4.00,
0.39
])
# computing the MLE of tau, tau_hat = x_bar + z_{0.05} * s
def tau(X):
x_bar = np.mean(X)
s = np.std(X)
z = norm.ppf(0.95)
return x_bar + z * s
tau_hat = tau(X)
print(f'MLE of tau: {tau_hat:.3f}')
# computing standard error of tau_hat
# using the delta method
n = X.size
x_bar = np.mean(X)
s = np.std(X)
z = norm.ppf(0.95)
se_delta = s * math.sqrt((1 + ((z ** 2) / 2)) / n)
print(f'Standard error (delta method): {se_delta:.3f}')
# using the parametric bootstrap
B = 1000
tau_boot = np.empty(B)
x_bar = np.mean(X)
s = np.std(X)
for i in range(B):
X_boot = rng.normal(loc=x_bar, scale=s, size=n)
tau_boot[i] = tau(X_boot)
se_boot = math.sqrt(np.sum(np.power(tau_boot - tau_hat, 2)) / B)
print(f'Standard error (bootstrap method): {se_boot:.3f}')
MLE of tau: 4.180 Standard error (delta method): 0.558 Standard error (bootstrap method): 0.548
Let $X_1, \dots, X_n \sim \text{Uniform}(0, \theta)$. Show that the MLE is consistent. Hint: Let $Y = \max \{ X_1, \dots, X_n \}$. For any $c$, $P(Y < c) = P(X_1 < c, X_2 < c, \dots, X_n < c) = P(X_1 < c)P(X_2 < c)\dots P(X_n < c)$.
Solution:
The MLE of $\theta$ is $\hat{\theta}_n = \max \{ X_1, \dots, X_n \}$. Observe that since $0 \le X_i \le \theta$ for all $i \in \{1, \dots, n\}$, $0 \le \hat{\theta}_n \le \theta$, and therefore $P(|\hat{\theta}_n - \theta| > \epsilon) = P(\hat{\theta}_n < \theta - \epsilon)$, and, if $\epsilon \ge \theta$, then $P(\hat{\theta}_n < \theta - \epsilon) = 0$. Let $0 < \epsilon < \theta$. We have
\begin{align*} P(\hat{\theta}_n < \theta - \epsilon) &= \prod_{i=1}^n P(X_i < \theta - \epsilon) \tag{hint} \\ &= \left( \frac{\theta - \epsilon}{\theta}\right) ^ n \rightarrow 0 \text{ as } n \rightarrow \infty \end{align*}Thus $\hat{\theta}_n \xrightarrow{P} \theta$.
Let $X_1, \dots, X_n \sim \text{Poisson}(\lambda)$. Find the method of moments estimator, the maximum likelihood estimator and the Fisher information $I(\lambda)$.
Solution:
The first moment of the Poisson distribution is $\lambda$, and the first sample moment is $\bar{X}$. Thus, the method of moments estimator is $\bar{X}$.
The likelihood function is:
\begin{align*} \mathcal{L}_n(\lambda) &= \prod_{i=1}^n e^{-\lambda} \frac{\lambda ^ {X_i}}{X_i !} \\ &= e^{-n\lambda} \prod_{i=1}^n \frac{\lambda ^ {X_i}}{X_i !} \end{align*}and, hence, the log-likelihood function is:
\begin{align*} \mathcal{l}_n(\lambda) &= \log(\mathcal{L}_n(\lambda)) \\ &= -n \lambda + \log(\lambda) \sum_{i=1}^n X_i - \sum_{i=1}^n \log(X_i !). \end{align*}Setting $\frac{d}{d \lambda} \mathcal{l}_n(\lambda) \mid_{\hat{\lambda}} = 0$, we have $\hat{\lambda} = \bar{X}$; i.e., the MLE is also $\bar{X}$.
We now compute the Fisher information. The score function is $s(x;\lambda) = \frac{\partial}{\partial \lambda} \log(f(x;\lambda))$, where $f(x;\lambda) = e^{-\lambda} \frac{x^{\lambda}}{x!}$, hence
\begin{align*} s(X; \lambda) = \frac{X_i}{\lambda} - 1 \end{align*}and
\begin{align*} -s'(X; \lambda) = \frac{X_i}{\lambda ^ 2} \end{align*}so
\begin{align*} I(\lambda) = E_{\lambda}(-s'(X;\lambda)) = \frac{1}{\lambda} \end{align*}and the Fisher information is thus $I_n(\lambda) = nI(\lambda) = \frac{n}{\lambda}$.
Let $X_1, \dots, X_n \sim N(\theta, 1)$. Define
\begin{align*} Y_i = \begin{cases} 1 & \text{if } X_i > 0 \\ 0 & \text{if } X_i \le 0 \end{cases}. \end{align*}Let $\psi = P(Y_1 = 1)$.
(a) Find the maximum likelihood estimator $\hat{\psi}$ of $\psi$.
We have
\begin{align*} \psi &= P(Y_1 = 1) \\ &= P(X_1 > 0) \\ &= P(X_1 - \theta > -\theta) \\ &= 1 - P(Z \le -\theta) \\ &= 1 - \Phi(-\theta) \end{align*}where $Z$ is a standard normal r.v., and $\Phi$ is the standard normal CDF. The MLE of the mean of a normal distribution is the sample mean, and, thus, by equivariance of the MLE, $\hat{\psi} = 1 - \Phi(-\bar{X}) = \Phi(\bar{X})$.
(b) Find an approximate 95 percent confidence interval for $\psi$.
From Example 9.21, we know that the Fisher information is $I_n(\theta) = n$, and thus we have the standard error $\hat{se}(\hat{\theta}) = \sqrt{1 / I_n(\hat{\theta}_n)} = n^{-1/2}$. By the delta method, the standard error of $\hat{\psi}$ is $|g'(\hat{\theta})|\hat{se}(\hat{\theta}_n)$. Here, $g(\hat{\theta}) = 1 - \Phi(-\hat{\theta})$, so $|g'(\hat{\theta})| = \phi(-\hat{\theta}) = \phi(\hat{\theta})$, where $\phi$ is the standard normal PDF, and the standard error of $\hat{\psi}$ is $\phi(\bar{X})n^{-1/2}$.
A 95 percent confidence interval is then given by:
$$(\Phi(\bar{X}) - z_{0.025}\phi(\bar{X})n^{-1/2}, \Phi(\bar{X}) + z_{0.025}\phi(\bar{X})n^{-1/2})$$theta = 0.3
n = 100
X = rng.normal(loc=theta, scale=1, size=n)
X_bar = np.mean(X)
psi_hat = norm.cdf(X_bar)
alpha = 0.05
z = norm.ppf(1 - alpha / 2)
se = norm.pdf(X_bar) * n ** (-0.5)
print(f"95% CI for psi: ({psi_hat - z * se:.3f},{psi_hat + z * se:.3f})")
95% CI for psi: (0.486,0.641)
(c) Define $\tilde{\psi} = (1 / n) \sum_i Y_i$. Show that $\tilde{\psi}$ is a consistent estimator of $\psi$.
Observe that $Y_i$ is a Bernoulli r.v. with expectation $\Phi(\theta)$, so, by the (weak) Law of Large Numbers, $\tilde\psi = \bar{Y} \xrightarrow{P} \Phi(\theta) = \psi$.
(d) Compute the asymptotic relative efficiency of $\tilde{\psi}$ to $\hat{\psi}$. Hint: Use the delta method to get the standard error of the MLE. Then compute the standard error (i.e. the standard deviation) of $\tilde{\psi}$.
We know the standard error of the MLE, $\hat{\psi}$, to be $\phi(\bar{X})n^{-1/2}$.
Meanwhile, the variance of $\tilde{\psi}$ is
\begin{align*} Var(\tilde{\psi}) &= Var(\frac{1}{n} \sum_i Y_i) \\ &= \frac{1}{n^2} \sum_i Var(Y_i) \\ &= \frac{1}{n} \Phi(\theta)(1 - \Phi(\theta)). \end{align*}Thus, the asymptotic relative efficiency is
\begin{align*} \text{ARE}(\tilde{\psi}, \hat{\psi}) = \frac{\phi(\theta)^2}{\Phi(\theta)(1 - \Phi(\theta))} \end{align*}(e) Suppose that the data are not really normal. Show that $\hat{\psi}$ is not consistent. What, if anything, does $\hat{\psi}$ converge to?
Now, the true value of $\psi$ is $\psi = P(Y_1 = 1) = P(X_1 > 0) = 1 - F_X(0)$, where $F$ is the CDF of the data.
By the law of large numbers, $\bar{X} \xrightarrow{P} \mu$, where $\mu_X$ is the mean of the data. Then, by 5.5f, we have $\hat{\psi} = \Phi(\bar{X}) \xrightarrow{P} \Phi(\mu_X)$. So $\hat{\psi}$ converges in probability to $\Phi(\mu_X)$, which is not necessarily equal to $1 - F_X(0)$.
(Comparing two treatments.) $n_1$ people are given treatment 1 and $n_2$ people are given treatment 2. Let $X_1$ be the number of people on treatment 1 who respond favorably to the treatment and let $X_2$ be the number of people on treatment 2 who respond favorably. Assume $X_1 \sim \text{Binomial}(n_1, p_1)$ and $X_2 \sim \text{Binomial}(n_2, p_2)$. Let $\psi = p_1 - p_2$.
(a) Find the MLE $\hat{\psi}$ for $\psi$.
We have
\begin{align*} \mathcal{L}(p_1,p_2) &= {n_1 \choose x_1} p_1 ^ {x_1} (1 - p_1) ^ {n_1 - x_1} {n_2 \choose x_2} p_2 ^ {x_2} (1 - p_2) ^ {n_1 - x_2} \\ \Rightarrow \mathcal{l}(p_1,p_2) &= \sum_{i=1}^2 \log {n_i \choose x_i} + x_i \log(p_i) + (n_i - x_i) \log(1 - p_i) \end{align*}Equating partials with 0, we find the MLEs for $p_1$ and $p_2$, $\hat{p}_1 = \frac{X_1}{n_1}$ and $\hat{p}_2 = \frac{X_2}{n_2}$. By equivariance of the MLE, $\hat{\psi} = \hat{p}_1 - \hat{p}_2 = \frac{X_1}{n_1} + \frac{X_2}{n_2}$.
(b) Find the Fisher information matrix, $I(p_1, p_2)$.
The Fisher information matrix is the negative expected Hessian of the log-likelihood. We have
\begin{align*} \frac{\partial \mathcal{l}(p_1, p_2)}{\partial p_i} = \frac{x_i}{p_i} + \frac{x_i - n_i}{1 - p_i}, \end{align*}\begin{align*} H_{ii} = \frac{\partial^2 \mathcal{l}(p_1, p_2)}{\partial^2 p_i} = -\frac{x_i}{p_i^2} + \frac{x_i - n_i}{(1 - p_i)^2} \end{align*}\begin{align*} H_{12} = H_{21} = \frac{\partial^2 \mathcal{l}(p_1, p_2)}{\partial p_1 \partial p_2} = \frac{\partial^2 \mathcal{l}(p_1, p_2)}{\partial p_2 \partial p_1} = 0 \end{align*}so, \begin{align*} E[H_{ii}] &= -\frac{n_i}{p_i} + \frac{n_i p_i - n_i}{(1 - p_i)^2} = - \frac{n_i}{p_i(1-p_i)} \\ E[H_{12}] &= E[H_{21}] = 0\\ \end{align*} and the Fisher Information Matrix is then \begin{align*} I(p_1, p_2) = \begin{bmatrix} \frac{n_1}{p_1(1-p_1)} & 0 \\ 0 & \frac{n_2}{p_2(1-p_2)} \\ \end{bmatrix}. \end{align*}
(c) Use the multiparameter delta method to find the asymptotic standard error of $\hat{\psi}$.
We have the inverse of the Fisher Information Matrix:
\begin{align*} J = J(p_1, p_2) = I^{-1}(p_1, p_2) = \begin{bmatrix} \frac{p_1(1-p_1)}{n_1} & 0 \\ 0 & \frac{p_2(1-p_2)}{n_2} \\ \end{bmatrix}. \end{align*}Since $\psi = g(p_1, p_2) = p_1 - p_2$,
\begin{align*} \nabla g = \begin{bmatrix} 1 \\ -1 \end{bmatrix}. \end{align*}Thus the estimated standard error of $\hat{\psi}$ is
\begin{align*} \hat{\text{se}}(\hat{\psi}) &= \sqrt{(\hat{\nabla} g)^T \hat{J} (\hat{\nabla} g)} \\ &= \left( \frac{\hat{p_1}(1-\hat{p_1})}{n_1} + \frac{\hat{p_2}(1-\hat{p_2})}{n_2} \right) ^ {1 / 2} \end{align*}(d) Suppose that $n_1 = n_2 = 200$, $X_1 = 160$ and $X_2 = 148$. Find $\hat{\psi}$. Find an approximate 90 percent confidence interval for $\psi$ using (i) the delta method and (ii) the parametric bootstrap.
n_1 = 200
n_2 = n_1
X_1 = 160
X_2 = 148
p_1_hat = X_1 / n_1
p_2_hat = X_2 / n_2
psi_hat = p_1_hat - p_2_hat
alpha = 0.10
z = norm.ppf(1 - alpha / 2)
se_delta = math.sqrt((p_1_hat * (1 - p_1_hat) / n_1) + (p_2_hat * (1 - p_2_hat) / n_2))
print(f"90% confidence interval (delta method): ({psi_hat - z * se_delta:.3f}, {psi_hat + z * se_delta:.3f})")
# parametric bootstrap method
B = 10000
psi_boot = np.empty(B)
for i in range(B):
X_1_boot = rng.binomial(n_1, p_1_hat)
X_2_boot = rng.binomial(n_2, p_2_hat)
psi_boot[i] = X_1_boot / n_1 + X_2_boot / n_2
se_boot = math.sqrt(np.var(psi_boot))
print(f"90% confidence interval (bootstrap method): ({psi_hat - z * se_boot:.3f}, {psi_hat + z * se_boot:.3f})")
90% confidence interval (delta method): (-0.009, 0.129) 90% confidence interval (bootstrap method): (-0.009, 0.129)
Find the Fisher information matrix for Example 9.29.
Solution:
Let $X_1, \dots, X_n \sim N(\mu, \sigma^2)$. The likelihood function is (ignoring some constants)
\begin{align*} \mathcal{L}_n(\mu, \sigma) = \sigma^{-n} \exp \left\{ -\frac{1}{2\sigma^2} \sum_i (X_i - \mu)^2 \right\} \end{align*}and the log-likelihood function is
\begin{align*} \mathcal{l}_n(\mu, \sigma) = -n \log(\sigma) -\frac{1}{2\sigma^2} \sum_i (X_i - \mu)^2. \end{align*}Computing the Hessian, $H$:
\begin{align*} \frac{\partial \mathcal{l}_n}{\partial \mu} &= \frac{1}{\sigma^2} \sum_i (X_i - \mu), \\ H_{11} = \frac{\partial^2 \mathcal{l}_n}{\partial \mu^2} &= -\frac{n}{\sigma^2}\\ H_{12} = \frac{\partial^2 \mathcal{l}_n}{\partial \sigma \partial \mu} &= -\frac{2}{\sigma^3} \sum_i (X_i - \mu)\\ \frac{\partial \mathcal{l}_n}{\partial \sigma} &= -\frac{n}{\sigma} +\frac{1}{\sigma^3} \sum_i (X_i - \mu)^2. \\ H_{21} = \frac{\partial^2 \mathcal{l}_n}{\partial \mu \partial \sigma} &= \frac{2}{\sigma^3} \sum_i (X_i - \mu)\\ H_{22} = \frac{\partial^2 \mathcal{l}_n}{\partial \sigma^2} &= \frac{n}{\sigma^2} - \frac{3}{\sigma^4} \sum_i (X_i - \mu) ^ 2. \end{align*}The Fisher information matrix is the negative expected value of the Hessian:
\begin{align*} I_n(\mu, \sigma) &= \begin{bmatrix} \frac{n}{\sigma^2} & 0 \\ 0 & \frac{2n}{\sigma^2} \end{bmatrix} \end{align*}where, for $E(H_{12})$ and $E(H_{21})$, we observe $E(X_i) = \mu$, and for $E(H_{22})$, we observe the bias-variance decomposition:
\begin{align*} E\left(\sum_i (X_i - \mu)^2\right) = n(0)^2 + n\sigma^2 \end{align*}Let $X_1, \dots, X_n \sim \text{Normal}(\mu, 1)$. Let $\theta = e^{\mu}$ and let $\hat{\theta} = e^{\bar{X}}$ be the MLE. Create a data set (using $\mu = 5$) consisting of $n=100$ observations.
(a) Use the delta method to get $\hat{se}$ and a 95 percent confidence interval for $\theta$. Use the parametric bootstrap to get $\hat{se}$ and 95 percent confidence interval for $\theta$. Use the nonparametric bootstrap to get $\hat{se}$ and 95 percent confidence interval for $\theta$. Compare your answers.
Solution: In this case, the score function is $s(X, \mu) = X - \mu$, and $I(\mu) = 1$, meaning $I_n(\theta) = n$, $\hat{se}(\mu) = \frac{1}{\sqrt{n}}$, and, by the delta method, $\hat{se} = \hat{se}(\theta) = |g'(\hat{\theta})|\hat{se}(\hat{\mu}_n) = |e^{\bar{X}}|\frac{1}{\sqrt{n}}$.
mu = 5
n = 100
X = rng.normal(mu, scale=1, size=n)
X_bar = np.mean(X)
theta = np.e ** 5
theta_hat = np.e ** X_bar
alpha = 0.05
z = norm.ppf(1 - alpha / 2)
print(f"True value of theta: {theta:.3f}")
print(f"MLE: {theta_hat:.3f}")
# delta method
se_delta = np.sqrt(1 / n) * np.abs(np.e ** X_bar)
print(f"{100 * (1 - alpha):.0f}% C.I. (delta method): ({theta_hat - z * se_delta:.3f}, {theta_hat + z * se_delta:.3f})")
# parametric bootstrap
B = 100000
theta_param_boot = np.empty(B)
for i in range(B):
X_boot = rng.normal(X_bar, scale=1, size=n)
theta_param_boot[i] = np.e ** np.mean(X_boot)
se_param_boot = np.std(theta_param_boot)
print(f"{100 * (1 - alpha):.0f}% C.I. (parametric bootstrap): ({theta_hat - z * se_param_boot:.3f}, {theta_hat + z * se_param_boot:.3f})")
# nonparametric bootstrap
theta_boot = np.empty(B)
for i in range(B):
X_boot = rng.choice(X, size=n, replace=True)
theta_boot[i] = np.e ** np.mean(X_boot)
se_boot = np.std(theta_boot)
print(f"{100 * (1 - alpha):.0f}% C.I. (nonparametric bootstrap): ({theta_hat - z * se_boot:.3f}, {theta_hat + z * se_boot:.3f})")
True value of theta: 148.413 MLE: 143.331 95% C.I. (delta method): (115.239, 171.424) 95% C.I. (parametric bootstrap): (115.071, 171.592) 95% C.I. (nonparametric bootstrap): (118.237, 168.425)
We observe good agreement between the delta method and the parametric bootstrap. The C.I. generated by the nonparametric bootstrap tends to be wider than those provided by the other two methods.
(b) Plot a histogram of the bootstrap replications for the parametric and nonparametric bootstraps. These are estimates of the distribution of $\hat{\theta}$. The delta method also gives an approximation to this distribution, namely, $\text{Normal}(\hat{\theta}, se^2)$. Compare these to the true sampling distribution of $\hat{\theta}$ (which you can get by simulation. Which approximation - parametric bootstrap, bootstrap, or delta method - is closer to the true distribution?
def theta_cdf(x):
return norm.cdf(math.log(x), loc=mu, scale = np.sqrt(1 / n))
bins = np.linspace(50, 200, 150)
theta_cdf_bins = list(map(theta_cdf, bins))
theta_cdf_bins_delta = np.empty(len(bins))
theta_cdf_bins_delta[0] = 0
theta_cdf_bins_delta[1:] = np.diff(theta_cdf_bins)
fig = go.Figure()
fig.add_trace(go.Histogram(x=theta_param_boot,
opacity=0.5,
histnorm='probability density',
name='Parametric Bootstrap'))
fig.add_trace(go.Scatter(x=bins, y=theta_cdf_bins_delta, name='True Sampling Distribution'))
fig.show()
fig = go.Figure()
fig.add_trace(go.Histogram(x=theta_boot,
opacity=0.5,
histnorm='probability density',
name='Nonparametric Bootstrap'))
fig.add_trace(go.Scatter(x=bins, y=theta_cdf_bins_delta, name='True Sampling Distribution'))
fig.show()
fig = go.Figure()
fig.add_trace(go.Scatter(x=bins, y=norm.pdf(bins, loc=theta_hat, scale=se_delta), name='Delta Approximation'))
fig.add_trace(go.Scatter(x=bins, y=theta_cdf_bins_delta, name='True Sampling Distribution'))
fig.show()
No clear winner as far as I can tell.
Let $X_1, \dots, X_n \sim \text{Uniform}(0, \theta)$. The MLE is $\hat{\theta} = X_{(n)} = \max \{X_1, \dots, X_n\}$. Generate a dataset of size 50 with $\theta = 1$.
(a) Find the distribution of $\hat{\theta}$ analytically. Compare the true distribution of $\hat{\theta}$ to the histrograms from the parametric and nonparametric bootstraps.
Solution: From Chapter 8, Problem #7, the CDF of $\hat{\theta}$ is
\begin{equation*} F_{\hat{\theta}}(x) = \begin{cases} 0 & x < 0 \\ \left(\frac{x}{\theta} \right)^n & 0 \le x < \theta \\ 1 & \theta \le x \end{cases} \end{equation*}which has PDF $\frac{n}{\theta^n} x^{n-1} \mathbb{1}_{(0,\theta)}(x)$
n = 50
theta = 1
# true distribution of theta_hat
def theta_hat_density(x):
return (n / (theta ** n)) * (x ** (n - 1))
xx = np.linspace(0, theta, 100)
theta_hat_density = list(map(theta_hat_density, xx))
# get data
X = rng.uniform(low=0, high=theta, size=n)
theta_hat = max(X)
# bootstrap
B = 1000
T_boot = np.empty(B)
for i in range(B):
X_boot = rng.choice(X, size=n, replace=True)
T_boot[i] = max(X_boot)
# parametric bootstrap
T_param_boot = np.empty(B)
for i in range(B):
X_param_boot = rng.uniform(0, theta_hat, size=n)
T_param_boot[i] = max(X_param_boot)
# plot nonparametric bootstrap vs. true density
fig = go.Figure(data=go.Histogram(x=T_boot,
nbinsx=10,
opacity=0.5,
histnorm='probability density',
name='Nonparametric Bootstrap Replications'))
fig.add_trace(go.Scatter(x=xx, y=theta_hat_density, name='True Sampling Distribution'))
fig.update_xaxes(range=[0.85, 1.05])
fig.show()
# plot nonparametric bootstrap vs. true density
fig = go.Figure(data=go.Histogram(x=T_param_boot,
nbinsx=10,
opacity=0.5,
histnorm='probability density',
name='Parametric Bootstrap Replications'))
fig.add_trace(go.Scatter(x=xx, y=theta_hat_density, name='True Sampling Distribution'))
fig.update_xaxes(range=[0.85, 1.05])
fig.show()
(b) This is a case where the bootstrap does very poorly. In fact, we can prove that this is the case. Show that $P(\hat{\theta} = \theta) = 0$ and yet $P(\hat{\theta}^* = \hat{\theta}) \approx .632$. Hint: show that $P(\hat{\theta}^* = \hat{\theta}) = 1 - (1 - \frac{1}{n})^n$ then take the limit as $n$ gets large.
Solution: See Chapter 8, Problem #7.