In [1]:

import numpy as np
from numpy.random import default_rng
rng = default_rng(42)
from scipy.stats import norm
import matplotlib as mpl
from matplotlib import pyplot as plt
import plotly.graph_objects as go

mpl.rcParams['font.size'] = 18
plt.rcParams['text.usetex'] = False

1¶

Suppose we play a game where we start with $c$ dollars. On each play of the game you either double or halve your money, with equal probability. What is your expected fortune after $n$ trials?

Solution:

Let $\mu_n$ be the expected amount after the $n$th round. We have $\mu_0=c$, and the recurrence relation

$$ \mu_{n+1} = \frac12 \left(\frac{\mu_n}{2} \right) + \frac12 (2 \mu_n) = \frac54 \mu_n$$

meaning $\mu_n = c \left( \frac54 \right) ^ n$.

In [2]:

c = 3 # initial amount
n = 10000 # number of simulations
k = 10 # number of rounds
x = rng.integers(0, 1, size=(n, k), endpoint=True)
x = np.where(x==1, 1, -1)
winnings = 2.0 ** np.cumsum(x, axis=1) * c
average_winnings = np.average(winnings, axis=0)

rounds = np.arange(1, k+1, 1)
predicted_winnings = [c * (5/4) ** j for j in rounds]

plt.plot(rounds, predicted_winnings, label='Predicted c * (5/4)^n ')
plt.plot(rounds, average_winnings, label='Average')
plt.title("Winnings vs. Rounds")
plt.ylabel("Average Winnings")
plt.xlabel("Rounds")
plt.legend()
plt.grid()
plt.xticks(rounds)
plt.show()

2¶

Show that $V(X) = 0$ if and only if there is a constant $c$ such that $P(X = c) = 1$.

Solution:

$\Rightarrow$: Suppose $V(X) = 0$. Then $\int (x - \mu)^2 \, dF(x) = 0$. Since the integrand is nonnegative, we then have $x - \mu = 0$ almost everywhere (can be shown via contradiction), meaning the statement holds with $c = \mu$.

$\Leftarrow$: Suppose $P(X = c) = 1$. Then, $E(X) = c$, and $E(X^2) = c ^ 2$, meaning $V(X) = E(X^2) - E(X)^2 = 0$.

3¶

Let $X_1, \dots, X_n \sim \text{Uniform}(0,1)$ and let $Y_n = \max \{X_1, \dots, X_n\}$. Find $E(Y_n)$.

Solution:

We have \begin{align*} P(Y_n \le y) &= P(X_1 \le y \cap \dots X_n \le y) \\ &= P(X_1 \le y) \dots P(X_n \le y) \tag{independence} \\ &= P(X_1 \le y) ^ n \tag{identically distributed} \\ &= y ^ n \\ &\Rightarrow f_{Y_n}(y) = n y ^{n-1} \end{align*}

Thus,

\begin{align*} E(Y_n) &= \int_0^1 y n y ^ {n - 1} \, dy \\ &= \frac{n}{n+1} \end{align*}

In [3]:

n = 1000
k = 10
x = rng.uniform(0, 1, size=(n, k))
y = np.zeros((n, k))
for j in range(k):
    y[:, j] = np.max(x[:, :j+1], axis=1)
    
plt.plot(np.mean(y, axis=0), label='Simulated')
plt.plot([j / (j + 1) for j in range(1, k+1)], label='Predicted (n / (n+1))')
plt.title("E(Y_n) vs. n")
plt.ylabel("E(Y_n)")
plt.xlabel("n")
plt.legend()
plt.grid()
plt.show()

4¶

A particle starts at the origin of the real line and moves along the line in jumps of one unit. For each jump the probability is $p$ that the particle will jump one unit to the left and the probability is $1-p$ that the particle will jump one unit to the right. Let $X_n$ be the position of the particle after $n$ units. Find $E(X_n)$ and $V(X_n)$. (This is known as a random walk.)

Solution:

Letting $\mu_n = E(X_n)$, we have $\mu_0 = 0$ and the recursive relationship $\mu_{n+1} = p (\mu_n - 1) + (1 - p) (\mu_n + 1) = \mu_n + 1 - 2p$, meaning $\mu_n = (1 - 2p)n.$

Let $Y_n$ be the random variable assuming -1 or 1 with probability $p$ and $1 - p$, respectively. We have $V(X_n) = V(Y_1 + \dots + Y_n) = \sum V(Y_i)$. We have $$E(Y_i) = (1-p) (-1) + p (1) = 1 - 2p$$ $$E(Y_i^2) = (1-p) (-1) ^ 2 + p (1)^2 = 1$$ so $V(Y_i) = E(Y_i ^ 2) - E(Y_i)^2 = 1 - (1 - 2p)^2$, and $V(X_n) = n(1 - (1 - 2p)^2)$.

In [4]:

n = 1000
k = 1000
p = 0.45
mu = (1 - 2 * p) * np.arange(k)
sigma2 = (1 - (1 - 2 * p) ** 2) * np.arange(k)
X = np.cumsum(rng.choice([-1, 1], size=(n, k), p=[p, 1-p]), axis=1)
plt.plot(X.T, color='grey', alpha=0.1, linewidth=0.5)
plt.plot(np.arange(k), mu, color='black')
plt.plot(np.arange(k), mu + 1.96 * np.sqrt(sigma2), color='blue')
plt.plot(np.arange(k), mu - 1.96 * np.sqrt(sigma2), color='blue')
plt.show()

5¶

A fair coin is tossed until a head is obtained. What is the expected number of tosses that will be required?

Solution:

Let $X$ be the number of tosses required. We have $E(X) = \frac12 1 + \frac12(1 + E(X))$, meaning $E(X) = 2$.

6¶

Prove Theorem 3.6 for discrete random variables:

(The Rule of the Lazy Statistician). Let $Y = r(X)$. Then

$$ E(Y) = E(r(X)) = \int r(x) \, dF_X(x)$$

Solution: Let $f_Y(y)$ be the pdf of $Y$.

\begin{align*} E(Y) &= \sum_{y} y f_Y(Y = y) \tag{definition of expectation}\\ &= \sum_y y f_Y(r(X) = y) \tag{definition of $Y$}\\ &= \sum_y y \sum_{ \{ x \mid r(x) = y \} } f_X(x) \tag{discreteness of $X$}\\ &= \sum_x r(x) f_X(x) \\ &= \int r(x) \, dF_X(x) \end{align*}

7¶

Let $X$ be a continuous random variable with CDF $F$. Suppose that $P(X > 0) = 1$ and that $E(X)$ exists. Show that $E(X) = \int_0^\infty P(X > x) \, dx$. Hint: Consider integrating by parts. The following fact is helpful: if $E(X)$ exists then $\lim_{x\to\infty} x[1 - F(x)] = 0$.

Solution:

\begin{align*} E(X) &= \int_{-\infty}^\infty x f(x) \, dx \tag{definition}\\ &= \int_{0}^\infty x f(x) \, dx \tag{nonnegativity}\\ &= \lim_{y\to\infty} y F(y) - \int_0^y F(x) \,dx \tag{integration by parts}\\ &= \lim_{y\to\infty} \int_0^y F(y) - F(x) \, dx \tag{FTC} \\ &= \lim_{y\to\infty} \int_0^{\infty} [F(y) - F(x)]\mathbb{1}_{x \le y} \, dx \tag{rewriting}\\ &= \int_0^\infty 1 - F(x) \, dx \tag{MCT*} \\ &= \int_0^\infty P(X > x) \, dx \tag{definition of $F$} \end{align*}

*observing that $g_y(x) = [F(y) - F(x)]\mathbb{1}_{x \le y}$ is a monotonically increasing sequence in $y$ that converges pointwise to $1 - F(x)$, we have the result by the Monotone Convergence Theorem.

8¶

Prove Theorem 3.17:

Let $X_1, \dots, X_n$ be IID and let $\mu = E(X_i), \sigma^2 = V(X_i)$. Then

$$ E(\bar{X}_n) = \mu, \qquad V(\bar{X}_n) = \frac{\sigma^2}{n}, \qquad \text{and} \qquad E(S_n^2) = \sigma^2$$

Solution:

$E(\bar{X}_n) = \mu$:

\begin{align*} E(\bar{X}_n) &= E \left( \frac{1}{n} \sum_i X_i \right) \tag{definition of $\bar{X}_n$} \\ &= \frac{1}{n} \sum_i E(X_i) \tag{linearity of expectation} \\ &= \frac{1}{n} \sum_i \mu = \mu \tag{assumption} \\ \end{align*}

$V(\bar{X}_n) = \frac{\sigma^2}{n}$:

\begin{align*} V(\bar{X}_n) &= V \left( \frac{1}{n} \sum_i X_i \right) \tag{definition of $\bar{X}_n$} \\ &= \sum_i \frac{1}{n^2} V(X_i) \tag{independence} \\ &= \frac{n}{n^2} \sigma^2 = \frac{\sigma^2}{n} \tag{assumption} \end{align*}

$E(S_n^2) = \sigma^2$:

From the expression $V(X) = E(X^2) - E(X)^2$, we have $E(X_i ^ 2) = \mu ^ 2 + \sigma ^ 2$, and $E(\bar{X}_n^2) = \mu ^ 2 + \sigma ^ 2 / n$. Furthermore

\begin{align*} E(X_i \bar{X}_n) &= E \left( \frac{1}{n} \left[X_i ^ 2 + \sum_{j \ne i} X_i X_j \right] \right) \tag{expanding} \\ &= \frac{1}{n} E(X_i^2) + \frac{1}{n} \sum_{j \ne i} E(X_i X_j) \tag{linearity of expectation} \\ &= \frac{1}{n} E(X_i^2) + \frac{1}{n} \sum_{j \ne i} E(X_i) E(X_j) \tag{independence} \\ &= \frac{1}{n} (\mu ^ 2 + \sigma ^ 2) + \frac{n-1}{n}\mu^2 \\ &= \mu ^ 2 + \frac{\sigma ^ 2}{n} \end{align*}

Therefore, \begin{align*} E(S_n^2) &= E \left( \frac{1}{n - 1} \sum_i (X_i - \bar{X}_n) ^ 2 \right) \tag{definition of $S_n^2$}\\ &= \frac{1}{n - 1} \sum_i E(X_i ^ 2) - 2E(X_i \bar{X}_n) + E(\bar{X}_n^2) \tag{linearity of expectation} \\ &= \frac{1}{n - 1} \sum_i \left[ \mu ^ 2 + \sigma ^ 2 - 2(\mu ^ 2 + \frac{\sigma ^ 2}{n}) + \mu ^ 2 + \frac{\sigma ^ 2}{n} \right] \tag{above substitutions} \\ &= \sigma^2 \tag{simplification} \\ \end{align*}

9¶

(Computer Experiment.) Let $X_1, X_2, \dots, X_n$ be $N(0, 1)$ random variables and let $\bar{X}_n = n^{-1} \sum_{i=1}^n X_i$. Plot $\bar{X}_n$ versus $n$ for $n = 1, \dots, 10,000$. Repeat for $X_1, X_2, \dots, X_n \sim $ Cauchy. Explain why there is such a difference.

In [5]:

n = 10000
indices = np.arange(1, n + 1)
X = rng.normal(loc=0, scale=1, size=n) # sample normal random variables
X_bar = np.cumsum(X) / indices # calculate sample means

# plot sample means versus sample size
fig, ax = plt.subplots()
ax.plot(indices, X_bar)
ax.grid()
ax.set_xlabel(r"$n$", fontsize=18)
ax.set_ylabel(r"$\bar{X}_n$", rotation=0, fontsize=18)
plt.show()

In [6]:

n = 10000
indices = np.arange(1, n + 1)
X = rng.standard_cauchy(size=n) # sample cauchy random variables
X_bar = np.cumsum(X) / indices # calculate sample means

# plot sample means versus sample size
fig, ax = plt.subplots()
ax.plot(indices, X_bar)
ax.grid()
ax.set_xlabel(r"$n$", fontsize=18)
ax.set_ylabel(r"$\bar{X}_n$", rotation=0, fontsize=18)
plt.show()

The former has a defined expected value, and the sample means converge to it (almost surely, by the Strong Law of Large Numbers). The latter does not have a defined expected value; sample means can jump all over the place. Intuition: the Cauchy distribution has "fatter tails", meaning large values that have a significant effect on the sample mean are commonplace.

10¶

Let $X \sim N(0,1)$ and let $Y = e ^ X$. Find $E(Y)$ and $V(Y)$.

Solution:

The moment generating function of $X$ is $\phi_X(t) = E(e^{tX}) = \exp \{t ^ 2 / 2 \} \Rightarrow E(e^X) = e^{1 / 2}$ (and $E(e^{2X}) = e^2$). Meanwhile,

\begin{align*} V(Y) &= E(Y^2) - E(Y)^2 \tag{variance formula} \\ &= E(e^{2X}) - E(e^{X}) ^ 2 \tag{substitution} \\ &= e^2 - e \end{align*}

11¶

(Computer Experiment: Simulating the Stock Market.) Let $Y_1, Y_2, \dots$ be independent random variables such that $P(Y_i = 1) = P(Y_i = -1) = 1/2$. Let $X_n = \sum{i=1}^n Y_i$. Think of $Y_i = 1$ as "the stock price increased by one dollar", $Y_i = -1$ as "the stock price decreased by one dollar", and $X_n$ as the value of the stock on day $n$.

(a) Find $E(X_n)$ and $V(X_n)$.

(b) Simulate $X_n$ and plot $X_n$ versus $n$ for $n=1,2,\dots, 10,000$. Repeat the whole simulation several times. Notice two things. First, it's easy to "see" patterns in the sequence even though it is random. Second, you will find that the four runs look very different even though they were generated the same way. How do the calculations in (a) explain the second observation?

Solution: As $X_n$ is a random walk with $p = 1/2$, we have, by Problem #4, $E(X_n) = 0$ and $V(X_n) = n$.

In [7]:

n = 10000 # trials per simulation
k = 4 # n_simulations
Y = rng.choice([-1,1], size=(k, n))
X = np.cumsum(Y, axis=1)
plt.plot(X.T, color='grey', alpha=0.5, linewidth=0.5)
plt.show()

They look different because the variance grows as $n$.

12¶

Prove the formulas given in the table at the beginning of Section 3.4 for the Bernoulli, Poisson, Uniform, Exponential, Gamma and Beta. Here are some hints. For the mean of the Poisson, use the fact that $e^a = \sum_{x=0}^\infty a^x / x!$. To compute the variance, first compute $E(X(X-1))$. For the mean of the Gamma, it will help to multiply and divide by $\Gamma(\alpha + 1)/\beta^{\alpha + 1}$ and use the fact that a Gamma density integrates to 1. For the Beta, multiply and divide by $\Gamma(\alpha + 1)\Gamma(\beta) / \Gamma(\alpha + \beta + 1)$.

Solution:

X = Bernoulli($p$):

\begin{align*} E(X) &= 1 \cdot p + 0 \cdot (1-p) \tag{definition of discrete expectation} \\ &= p \end{align*}\begin{align*} V(X) &= E(X^2) - E(X)^2 \tag{variance formula} \\ &= 1 ^ 2 \cdot p + 0 ^ 2 \cdot (1-p) - E(X)^2 \tag{lazy statistician} \\ &= p - p^2 \tag{$E(X) = p$} \\ &= p(1-p) \end{align*}

X = Poisson($\lambda$):

\begin{align*} E(X) &= \sum_{j=0}^\infty j e^{-\lambda} \frac{\lambda^j}{j!} \tag{definition} \\ &= \sum_{j=1}^\infty j e^{-\lambda} \frac{\lambda^j}{j!} \tag{first term vanishes} \\ &= \lambda e^{-\lambda} \sum_{j=1}^\infty \frac{\lambda^{j-1}}{(j-1)!} \tag{simplification} \\ &= \lambda e^{-\lambda} \sum_{k=0}^\infty \frac{\lambda^{k}}{k!} \tag{letting $k = j-1$} \\ &= \lambda e^{-\lambda} e^{\lambda} \tag{Taylor Expansion of $e^\lambda$} \\ &= \lambda \end{align*}

To compute variance, we shall use $V(X) = E(X^2) - E(X)^2$. Calculating the expected square: \begin{align*} E(X^2) &= \sum_{j=0}^\infty j^2 e^{-\lambda} \lambda^j / j! \tag{lazy statistician} \\ &= \sum_{j=1}^\infty j^2 e^{-\lambda} \lambda^j / j! \tag{first term vanishes} \\ &= \lambda e^{-\lambda} \sum_{j=1}^\infty j \lambda^{j-1} / (j-1)! \tag{simplification} \\ &= \lambda e^{-\lambda} \sum_{k=0}^\infty (k+1) \lambda^{k} / k! \tag{letting $k=j-1$} \\ &= \lambda e^{-\lambda} \left[\sum_{k=0}^\infty k \lambda^{k} / k! + \sum_{k=0}^\infty \lambda^{k} / k! \right] \tag{expanding} \\ &= \lambda e^{-\lambda} \left[E(X) + e^\lambda \right] \tag{definition of $E(X)$ and taylor expansion} \\ &= \lambda ^2 + \lambda \end{align*} Thus, $V(X) = \lambda ^2 + \lambda - \lambda ^ 2 = \lambda$.

X = Uniform($a,b$):

\begin{align*} E(X) &= \int_{-\infty}^\infty x f(x) \, dx \tag{definition of $E(X)$} \\ &= \int_a^b x \frac{1}{b-a} \, dx \tag{definition of $f(x)$} \\ &= \frac{1}{b-a} \frac{b^2 - a^2}{2} \tag{integration} \\ &= \frac{a+b}{2} \end{align*}

and

\begin{align*} E(X^2) &= \int_{-\infty}^\infty x ^ 2 f(x) \, dx \tag{lazy statistician} \\ &= \int_a^b x ^2 \frac{1}{b-a} \, dx \tag{definition of $f(x)$} \\ &= \frac{1}{b-a} \frac{b^3 - a^3}{3} \tag{integration} \\ &= \frac{a^2+ab+b^2}{3} \end{align*}

thus

\begin{align*} V(X) &= E(X^2) - E(X) \tag{variance formula} \\ &= (a^2+ab+b^2)/3 - ((a+b)/2)^2 \tag{substitution} \\ &= (b - a)^2 / 12 \end{align*}

X = Exponential($\beta$):

\begin{align*} E(X) &= \int_{-\infty}^\infty x f(x) \, dx \tag{definition of $E$} \\ &= \int_0^\infty x \frac1\beta e^{-x/\beta} \, dx \tag{definition of $f$} \\ &= \frac1\beta \left[ \lim_{L\to\infty} x \beta e^{-x / \beta} \mid_0^L + \int_0^L \beta e^{-x / \beta} \, dx \right] \tag{integration by parts} \\ &= \frac1\beta \left[0 + \beta \lim_{L\to\infty} -\beta e ^{-x / \beta} \mid_0^L \right] \\ &= \beta \end{align*}

and \begin{align*} E(X^2) &= \int_0^\infty x ^ 2 \frac1\beta e^{-x/\beta} \, dx \tag{definition of $f$} \\ &= \frac1\beta \left[ \lim_{L\to\infty} x ^ 2 \beta e^{-x / \beta} \mid_0^L + \int_0^L 2x\beta e^{-x / \beta} \, dx \right] \tag{integration by parts} \\ &= \int_0^\infty 2x e^{-x / \beta} \, dx \\ &= 2\beta \int_0^\infty x \frac1\beta e^{-x / \beta} \, dx \\ &= 2\beta E(X) \tag{definition of $E$}\\ &= 2\beta^2 \end{align*} thus $V(X) = E(X^2) - E(X)^2 = \beta^2$.

X = Gamma($\alpha, \beta$):

\begin{align*} E(X) &= \int_0^\infty x \frac1{\beta^\alpha \Gamma(\alpha)} x ^ {\alpha - 1} e ^ {-x / \beta} \, dx \tag{definition}\\ &= \int_0^\infty x \frac1{\beta^\alpha \Gamma(\alpha)} x ^ {\alpha - 1} e ^ {-x / \beta} \frac{\Gamma(\alpha + 1)}{\beta^{\alpha + 1}}\frac{\beta^{\alpha + 1}}{\Gamma(\alpha + 1)} \, dx \tag{hint}\\ &= \frac{\Gamma(\alpha + 1)}{\Gamma(\alpha)} \frac{\beta^{\alpha +1}}{\beta^\alpha} \int_0^\infty \frac{1}{\beta^{\alpha + 1} \Gamma(\alpha + 1)}x^{\alpha}e^{-x / \beta} \, dx \\ &= \alpha \beta \int_0^\infty \frac{1}{\beta^{\delta} \Gamma(\delta)}x^{\delta-1}e^{-x / \beta} \, dx \tag{$\delta = \alpha + 1$} \\ &= \alpha \beta \tag{integrand is the Gamma pdf} \end{align*}

Recall $\Gamma(\alpha) = \int_0^{\infty} y^{\alpha - 1}e^{-y} \, dy$. \begin{align*} E(X^2) &= \int_0^\infty x^2 \frac1{\beta^\alpha \Gamma(\alpha)} x ^ {\alpha - 1} e ^ {-x / \beta} \, dx \tag{definition} \\ &= \frac1{\beta^\alpha \Gamma(\alpha)} \int_0^\infty x ^ {\alpha + 1} e ^ {-x / \beta} \, dx \\ &= \frac{\beta^{\alpha +2}}{\beta^\alpha \Gamma(\alpha)} \int_0^\infty y ^ {\alpha + 1} e ^ {-y} \, dy \tag{$y = x / \beta$} \\ &= \frac{\beta ^ 2}{ \Gamma(\alpha)} \int_0^\infty y ^ {(\alpha + 2) - 1} e ^ {-y} \, dy \\ &= \frac{\beta ^ 2\Gamma(\alpha + 2) }{\Gamma(\alpha)} \tag{definition of $\Gamma(\alpha + 2)$}\\ &= \alpha \beta ^ 2 + \alpha ^ 2 \beta ^2 \end{align*} thus $V(X) = E(X^2) - E(X)^2 = \alpha \beta^2$.

X = Beta($\alpha, \beta$):

\begin{align*} E(X) &= \int_0^1 x \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} x ^ {\alpha - 1} (1 - x) ^ {\beta - 1} \, dx \tag{definition} \\ &= \int_0^1 \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} x ^ {\alpha} (1 - x) ^ {\beta - 1} \frac{\Gamma(\alpha + 1)\Gamma(\beta)}{\Gamma(\alpha + \beta + 1)} \frac{\Gamma(\alpha + \beta + 1)}{\Gamma(\alpha + 1)\Gamma(\beta)}\, dx \tag{hint} \\ &= \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha + \beta + 1)} \frac{\Gamma(\alpha + 1)}{\Gamma(\alpha)} \int_0^1 \frac{\Gamma(\alpha + \beta + 1)}{\Gamma(\alpha + 1) \Gamma(\beta)} x ^ {\alpha} (1 - x) ^ {\beta - 1} \, dx \\ &= \frac{\alpha}{\alpha + \beta} \int_0^1 \frac{\Gamma(\alpha' + \beta)}{\Gamma(\alpha') \Gamma(\beta)} x ^ {\alpha'-1} (1 - x) ^ {\beta - 1} \, dx \tag{$\alpha' = \alpha + 1$} \\ &= \frac{\alpha}{\alpha + \beta} \tag{integrand is a density} \end{align*}

Similarly, \begin{align*} E(X^2) &= \frac{\alpha}{\alpha + \beta} \int_0^1 \frac{\Gamma(\alpha + \beta + 1)}{\Gamma(\alpha + 1) \Gamma(\beta)} x ^ {\alpha + 1} (1 - x) ^ {\beta - 1} \, dx \\ &= \frac{\alpha}{\alpha + \beta} \int_0^1 x \frac{\Gamma(\alpha' + \beta)}{\Gamma(\alpha') \Gamma(\beta)} x ^ {\alpha'} (1 - x) ^ {\beta - 1} \, dx \tag{$\alpha' = \alpha + 1$}\\ &= \frac{\alpha}{\alpha + \beta} E(X') \tag{$X' = \text{Beta}(\alpha',\beta)$} \\\frac{\alpha}{\alpha + \beta}\frac{\alpha}{\alpha + \beta} &= \frac{\alpha}{\alpha + \beta} \frac{\alpha + 1}{\alpha + \beta + 1} \tag{$E(X) = \alpha / (\alpha + \beta)$} \end{align*} thus $V(X) = E(X^2) - E(X)^2 = \frac{\alpha(\alpha + 1)}{(\alpha + \beta)(\alpha + \beta + 1)} - \left(\frac{\alpha}{\alpha + \beta}\right)^2 = (\alpha \beta) / ((\alpha + \beta)^2 (\alpha + \beta + 1))$.

13¶

Suppose we generate a random variable $X$ in the following way. First we flip a fair coin. If the coin is heads, take $X$ to have a Unif$(0,1)$ distribution. If the coin is tails, take $X$ to have a Unif$(3,4)$ distribution.

(a) Find the mean of $X$.

(b) Find the standard deviation of $X$.

Solution:

\begin{align*} E(X) &= \frac12 E(X \mid \text{heads}) + \frac12 E(X \mid \text{tails}) \\ &= \frac12 \frac12 + \frac12 \frac72 \\ &= 2 \end{align*}\begin{align*} E(X^2) &= \frac12 E(X^2 \mid \text{heads}) + \frac12 E(X^2 \mid \text{tails}) \\ &= \frac12 \frac{0^2 + 0 + 1^2}{3} + \frac12 \frac{3^2 + 3 \cdot 4 + 4^2}{3} \tag{$E(\text{Unif}(a,b) = (a^2 + ab + b^2)/3$} \\ &= 19 / 3 \\ \end{align*}

Thus $V(X) = 19 / 3 - 2 ^ 2 = 7 / 3$.

14¶

Let $X_1, \dots, X_m$ and $Y_1, \dots, Y_n$ be random variables and let $a_1, \dots, a_m$ and $b_1, \dots, b_n$ be constants. Show that

$$\text{Cov} \left( \sum_{i=1}^m a_i X_i , \sum_{j=1}^n b_j Y_j \right) = \sum_{i=1}^m \sum_{j = 1}^n a_i b_j \text{Cov} (X_i, Y_j) $$

Solution: Letting $\mu_i^X = E(X_i)$ and $\mu_j^Y = E(Y_j)$,

\begin{align*} \text{Cov} \left( \sum_{i=1}^m a_i X_i , \sum_{j=1}^n b_j Y_j \right) &= E \left( \left(\sum_{i=1}^m a_i X_i - \sum_{i=1}^m a_i \mu_i^X \right) \left( \sum_{j=1}^n b_j Y_j - \sum_{j=1}^n b_j \mu_j^Y \right) \right) \tag{definition} \\ &= E \left( \left( \sum_{i=1}^m a_i(X_i - \mu_i^X) \right) \left( \sum_{j=1}^n b_j (Y_j - \mu_j^Y) \right) \right) \\ &= E \left( \sum_{i=1}^m \sum_{j=1}^n a_i b_j (X_i - \mu_i^X) (Y_j - \mu_j^Y) \right) \\ &= \sum_{i=1}^m \sum_{j=1}^n a_i b_j E \left((X_i - \mu_i^X) (Y_j - \mu_j^Y) \right) \tag{linearity of $E$} \\ &= \sum_{i=1}^m \sum_{j=1}^n a_i b_j \text{Cov} \left((X_i, Y_j \right) \tag{definition} \end{align*}

15¶

Let

$$ f(x, y) = \begin{cases} \frac13 (x + y) & 0 \le x \le 1, 0 \le y \le 2 \\ 0 & \text{otherwise} \end{cases} $$

Find $V(2X - 3Y + 8)$.

Solution:

$$E(2X - 3Y + 8) = \int_0^1 \int_0^2 (2x - 3y + 8) \frac13 (x+y) \, dydx = \frac{49}{9}$$

.

$$E((2X - 3Y + 8)^2) = \int_0^1 \int_0^2 (2x - 3y + 8)^2 \frac13 (x+y) \, dydx = \frac{98}{3}$$

.

Thus, $V(2X - 3Y + 8) = 245 / 81$.

16¶

Let $r(x)$ be a function of $x$ and let $s(y)$ be a function of $y$. Show that

$$E(r(X)s(Y) \mid X) = r(X)E(s(Y) \mid X).$$

Also, show that $E(r(X) \mid X) = r(X)$.

Solution:

\begin{align*} E(r(X)s(Y) \mid X) &= \int r(x)s(y)f_{Y \mid X}(y \mid x) \, dy \tag{Definition} \\ &= r(x) \int s(y) f_{Y \mid X}(y \mid x) \, dy \\ &= r(x) E(s(Y) \mid X) \end{align*}

The latter follows from taking $s=1$.

17¶

Prove that

$$V(Y) = EV(Y \mid X) + VE(Y \mid X).$$

Hint: Let $m = E(Y)$ and let $b(x) = E(Y \mid X = x)$. Note that $E(b(X)) = EE(Y \mid X) = E(Y) = m$. Bear in mind that $b$ is a function of $x$. Now write $V(Y) = E(Y-m)^2 = E((Y-b(X)) + (b(X) - m))^2$. Expand the square and take the expectation. You then have to take the expectation of three terms. In each case, use the rule of the iterated expectation: $E(\text{stuff}) = EE(\text{stuff} \mid X)$.

Solution:

\begin{align*} V(Y) &= E(Y^2) - E(Y)^2 \tag{variance formula} \\ &= E\left[E(Y^2 \mid X)\right] - E\left[E(Y \mid X) \right] ^ 2\tag{rule of iterated expectation} \\ &= E\left[(E(Y \mid X))^2 + V(Y \mid X) \right] - E\left[E(Y \mid X) \right] ^ 2 \tag{variance formula} \\ &= E\left[ V(Y \mid X) \right] + E\left[(E(Y \mid X))^2\right] - E\left[E(Y \mid X) \right] ^ 2 \tag{linearity of $E$} \\ &= E\left[ V(Y \mid X) \right] + V\left[ E(Y \mid X)\right] \tag{var. form. on the r.v. $ E(Y \mid X)$} \end{align*}

18¶

Show that if $E(X \mid Y = y) = c$ for some constant $c$, then $X$ and $Y$ are uncorrelated.

Solution:

Note that $\rho = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_X} \Rightarrow$ $X$ and $Y$ are uncorrelated iff $\text{Cov}(X,Y) = E(XY) - E(X)E(Y) = 0$. Since

\begin{align*} E(X) &= E(E(X \mid Y)) \tag{rule of iterated expectation} \\ &= E(c) \tag{assumption} \\ &= c \tag{expectation of constant} \end{align*}

and

\begin{align*} E(XY) &= E(E(XY \mid Y)) \tag{rule of iterated expectation} \\ &= E(E(X \mid Y)Y)) \tag{$Y$ is constant in inner expectation} \\ &= E(cY) \tag{assumption} \\ &= cE(y) \tag{linearity of expectation} \end{align*}

the result follows.

19¶

This question is to help you understand the idea of a sampling distribution. Let $X_1, \dots, X_n$ be IID with mean $\mu$ and variance $\sigma^2$. Let $\bar{X}_n = n^{-1}\sum_{i=1}^n X_i$. Then $\bar{X}_n$ is a statistic, that is, a function of the data. Since $\bar{X}_n$ is a random variable, it has a distribution. This distribution is called the sampling distribution of the statistic. Recall from Theorem 3.17 that $E(\bar{X}_n) = \mu$ and $V(\bar{X}_n) = \sigma^2 / n$. Don't confuse the distribution of the data $f_X$ and the distribution of the statistic $f_{\bar{X}_n}$. To make this clear, let $X_1, \dots, X_n \sim \text{Uniform}(0,1)$. Let $f_X$ be the density of the $\text{Uniform}(0,1)$. Plot $f_X$. Now let $\bar{X}_n = n^{-1} \sum_{i=1}^n$. Find $E(\bar{X}_n)$ and $V(\bar{X}_n)$. Plot them as a function of $n$. Interpret. Now simulate the distribution of $\bar{X}_n$ for $n = 1, 5, 25, 100$. Check that the simulated values of $E(\bar{X}_n)$ and $V(\bar{X}_n)$ agree with your theoretical calculations. What do you notice about the sampling distribution of $\bar{X}_n$ as $n$ increases?

Solution:

In this case, $\mu = \frac12$ and $\sigma^2 = \frac{1}{12}$ (see Problem #12), and we have

\begin{align*} E(\bar{X}_n) &= \mu = \frac12 \\ V(\bar{X}_n) &= \sigma^2/n = \frac{1}{12n} \end{align*}

In [8]:

plt.figure(figsize=(6,6))
ax = plt.subplot(111)
plt.axhline(0, color='black')
plt.axvline(0, color='black')
ax.hlines(0, xmin=-0.5, xmax=0, linewidth=5)
ax.hlines(1, xmin=0, xmax=1, linewidth=5)
ax.hlines(0, xmin=1, xmax=2, linewidth=5)
ax.vlines([0,1], ymin=0, ymax=1, linewidth=5, linestyle='dashed')
ax.grid()
ax.set_xlim(-0.5, 2)
ax.set_ylim(-0.5, 1.5)
ax.set_title(r"PDF of Uniform(0,1)")
ax.set_xlabel(r"x")
ax.set_ylabel(r"f_X(x)")
plt.show()

In [9]:

n = 100 # n_trials

k = 1000 # n_simulations

E_X_bar = np.empty(n-1)
V_X_bar = np.empty(n-1)

for j in range(1, n):
    X = rng.uniform(0,1, size=(k, j)).mean(axis=1)
    E_X_bar[j-1] = X.mean()
    V_X_bar[j-1] = X.var()
nn = np.arange(0, n, 1)

plt.figure(figsize=(16,4))
nn = np.arange(1, n)
ax = plt.subplot(121)

ax.plot(nn, [0.5 for _ in nn], label='Theoretical')
ax.plot(nn, E_X_bar, label='simulated')
ax.set_title("Sampling Distribution Mean")
ax.set_xlabel(r"$n$")
ax.set_ylabel(r"$E(X_n)$")
ax.set_ylim(0.49, 0.51)
ax.grid()
ax.legend()

ax = plt.subplot(122)

ax.plot(nn, [1 / (12 * i) for i in nn], label='Theoretical')
ax.plot(nn, V_X_bar, label='simulated')
ax.set_yscale('log')
ax.set_title("Sampling Distribution Variance")
ax.set_xlabel(r"$n$")
ax.set_ylabel(r"$V(X_n)$")
ax.grid()
ax.legend()

plt.show()

20¶

Prove Lemma 3.21

If $a$ is a vector and $X$ is a random vector with mean $\mu$ and variance $\Sigma$, then $E(a^T X) = a^T \mu$ and $V(a^T X) = a^T \Sigma a$. If $A$ is a matrix then $E(AX) = A\mu$ and $V(AX) = A\Sigma A^T$.

Solution:

Let $a_i$ be the $i$th element of $a$. Then

\begin{align*} E(a^T X) &= E \left[ \sum_j a_j X_j \right] \\ &= \sum_j a_j E(X_j) \\ &= a^T E(X) \end{align*}

Let $a_{ij}$ be the element in the $i$th row and $j$th column of the $m \times n$ matrix $A$. Let the $1 \times n$ vector $a_{i*}$ be the $i$th row of $A$. Then

\begin{align*} (E(AX))_i &= E \left[ (AX)_i \right] \\ &= E \left[ a_{i*} X \right] \\ &= a_{i*}E(X) \end{align*}

hence $E(AX) = AE(X)$.

Now

\begin{align*} V(a^T X) &= V \left[ \sum_j a_j X_j \right] \\ &= \sum_{i=1}^n \sum_{j=1}^n a_i Cov(X_i, X_j) a_j \tag{Problem #14} \\ &= a^T V(X) a \end{align*}

and

\begin{align*} (V(AX))_{ij} &= Cov((AX)_i, (AX)_j)) \tag{definition} \\ &= Cov(a_{i*}X, a_{j*}X) \\ &= \sum_{k=1}^n\sum_{j=1}^n a_{ik} Cov(X_k, X_l) a_{jl} \tag{Problem #14} \\ \end{align*}

hence $V(AX) = AV(X)A^T$.

21¶

Let $X$ and $Y$ be random variables. Suppose that $E(Y \mid X) = X$. Show that $\text{Cov}(X,Y) = V(X)$.

Solution:

Since

\begin{align*} E(XY) &= E(E(XY \mid X)) \tag{iterated expectation} \\ &= E(XE(Y \mid X)) \tag{X is constant in inner expectation} \\ &= E(X^2) \tag{assumption} \end{align*}

and

\begin{align*} E(Y) &= E(E(Y \mid X)) \tag{iterated expectation} \\ &= E(X) \tag{assumption} \end{align*}

We have \begin{align*} \text{Cov}(X,Y) &= E(XY) - E(X)E(Y) \tag{covariance formula} \\ &= E(X^2) - E(X)^2 \tag{above} \\ &= V(X) \tag{variance formula} \end{align*}

22¶

Let $X \sim \text{Uniform}(0,1)$. Let $0 < a < b < 1$. Let

$$ Y = \begin{cases} 1 & 0 < x < b \\ 0 & \text{otherwise} \end{cases} $$

and let

$$ Z = \begin{cases} 1 & a < x < 1 \\ 0 & \text{otherwise} \end{cases} $$

(a) Are $Y$ and $Z$ independent? Why/Why not?

(b) Find $E(Y \mid Z)$. Hint: What values $z$ can $Z$ take? Now find $E(Y \mid Z = z)$.

Solution:

(a) We have $E(Y)= b$, $E(Z) = 1-a$, and $E(YZ) = b-a$, meaning. Since. Since $Cov(Y,Z) = E(YZ) - E(Y)E(Z) = a(b-1) \ne 0$. Thus, $Y$ and $Z$ are not independent.

(b) Since $Z \in \{0, 1\}$ and $E(Y \mid Z = 0) = 1$ and $E(Y \mid Z = 1) = (b-a) / (1 - a)$, $E(Y \mid Z) = \frac{(b-a) z}{1 - a}$.

23¶

Find the moment generating function for the Poisson, Normal, and Gamma distributions.

Poisson:

\begin{align*} \psi(t) &= \sum_{n=0}^\infty e^{tn} e^{-\lambda} \frac{\lambda^n}{n!} \tag{definition} \\ &= e^{-\lambda} \sum_{n=0}^\infty \frac{(e^t\lambda)^n}{n!} \\ &= e^{-\lambda} e^{e^t \lambda} \tag{Taylor Series expansion} \\ &= e^{\lambda(e^t - 1)} \end{align*}

Normal:

\begin{align*} \psi(t) &= \frac{1}{\sqrt{2 \pi} \sigma} \int_{-\infty}^\infty \exp(tx) \exp\left[-\frac{1}{2\sigma^2}\left(x-\mu\right)^2 \right] \, dx \tag{definition} \\ &= \frac{1}{\sqrt{2 \pi} \sigma} \int_{-\infty}^\infty \exp\left[-\frac{1}{2\sigma^2}\left(x^2 -2x\mu-\mu^2 -2\sigma^2 t x\right)\right] \, dx \\ \end{align*}

Focusing on the innermost parentheses: \begin{align*} x^2 -2 \mu x-\mu^2 -2 \sigma^2 t x &= x^2 - (2 \mu + 2 \sigma^2 t)x + \mu^2 \\ &= x^2 - (2 \mu + 2 \sigma^2 t)x + (\mu + \sigma^2 t)^2 - (\mu + \sigma^2 t)^2 + \mu^2 \\ &= (x - (\mu + \sigma^2 t))^2 - 2 \mu \sigma^2 t - \sigma^4 t^2 \end{align*}

thus,

\begin{align*} \psi(t) &= \frac{1}{\sqrt{2 \pi} \sigma} \exp\left[ \mu t + \frac{\sigma^ 2 t ^ 2}{2}\right] \int_{-\infty}^\infty \exp\left[-\frac{1}{2\sigma^2}\left( x - (\mu + \sigma^2 t\right) ^ 2\right] \, dx \\ &= \exp\left[ \mu t + \frac{\sigma^ 2 t ^ 2}{2}\right] \int_{-\infty}^\infty \frac{1}{\sqrt{2 \pi} \sigma} \exp\left[-\frac{1}{2\sigma^2}\left( x - (\mu + \sigma^2 t)\right) ^ 2\right] \, dx \\ &= \exp\left[ \mu t + \frac{\sigma^ 2 t ^ 2}{2}\right] \end{align*}

Gamma:

\begin{align*} \psi(t) &= \int_0^\infty e^{tx} \frac{1}{\beta^\alpha \Gamma (\alpha )} x ^{\alpha - 1} e ^{-x / \beta} \, dx \\ &= \int_0^\infty \frac{1}{\beta^\alpha \Gamma (\alpha )} x ^{\alpha - 1} \exp \left[\frac{-x}{\beta / (1-\beta t) } \right] \, dx \\ &= \left(\frac{1}{1-\beta t} \right) ^ \alpha \int_0^\infty \frac{1}{\left[\frac{\beta}{1-\beta t}\right]^\alpha \Gamma (\alpha )} x ^{\alpha - 1} \exp \left[\frac{-x}{\beta / (1-\beta t) } \right] \, dx \\ \end{align*}

If $t < 1 / \beta$, then $\beta / (1 - \beta t) > 0$, and the integral is finite, with the integrand the Beta($\alpha, \beta / (1 - \beta t))$ density, meaning $\psi(t) = \left(\frac{1}{1-\beta t} \right) ^ \alpha$. Otherwise, the integral is not finite, and $\psi(t)$ is undefined.

24¶

Let $X_1, \dots, X_n \sim \text{Exp}(\beta)$. Find the moment generating function of $X_i$. Prove that $\sum_{i=1}^n X_i \sim \text{Gamma}(n, \beta)$.

Solution:

We have

\begin{align*} \psi(X_i) &= \int_0^\infty e^{tx} \frac{1}{\beta} e^{- x / \beta} \, dx \tag{definition} \\ &= \int_0^\infty \frac{1}{\beta} e^{- x / (\beta / (1 - \beta t))} \, dx \\ &= \frac{1}{1-\beta t} \int_0^{\infty} \frac{1}{\beta / (1 - \beta t)} e^{- x / (\beta / (1 - \beta t))} \, dx \\ &= \frac{1}{1-\beta t} \tag{integrand is pdf of Exp($\beta/(1-\beta t)$)} \end{align*}

By Lemma 3.31, the MGF of $\sum_{i=1}^n X_i$ is $\prod_{i=1}^n \psi(X_i) = \left( \frac{1}{1-\beta t} \right)^n$, which is the MGF of the $\text{Gamma}(n, \beta)$ distribution. Since the MGF characterizes the distribution, we conclude $\sum_{i=1}^n X_i \sim \text{Gamma}(n, \beta)$.

Note that this result squares with the intuitive definition of Exp($\beta$) and Gamma($n, \beta$). The former models the wait time for the first occurance of an event, and the latter models the wait time for the $n$th event, where events have frequency parameter $\beta$.