1.2 Sample Spaces and Events¶

The sample space $\Omega$ is the set of possible outcomes of an experiment. Points $\omega$ in $\Omega$ are called sample outcomes, realizations, or elements. Subsets of $\Omega$ are called Events.

Given an event $A$, let $A^c = \{ \omega \in \Omega : \omega \notin A \}$ denote the complement of $A$. Informally, $A^c$ can be read as "not $A$". The complement of $\Omega$ is the empty set $\emptyset$. The union of events $A$ and $B$ is defined

$$ A \cup B = \{ \omega \in \Omega : \omega \in A \text{ or } \omega \in B \}$$

which can be thought of as "$A$ or $B$". If $A_1, A_2,\dots$ is a seqyence of sets then

$$ \bigcup_{i=1}^{\infty} A_i = \left\{ \omega \in \Omega : \omega \in A_i \text{ for at least one $i$}\right\}.$$

The intersection of $A$ and $B$ is

$$ A \cap B = \{ \omega \in \Omega : \omega \in A \text{ and } \omega \in B \}$$

read "$A$ and $B$." Sometimes we write $A \cap B$ as $AB$ or $(A,B)$. If $A_1, A_2, \dots$ is a sequence of sets then

$$ \bigcap_{i=1}^{\infty} A_i = \left\{ \omega \in \Omega : \omega \in A_i \text{ for all $i$}\right\}.$$

The set difference is defined by $A - B = \{\omega : \omega \in A, \omega \notin B\}$. If every element of $A$ is also contained in $B$ we write $A \subset B$ or, equivalently, $B \supset A$. If $A$ is a finite set, let $|A|$ denote the number of elements in $A$.

Notation	Summary of Terminology
$\Omega$	sample space
$\omega$	outcome (point or element)
$A$	event (subset of $\Omega$)
$A^c$	complement of $A$ (not $A$)
$A \cup B$ or $AB$	union ($A$ or $B$)
$A \cap B$	intersection ($A$ and $B$)
$A - B$	set difference ($\omega$ in $A$ but not in $B$)
$A \subset B$	set inclusion
$\emptyset$	null event (always false)
$\Omega$	true event (always true)

We say that $A_1, A_2, \dots$ are disjoint or are mutually exclusive if $A_i \cap A_j = \emptyset$ whenever $i \ne j$. For example, $A_1 = [0, 1), A_2 = [1, 2), A_3 = [2, 3), \dots$ are disjoint. A partition of $\Omega$ is a sequence of disjoint sets $A_1, A_2, \dots$ such that $\bigcup_{i=1}^\infty A_i = \Omega$. Given an event $A$, define the indicator function of A by

$$I_A(\omega) = I(\omega \in A) = \begin{cases} 1 & \text{if $\omega \in A$} \\ 0 & \text{if $\omega \notin A$.} \end{cases}$$

A sequence of sets $A_1, A_2, \dots$ is monotone increasing if $A_1 \subset A_2 \subset \dots$ and we define $\lim_{n\to\infty}A_n = \bigcup_{i=1}^\infty A_i$. A sequence of sets $A_1, A_2, \dots$ is monotone decreasing if $A_1 \supset A_2 \supset \dots$ and we define $\lim_{n\to\infty}A_n = \bigcap_{i=1}^\infty A_i$. In either case, we will write $A_n \rightarrow A$.

1.3 Probability¶

We will assign a real number $\mathbb{P}(A)$ to every event $A$, called the probability of $A$. We call call $\mathbb{P}$ a probability disribution or probability measure. To qualify as a probability, $\mathbb{P}$ must satisfy three axioms:

1.5 Definition¶

A function $\mathbb{P}$ that assigns a real number $\mathbb{P}(A)$ to each event $A$ is a probability distribution* or a probability measure if it satisfies the following three axioms:*

Axiom 1: $\mathbb{P}(A) \ge 0$ for every $A$
Axiom 2: $\mathbb{P}(\Omega) = 1$
Axiom 3: If $A_1, A_2, \dots$ are disjoint then

$$\mathbb{P}\left( \bigcup_{i=1}^\infty A_i \right) = \sum_{i=1}^\infty \mathbb{P}(A_i)$$

1¶

Fill in the details of the proof of Theorem 1.8. Also, prove the monotone decreasing case.

Solution:

1.8 Theorem¶

If $A_n \rightarrow A$ then

$$\mathbb{P}(A_n) \rightarrow \mathbb{P}(A)$$

as $n \rightarrow \infty$.

Proof: Suppose that $A_n$ is monotone increasing so that $A_1 \subset A_2 \subset \dots$. Let $A = \lim_{n\to\infty} A_n = \bigcup_{i=1}^\infty A_i$. Define $B_1 = A_1$, $B_2 = \{ \omega \in \Omega : \omega \in A_2, \omega \notin A_1 \}$, $B_3 = \{ \omega \in \Omega : \omega \in A_3, \omega \notin A_2, \omega \notin A_1 \}, \dots$.

Claim: $B_1, B_2, \dots$ are disjoint. Proof: Let $i, j \in \mathbb{N}$ such that $i \ne j$. WLOG, let $i < j$. Suppose $\omega \in B_i$. Then, $\omega \in A_i$. By the definition of $B_j$, $\omega \notin B_j$. Suppose now that $\omega \in B_j$. By the definition of $B_j$, $\omega \in A_j$, and $\omega \notin A_k$ for all $k < j$. In particular, $\omega \notin A_i$. Observing that $B_i \subset A_i$, we see $\omega \notin B_i$.

Claim: $A_n = \bigcup_{i=1}^n A_i = \bigcup_{i=1}^n B_i$. Proof: Suppose $\omega \in A_n$. Since $\{A_i\}$ is monotone increasing, there exists a $k \in \{1, \dots, n \}$ such that $\omega \in A_k$ and $\omega \notin A_l$ for all $l \in \{1, \dots, k-1\}$. Thus, $\omega \in B_k \subset \bigcup_{i=1}^n B_i$. Suppose now that $\omega \in \bigcup_{i=1}^n B_i$. There exists a $k \in \{1, \dots, n\}$ such that $\omega \in B_k$. Since $B_k \subset A_k \subset A_n$, $\omega \in A_n$.

Claim: $\bigcup_{i=1}^\infty A_i = \bigcup_{i=1}^\infty B_i$. Proof: Similar to the finite case.

From Axiom 3:

$$\mathbb{P}(A_n) = \mathbb{P} \left( \bigcup_{i=1}^n B_i \right) = \sum_{i=1}^n \mathbb{P}(B_i)$$

and hence, using Axiom 3 again,

$$\lim_{n\to\infty} \mathbb{P}(A_n) = \lim_{n\to\infty} \sum_{i=1}^n \mathbb{P}(B_i) = \sum_{i=1}^\infty \mathbb{P}(B_i) = \mathbb{P} \left( \bigcup_{i=1}^\infty B_i \right) = \mathbb{P}(A).$$

To establish the monotone decreasing case, we observe that $\{A^c_n\}$ is a monotone increasing sequence. Thus, $\mathbb{P}(A_n^c) \rightarrow \mathbb{P}(A^c)$, and

$$\mathbb{P}(A_n) = 1 - \mathbb{P}(A_n^c) \rightarrow 1 - \mathbb{P}(A^c) = \mathbb{P}(A).$$

2¶

Prove the statements in equation (1.1)

$\mathbb{P}(\emptyset) = 0$: Note that $\Omega = \Omega \cup \emptyset$, and $\Omega$ and $\emptyset$ are disjoint. By Axiom 3, $\mathbb{P}(\Omega) = \mathbb{P}(\Omega \cup \emptyset) = \mathbb{P}(\Omega) + \mathbb{P}(\emptyset)$. Thus, by Axiom 2, $\mathbb{P}(\emptyset) = 0$.

$A \subseteq B \Rightarrow \mathbb{P}(A) \le \mathbb{P}(B)$: We have $\mathbb{P}(A) \le \mathbb{P}(A) + \mathbb{P}(B - A) = \mathbb{P}(A \cup (B - A)) = \mathbb{P}(B)$, using Axiom 1 in the inequality, Axiom 3 in the first equality, and the fact that $A \subseteq B$ in the second equality.

$0 \le \mathbb{P}(A) \le 1$: The first inequality is Axiom 1. The second follows from the previous result and Axiom 2.

$\mathbb{P}(A^c) = 1 - \mathbb{P}(A)$: We have $\mathbb{P}(A^c) = \mathbb{P}(A^c) + \mathbb{P}(A) - \mathbb{P}(A) = \mathbb{P}(A^c \cup A) - \mathbb{P}(A) = \mathbb{P}(\Omega) - \mathbb{P}(A) = 1 - \mathbb{P}(A)$, using Axiom 3 in the second equality and Axiom 2 in the fourth.

$A \cap B = \emptyset \Rightarrow \mathbb{P}(A \cup B) = \mathbb{P}(A) + \mathbb{P}(B)$: Immediate from Axiom 3.

3¶

Let $\Omega$ be a sample space and let $A_1, A_2, \dots$ be events. Define $B_n = \bigcup_{i=n}^{\infty} A_i$ and $C_n = \bigcap_{i=n}^\infty A_i$.

(a) Show that $B_1 \supset B_2 \supset \dots$ and that $C_1 \subset C_2 \subset \dots$.

We have $B_{n} = B_{n+1} \cup A_n$, meaning $B_{n} \supset B_{n+1}$. Similarly, $C_{n} = C_{n+1} \cap A_n$, meaning $C_n \subset C_{n+1}$.

(b) Show that $\omega \in \bigcap_{n=1}^\infty B_n$ if and only if $\omega$ belongs to an infinite number of the events $A_1, A_2, \dots$.

$\subseteq$: Suppose $\omega \in \bigcap_{n=1}^\infty B_n$. Then, for each $i \in \{1, \dots, \infty \}$, $\omega \in B_i$. Since $\omega \in B_i$, there exists a $k \ge i$ such that $\omega \in A_k$. Thus, for each $i \in \{1, \dots, \infty \}$, we can identify a $k \ge i$ such that $\omega \in A_k$. To construct infinitely many sets: for $i=1$, choose the smallest $k_1 \ge i$ such that $\omega \in A_k$. Call $i_1=1$. Then, find the smallest $i > i_2$ for which we can find a $k_2 > k_1$ such that $\omega \in A_{k_2}$, call it $i_2$. Then, find the smallest $i > i_2$ for which we can find a $k_3 > k_2$ such that $\omega \in A_{k_3}$. We can continue in this fashion to identify infinitely many $A_k$.

$\supseteq$: Suppose $\omega$ belongs to an infinite number of the events $A_1, A_2, \dots$. Let $\{i_k\} $ be such that $\omega \in \bigcap_{k=1}^\infty A_{i_k}$. Fix $n$. Choose $k$ such that $i_k \ge n$. Since $\omega \in A_{i_k}$, $\omega \in \bigcup_{i=n}^\infty A_i = B_n$. Since this holds for all $n \in \{1, \dots, \infty\}$, $\omega \in \bigcap_{n=1}^\infty B_n$.

(c) Show that $\omega \in \bigcup_{n=1}^\infty C_n$ if and only if $\omega$ belongs to all the events $A_1, A_2, \dots$ except possibly a finite number of those events.

$\subseteq$: Suppose $\omega \in \bigcup_{n=1}^\infty C_n$. Then, there is a $k$ such that $\omega \in C_k = \bigcap_{i=k}^\infty A_i$. Thus, $\omega \in A_i$ for all $i \ge k$, i.e., all the events $A_1, A_2, \dots$ except for $\{A_1, \dots, A_{k-1}\}$.

$\supseteq$: Suppose $\omega$ belongs to all the events $A_1, A_2, \dots$ except possibly a finite number of those events. Then, there is a $k$ such that $\omega$ belongs to all the events $A_{k}, A_{k+1}, \dots$. That is, $\omega \in \bigcap_{i=k}^\infty A_i = C_k \subseteq \bigcup_{n=1}^\infty C_n$.

4¶

Let $\{ A_i : i \in I \}$ be a collection of events where $I$ is an arbitrary index set. Show that

$$ \left( \bigcup_{i \in I} A_i \right)^c = \bigcap_{i \in I} A_i^c \text{ and } \left( \bigcap_{i \in I} A_i \right)^c = \bigcup_{i \in I} A_i^c$$

Hint: First prove this for $I = \{1, \dots, n \}$.

Proof: Suppose $I$ is finite. WLOG, $I = \{1, \dots, n \}$, otherwise, renindex the events accordingly.

We have

$$\omega \in \left( \bigcup_{i \in I} A_i \right)^c \iff \omega \notin \bigcup_{i \in I} A_i \iff \omega \notin A_i \, \forall i \in I \iff \omega \in A_i^c \, \forall i \in I \iff \bigcap_{i \in I} A_i^c.$$

Similarly,

$$\omega \in \left( \bigcap_{i \in I} A_i \right)^c \iff \omega \notin \bigcap_{i \in I} A_i \iff \exists \, k \in I \text{ s.t. } \omega \notin A_k \iff \exists \, k \in I \text{ s.t. } \omega \in A_k^c \iff \omega \in \bigcup_{i \in I} A_i^c$$

Now let $I$ be infinite. Let $I_k \subseteq I$ be the first $k$ elements. Since $\bigcap_{i \in I_k} A_i^c$ is a monotonically decreasing sequence in $k$, and $\bigcup_{i \in I_k} A_i^c$ is a monotonically increasing sequence in $k$, the result holds for all finite index sets, we get the result by taking the limit as $k\to\infty$.

5¶

Suppose we toss a fair coin until we get exactly two heads. Describe the sample space $S$. What is the probability that exactly $k$ tosses are required?

$S$ consists of all sequences of the form $T_1T_2 \dots T_pHT_1T_2 \dots T_qH$, where $p,q \in \{0, 1, \dots, \infty\}$.

If $k$ tosses are required, then $k-2$ of the tosses were tails, and $p + q =k - 2$. The probability of a given sequence is $0.5^k$, and there are $k-1$ such sequences. Thus, the probability is $(k - 1) \cdot 0.5^k$.

In [2]:

n = 100000
k_max = 100
flips = rng.integers(low=0, high=1, size=(n, k_max), endpoint=True)
X = np.argmax(np.cumsum(flips, axis=1) >= 2, axis=1) + 1
p_hat = np.bincount(X) / X.size
p = [(k > 0) * (k - 1) * (0.5) ** k for k in range(p_hat.size)]
df = pd.DataFrame(zip(p, p_hat), columns=['Predicted', 'Actual'])
px.bar(df, barmode='group')

6¶

Let $\Omega = \{0, 1, \dots, \}$. Prove that there does not exists a uniform distribution on $\Omega$ (i.e., if $\mathbb{P}(A) = \mathbb{P}(B)$ whenever $|A| = |B|$, then $\mathbb{P}$ cannot satisfy the axioms of probability).

Suppose $|A| = |B| \Rightarrow \mathbb{P}(A) = \mathbb{P}(B)$. Consider the probability of the singleton event $\{0\}$. By Axiom 1, $\mathbb{P}(\{0\}) \ge 0$.

Case 1: $\mathbb{P}(\{0\}) = 0$. Then, by the hypothesis, $\mathbb{P}(\{k\}) = 0$ for all $k \in \{0, 1, \dots, \}$. By Axiom 3, $\mathbb{P}(\Omega) = 0$, violating Axiom 2.

Case 2: $\mathbb{P}(\{0\}) = \epsilon > 0$. Then, by the hypothesis, $\mathbb{P}(\{k\}) = \epsilon$ for all $k \in \{0, 1, \dots, \}$. By Axiom 3, $\mathbb{P}(\Omega) = \infty$, violating Axiom 2.

7¶

Let $A_1, A_2, \dots$ be events. Show that

$$\mathbb{P} \left( \bigcup_{n=1}^\infty A_n \right) \le \sum_{n=1}^\infty \mathbb{P}(A_n)$$

Define $B_n = A_n - \bigcup_{i=1}^{n-1} A_i$. The proof that the $B_n$ are disjoint and that $\bigcup_{n=1}^\infty A_n = \bigcup_{n=1}^\infty B_n$ follows the same lines as the proof of Theorem 1.8 (Problem #1).

Thus, since $B_n \subseteq A_n$, $\mathbb{P}(B_n) \le \mathbb{P}(A_n)$

$$\mathbb{P} \left( \bigcup_{n=1}^\infty A_n \right) = \mathbb{P} \left( \bigcup_{n=1}^\infty B_n \right) = \sum_{n=1}^\infty \mathbb{P}(B_n) \le \sum_{n=1}^\infty \mathbb{P}(A_n).$$

8¶

Suppose that $P(A_i) = 1$ for each $i$. Prove that

$$ P \left( \bigcap_{i=1}^\infty A_i \right) = 1.$$

We have

$$P \left( \bigcap_{i=1}^\infty A_i \right) = 1 - P \left( \left( \bigcap_{i=1}^\infty A_i \right) ^ c \right)$$

and, from Problem 4,

$$ \left( \bigcap_{i=1}^\infty A_i \right) ^ c = \bigcup_{i=1}^\infty A_i^c.$$

From Problem 7,

$$ P(\bigcup_{i=1}^\infty A_i^c) \le \sum_{i=1}^\infty P(A_i^c) = \sum_{i=1}^\infty 1 - P(A_i) = \sum_{i=1}^\infty 0 = 0,$$

and, thus, since probabilities are nonnegative, $ P(\bigcup_{i=1}^\infty A_i^c) = 0$. Therefore, $$ P \left( \bigcap_{i=1}^\infty A_i \right) = 1.$$

9¶

For fixed $B$ such that $P(B) > 0$, show that $P(\cdot \mid B)$ satisfies the axioms of probability.

Axiom 1: Let $A$ be an event. Then, $P(AB) \ge 0$, and we have $P(A \mid B) = P(AB) / P(B) \ge 0 $.

Axiom 2: Since $B \subseteq \Omega$, $P(B \cap \Omega) = P(B)$, and we have $P(\Omega \mid B) = P(B \cap \Omega) / P(B) = P(B) / P(B) = 1$.

Axiom 3: Let $\{A_i\}$ be a disjoint sequence of sets.

$$P \left( \bigcup_{i=1}^\infty A_i \mid B \right) = \frac{P(B \cap \bigcup_{i=1}^\infty A_i)}{P(B)} = \frac{P(\bigcup_{i=1}^\infty A_i \cap B)}{P(B)} = \frac{\sum_{i=1}^\infty P(A_i \cap B)}{P(B)} = \sum_{i=1}^\infty P (A_i \mid B)$$

10¶

You have probably heard it before. Now you can solve it rigorously. It is called the "Monty Hall Problem." A prize is placed at random behind one of three doors. You pick a door. To be concrete, let's suppose you always pick door 1. Now Monty Hall chooses one of the other two doors, opens it and shows you that it is empty. He then gives you the opportunity to keep your door or switch to the other unopened door. Should you stay or switch? Intuition suggests it doesn't matter. The correct answer is that you should switch. Prove it. It will help to specify the sample space and the relevant events carefully. Thus write $\Omega = \{(\omega_1, \omega_2) : \omega_i \in \{ 1, 2, 3 \} \}$, where $\omega_1$ is where the prize is and $\omega_2$ is the door monty opens.

Solution:

Enumerating the possible outcomes (conditioned on the fact that we choose the first door), their probabilities, and letting $R$ be the event we win a reward from switching:

Outcome	$\omega_1$	$\omega_2$	Probability	$R$
1	1	2	$P(\omega_2 = 2 \mid \omega_1 = 1) P(\omega_1 = 1) = \frac12 \cdot \frac13 = \frac16$	0
2	1	3	$P(\omega_2 = 3 \mid \omega_1 = 1) P(\omega_1 = 1) = \frac12 \cdot \frac13 = \frac16$	0
3	2	3	$P(\omega_2 = 3 \mid \omega_1 = 2) P(\omega_1 = 2) = 1 \cdot \frac13 = \frac13$	1
4	3	2	$P(\omega_2 = 2 \mid \omega_1 = 3) P(\omega_1 = 3) = 1 \cdot \frac13 = \frac13$	1

Thus $P(R) = \frac23$.

11¶

Suppose that $A$ and $B$ are independent events. Show that $A^c$ and $B^c$ are independent events.

\begin{align*} P(A^c B^c) &= P( (A \cup B) ^ c) \tag{De Morgan}\\ &= 1 - P(A \cup B) \\ &= 1 - [P(A) + P(B) - P(AB)] \tag{Lemma 1.6}\\ &= 1 - [P(A) + P(B) - P(A)P(B)] \tag{Hypothesis}\\ &= 1 - [1 - P(A^c) + 1 - P(B^c) - (1 - P(A^c))(1 - P(B^c))]\\ &= P(A^c)P(B^c) \end{align*}

12¶

There are three cards. The first is green on both sides, the second is red on both sides and the third is green on one side and red on the other. We choose a card at random and we see one side (also chosen at random). If the side we see is green, what is the probability that the other side is also green? Many people intuitively answer 1/2. Show that the correct answer is 2/3.

Let $C_1, C_2$ and $C_3$ be the events that we choose the first, second, and third card, respectively. Let $A$ be the event that the side we see is green. Let $B$ be the event that the second side is green.

We have $P(A \mid C_1) = 1$, $P(A \mid C_2) = \frac12$, and $P(A \mid C_3) = 0$, and $P(C_i)=\frac13$. We have $P(A) = P(A \mid C_1) P(C_1) + P(A \mid C_1) P(C_1) + P(A \mid C_1) P(C_1) = 1 \cdot \frac13 + \frac12 \cdot \frac13 + 0 \cdot \frac13 = \frac12$

From Bayes' Theorem,

$$P(C_1 \mid A) = \frac{P(A \mid C_1) P(C_1)}{P(A)} = \frac23$$$$P(C_2 \mid A) = \frac{P(A \mid C_2) P(C_2)}{P(A)} = \frac13$$$$P(C_3 \mid A) = \frac{P(A \mid C_2) P(C_2)}{P(A)} = 0$$

Now, $P(B \mid A) = \sum_{i=1}^3 P(B \cap C_i \mid A) = \sum_{i=1}^3 P(B \mid C_i \cap A)P(C_i \mid A) = 1 \cdot \frac23 + 0 \cdot \frac13 + 0 \cdot 0 = \frac23$

13¶

Suppose that a fair coin is tossed repeatedly until both a head and tail have appeared at least once.

(a) Describe the sample space $\Omega$.

There are infinitely many outcomes. Each outcome is a finite sequence of the form $H \dots HT$ or $T \dots TH$.

(b) What is the probability that three tosses will be required?

Let $A$ be the event that at least three tosses are required. We have

$$\mathbb{P}(A) = 1 - \mathbb{P}(A^c) = 1 - \mathbb{P}(HH \cup TT) = 1 - \mathbb{P}(HH) - \mathbb{P}(TT) = 1 - \frac14 - \frac14 = \frac12.$$

Let $B$ be the event that at least four tosses are required. Note $B \subset A$. We similarly have:

$$\mathbb{P}(B) = 1 - \mathbb{P}(B^c) = 1 - \mathbb{P}(HHH \cup TTT) = 1 - \mathbb{P}(HHH) - \mathbb{P}(TTT) = 1 - \frac18 - \frac18 = \frac14.$$

Let $C$ be the event that exactly 3 are required. We have $C = A - B$, and $\mathbb{P}(C) = \frac{1}{4}$.

14¶

Show that if $\mathbb{P}(A) = 0$ or $\mathbb{P}(A) = 1$ then $A$ is independent of every other event. Show that if $A$ is independent of itself then $\mathbb{P}(A)$ is either 0 or 1.

Suppose $\mathbb{P}(A) = 0$. Let $B$ be another event. Since $A \cap B \subseteq A$, $\mathbb{P}(A \cap B) \le \mathbb{P}(A) = 0$. Meanwhile $\mathbb{P}(A \cap B) \ge 0$ by Axiom 1, so $\mathbb{P}(A \cap B) = 0$. Meanwhile $\mathbb{P}(A) \mathbb{P}(B) = 0 \cdot \mathbb{P}(B) = 0$. Thus, $\mathbb{P}(A \cap B) = \mathbb{P}(A)\mathbb{P}(B)$.

Now suppose $\mathbb{P}(A) = 1$. Then, $\mathbb{P}(A)\mathbb{P}(B) = \mathbb{P}(B)$. We have $\mathbb{P}(A^c) = 1 - \mathbb{P}(A) = 0$, and, because $B \cap A^c \subseteq A^c$, $\mathbb{P}(B \cap A^c) = 0$. Therefore, $\mathbb{P}(B) = \mathbb{P}(B \cap A) + \mathbb{P}(B \cap A^c) = \mathbb{P}(B \cap A)$. Thus, $\mathbb{P}(A \cap B) = \mathbb{P}(A)\mathbb{P}(B)$.

Suppose $A$ is independent of itself. By definition, $\mathbb{P}(A) = \mathbb{P}(A)^2$, which holds only if $\mathbb{P}(A) \in \{0,1\}$.

15¶

The probability that a child has blue eyes is 1 / 4. Assume independence between children. Consider a family with 3 children.

(a) If it is known that at least one child has blue eyes, what is the probability that at least two children have blue eyes?

Note that there are $2^3 = 8$ outcomes for child eye colors (let 1 denote blue and 0 denote not blue):

#	Outcome	Probability
1	000	27 / 64
2	100	9 / 64
3	010	9 / 64
4	001	9 / 64
5	110	3 / 64
6	101	3 / 64
7	011	3 / 64
8	111	1 / 64

The first is ruled out, and in the last four, at least two have blue eyes. The probability is the sum of the last four probabilities divided by the last seven probabilities, $10 / 37 \approx 27.3\%$.

(b) If it is known that the youngest child has blue eyes, what is the probability that at least two children have blue eyes?

Letting the youngest correspond to the rightmost number in the outcome, we see there are four scenarios (4, 6, 7, and 8). The probability is the probability of 6,7, or 8 divided by the probability of 4, 6, 7, or 8, $7 / 16 = 43.75 \%$.

16¶

Prove Lemma 1.14

1.14 Lemma. If $A$ and $B$ are independent events then $P(A \mid B) = P(A)$. Also, for any pair of events $A$ and $B$,

$$P(AB) = P(A \mid B)P(B) = P(B \mid A)P(A)$$

Solution:

For first statement: \begin{align*} P(A \mid B) &= \frac{P(AB)}{P(B)} \tag{Definition of Conditional Probability} \\ &= \frac{P(A)P(B)}{P(B)} \tag{independence of $A$ and $B$} \\ &= P(A) \end{align*}

The second statement is immediate from the definition of conditional probability.

17¶

Show that

$$P(ABC) = P(A \mid BC) P(B \mid C) P(C)$$

This is called the chain rule (probability).

Since

$$P(A \mid BC) = \frac{P(ABC)}{P(BC)} \Rightarrow P(ABC) = P(A \mid BC) P(BC)$$

and

$$P(B \mid C) = \frac{P(BC)}{P(C)} \Rightarrow P(BC) = P(B \mid C)P(C),$$

the result follows.

18¶

Suppose $k$ events form a partition of the sample space $\Omega$, i.e., they are disjoint and $\bigcup_{i=1}^k A_i = \Omega$. Assume that $\mathbb{P}(B) > 0$. Prove that if $\mathbb{P}(A_1 \mid B) < \mathbb{P}(A_1)$ then $\mathbb{P}(A_i \mid B) > \mathbb{P}(A_i)$ for some $i = 2, \dots, k$.

Suppose, for a contradiction, $P(A_i \mid B) \le P(A_i)$ for all $i$.

From Bayes' Theorem:

$$P(B \mid A_i) = \frac{P(A_i \mid B)P(B)}{P(A_i)},$$

therefore,

\begin{align*} P(B) & = \sum_{i} P(B \mid A_i) P(A_i) \tag{Law of total probability}\\ &= \sum_{i} P(A_i \mid B) P(B) \tag{Bayes Theorem}\\ & \Rightarrow \sum_{i} P(A_i \mid B) = 1. \end{align*}

However,

\begin{align*} \sum_i P(A_i \mid B) &= P(A_1 \mid B) + \sum_{i > 1} P(A_i \mid B) \\ &< P(A_i) + \sum_{i > 1} P(A_i \mid B) \tag{Hypothesis}\\ &\le 1, \tag{Contradictive Assumption} \end{align*}

a contradiction.

19¶

Suppose that 30 percent of computer owners use a Macintosh, 50 percent use Windows, and 20 percent use Linux. Suppose that 65 percent of the Mac users have succumbed to a computer virus, 82 percent of the Windows users get the virus, and 50 percent of the Linux users get the virus. We select a person at random and learn that her system was infected with the virus. What is the probability that she is a Windows user?

Let $M$, $W$, and $L$ denote the event that a given person is a Macintosh, Windows, or Linux user, respectively. Let $V$ denote the event of the person's computer having a virus. We seek $\mathbb{P}(W \mid V)$. From Bayes' Theorem:

$$\mathbb{P}(W \mid V) = \frac{\mathbb{P}(V \mid W)\mathbb{P}(W)}{\mathbb{P}(V \mid M)\mathbb{P}(M) + \mathbb{P}(V \mid W)\mathbb{P}(W) + \mathbb{P}(V \mid L)\mathbb{P}(L)}$$

Thus

$$\mathbb{P}(W \mid V) = \frac{0.82 \cdot 0.5}{0.65 \cdot 0.3 + 0.82 \cdot 0.5 + 0.5 \cdot 0.2} = \frac{82}{141} \approx 58.16\%$$

20¶

A box contains $5$ coins and each has a different probability of showing heads. Let $p_1, \dots, p_5$ denote the probability of heads on each coin. Suppose that

$$p_1 = 0, p_2 = 1 / 4, p_3 = 1 / 2, p_4 = 3 / 4, \text{ and } p_5 = 1.$$

Let $H$ denote "heads is obtained" and let $C_i$ denote the event that coin $i$ is selected.

(a) Select a coin at random and toss it. Suppose a head is obtained. What is the posterior probability that coin $i$ was selected ($i = 1, \dots, 5$)? In other words, find $\mathbb{P}(C_i \mid H)$ for $i = 1, \dots, 5$.

From Bayes theorem,

$$\mathbb{P}(C_i \mid H) = \frac{\mathbb{P}(H \mid C_i)P(C_i)}{\mathbb{P}(H)} = \frac{\mathbb{P}(H \mid C_i)}{5\mathbb{P}(H)}$$

and

$$\mathbb{P}(H) = \sum_{i=1}^5\mathbb{P}(H \mid C_i)P(C_i) = \frac15 \sum_{i=1}^5\mathbb{P}(H \mid C_i) = \frac15\sum_{i=1}^5 p_i = \frac12$$

So $$\mathbb{P}(C_i \mid H) = \begin{cases} 0 & i = 1 \\ \frac{1}{10} & i = 2 \\ \frac{2}{10} & i = 3 \\ \frac{3}{10} & i = 4 \\ \frac{4}{10} & i = 5 \\ \end{cases}$$

(b) Toss the coin again. What is the probability of another head? In other words find $\mathbb{P}(H_2 \mid H_1)$, where $H_j = $ "heads on toss $j$".

We have:

$$\mathbb{P}(H_2 \mid H_1) = \frac{\mathbb{P}(H_2 \cap H_1)}{\mathbb{P}(H_1)} = \frac{\sum_{i=1}^5\mathbb{P}(H_2 \cap H_1 \mid C_i)\mathbb{P}(C_i)}{\sum_{i=1}^5\mathbb{P}(H_1 \mid C_i)\mathbb{P}(C_i)} = \frac{\sum_{i=1}^5p_i^2 \frac15}{\sum_{i=1}^5 p_i \frac15} = \frac34$$

Now suppose that the experiment was carried out as follows: We select a coin at random and toss it until a head is obtained.

From Bayes theorem,

$$\mathbb{P}(C_i \mid B_4) = \frac{\mathbb{P}(B_4 \mid C_i)P(C_i)}{\mathbb{P}(B_4)} = \frac{\mathbb{P}(B_4 \mid C_i)}{5\mathbb{P}(B_4)}$$

We have

$$\mathbb{P}(B_4) = \sum_{i=1}^5 \mathbb{P}(B_4 \mid C_i) \mathbb{P}(C_i) = \frac15 \sum_{i=1}^5 (1-p_i)^3p_i = 23 / 640$$

So $$\mathbb{P}(C_i \mid B_4) = \begin{cases} 0 & i = 1 \\ \frac{27}{46} & i = 2 \\ \frac{16}{46} & i = 3 \\ \frac{3}{46} & i = 4 \\ 0 & i = 5 \\ \end{cases}$$

In [3]:

n = 1000000
probs = rng.choice([0, .25, 0.5, 0.75, 1], size=n)
successes = np.argwhere(rng.random(size=n) <= probs).flatten()
np.sum(rng.random(size=successes.size) <= probs[successes]) / successes.size

Out[3]:

0.7498635215635929

21¶

(Computer Experiment.) Suppose a coin has probability $p$ of falling heads up. If we flip the coin many times, we would expect the proportion of heads to be near $p$. We will make this formal later. Take $p = .3$ and $n=1000$ and simulate $n$ coin flips. Plot the proportion of heads as a function of $n$. Repeat for $p = .03$.

In [4]:

p = 0.3
n = 1000
def plot_LLN(p, n):
    head_counts = np.cumsum(rng.random(n) < p)
    totals = np.arange(1, n + 1)
    proportions = head_counts / totals
    fig.add_trace(go.Scatter(x=totals, y=proportions, name=f'p={p}'))
    fig.update_layout(
        title=f"Law of Large Numbers Example",
        xaxis_title="Number of Flips",
        yaxis_title="Proportion of Heads",
        font=dict(
            family="Courier New, monospace",
            size=18,
            color="RebeccaPurple"
        )
    )
fig = go.Figure()
plot_LLN(p=0.3, n=1000)
plot_LLN(p=0.03, n=1000)
fig.show()

22¶

(Computer Experiment.) Suppose we flip a coin $n$ times and let $p$ denote the probability of heads. Let $X$ be the number of heads. We call $X$ a binomial random variable, which is discussed in the next chapter. Intuition suggests that $X$ will be close to $np$. To see if this is true, we can repeat this experiment many times and average the $X$ values. Carry out a simulation and compare the average of the $X$'s to $np$. Try this for $p=.3$ and $n=10$, $n=100$, and $n=1000$.

In [5]:

p = 0.3
n_simulations = 100000
for n in [10, 100, 1000]:
    Xs = np.sum(rng.random((n_simulations, n)) < p, axis=1)
    print(f'Avg(X): {np.mean(Xs):>6.2f} np: {n * p:>5.0f}')

Avg(X):   3.00 np:     3
Avg(X):  30.01 np:    30
Avg(X): 300.01 np:   300

23¶

(Computer Experiment.) Here we will get some experience simulating conditional probabilities. Consider tossing a fair die. Let $A = \{ 2, 4, 6 \}$ and $B = \{ 1, 2, 3, 4 \}$. Then, $\mathbb{P}(A) = 1 / 2$, $\mathbb{P}(B) = 2 / 3$ and $\mathbb{P}(AB)= 1 / 3$. Since $\mathbb{P}(AB) = \mathbb{P}(A)\mathbb{P}(B)$, the events $A$ and $B$ are independent. Simulate draws from the sample space and verify that $\hat{\mathbb{P}}(AB) = \hat{\mathbb{P}}(A)\hat{\mathbb{P}}(B)$, where $\hat{\mathbb{P}}(A)$ is the proportion of times $A$ occured in the simulation and similarly for $\hat{\mathbb{P}}(AB)$ and $\hat{\mathbb{P}}(B)$. Now find two events $A$ and $B$ that are not independent. Compute $\hat{\mathbb{P}}(A)$, $\hat{\mathbb{P}}(B)$, and $\hat{\mathbb{P}}(AB)$. Compare the calculated values to their theoretical values. Report your results and interpret.

In [6]:

def check_independence(A, B):
    n_rolls = 10000000
    AB = [i for i in A if i in B]
    rolls = rng.integers(low=1, high=6, size=n_rolls, endpoint=True)
    prob_a = np.sum(np.isin(rolls, A)) / n_rolls
    prob_b = np.sum(np.isin(rolls, B)) / n_rolls
    prob_ab = np.sum(np.isin(rolls, AB)) / n_rolls
    for name, prob in zip(['P(A)', 'P(B)', 'P(AB)'], [prob_a, prob_b, prob_ab]):
        print(f"{name} : {prob:.5f}")
    error = np.abs(prob_a * prob_b - prob_ab)
    print(f"P(A)P(B) - P(AB) : {error:.5f}")
    print("Independent") if error < 1e-3 else print("Not independent")
    
A = [2, 4, 6]
B = [1, 2, 3, 4]
check_independence(A, B)

P(A) : 0.50002
P(B) : 0.66636
P(AB) : 0.33322
P(A)P(B) - P(AB) : 0.00003
Independent