# Mathematics

CHAPTER 2. DESCRIPTIVE AND GRAPHICAL STATISTICS 25

0 2 4 6 8

− 1.

0 −

0. 5

0. 0

0. 5

1. 0

1. 5

xs

ys

Some sample scatterplots of variables with different population correlations are shown below.

Go to TOC

CHAPTER 2. DESCRIPTIVE AND GRAPHICAL STATISTICS 26

−1 0 1 2

− 4

− 2

0 1

2 3

cor(x,y)=0

−2 −1 0 1 2

− 3

− 2

− 1

0 1

2

cor(x,y)=0.3

−3 −1 0 1 2 3 4

− 3

− 1

1 2

cor(x,y)=−0.5

−2 −1 0 1 2

− 2

− 1

0 1

2 cor(x,y)=0.9

2.3.4 Exercises

1. With the Air Pollution Filter Noise data, construct side by side boxplots of the variable NOISE for the different levels of the factor SIZE. Comment. Do the same for NOISE and TYPE.

2. With the Payroll data, construct side by side boxplots of ”employees” versus ”industry” and ”pay- roll” versus ”industry”. Are these boxplots as informative as the color coded scatterplot in Section 2.3.2?

3. If you are using Rstudio click on the ”Packages” tab, then the checkbox next to the library MASS. Click on the word MASS and then the data set ”mammals” and read about it. If you are using R alone, in the Console window at the prompt > type

> data(mammals,package=”MASS”).

View the data with

Go to TOC

CHAPTER 2. DESCRIPTIVE AND GRAPHICAL STATISTICS 27

> mammals

Make a scatterplot with the following commands and comment on the result.

> attach(mammals)

> plot(body,brain)

Also make a scatterplot of the log transformed body and brain weights.

> plot(log(body),log(brain))

A recently discovered hominid species homo floresiensis had an estimated average body weight of 25 kg. Based on the scatterplots, what would you guess its brain weight to be?

4. Let x and y be jointly distributed numeric variables and let z = a + by, where a and b are constants. Show that cov(x, z) = b ∗ cov(x, y). Show that if b > 0, cor(x, z) = cor(x, y). What happens if b < 0?

Go to TOC

Chapter 3

Probability

3.1 Basic Definitions. Equally Likely Outcomes

Let a random experiment with sample space Ω be given. Recall from Chapter 1 that Ω is the set of all possible outcomes of the experiment. An event is a subset of Ω. A probability measure is a function which assigns numbers between 0 and 1 to events. If the sample space Ω, the collection of events, and the probability measure are all specified, they constitute a probability model of the random experiment.

The simplest probability models have a finite sample space Ω. The collection of events is the col- lection of all subsets of Ω and the probability of an event is simply the proportion of all possible outcomes that correspond to that event. In such models, we say that the experiment has equally likely outcomes. If the sample space has N elements, then each elementary event {ω} consisting of a single outcome has probability 1N . If E is a subset of Ω, then

Pr(E) = #(E)

N .

Here we introduce some notation that will be used throughout this text. The probability measure for a random experiment is most often denoted by the abbreviation Pr, sometimes with subscripts. Events will be denoted by upper case Latin letters near the beginning of the alphabet. The expression #(E) denotes the number of elements of the subset E.

Example 3.1. The Payroll data consists of 50 observations of 3 variables, ”payroll”, ”employees” and ”industry”. Suppose that a random experiment is to choose one record from the Payroll data and suppose that the experiment has equally likely outcomes. Then, as the summary below shows, the probability that industry A is selected is

Pr(industry = A) = 27

50 = 0.54.

> Payroll=read.table(“Payroll.txt”,header=T)

> summary(Payroll)

28

Go to TOC

CHAPTER 3. PROBABILITY 29

payroll employees industry

Min. :129.1 Min. : 26.00 A:27

1st Qu.:167.8 1st Qu.: 71.25 B:23

Median :216.1 Median :108.50

Mean :228.2 Mean :106.42

3rd Qu.:287.8 3rd Qu.:143.25

Max. :354.8 Max. :172.00

In this example we use another common and convenient notational convention. The event whose probability we want is described in quasi-natural language as ”industry=A” rather than with the the formal but too cumbersome {ω ∈ Payroll|industry(ω) = A}. The description ”industry=A” refers to the set of all possible outcomes of the experiment for which the variable ”industry” has the value ”A”. This sort of informal description of an event will be used again and again.

The assumption of equally likely outcomes is an assumption about the selection procedure for ob- taining one record from the data. It is conceivable that a selection method is employed for which this assumption is not valid. If so, we should be able to discover that it is invalid by replicating the experiment sufficiently many times. This is a basic principle of classical statistical inference. It relies on a famous result of mathematical probability theory called the law of large numbers. One version of it is loosely stated as follows:

Law of Large Numbers: Let E be an event associated with a random experiment and let Pr be the probability measure of a true probability model of the experiment. Suppose the experiment is repli- cated n times and let P̂ r(E) = 1n × # replications in which E occurs. Then P̂ r(E) → Pr(E) as n→∞.

P̂ r(E) is called the empirical probability of E.

3.2 Combinations of Events

Events are related to other events by familiar set operations. Let E1, E2, . . . be a finite or infinite sequence of events. The union of E1 and E2 is the event

E1 ∪ E2 = {ω ∈ Ω|ω ∈ E1 or ω ∈ E2}.

More generally, ⋃ i

Ei = E1 ∪ E2 ∪ . . . = {ω ∈ Ω|ω ∈ Ei for some i }.

The intersection of E1 and E2 is the event

E1 ∩ E2 = {ω ∈ Ω|ω ∈ E1 and ω ∈ E2},

and, in general, ⋂ i

Ei = E1 ∩ E2 ∩ . . . = {ω ∈ Ω|ω ∈ Ei for all i}.

Go to TOC

CHAPTER 3. PROBABILITY 30

Sometimes we omit the intersection symbol ∩ and simply conjoin the symbols for the events in an intersection. In other words,

E1E2 . . . En = E1 ∩ E2 ∩ . . . ∩ En.

The complement of the event E is the event

∼E = {ω ∈ Ω|ω /∈ E}.

∼E occurs if and only if E does not occur. The event E∼1 E2 occurs if and only if E1 occurs and E2 does not occur.

Finally, the entire sample space Ω is an event with complement φ, the empty event. The empty event never occurs. We need the empty event because it is possible to formulate a perfectly sensible description of an event which happens never to be satisfied. For example, if Ω = Payroll the event ”employees < 25” is never satisfied, so it is the empty event.

We also have the subset relation between events. E1 ⊆ E2 means that if E1 occurs, then E2 oc- curs, or in more familiar language, E1 is a subset of E2. For any event E, it is true that φ ⊆ E ⊆ Ω. E2 ⊇ E1 means the same as E1 ⊆ E2.

3.2.1 Exercises

1. A random experiment consists of throwing a pair of dice, say a red die and a green die, simultane- ously. They are standard 6-sided dice with one to six dots on different faces. Describe the sample space.

2. For the same experiment, let E be the event that the sum of the numbers of spots on the two dice is an odd number. Write E as a subset of the sample space, i.e., list the outcomes in E.

3. List the outcomes in the event F = ”the sum of the spots is a multiple of 3”.

4. Find ∼F , E ∪ F , EF = E ∩ F , and E∼F .

5. Assume that the outcomes of this experiment are equally likely. Find the probability of each of the events in # 4.

6. Show that for any events E1 and E2, if E1 ⊆ E2 then ∼E2 ⊆∼ E1.

7. Load the ”mammals” data set into your R workspace. In Rstudio you can click on the ”Pack- ages” tab and then on the checkbox next to MASS. Without Rstudio, type

> data(mammals,package=”MASS”)

Attach the mammals data frame to your R search path with

> attach(mammals)

Go to TOC

CHAPTER 3. PROBABILITY 31

A random experiment is to choose one of the species listed in this data set. All outcomes are equally likely. You can obtain a list of the species in the event ”body > 200” with the command

> subset(mammals,body>200)

What is the probability of this event, i.e., what is the probability that you randomly select a species with a body weight greater than 200 kg?

8. What are the species in the event that the ratio of brain weight to body weight is greater than 0.02? Remember that brain weight is recorded in grams and body weight in kilograms, so body weight must be multiplied by 1000 to make the two weights comparable. What is the probability of that event?

3.3 Rules for Probability Measures

The assumption of equally likely outcomes is the starting point for the construction of many proba- bility models. There are many random experiments for which this assumption is wrong. No matter what other considerations are involved in choosing a probability measure for a model of a a random experiment, there are certain rules that it must satisfy. They are:

1. 0 ≤ Pr(E) ≤ 1 for each event E.

2. Pr(Ω) = 1.

3. If E1, E2, . . . is a finite or infinite sequence of events such that EiEj = φ for i 6= j, then Pr( ⋃ iEi) =∑

i Pr(Ei). If EiEj = φ for all i 6= j we say that the events E1, E2, . . . are pairwise disjoint.

These are the basic rules. There are other properties that may be derived from them as theorems.

4. Pr(E∼F ) = Pr(E)− Pr(EF ) for all events E and F . In particular, Pr(∼E) = 1− Pr(E)

5. Pr(φ) = 0.

6. Pr(E ∪ F ) = Pr(E) + Pr(F )− Pr(EF ) for all events E and F .

7. If E ⊆ F , then Pr(E) ≤ Pr(F ).

8. If E1 ⊆ E2 ⊆ . . . is an infinite sequence, then Pr( ⋃ iEi) = limi→∞ Pr(Ei).

9. If E1 ⊇ E2 ⊇ . . . is an infinite sequence, then Pr( ⋂ iEi) = limi→∞ Pr(Ei).

Go to TOC

CHAPTER 3. PROBABILITY 32

3.4 Counting Outcomes. Sampling with and without Replace- ment

Suppose a random experiment with sample space Ω is replicated n times. The result is a sequence (ω1, ω2, . . . , ωn), where ωi ∈ Ω is the outcome of the ith replication. This sequence is the outcome of a so-called compound experiment – the sequential replications of the basic experiment. The sample space of this compound experiment is the n-fold cartesian product Ωn = Ω × Ω × · · · × Ω. Now suppose that the basic experiment is to choose one member of a finite population with N elements. We may identify the sample space Ω with the population. Consider an outcome (ω1, ω2, . . . , ωn) of the replicated experiment. There are N possibilities for ω1 and for each of those there are N possi- bilities for ω2 and for each pair ω1, ω2 there are N possibilities for ω3, and so on. In all, there are N × N × · · · × N = Nn possibilities for the entire sequence (ω1, ω2, · · · , ωn). If all outcomes of the compound experiment are equally likely, then each has probability 1Nn . Moreover, it can be shown that the compound experiment has equally likely outcomes if and only if the basic experiment has equally likely outcomes, each with probability 1N .

Definition: An ordered random sample of size n with replacement from a population of size N is a randomly chosen sequence of length n of elements of the population, where repetitions are possible and each outcome (ω1, ω2, · · · , ωn) has probability 1Nn .

Now suppose that we sample one element ω1 from the population, with all N outcomes equally likely. Next, we sample one element ω2 from the population excluding the one already chosen. That is, we randomly select one element from Ω ∼ {ω1} with all the remaining N − 1 elements being equally likely. Next, we randomly select one element ω3 from the the N − 2 elements of Ω ∼ {ω1, ω2}, and so on until at last we select ωn from the remaining N − (n− 1) elements of the population. The result is a nonrepeating sequence (ω1, ω2, · · · , ωn) of length n from the population. A nonrepeating sequence of length n is also called a permutation of length n from the N objects of the population. The total

number of such permutations is N × (N − 1)× · · · × (N − n+ 1) = N !(N−n)! . Obviously, we must have n ≤ N for this to make sense. The number of permutations of length N from a set of N objects is N !. It can be shown that, with the sampling scheme described above, all permutations of length n

are equally likely to result. Each has probability (N−n)!N ! of occurring.

Definition: An ordered random sample of size n without replacement from a population of size N is a randomly chosen nonrepeating sequence of length n from the population where each outcome

(ω1, ω2, · · · , ωn) has probability (N−n)!N ! .

Most of the time when sampling without replacement from a finite population, we do not care about the order of appearance of the elements of the sample. Two nonrepeating sequences with the same elements in different order will be regarded as equivalent. In other words, we are concerned only with the resulting subset of the population. Let us count the number of subsets of size n from a set of N objects. Temporarily, let C denote that number. Each subset of size n can be ordered in n! different ways to give a nonrepeating sequence. Thus, the number of nonrepeating sequences of length n is C times n!. So, N !(N−n)! = C × n! i.e., C =

N ! n!(N−n)! =

( N n

) . This is the same binomial coefficient

( N n

) that appears in the binomial theorem: (a+ b)N =

∑N n=0

( N n

) anbN−n.

Go to TOC

CHAPTER 3. PROBABILITY 33

Definition: A simple random sample of size n from a population of size N is a randomly chosen subset

of size n from the population, where each subset has the same probability of being chosen, namely 1 (Nn)

.

A simple random sample may be obtained by choosing objects from the population sequentially, in the manner described above, and then ignoring the order of their selection.

Example: The Birthday Problem

There are N = 365 days in a year. (Ignore leap years.) Suppose n = 23 people are chosen ran- domly and their birthdays recorded. What is the probability that at least two of them have the same birthday?

Solution: Arbitrarily numbering the people involved from 1 to n, their birthdays form an ordered sam- ple, with replacement, from the set of 365 birthdays. Therefore, each sequence has probability 1Nn of occurring. No two people have the same birthday if and only if the sequence is actually nonrepeating. The number of nonrepeating sequences of birthdays is N(N − 1) · · · (N −n+ 1). Therefore, the event ”No two people have the same birthday” has probability

N(N − 1) · · · (N − n+ 1) Nn

= N(N − 1) · · · (N − n+ 1)

N ×N × · · · ×N

= (1− 1 N

)(1− 2 N

) · · · (1− n− 1 N

)

With n = 23 and N = 365 we can find this in R as follows:

> prod(1-(1:22)/365)

[1] 0.4927028

So, there is about a 49% probability that no two people in a random selection of 23 have the same birthday. In other words, the probability that at least two share a birthday is about 51%.

An important, intuitively obvious principle in statistics is that if the sample size n is very small in comparison to the population size N , a sample taken without replacement may be regarded as one taken with replacement, if it is mathematically convenient to do so. A sample of size 100 taken with replacement from a population of 100,000 has very little chance of repeating itself. The probability of a repetition is about 5%.

3.4.1 Exercises

1. A red 6-sided die and a green 6-sided die are thrown simultaneously. The outcomes of this exper- iment are equally likely. What is the probability that at least one of the dice lands with a 6 on its upper face?

2. A hand of 5-card draw poker is a simple random sample from the standard deck of 52 cards. What is the probability that a 5-card draw hand contains the ace of hearts?

Go to TOC

CHAPTER 3. PROBABILITY 34

3. How many 5 draw poker hands are there? In 5-card stud poker, the cards are dealt sequentially and the order of appearance is important. How many 5 stud poker hands are there?

4. Everybody in Ourtown is a fool or a knave or possibly both. 70% of the citizens are fools and 85% are knaves. One citizen is randomly selected to be mayor. What is the probability that the mayor is both a fool and a knave?

5. A Martian year has 669 days. An R program for calculating the probability of no repetitions in a sample with replacement of n birthdays from a year of N days is given below.

> birthdays=function(n,N) prod(1-1:(n-1)/N)

To invoke this function with, for example, n=12 and N=400 simply type

> birthdays(12,400)

Check that the program gives the right answer for N=365 and n=23. Then use it to find the number n of Martians that must be sampled in order for the probability of a repetition to be at least 0.5.

6. A standard deck of 52 cards has four queens. Two cards are randomly drawn in succession, without replacement, from a standard deck. What is the probability that the first card is a queen? What is the probability that the second card is a queen? If three cards are drawn, what is the probability that the third is a queen? Make a general conjecture. Prove it if you can. (Hint: Does the probability change if ”queen” is replaced by ”king” or ”seven”?)

3.5 Conditional Probability

Definition: Let A and B be events with Pr(B) > 0. The conditional probability of A, given B is:

Pr(A|B) = Pr(AB) Pr(B)

. (3.1)

Pr(A) itself is called the unconditional probability of A.

Example 3.2. R includes a tabulation by various factors of the 2201 passengers and crew on the Titanic. Read about it by typing

> help(Titanic)

We are going to look at these factors two at a time, starting with the steerage class of the passengers and whether they survived or not.

> apply(Titanic,c(1,4),sum)

Survived

Class No Yes

Go to TOC

CHAPTER 3. PROBABILITY 35

1st 122 203

2nd 167 118

3rd 528 178

Crew 673 212

Suppose that a passenger or crew member is selected randomly. The unconditional probability that that person survived is 7112201 = 0.323.

> apply(Titanic,4,sum)

No Yes

1490 711

> apply(Titanic,1,sum)

1st 2nd 3rd Crew

325 285 706 885

Let us calculate the conditional probability of survival, given that the person selected was in a first class cabin. If A = ”survived” and B = ”first class”, then

Pr(AB) = 203

2201 = 0.0922

and

Pr(B) = 325

2201 = 0.1477.

Thus,

Pr(A|B) = 0.0922 0.1477

= 0.625.

First class passengers had about a 62% chance of survival. For random sampling from a finite popu- lation such as this, we can use the counts of occurrences of the events rather than their probabilities because the denominators in Pr(AB) and Pr(B) cancel.

Pr(A|B) = #(AB) #(B)

= 203

325 = 0.625

For comparison, look at the conditional probabilities of survival for the other classes.

Pr(survived|second class) = 118 285

= 0.414

Pr(survived|third class) = 178 706

= 0.252

Pr(survived|crew) = 212 885

= 0.240

Go to TOC

CHAPTER 3. PROBABILITY 36

3.5.1 Relating Conditional and Unconditional Probabilities

The defining equation (3.1) for conditional probability can be written as

Pr(AB) = Pr(A|B)Pr(B), (3.2)

which is often more useful, especially when Pr(A|B) is easily determined from the description of the experiment. There is an even more useful result sometimes called the law of total probability. Let B1, B2, · · · , Bk be pairwise disjoint events such that each Pr(Bi) > 0 and Ω = B1 ∪ B2 ∪ · · · ∪ Bk. Let A be another event. Then,

Pr(A) =

k∑ i=1

Pr(A|Bi)Pr(Bi). (3.3)

This is quite easy to show since A = (AB1) ∪ · · · ∪ (ABk) is a union of pairwise disjoint events and Pr(ABi) = Pr(A|Bi)Pr(Bi).