|Discrete Probability Distributions|
After completing this chapter, you should be able to
Is Pooling Worthwhile?
Blood samples are used to screen people for certain diseases. When the disease is rare, health care workers sometimes combine or pool the blood samples of a group of individuals into one batch and then test it. If the test result of the batch is negative, no further testing is needed since none of the individuals in the group has the disease. However, if the test result of the batch is positive, each individual in the group must be tested.
Consider this hypothetical example: Suppose the probability of a person having the disease is 0.05, and a pooled sample of 15 individuals is tested. What is the probability that no further testing will be needed for the individuals in the sample? The answer to this question can be found by using what is called the binomial distribution. See Statistics Today—Revisited at the end of the chapter.
This chapter explains probability distributions in general and a specific, often used distribution called the binomial distribution. The Poisson, hypergeometric, and multinomial distributions are also explained.
Many decisions in business, insurance, and other real-life situations are made by assigning probabilities to all possible outcomes pertaining to the situation and then evaluating the results. For example, a saleswoman can compute the probability that she will make 0, 1, 2, or 3 or more sales in a single day. An insurance company might be able to assign probabilities to the number of vehicles a family owns. A self-employed speaker might be able to compute the probabilities for giving 0, 1, 2, 3, or 4 or more speeches each week. Once these probabilities are assigned, statistics such as the mean, variance, and standard deviation can be computed for these events. With these statistics, various decisions can be made. The saleswoman will be able to compute the average number of sales she makes per week, and if she is working on commission, she will be able to approximate her weekly income over a period of time, say, monthly. The public speaker will be able to plan ahead and approximate his average income and expenses. The insurance company can use its information to design special computer forms and programs to accommodate its customers’ future needs.
This chapter explains the concepts and applications of what is called a probability distribution. In addition, a special probability distribution, the binomial distribution, is explained.
Construct a probability distribution for a random variable.
Before probability distribution is defined formally, the definition of a variable is reviewed. In Chapter 1, a variable was defined as a characteristic or attribute that can assume different values. Various letters of the alphabet, such as X, Y, or Z, are used to represent variables. Since the variables in this chapter are associated with probability, they are called random variables .
For example, if a die is rolled, a letter such as X can be used to represent the outcomes. Then the value that X can assume is 1, 2, 3, 4, 5, or 6, corresponding to the outcomes of rolling a single die. If two coins are tossed, a letter, say Y, can be used to represent the number of heads, in this case 0, 1, or 2. As another example, if the temperature at 8:00 A.M. is 43° and at noon it is 53°, then the values T that the temperature assumes are said to be random, since they are due to various atmospheric conditions at the time the temperature was taken.
A random variable is a variable whose values are determined by chance.
Also recall from Chapter 1 that you can classify variables as discrete or continuous by observing the values the variable can assume. If a variable can assume only a specific number of values, such as the outcomes for the roll of a die or the outcomes for the toss of a coin, then the variable is called a discrete variable.
Discrete variables have a finite number of possible values or an infinite number of values that can be counted. The word counted means that they can be enumerated using the numbers 1, 2, 3, etc. For example, the number of joggers in Riverview Park each day and the number of phone calls received after a TV commercial airs are examples of discrete variables, since they can be counted.
Variables that can assume all values in the interval between any two given values are called continuous variables. For example, if the temperature goes from 62 to 78° in a 24-hour period, it has passed through every possible number from 62 to 78. Continuous random variables are obtained from data that can be measured rather than counted. Continuous random variables can assume an infinite number of values and can be decimal and fractional values. On a continuous scale, a person’s weight might be exactly 183.426 pounds if a scale could measure weight to the thousandths place; however, on a digital scale that measures only to tenths of pounds, the weight would be 183.4 pounds. Examples of continuous variables are heights, weights, temperatures, and time. In this chapter only discrete random variables are used; Chapter 6 explains continuous random variables.
The procedure shown here for constructing a probability distribution for a discrete random variable uses the probability experiment of tossing three coins. Recall that when three coins are tossed, the sample space is represented as TTT, TTH, THT, HTT, HHT, HTH, THH, HHH; and if X is the random variable for the number of heads, then X assumes the value 0, 1, 2, or 3.
Probabilities for the values of X can be determined as follows:
Hence, the probability of getting no heads is , one head is , two heads is , and three heads is . From these values, a probability distribution can be constructed by listing the outcomes and assigning the probability of each outcome, as shown here.
A discrete probability distribution consists of the values a random variable can assume and the corresponding probabilities of the values. The probabilities are determined theoretically or by observation.
Discrete probability distributions can be shown by using a graph or a table. Probability distributions can also be represented by a formula. See Exercises 31–36 at the end of this section for examples.
Rolling a Die
Construct a probability distribution for rolling a single die.
Since the sample space is 1, 2, 3, 4, 5, 6 and each outcome has a probability of , the distribution is as shown.
Probability distributions can be shown graphically by representing the values of X on the x axis and the probabilities P(X) on the y axis.
Represent graphically the probability distribution for the sample space for tossing three coins.
The values that X assumes are located on the x axis, and the values for P(X) are located on the y axis. The graph is shown in Figure 5–1.
Note that for visual appearances, it is not necessary to start with 0 at the origin.
Examples 5–1 and 5–2 are illustrations of theoretical probability distributions. You did not need to actually perform the experiments to compute the probabilities. In contrast, to construct actual probability distributions, you must observe the variable over a period of time. They are empirical, as shown in Example 5–3.
Probability Distribution for Example 5–2
Baseball World Series
The baseball World Series is played by the winner of the National League and the American League. The first team to win four games wins the World Series. In other words, the series will consist of four to seven games, depending on the individual victories. The data shown consist of the number of games played in the World Series from 1965 through 2005. (There was no World Series in 1994.) The number of games played is represented by the variable X. Find the probability P(X) for each X, construct a probability distribution, and draw a graph for the data.
|X||Number of times played|
The probability P(X) can be computed for each X by dividing the number of games X by the total.
|For 4 games, = 0.200||For 6 games, = 0.225|
|For 5 games, = 0.175||For 7 games, = 0.400|
The probability distribution is
The graph is shown in Figure 5–2.
Probability Distribution for Example 5–3
Speaking of Statistics
Coins, Births, and Other Random (?) Events
Examples of random events such as tossing coins are used in almost all books on probability. But is flipping a coin really a random event?
Tossing coins dates back to ancient Roman times when the coins usually consisted of the Emperor’s head on one side (i.e., heads) and another icon such as a ship on the other side (i.e., ships). Tossing coins was used in both fortune telling and ancient Roman games.
A Chinese form of divination called the I-Ching (pronounced E-Ching) is thought to be at least 4000 years old. It consists of 64 hexagrams made up of six horizontal lines. Each line is either broken or unbroken, representing the yin and the yang. These 64 hexagrams are supposed to represent all possible situations in life. To consult the I-Ching, a question is asked and then three coins are tossed six times. The way the coins fall, either heads up or heads down, determines whether the line is broken (yin) or unbroken (yang). Once the hexagon is determined, its meaning is consulted and interpreted to get the answer to the question. (Note: Another method used to determine the hexagon employs yarrow sticks.)
In the 16th century, a mathematician named Abraham DeMoivre used the outcomes of tossing coins to study what later became known as the normal distribution; however, his work at that time was not widely known.
Mathematicians usually consider the outcomes of a coin toss a random event. That is, each probability of getting a head is , and the probability of getting a tail is . Also, it is not possible to predict with 100% certainty which outcome will occur. But new studies question this theory. During World War II a South African mathematician named John Kerrich tossed a coin 10,000 times while he was interned in a German prison camp. Unfortunately, the results of his experiment were never recorded, so we don’t know the number of heads that occurred.
Several studies have shown that when a coin-tossing device is used, the probability that a coin will land on the same side on which it is placed on the coin-tossing device is about 51%. It would take about 10,000 tosses to become aware of this bias. Furthermore, researchers showed that when a coin is spun on its edge, the coin falls tails up about 80% of the time since there is more metal on the heads side of a coin. This makes the coin slightly heavier on the heads side than on the tails side.
Another assumption commonly made in probability theory is that the number of male births is equal to the number of female births and that the probability of a boy being born is and the probability of a girl being born is . We know this is not exactly true.
In the later 1700s, a French mathematician named Pierre Simon Laplace attempted to prove that more males than females are born. He used records from 1745 to 1770 in Paris and showed that the percentage of females born was about 49%. Although these percentages vary somewhat from location to location, further surveys show they are generally true worldwide. Even though there are discrepancies, we generally consider the outcomes to be 50-50 since these discrepancies are relatively small.
Based on this article, would you consider the coin toss at the beginning of a football game fair?
Two Requirements for a Probability Distribution
1.The sum of the probabilities of all the events in the sample space must equal 1; that is, ΣP(X) = 1.
2.The probability of each event in the sample space must be between or equal to 0 and 1. That is, 0 ≤ P(X) ≤ 1.
The first requirement states that the sum of the probabilities of all the events must be equal to 1. This sum cannot be less than 1 or greater than 1 since the sample space includes all possible outcomes of the probability experiment. The second requirement states that the probability of any individual event must be a value from 0 to 1. The reason (as stated in Chapter 4) is that the range of the probability of any individual value can be 0, 1, or any value between 0 and 1. A probability cannot be a negative number or greater than 1.
Determine whether each distribution is a probability distribution.
a.Yes, it is a probability distribution.
b.No, it is not a probability distribution, since P(X) cannot be 1.5 or –1.0.
c.Yes, it is a probability distribution.
d.No, it is not, since ΣP(X) = 1.2.
Many variables in business, education, engineering, and other areas can be analyzed by using probability distributions. Section 5–2 shows methods for finding the mean and standard deviation for a probability distribution.
Applying the Concepts 5–1
Dropping College Courses
Use the following table to answer the questions.
|Reason for Dropping a College Course||Frequency||Percentage|
|Change in work schedule||20|
|Change of major||14|
|No meaningful reason||3|
1.What is the variable under study? Is it a random variable?
2.How many people were in the study?
3.Complete the table.
4.From the information given, what is the probability that a student will drop a class because of illness? Money? Change of major?
5.Would you consider the information in the table to be a probability distribution?
6.Are the categories mutually exclusive?
7.Are the categories independent?
8.Are the categories exhaustive?
9.Are the two requirements for a discrete probability distribution met?
See page 295 for the answers.
1.Define and give three examples of a random variable.
2.Explain the difference between a discrete and a continuous random variable.
3.Give three examples of a discrete random variable.
4.Give three examples of a continuous random variable.
5.What is a probability distribution? Give an example.
For Exercises 6 through 11, determine whether the distribution represents a probability distribution. If it does not, state why.