This project combines a number of aspects of statistics. First you will use probability distributions to model the possible outcomes of a particular experiment, one with its roots in randomness. Second, you will use your model to estimate or, to a degree, predict the outcome of the experiment. Lastly, this project will employ simulation to replicate the experiment allowing you to compare your estimates with an actual outcome.
The experiment is an old one coming from the days when probability was studied with the use of coins, marbles, pegs, cards, or whatever could be used. We will analyze the paths taken by marbles as they fall through vertical boards consisting of rows of pegs. This probability experiment is frequently simulated in Mathematics and Science Museums around the world.
The background section for the project will tell you about the history of this device.
In the 1870s, Sir Francis Galton created a device he called a quincunx for studying probability. The device was made up of a vertical board with a chute at the top. The chute was filled with marbles which were dispensed into an array of pegs. The pegs acted as obstructions, forcing the marbles to change direction, the choice of direction being random. At the bottom of the quincunx was a set of bins for catching the marbles.
Galton’s original sketch of the quincunx illustrates the setup and shows a possible arrangement of marbles in their bins after completing their journey. The idea being to study and explain that final distribution of marbles among bins.
Notice the arrangement of pegs as alternating rows so that between two pegs in one row there is a peg in the next row. This falling marble will strike a peg in each row as it progresses. In fact, the word quincunx refers to any arrangement of five objects in a rectangle, one object at each corner and one in the middle.
The word is often generalized to mean anything made up of such patterns of five. Our goal is to describe, mathematically and probabilistically, the possible resting places for a marble passing through the quincunx.
Click here to view an animation that simulates the path of one marble through a quincunx with 10 rows of pegs.
You can see the marble wend its way bouncing from peg to peg until it lands in the bin that it does. Note there are any number of ways the marble could have found its way to the same bin. Can you see another?
Let’s look at a case with a small number of rows of pegs, say 3, with 4 bins at the bottom. The picture below shows the marble beginning its descent.
Number the bins 0 to 3 from left to right. Next assume that when a marble hits a peg, the probability is ½ the marble will drop to the left and ½ it will drop to the right.
How can the marble end up in the leftmost bin, bin number 0? The only way is if the marble drops to the left each time it strikes a peg. Since the probability is ½ at each stage, the probability of landing all the way to the left is
(½)(½)(½) = 1/8
In fact, any specific path through the pegs will have probability 1/8 .
Now, how can the marble land in bin number 1, the second bin from the left. There are three paths starting from the top which end in the second bin, namely:
bounce left, bounce left, bounce right bounce left, bounce right, bounce left bounce right, bounce left, bounce left
Each path has probability 1/8 so the probability of landing in the second bin is 3/8. Similarly (or by symmetry) the probability of landing in the third bin is 3/8 and the rightmost bin 1/8. We have completely determined the probabilities for the quincunx with 3 rows which we summarize in the table below.
Now that you understand the quincunx and how to compute the associated probabilities, you are ready to move on to the exercises for this project.
When you are finished reviewing this Project, go on to the exercises below.
1. Suppose the bins at the bottom of the quincunx are numbered from 0 to N. Argue that the probability of a marble landing in bin number N follows a binomial distribution where N represents the number of trials and the probability p is ½.
2. Write down the distribution for the case N=8.
3. If 350 marbles pass through the quincunx, how many marbles would you predict would land in each of the nine bins?
4. Click here to view an animation that simulates the quincunx using 350 marbles and 8 rows of pegs. At the end of the video, count the number of marbles in each bin. Compare with your predictions from Exercise 3. Do they agree? Discuss why your predictions and the simulation may disagree.
The text has told you about the Central Limit Theorem and how important its use is in the field of statistics. At this point you may not fully understand the theorem or may not even be convinced it is true. After all, it is saying that no matter how skewed, lopsided, or disproportionate a probability distribution might be, all you have to do is randomly select samples that are large enough, find their sample means, and a bell-shaped curve will appear when you construct a histogram of the sample means. This project will use the simple act of rolling a six-sided die to both clear up any confusion you may have regarding the statement of the theorem and further convince you of its truth.
Consider the act of rolling a standard six-sided die
numbered 1 through 6 as shown. We know the number that appears on top after the roll follows the probability distribution in the table below.
The graph of this distribution is anything but bell-shaped
and is, in fact, quite flat.
The corresponding mean and standard deviation of this distribution are
So what does the Central Limit Theorem say about this distribution? Suppose instead of just rolling a die once and looking at its value, we roll the die a specific number of times and average the values of all the rolls. For example, let’s say the die is rolled twice (note this is equivalent to rolling two dice at once) and the mean of the two numbers computed and recorded. We could still record the numbers 1 – 6 (for example, a roll of (1,1) produces an average of 1, a roll of (1,3) or (2,2) gives an average of 2 and so on), but our result may now include numbers such as 1.5 or 4.5 (by rolling (1,2) and (3,6) for instance). The Central Limit Theorem states that if we were to repeat this experiment over and over and plot the probability or frequency distribution for our results, we would see a distribution which was approximately a normal or bell-shaped distribution having approximately the same mean as the distribution for one roll of a die, namely
Mean of averages: 3.5
and whose standard deviation is approximately
Std. deviation of averages:
the standard deviation of the one-die distribution divided by the square root of the number of rolls being averaged, in this case 2.
The more rolls you record, the closer the resulting distribution will be to the estimated distribution. Some visual simulations will help.
Click here to view an animation that simulates the experiment described in the background section, that of rolling two dice and recording the average. The animation shows the distribution growing cumulatively as more rolls are recorded until a total of 5000 rolls is reached.
The net result looks similar to the shape of a normal distribution. At the end of the 5000 experiments, the data had the following frequency distribution.
from which the mean and standard deviation of the data can be computed to be 3.5029 and 1.203 respectively, values very close to the theoretical values of 3.5 and 1.21 from the Central Limit Theorem.
Click here to view an animation that shows a similar experiment, except here the rolls of 10 dice are averaged and recorded over 5000 repetitions.
As you can see the distribution looks more like a traditional normal curve. One reason for this is that more values are possible to observe. For example, the average of 10 dice can be 1, 1.1, 1.2, …, 5.9, 6.
These examples should convince you of the basic premise in the Central Limit Theorem, but why would you expect such a theorem to be true? An intuitive argument is one similar to the one we gave in the Probability and Simulation Project. When you roll one die there is no difference between a 1 and a 3. They are just two different sides of the die and are equally likely to turn up. Now suppose we are averaging the values of three dice. Getting a result of 1 is suddenly very special. The only way to get an average of 1 would be for each die to show a value of one. An average of 3 on the other hand, could occur with any of 25 different rolls, including (1,2,6), (3,3,3), (4,2,3) and so is much more likely to be seen than an average of 1. For this reason, values in the middle occur with greater frequency than those at the outer limits, giving the distribution a bell shape.
Java is a special programming language used extensively on the Internet where the programs are designed for specific tasks tied to the material on the page. The programs are referred to as applets and are often interactive. The Department of Statistics at the University of South Carolina has a number of statistics-related Java applets posted on their Web site.
One such Java applet simulates the Central Limit Theorem with dice rolls. The applet requires the user to specify the number of dice rolled (from 1 to 5) and then to specify the number of rolls. Hitting the indicated button will show the resulting distribution. Each hit of the button will add the specified number of rolls to the current distribution in a cumulative fashion, while changing the number of dice zeros everything out for a new run of dice rolls.
At the bottom of this page you will be told how to reach the applet. Spend some time at this site interacting with the applet to reinforce your knowledge of the Central Limit Theorem making sure you understand how your inputs to the applet affect the resulting graph. Note that instead of labeling the horizontal axis with the average of the dice, the author of the applet simply uses the total of the dice. There is really no difference as the axis could be relabeled with averages by simply dividing by the number of dice. The link below will open in a new window. Once you are comfortable with the applet and have had some fun rolling dice millions of time, return to this page by closing that browser window. Then, proceed to the exercises. The applet can be found at http://www.stat.sc.edu/~west/javahtml/CLT.html.
When you’ve finished reviewing the simulation, go on to the exercises below.
1. Write down the theoretical distribution for the average of two dice. Compute the mean and standard deviation and compare to the estimates given by the Central Limit Theorem.
2. After one experiment where 4 dice were rolled 1,000 times, the observed distribution of averages was seen to be
Compute the mean and standard deviation of this distribution and compare to the estimates given by the Central Limit Theorem.
3. Set the parameters for the applet at http://www.stat.sc.edu/~west/javahtml/CLT.html to 2 dice with the number of rolls set to 100. Hit the “Roll the dice” button repeatedly, counting the number of times until the distribution looks approximately normal to you. Repeat this experiment with 5 dice. Which number of dice takes longer to appear normal? Can you explain this behavior?