Chs 1 – 3 1. A type of variable where arithmetic operations do not make sense are called _______.
A) quantitative B) categorical C) distributions D) cases
2. When using a pie chart, the sum of all the percentages should be _____.
A) 0 B) 1 C) 100 D) 50
3. What method is useful when comparing two distributions using a stemplot?
A) Splitting the stem B) Trimming the leaves C) Back-to-back stemplots D) None of the above
4. The histogram at right shows data from 30 students who were asked,
“How much time do you spend on the Internet in minutes?” What are some features about the data? A) There is a potential outlier. B) Most values are around 800. C) The range of values is between 0 and 400. D) None of the above
5. In a statistics class with 136 students, the professor records how
much money each student has in their possession during the first class of the semester. The histogram shown below represents the data he collected. What is approximately the percentage of students with under $10 in their possession? A) 35% B) 40% C) 44% D) 50%
6. A study is being conducted on air quality at a small college in the
South. As part of this study, monitors were posted at every entrance to this college from 6:00 a.m. to 10:00 p.m. on a randomly chosen day. The monitors recorded the mode of transportation used by each person as they entered the campus. Based on the information recorded, the following bar graph was constructed. Approximately what percentage of people entering campus on this particular day arrived by car? A) 9% B) 31% C) 53% D) 62%
7. The Insurance Institute for Highway Safety publishes data on the total damage suffered by compact automobiles in a series of controlled, low-speed collisions. The cost for a sample of nine cars, in hundreds of dollars, is provided below
10 6 8 10 4 3.5 7.5 8 9
What is the median cost of the total damage suffered for this sample of cars? A) $400 B) $730 C) $800 D) $1000
8. What is the interquartile range of the above data?
A) $300 B) $350 C) $400 D) $450 9. In a statistics class with 136 students, the professor records how much
money each student has in their possession during the first class of the semester. The histogram shown below represents the data he collected. From the histogram, which of the following is true? A) The mean is larger than the median. B) The mean is smaller than the median. C) The mean and median are approximately equal. D) It is impossible to compare the mean and median for these data.
10. The following boxplot is of the birth weights (in ounces) of 160 infants born
in a local hospital. About 40 of the birth weights were below A) 92 ounces B) 102 ounces C) 112 ounces D) 122 ounces
11. This is a standard deviation contest. Which of the following sets of four
numbers has the largest possible standard deviation? A) 7, 8, 9, 10 B) 5, 5, 5, 5 C) 0, 0, 10, 10 D) 0, 1, 2, 3
12. Agricultural fairs often hold competitions for produce grown by local gardeners. The following data
are the weight (in pounds) of tomatoes entered into an annual fair in Roland, Manitoba, Canada, in 2007.
2.48, 1.52, 1.15, 1.13, 1.00, 0.99, 0.96, 0.94, 0.75
Apply the 1.5 ×IQR rule to the data to check for outlier values. In this case, A) there are no outliers B) the value 0.75 is the only outlier C) the values 0.75 and 2.48 are both outliers D) the value 2.48 is the only outlier E) the values 1.52 and 2.48 are both outliers
13. The number of Facebook friends students at a university have are Normally distributed with a mean
of 1200 and a standard deviation of 200. What percentage of students has exactly 1000 Facebook friends? A) 84.13% B) 15.86% C) 42.07% D) None of the above
60 65 70 75 80 85 90
14. Many residents of suburban neighborhoods own more than one car but consider one of their cars to be the main family vehicle. The age of these family vehicles can be modeled by a Normal distribution with a mean of 2 years and a standard deviation of 6 months. What is the standardized value (Z score) for a family vehicle that is 3 years and 3 months old? A) 0.22 B) 2.5 C) 2.6 D) 2.92
15. Using the standard Normal distribution tables, what is the area under the standard Normal curve
corresponding to Z< 1.1? A) 0.1357 B) 0.2704 C) 0.8413 D) 0.8643
16. Using the standard Normal distribution tables, what is the area under the standard Normal curve
corresponding to –0.5 <Z< 1.2? A) 0.3085 B) 0.8849 C) 0.5764 D) 0.2815
17. Chocolate bars produced by a certain machine are labeled with 8.0 ounces. The distribution of the actual weights of these chocolate bars is Normal with a mean of 8.1 ounces and a standard deviation of 0.1 ounces. A chocolate bar is considered underweight if it weighs less than 8.0 ounces. What proportion of chocolate bars weighs less than 8.0 ounces? A) 0.159 B) 0.341 C) 0.500 D) 0.841
18. Which of the following statements about the standardized z-score of a value of a variable X, which
has a mean of m and a standard deviation of s, is/are TRUE? A) The z-score has a mean equal to 0. B) The z-score has a standard deviation equal to 1. C) The z-score tells us how many standard deviation units from the original observation fall away
from the mean. D) The z-score tells us the direction the observation falls away from the mean. E) All of the above statements about the z-score are true.
19. A researcher measured the height (in feet) and volume of usable
lumber (in cubic feet) of 32 cherry trees. The goal is to determine if the volume of usable lumber can be estimated from the height of a tree. The results are plotted at right. Select all descriptions that apply to the scatterplot. A) There is a positive association between height and volume. B) There is a negative association between height and volume. C) There is an outlier in the plot. D) The plot is skewed to the left. E) Both A and C
20. John’s parents recorded his height at various ages between 36 and 66 months. Below is a record of the results:
Age (months) 36 48 54 60 66 Height (inches) 34 38 41 43 45
John’s parents decide to use the least-squares regression line of John’s height on age to predict his height at age 21 years (252 months). What conclusion can we draw? A) John’s height, in inches, should be about half his age, in months. B) The parents will get a fairly accurate estimate of his height at age 21 years, because the data are
clearly correlated. C) Such a prediction could be misleading, because it involves extrapolation. D) All of the above.
Questions 21 – 23 Colorectal cancer (CRC) is the third most commonly diagnosed cancer among Americans (with nearly 147,000 new cases), and the third leading cause of cancer death (with over 50,000 deaths annually). Research was done to determine whether there is a link between obesity and CRC mortality rates among African Americans in the United States by county. Below are the results of a least-squares regression analysis from the software StatCrunch.
Simple linear regression results: Dependent Variable: Mortality.rate Independent Variable: Obesity.rate Mortality.rate = 13.458199 – 0.21749489 Obesity.rate Sample size: 3098 R (correlation coefficient) = –0.0067 R-sq = 4.5304943E-5 Estimate of error standard deviation: 111.20661 Parameter estimates: Parameter Estimate Std. Err. Alternative DF T-Stat P-Value Intercept 13.458199 15.9797735 ≠ 0 3096 0.84220207 0.3997 Slope –0.21749489 0.5807189 ≠ 0 3096 –0.37452698 0.708 Analysis of variance table for regression model: Source DF SS MS F-stat P-value Model 1 1734.7122 1734.7122 0.14027046 0.708 Error 3096 3.8287952E7 12366.91 Total 3097 3.8289688E7
21. What is the equation to predict mortality rates from obesity rates?
A) Mortality.rate = 13.458199 – 0.21749489 Obesity.rate B) Obesity.rate = 13.458199 – 0.21749489 Mortality.rate C) Mortality.rate = 13.458199 + 0.21749489 Obesity.rate D) Mortality.rate = 13.458199 – 0.0067 Obesity.rate
22. What fraction of the variation in mortality rates is explained by the least-squares regression?
A) 0.000045 B) 111.201 C) –0.0067 D) 13.45
23. A study of the salaries of full professors at a small university shows that the median salary for female professors is considerably less than the median male salary. Further investigation shows that the median salaries for male and female full professors are about the same in every department (English, physics, etc.) of the university. Which phenomenon explains the reversal in this example? A) extrapolation B) Simpson’s paradox C) causation D) correlation
Questions 24 – 26 Is age a good predictor of salary for CEO’s? Sixty CEO’s between the age of 32 and 74 were asked their salary (in thousands). The results of a statistical analysis are shown below: Simple linear regression results: Dependent Variable: SALARY Independent Variable: AGE SALARY = 242.70212 + 3.1327114 AGE Sample size: 59 R (correlation coefficient) = 0.1276 R-sq = 0.016270384 Estimate of error standard deviation: 220.64246 Parameter estimates: Parameter Estimate Std. Err. Alternative DF T-Stat P-Value Intercept 242.70212 168.7604 ≠ 0 57 1.4381461 0.1559 Slope 3.1327114 3.2264276 ≠ 0 57 0.9709536 0.3357 Analysis of variance table for regression model: Source DF SS MS F-stat P-value Model 1 45896.027 45896.027 0.9427509 0.3357 Error 57 2774936.2 48683.094 Total 58 2820832.2 24. Suppose a CEO is 57 years old. What do you predict his/her salary to be?
A) over $400,000 B) between $100,00 and $400,000 C) under $100,00 D) None of the above.
25. Suppose you wanted to predict the salary of the CEO of Facebook, Mark Zuckerberg, based on the
information here. How well do you think your prediction would be assuming Mr. Zuckerberg was 23 when he started Facebook and became CEO? A) The prediction would be accurate and around $300,000. B) The prediction would require extrapolation and therefore would not be accurate. C) The prediction would be accurate and around $240,000. D) None of the above.
26. What are possible reasons for a correlation around 0.13 for the above data?
A) Age is a very strong predictor of CEO salary. B) Age is not a good predictor and something else may be a better a predictor C) There is not enough data to accurately estimate the correlation. D) The range of ages is too small.
0.0 0.5 1.0 1.5 2.0 2.5
27. Consider the scatterplot at right. What do we call the point indicated by the plotting symbol O? A) a residual B) influential C) a z-score
Questions 28 – 29 The 94 students in a statistics class are categorized by gender and by the year in school. The numbers obtained are displayed below:
Year in school Gender Freshman Sophomore Junior Senior Graduate Total Male 1 2 9 17 2 31 Female 23 17 13 7 3 63 Total 24 19 22 24 5 94
28. What proportion of the statistics students in this class are sophomores, given they are female?
A) 0.11 B) 0.202 C) 0.27 D) 19 29. What proportion of the statistics students in this class are male?
A) 0.065 B) 0.105 C) 0.33 D) 31 30. What is the best way to control for lurking variables?
A) Compare two or more treatments B) Randomize to assign experimental units to treatments C) Repeat each treatment on many units D) None of the above
Extra Credit 1. In order to determine if drinking from plastic water bottles causes cancer, researchers surveyed a large
sample of adults. For each adult they recorded whether the person drank regularly from plastic water bottles at any period in their life and whether the person had cancer. They then compared the proportion of cancer cases in those who drank from plastic water bottles regularly at some time in their lives with the proportion of cases in those who never drank from plastic water bottles at any point in their lives. The researchers found a higher proportion of cancer cases among those who drank from plastic water bottles regularly than among those who never drank from plastic water bottles. What type of study is this? A) An observational study B) An experiment but not a double-blind experiment C) A double-blind experiment D) A block design
2.. Which of the following best describes a simple random sample (SRS) of size n?
A) It is a random sample of size n selected so that everyone in the population has a known probability of being included in the sample.
B) It is a random sample of size n selected so that everyone in the population has the same chance of being included in the sample.
C) It is a probability sample of size n with known probabilities of selection. D) It is a sample selected from the population in such a way that every set of n individuals has an
equal chance of being in the sample actually selected. E) It is a sample of n individuals selected in such a way that only chance determines who is included
in the sample.
A common fear for incoming freshman in college is the dreaded “freshman fifteen.” The combination of being in a new environment away from home, a high stress level, alcohol consumption, and eating dining hall food can cause weight gain in college students. A study examined weight gained during the first year of college and what factors contribute to it. A 27-question survey was sent to 252 students at over 50 universities in the United States. Questions included information on demographics, weight gain, diet, family relationships, etc. Ninety-five survey responses were received from students across 37 United States colleges and universities, with 32 respondents from Rose-Hulman Institute of Technology.
3. What is the sample in this study?
A) U.S. college students B) All college students C) The survey respondents D) The 50 universities
4. What is the response rate?
A) 50/252 B) 95/252 C) 32/50 D) 32/25
5. What type of sample is this?
A) Simple random sample B) Probability sample C) Stratified random sample D) Voluntary response sample
6. Does the survey suffer from nonresponse?
A) No, everyone chosen for the survey participated. B) No, this was an experiment so nonresponse is not an issue. C) Yes, because not everyone chosen for the survey participated. D) Yes, because the survey contained too many questions and it is likely participants did not answer all the questions.