Mathematics
Week 8 Discussion Example:
Week 8 Discussion Board Example – Linear Regression – Certification Time
An oil company wants to develop an aptitude test that can predict how efficient a new employee will be,
before they spend thousands of dollars training and outfitting them to work on their floating oil rigs.
They want you to evaluate the relationship between the test scores and the length of time it took the
workers to complete their on-site certification.
You collected the information on 50 new hires and performed a linear regression on their test scores
and the time it took them to complete their certification.
Use what you have learned about Linear Regression to answer the following question. The output from
the Excel ToolPak, Regression Tool is located below.
Test
Score
Hours to
Complete
Certification
Test
Score
Hours to
Complete
Certification
1 119 277 26 164 237
2 91 266 27 114 275
3 110 258 28 202 261
4 256 227 29 208 154
5 239 223 30 259 193
6 292 103 31 247 145
7 176 217 32 210 228
8 193 176 33 264 254
9 211 281 34 93 221
10 196 183 35 200 242
11 124 287 36 172 181
12 120 276 37 203 168
13 263 110 38 251 152
14 190 202 39 193 174
15 252 231 40 264 108
16 179 236 41 196 228
17 237 191 42 290 141
18 290 275 43 236 185
19 198 288 44 120 233
20 286 108 45 122 275
21 180 232 46 166 290
22 206 219 47 106 279
23 231 175 48 104 251
24 176 265 49 121 206
25 174 233 50 230 213
DATA
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.6070713
R Square 0.368535564
Adjusted R Square0.355380055
Standard Error41.99982907
Observations 50
ANOVA
df SS MS F Significance F
Regression 1 49415.90919 49415.91 28.01378 2.95E-06
Residual 48 84671.31081 1763.986
Total 49 134087.22
Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept 323.5161 21.0445 15.3729 0.0000 281.2032 365.8290 281.2032 365.8290
Test Score -0.5494 0.1038 -5.2928 0.0000 -0.7582 -0.3407 -0.7582 -0.3407
What is the regression equation from the Summary Output? Is this a useful model? How do you know.
�̂� = 323.5 − 0.5495𝑥
The tool automatically performs a test of hypothesis to determine if the slope of the population
regression model (1) is equal to zero.
𝐻0: 𝛽1 = 0
If 1 is equal to zero, then the predicted y value would be a constant and there would be no
value in knowing the regression equation. The p-value for this test is less than 0.0000.
Therefore, we can reject the null hypothesis and we know the model is useful.
Are the assumptions of regression satisfied? How did you verify them?
There are four assumptions we need to verify, before we try to use the regression model.
Linearity – Is there a linear relationship between the dependent and independent
variables? We check this assumption by looking at the scatter plot of the original data.
The scatter plot shows a relatively weak negative linear relationship.
Independence – Are the errors (residuals) independent of each other? We check this
assumption by looking at the residuals plot. The residuals plot does not show any
patterns or trends. This indicates the errors are independent.
Normality – Are the residuals normally distributed around the regression line? We check
this assumption by looking at the Normal Probability Plot of the residuals. A straight line
indicates the residuals are normally distributed. Our plot shows a small step near the
left end of the curve, but other than that, looks fairly linear.
Equal variance – Is the spread of the residuals approximately the same across the range
of the dependent variable? We check this assumption by looking at the residuals plot.
The spread appears to be wider on the far right side and narrow on the far left. It
appears fairly equal across the mid-section.
The equal variance assumption appears questionable, but the other three assumptions are
satisfied. These assumptions are fairly robust. We should use care when using the regression
model.
Does test score appear to be a good predictor for the certification time? Why do you think that?
The coefficient of correlation (R2) is the percentage of the change in y (change in certification
time) that can be explained by the change in x (change in test score). The R2 value for these two
variable is 0.3685. About 37% of the change in certification time can be explained by the change
in test scores.
The test scores appear to be a fair predictor for the time required to complete certification, but
there are other factors that are probably equally important.
One of the company’s new employees scored a 150 on the test while another scored 25. What is the
predicted time to complete certification for these two employees?
If we start with the regression equation and plug 150 in for x we can calculate the predicted
time to complete certification for the first employee.
�̂� = 323.5 − 0.5495𝑥
�̂� = 323.5 − 0.5495 ∗ (150)
�̂� = 241
He should complete his certification in approximately 241 hours.
We should not use this regression model to predict the second employee’s certification time.
The model covered test scores from about 90 to 300. A score of 25 is too far outside the range
of the model.