# Mathematics

Week 8 Discussion Example:

Week 8 Discussion Board Example – Linear Regression – Certification Time

An oil company wants to develop an aptitude test that can predict how efficient a new employee will be,

before they spend thousands of dollars training and outfitting them to work on their floating oil rigs.

They want you to evaluate the relationship between the test scores and the length of time it took the

workers to complete their on-site certification.

You collected the information on 50 new hires and performed a linear regression on their test scores

and the time it took them to complete their certification.

Use what you have learned about Linear Regression to answer the following question. The output from

the Excel ToolPak, Regression Tool is located below.

Test

Score

Hours to

Complete

Certification

Test

Score

Hours to

Complete

Certification

1 119 277 26 164 237

2 91 266 27 114 275

3 110 258 28 202 261

4 256 227 29 208 154

5 239 223 30 259 193

6 292 103 31 247 145

7 176 217 32 210 228

8 193 176 33 264 254

9 211 281 34 93 221

10 196 183 35 200 242

11 124 287 36 172 181

12 120 276 37 203 168

13 263 110 38 251 152

14 190 202 39 193 174

15 252 231 40 264 108

16 179 236 41 196 228

17 237 191 42 290 141

18 290 275 43 236 185

19 198 288 44 120 233

20 286 108 45 122 275

21 180 232 46 166 290

22 206 219 47 106 279

23 231 175 48 104 251

24 176 265 49 121 206

25 174 233 50 230 213

DATA

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.6070713

R Square 0.368535564

Adjusted R Square0.355380055

Standard Error41.99982907

Observations 50

ANOVA

df SS MS F Significance F

Regression 1 49415.90919 49415.91 28.01378 2.95E-06

Residual 48 84671.31081 1763.986

Total 49 134087.22

Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%Upper 95.0%

Intercept 323.5161 21.0445 15.3729 0.0000 281.2032 365.8290 281.2032 365.8290

Test Score -0.5494 0.1038 -5.2928 0.0000 -0.7582 -0.3407 -0.7582 -0.3407

What is the regression equation from the Summary Output? Is this a useful model? How do you know.

�̂� = 323.5 − 0.5495𝑥

The tool automatically performs a test of hypothesis to determine if the slope of the population

regression model (1) is equal to zero.

𝐻0: 𝛽1 = 0

If 1 is equal to zero, then the predicted y value would be a constant and there would be no

value in knowing the regression equation. The p-value for this test is less than 0.0000.

Therefore, we can reject the null hypothesis and we know the model is useful.

Are the assumptions of regression satisfied? How did you verify them?

There are four assumptions we need to verify, before we try to use the regression model.

Linearity – Is there a linear relationship between the dependent and independent

variables? We check this assumption by looking at the scatter plot of the original data.

The scatter plot shows a relatively weak negative linear relationship.

Independence – Are the errors (residuals) independent of each other? We check this

assumption by looking at the residuals plot. The residuals plot does not show any

patterns or trends. This indicates the errors are independent.

Normality – Are the residuals normally distributed around the regression line? We check

this assumption by looking at the Normal Probability Plot of the residuals. A straight line

indicates the residuals are normally distributed. Our plot shows a small step near the

left end of the curve, but other than that, looks fairly linear.

Equal variance – Is the spread of the residuals approximately the same across the range

of the dependent variable? We check this assumption by looking at the residuals plot.

The spread appears to be wider on the far right side and narrow on the far left. It

appears fairly equal across the mid-section.

The equal variance assumption appears questionable, but the other three assumptions are

satisfied. These assumptions are fairly robust. We should use care when using the regression

model.

Does test score appear to be a good predictor for the certification time? Why do you think that?

The coefficient of correlation (R2) is the percentage of the change in y (change in certification

time) that can be explained by the change in x (change in test score). The R2 value for these two

variable is 0.3685. About 37% of the change in certification time can be explained by the change

in test scores.

The test scores appear to be a fair predictor for the time required to complete certification, but

there are other factors that are probably equally important.

One of the company’s new employees scored a 150 on the test while another scored 25. What is the

predicted time to complete certification for these two employees?

If we start with the regression equation and plug 150 in for x we can calculate the predicted

time to complete certification for the first employee.

�̂� = 323.5 − 0.5495𝑥

�̂� = 323.5 − 0.5495 ∗ (150)

�̂� = 241

He should complete his certification in approximately 241 hours.

We should not use this regression model to predict the second employee’s certification time.

The model covered test scores from about 90 to 300. A score of 25 is too far outside the range

of the model.