Effects of an Outlier Refer to the Minitab-generated scatterplot given in Exercise 11 of

Section 10-1 on page 485.

a. Using the pairs of values for all 10 points, find the equation of the regression line.

b. After removing the point with coordinates (10, 10), use the pairs of values for the remaining 9 points and find the equation of the regression line.

c. Compare the results from parts (a) and (b).

Short Answer

Expert verified

a.The regression equation is\(\hat y = 0.264 - 0.906x\).

b.The regression equation excluding the pair (10, 10) is\(\hat y = 2.00 - 0.00x\).

c. The regression equations obtained in parts (a) and (b) are completely different from one another. The presence of an outlier (10,10) affects the regression equation significantly

Step by step solution

01

Given information

A set of 10 pairs of values is considered.

02

Regression equation using all values

a.

The regression equation of y on x has the following notation:

\(\hat y = {b_0} + {b_1}x\),where

\({b_0}\)is the intercept term, and

\({b_1}\)is the slope coefficient.

The following data points are considered:

The following table shows the necessary calculations:

The value of the y-intercept is computed below.

\(\begin{array}{c}{b_0} = \frac{{\left( {\sum y } \right)\left( {\sum {{x^2}} } \right) - \left( {\sum x } \right)\left( {\sum {xy} } \right)}}{{n\left( {\sum {{x^2}} } \right) - {{\left( {\sum x } \right)}^2}}}\\ = \frac{{\left( {28} \right)\left( {142} \right) - \left( {28} \right)\left( {136} \right)}}{{10\left( {142} \right) - {{\left( {28} \right)}^2}}}\\ = 0.264\end{array}\).

The value of the slope coefficient is computed below.

\(\begin{array}{c}{b_1} = \frac{{n\left( {\sum {xy} } \right) - \left( {\sum x } \right)\left( {\sum y } \right)}}{{n\left( {\sum {{x^2}} } \right) - {{\left( {\sum x } \right)}^2}}}\\ = \frac{{\left( {10} \right)\left( {136} \right) - \left( {28} \right)\left( {28} \right)}}{{10\left( {142} \right) - {{\left( {28} \right)}^2}}}\\ = 0.906\end{array}\).

Thus, the regression equation becomes

\(\hat y = 0.264 - 0.906x\).

03

Regression equation excluding the pair (10, 10)

b.

The following 9 pairs of data points are considered:

The following table shows the necessary calculations:

The value of the y-intercept is computed below.

\(\begin{array}{c}{b_0} = \frac{{\left( {\sum y } \right)\left( {\sum {{x^2}} } \right) - \left( {\sum x } \right)\left( {\sum {xy} } \right)}}{{n\left( {\sum {{x^2}} } \right) - {{\left( {\sum x } \right)}^2}}}\\ = \frac{{\left( {18} \right)\left( {42} \right) - \left( {18} \right)\left( {36} \right)}}{{9\left( {42} \right) - {{\left( {18} \right)}^2}}}\\ = 2.000\end{array}\).

The value of the slope coefficient is computed below.

\(\begin{array}{c}{b_1} = \frac{{n\left( {\sum {xy} } \right) - \left( {\sum x } \right)\left( {\sum y } \right)}}{{n\left( {\sum {{x^2}} } \right) - {{\left( {\sum x } \right)}^2}}}\\ = \frac{{\left( 9 \right)\left( {36} \right) - \left( {18} \right)\left( {18} \right)}}{{9\left( {42} \right) - {{\left( {18} \right)}^2}}}\\ = 0.000\end{array}\).

Thus, the regression equation becomes

\(\hat y = 2.00 - 0.00x\).

04

Comparison

c.

The regression equations obtained in parts (a) and (b) are completely different from one another.

Thus, the presence of an extreme data pair (10,10) can greatly influence the regression equation.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

The following exercises are based on the following sample data consisting of numbers of enrolled students (in thousands) and numbers of burglaries for randomly selected large colleges in a recent year (based on data from the New York Times).

Repeat the preceding exercise, assuming that the linear correlation coefficient is r= 0.997.

Interpreting\({R^2}\)For the multiple regression equation given in Exercise 1, we get \({R^2}\)= 0.928. What does that value tell us?

In Exercises 5–8, we want to consider the correlation between heights of fathers and mothers and the heights of their sons. Refer to the

StatCrunch display and answer the given questions or identify the indicated items.

The display is based on Data Set 5 “Family Heights” in Appendix B.

Identify the following:

a. The P-value corresponding to the overall significance of the multiple regression equation

b. The value of the multiple coefficient of determination\({R^2}\).

c. The adjusted value of \({R^2}\)

Cigarette Nicotine and Carbon Monoxide Refer to the table of data given in Exercise 1 and use the amounts of nicotine and carbon monoxide (CO).

a. Construct a scatterplot using nicotine for the xscale, or horizontal axis. What does the scatterplot suggest about a linear correlation between amounts of nicotine and carbon monoxide?

b. Find the value of the linear correlation coefficient and determine whether there is sufficient evidence to support a claim of a linear correlation between amounts of nicotine and carbon monoxide.

c. Letting yrepresent the amount of carbon monoxide and letting xrepresent the amount of nicotine, find the regression equation.

d. The Raleigh brand king size cigarette is not included in the table, and it has 1.3 mg of nicotine. What is the best predicted amount of carbon monoxide?

Tar

25

27

20

24

20

20

21

24

CO

18

16

16

16

16

16

14

17

Nicotine

1.5

1.7

1.1

1.6

1.1

1.0

1.2

1.4

Interpreting the Coefficient of Determination. In Exercises 5–8, use the value of the linear correlation coefficient r to find the coefficient of determination and the percentage of the total variation that can be explained by the linear relationship between the two variables.

Crickets and Temperature r = 0.874 (x = number of cricket chirps in 1 minute, y = temperature in °F)

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free