Effects of Clusters Refer to the Minitab-generated scatterplot given in Exercise 12 of Section 10-1 on page 485.

a. Using the pairs of values for all 8 points, find the equation of the regression line.

b. Using only the pairs of values for the 4 points in the lower left corner, find the equation of the regression line.

c. Using only the pairs of values for the 4 points in the upper right corner, find the equation of the regression line.

d. Compare the results from parts (a), (b), and (c).

Short Answer

Expert verified

a. The regression equation is\(\hat y = 0.085 - 0.985x\).

b. The regression equation with only 4 lower-left corner values is\(\hat y = 1.5 - 0.00x\).

c. The regression equation with only 4 upper-right corner values is\(\hat y = 9.5 - 0.00x\).

c. The regression equations obtained in parts (a), (b), and (c) are completely different from one another. The presence of different sets of values affects the regression equation to a large extent.

Step by step solution

01

Given information

A set of 8 pairs of values is considered.

02

Regression equation using all values

a.

The regression equation of y on x has the following notation:

\(\hat y = {b_0} + {b_1}x\),where

\({b_0}\)is the intercept term, and

\({b_1}\) is the slope coefficient.

The following data points are considered:

The following table shows the necessary calculations:


The value of the y-intercept is computed below.

\(\begin{array}{c}{b_0} = \frac{{\left( {\sum y } \right)\left( {\sum {{x^2}} } \right) - \left( {\sum x } \right)\left( {\sum {xy} } \right)}}{{n\left( {\sum {{x^2}} } \right) - {{\left( {\sum x } \right)}^2}}}\\ = \frac{{\left( {44} \right)\left( {372} \right) - \left( {44} \right)\left( {370} \right)}}{{8\left( {372} \right) - {{\left( {44} \right)}^2}}}\\ = 0.085\end{array}\).

The value of the slope coefficient is computed below.

\(\begin{array}{c}{b_1} = \frac{{n\left( {\sum {xy} } \right) - \left( {\sum x } \right)\left( {\sum y } \right)}}{{n\left( {\sum {{x^2}} } \right) - {{\left( {\sum x } \right)}^2}}}\\ = \frac{{\left( 8 \right)\left( {370} \right) - \left( {44} \right)\left( {44} \right)}}{{8\left( {372} \right) - {{\left( {44} \right)}^2}}}\\ = 0.985\end{array}\).

03

Regression equation using only lower-left points

b.

The following 4 pairs of data points are considered:

The following table shows the necessary calculations:

The value of the y-intercept is computed below.

\(\begin{array}{c}{b_0} = \frac{{\left( {\sum y } \right)\left( {\sum {{x^2}} } \right) - \left( {\sum x } \right)\left( {\sum {xy} } \right)}}{{n\left( {\sum {{x^2}} } \right) - {{\left( {\sum x } \right)}^2}}}\\ = \frac{{\left( 6 \right)\left( {10} \right) - \left( 6 \right)\left( 9 \right)}}{{4\left( {10} \right) - {{\left( 6 \right)}^2}}}\\ = 1.5\end{array}\).

The value of the slope coefficient is computed below.

\(\begin{array}{c}{b_1} = \frac{{n\left( {\sum {xy} } \right) - \left( {\sum x } \right)\left( {\sum y } \right)}}{{n\left( {\sum {{x^2}} } \right) - {{\left( {\sum x } \right)}^2}}}\\ = \frac{{\left( 4 \right)\left( 9 \right) - \left( 6 \right)\left( 6 \right)}}{{4\left( {10} \right) - {{\left( 6 \right)}^2}}}\\ = 0.000\end{array}\).

Thus, the regression equation becomes

\(\hat y = 1.5 - 0.00x\).

04

Regression equation using upper-right corner values

c.

The following 4 pairs of data points are considered:

The following table shows the necessary calculations:

The value of the y-intercept is computed below.

\(\begin{array}{c}{b_0} = \frac{{\left( {\sum y } \right)\left( {\sum {{x^2}} } \right) - \left( {\sum x } \right)\left( {\sum {xy} } \right)}}{{n\left( {\sum {{x^2}} } \right) - {{\left( {\sum x } \right)}^2}}}\\ = \frac{{\left( {38} \right)\left( {362} \right) - \left( {38} \right)\left( {361} \right)}}{{4\left( {362} \right) - {{\left( {38} \right)}^2}}}\\ = 9.5\end{array}\).

The value of the slope coefficient is computed below.

\(\begin{array}{c}{b_1} = \frac{{n\left( {\sum {xy} } \right) - \left( {\sum x } \right)\left( {\sum y } \right)}}{{n\left( {\sum {{x^2}} } \right) - {{\left( {\sum x } \right)}^2}}}\\ = \frac{{\left( 4 \right)\left( {361} \right) - \left( {38} \right)\left( {38} \right)}}{{4\left( {362} \right) - {{\left( {38} \right)}^2}}}\\ = 0.000\end{array}\).

Thus, the regression equation becomes

\(\hat y = 9.5 - 0.00x\).

05

Comparison

d.

The regression equations obtained in parts (a), (b), and (c) are completely different from one another.

Thus, the presence of different sets of values can greatly influence the regression equation.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Interpreting a Computer Display. In Exercises 9–12, refer to the display obtained by using the paired data consisting of Florida registered boats (tens of thousands) and numbers of manatee deaths from encounters with boats in Florida for different recent years (from Data Set 10 in Appendix B). Along with the paired boat, manatee sample data, StatCrunch was also given the value of 85 (tens of thousands) boats to be used for predicting manatee fatalities.

Predicting Manatee Fatalities Using x = 85 (for 850,000 registered boats), what is the single value that is the best predicted number of manatee fatalities resulting from encounters with boats?

Adjusted Coefficient of Determination For Exercise 2, why is it better to use values of adjusted \({R^2}\)instead of simply using values of \({R^2}\)?

In Exercises 5–8, use a significance level of A = 0.05 and refer to theaccompanying displays.Garbage Data Set 31 “Garbage Weight” in Appendix B includes weights of garbage discarded in one week from 62 different households. The paired weights of paper and glass were used to obtain the XLSTAT results shown here. Is there sufficient evidence to support the claim that there is a linear correlation between weights of discarded paper and glass?

Testing for a Linear Correlation. In Exercises 13–28, construct a scatterplot, and find the value of the linear correlation coefficient r. Also find the P-value or the critical values of r from Table A-6. Use a significance level of A = 0.05. Determine whether there is sufficient evidence to support a claim of a linear correlation between the two variables. (Save your work because the same data sets will be used in Section 10-2 exercises.)

Sports Diameters (cm), circumferences (cm), and volumes (cm3) from balls used in different sports are listed in the table below. Is there sufficient evidence to conclude that there is a linear correlation between diameters and circumferences? Does the scatterplot confirm a linear association?


Diameter

Circumference

Volume

Baseball

7.4

23.2

212.2

Basketball

23.9

75.1

7148.1

Golf

4.3

13.5

41.6

Soccer

21.8

68.5

5424.6

Tennis

7

22

179.6

Ping-Pong

4

12.6

33.5

Volleyball

20.9

65.7

4780.1

Softball

9.7

30.5

477.9

Notation Twenty different statistics students are randomly selected. For each of them, their body temperature (°C) is measured and their head circumference (cm) is measured.

a. For this sample of paired data, what does r represent, and what does \(\rho \)represent?

b. Without doing any research or calculations, estimate the value of r.

c. Does r change if the body temperatures are converted to Fahrenheit degrees

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free