In Exercises 9 and 10, use the given data to find the equation of the regression line. Examine the scatterplot and identify a characteristic of the data that is ignored by the regression line.

Short Answer

Expert verified

The regression equation is \(\hat y = 3.00 + 0.500x\).

The scatterplot is:

There exists an outlier in the data.

Step by step solution

01

Given information

Values are given on two variables namely, x and y.

02

Calculate the mean of x and y

Themean value of xis given as,

\(\begin{array}{c}\bar x = \frac{{\sum\limits_{i = 1}^n {{x_i}} }}{n}\\ = \frac{{10 + 8 + .... + 5}}{{11}}\\ = 9\end{array}\)

Therefore, the mean value of x is 9.

Themean value of yis given as,

\(\begin{array}{c}\bar y = \frac{{\sum\limits_{i = 1}^n {{y_i}} }}{n}\\ = \frac{{7.46 + 6.77 + .... + 5.73}}{{11}}\\ = 7.5\end{array}\)

Therefore, the mean value of y is 7.5

03

Calculate the standard deviation of x and y

The standard deviation of x is given as,

\(\begin{array}{c}{s_x} = \sqrt {\frac{{\sum\limits_{i = 1}^n {{{({x_i} - \bar x)}^2}} }}{{n - 1}}} \\ = \sqrt {\frac{{{{\left( {10 - 9} \right)}^2} + {{\left( {8 - 9} \right)}^2} + ... + {{\left( {5 - 9} \right)}^2}}}{{11 - 1}}} \\ = 3.3166\end{array}\)

Therefore, the standard deviation of x is 3.3166.

The standard deviation of y is given as,

\(\begin{array}{c}{s_y} = \sqrt {\frac{{\sum\limits_{i = 1}^n {{{({y_i} - \bar y)}^2}} }}{{n - 1}}} \\ = \sqrt {\frac{{{{\left( {7.46 - 7.5} \right)}^2} + {{\left( {6.77 - 7.5} \right)}^2} + ..... + {{\left( {5.73 - 7.5} \right)}^2}}}{{11 - 1}}} \\ = 2.0304\end{array}\)

Therefore, the standard deviation of y is 2.0304.

04

Calculate the correlation coefficient

The correlation coefficient is given as,

\(r = \frac{{n\left( {\sum {xy} } \right) - \left( {\sum x } \right)\left( {\sum y } \right)}}{{\sqrt {\left( {\left( {n\sum {{x^2}} } \right) - {{\left( {\sum x } \right)}^2}} \right)\left( {\left( {n\sum {{y^2}} } \right) - {{\left( {\sum y } \right)}^2}} \right)} }}\)

The calculations required to compute the correlation coefficient are as follows:

The correlation coefficient is given as,

\(\begin{array}{l}r = \frac{{n\left( {\sum {xy} } \right) - \left( {\sum x } \right)\left( {\sum y } \right)}}{{\sqrt {\left( {\left( {n\sum {{x^2}} } \right) - {{\left( {\sum x } \right)}^2}} \right)\left( {\left( {n\sum {{y^2}} } \right) - {{\left( {\sum y } \right)}^2}} \right)} }}\\ = \frac{{11\left( {797.47} \right) - \left( {99} \right)\left( {82.5} \right)}}{{\sqrt {\left( {\left( {11 \times 1001} \right) - {{\left( {99} \right)}^2}} \right)\left( {\left( {11 \times 659.9762} \right) - {{\left( {82.5} \right)}^2}} \right)} }}\\ = 0.8163\end{array}\)

Therefore, the correlation coefficient is 0.8163.

05

Calculate the slope of the regression line

The slopeof the regression line is given as,

\(\begin{array}{c}{b_1} = r\frac{{{s_Y}}}{{{s_X}}}\\ = 0.8163 \times \frac{{2.030}}{{3.317}}\\ = 0.4996\\ \approx 0.500\end{array}\)

Therefore, the value of slope is 0.500.

06

Calculate the intercept of the regression line

The interceptis computed as,

\(\begin{array}{c}{b_0} = \bar y - {b_1}\bar x\\ = 7.5 - \left( {0.500 \times 9} \right)\\ = 3.002\end{array}\)

Therefore, the value of intercept is 3.00.

07

Form a regression equation

Theregression equationis given as,

\(\begin{array}{c}\hat y = {b_0} + {b_1}x\\ = 3.002 + 0.500x\end{array}\)

Thus, the regression equation is \(\hat y = 3.002 + 0.500x\).

08

Construct a scatter plot

Use the following steps to plot a scatter plot between x and y:

  • Consider x and y.
  • Mark the values 0, 2, and so on until 14 on the vertical axis.
  • Mark the values 0, 5, and so on until 15 on the horizontal axis.
  • Plot the points on the graph corresponding to the pairs of values for the two variables.
  • Label the horizontal axis as “y” and the vertical axis as “x”.

The following scatterplot is generated:

09

State the characteristic ignored in the data

It can be observed from the above scatter plot that an observation (13,12.74) is extreme and deviates largely from a straight line pattern. Thus the characteristic that is been ignored is thatthere exists an outlier at (13,12.74).

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

\({s_e}\)Notation Using Data Set 1 “Body Data” in Appendix B, if we let the predictor variable x represent heights of males and let the response variable y represent weights of males, the sample of 153 heights and weights results in\({s_e}\)= 16.27555 cm. In your own words, describe what that value of \({s_e}\)represents.

let the predictor variable x be the first variable given. Use the given data to find the regression equation and the best predicted value of the response variable. Be sure to follow the prediction procedure summarized in Figure 10-5 on page 493. Use a 0.05 significance level.

For 50 randomly selected speed dates, attractiveness ratings by males of their

female date partners (x) are recorded along with the attractiveness ratings by females of their male date partners (y); the ratings are from Data Set 18 “Speed Dating” in Appendix B. The 50 paired ratings yield\(\bar x = 6.5\),\(\bar y = 5.9\), r= -0.277, P-value = 0.051, and\(\hat y = 8.18 - 0.345x\). Find the best predicted value of\(\hat y\)(attractiveness rating by female of male) for a date in which the attractiveness rating by the male of the female is x= 8.

In Exercises 5–8, use a significance level 0.05 and refer to theaccompanying displays.Cereal Killers The amounts of sugar (grams of sugar per gram of cereal) and calories (per gram of cereal) were recorded for a sample of 16 different cereals. TI-83>84 Plus calculator results are shown here. Is there sufficient evidence to support the claim that there is a linear correlation between sugar and calories in a gram of cereal? Explain.

Ages of MoviegoersThe table below shows the distribution of the ages of moviegoers(based on data from the Motion Picture Association of America). Use the data to estimate themean, standard deviation, and variance of ages of moviegoers.Hint:For the open-ended categoryof “60 and older,” assume that the category is actually 60–80.

Age

2-11

12-17

18-24

25-39

40-49

50-59

60 and older

Percent

7

15

19

19

15

11

14

Explore! Exercises 9 and 10 provide two data sets from “Graphs in Statistical Analysis,” by F. J. Anscombe, the American Statistician, Vol. 27. For each exercise,

a. Construct a scatterplot.

b. Find the value of the linear correlation coefficient r, then determine whether there is sufficient evidence to support the claim of a linear correlation between the two variables.

c. Identify the feature of the data that would be missed if part (b) was completed without constructing the scatterplot.

x

10

8

13

9

11

14

6

4

12

7

5

y

9.14

8.14

8.74

8.77

9.26

8.10

6.13

3.10

9.13

7.26

4.74

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free