Explore! Exercises 9 and 10 provide two data sets from “Graphs in Statistical Analysis,” by F. J. Anscombe, the American Statistician, Vol. 27. For each exercise,

a. Construct a scatterplot.

b. Find the value of the linear correlation coefficient r, then determine whether there is sufficient evidence to support the claim of a linear correlation between the two variables.

c. Identify the feature of the data that would be missed if part (b) was completed without constructing the scatterplot.

x

10

8

13

9

11

14

6

4

12

7

5

y

9.14

8.14

8.74

8.77

9.26

8.10

6.13

3.10

9.13

7.26

4.74

Short Answer

Expert verified

a. The scatter plot is shown below:

b. The correlation coefficient is 0.8162. There is enough evidence to support the claim that there is a linear correlation between the two variables.

c. The scatterplot shows that the data follows a non-linear pattern missing in part (b).

Step by step solution

01

Given information

The paired data for two variables arerecorded.

x

10

8

13

9

11

14

6

4

12

7

5

y

9.14

8.14

8.74

8.77

9.26

8.1

6.13

3.1

9.13

7.26

4.74

02

Sketch a scatterplot

a.

A scatterplot is a graph that represents observations for a paired set of data.

Steps to sketch a scatterplot:

  1. Define thex and yaxes for each of the two variables. The horizontal axis is thex-axis, and the vertical axis is the y-axis.
  2. Map each paired value corresponding to the axes.
  3. Thus, a scatter plot for the paired data is obtained.

03

Compute the measure of the correlation coefficient

b.

The correlation coefficient is computed below:

\(r = \frac{{n\sum {xy} - \left( {\sum x } \right)\left( {\sum y } \right)}}{{\sqrt {n\left( {\sum {{x^2}} } \right) - {{\left( {\sum x } \right)}^2}} \sqrt {n\left( {\sum {{y^2}} } \right) - {{\left( {\sum y } \right)}^2}} }}\)

The valuesare listedin the table below:

x

y

\({x^2}\)

\({y^2}\)

\(xy\)

10

9.14

100

83.5396

91.4

8

8.14

64

66.2596

65.12

13

8.74

169

76.3876

113.62

9

8.77

81

76.9129

78.93

11

9.26

121

85.7476

101.86

14

8.1

196

65.61

113.4

6

6.13

36

37.5769

36.78

4

3.1

16

9.61

12.4

12

9.13

144

83.3569

109.56

7

7.26

49

52.7076

50.82

5

4.74

25

22.4676

23.7

\(\sum x = 99\)

\(\sum y = 82.51\)

\(\sum {{x^2}} = 1001\)

\(\sum {{y^2} = } \;660.1763\)

\(\sum {xy\; = \;} 797.59\)

Substitute the values in the formula:

\(\begin{aligned} r &= \frac{{11\left( {797.59} \right) - \left( {99} \right)\left( {82.51} \right)}}{{\sqrt {11\left( {1001} \right) - {{\left( {99} \right)}^2}} \sqrt {11{{\left( {660.1763} \right)}^2} - {{\left( {82.51} \right)}^2}} }}\\ &= 0.8162\end{aligned}\)

Thus, the correlation coefficient is 0.8162.

04

Step 4:Conduct a hypothesis test for correlation

Let\(\rho \)be the true correlation coefficient measure for the paired variables.

For testing the claim, form the hypotheses as shown below:

\(\begin{array}{l}{{\rm{H}}_{\rm{o}}}:\rho = 0\\{{\rm{{\rm H}}}_{\rm{a}}}:\rho \ne 0\end{array}\)

The samples size is11(n).

The test statistic is computed as follows:

\(\begin{aligned} t &= \frac{r}{{\sqrt {\frac{{1 - {r^2}}}{{n - 2}}} }}\\ &= \frac{{0.8162}}{{\sqrt {\frac{{1 - {{0.8162}^2}}}{{11 - 2}}} }}\\ &= 4.238\end{aligned}\)

Thus, the test statistic is 4.238.

The degree of freedom is computed below:

\(\begin{aligned} df &= n - 2\\ &= 11 - 2\\ &= 9\end{aligned}\)

The p-value is computed using the t-distribution table.

\(\begin{aligned} p{\rm{ - value}} &= 2P\left( {T > t} \right)\\ &= 2P\left( {T > 4.238} \right)\\ &= 2\left( {1 - P\left( {T < 4.238} \right)} \right)\\ &= 0.002\end{aligned}\)

Thus, the p-value is 0.002.

Since the p-value is lesser than 0.05, the null hypothesis is rejected.

Therefore, there is sufficient evidence to conclude that variables x and y have a linear correlation between them.

05

Analyze the importance of the scatterplot

c.

The scatterplot reveals that the data follows a strong non-linear pattern. It means that the observations do not align on a straight line.

The characteristic of the data would be missed in part (b) if the scatterplot was not sketched.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Exercises 13–28 use the same data sets as Exercises 13–28 in Section 10-1. In each case, find the regression equation, letting the first variable be the predictor (x) variable. Find the indicated predicted value by following the prediction procedure summarized in Figure 10-5 on page 493.

Use the CPI/subway fare data from the preceding exercise and find

the best predicted subway fare for a time when the CPI reaches 500. What is wrong with this prediction?

Stocks and Sunspots. Listed below are annual high values of the Dow Jones Industrial Average (DJIA) and annual mean sunspot numbers for eight recent years. Use the data for Exercises 1–5. A sunspot number is a measure of sunspots or groups of sunspots on the surface of the sun. The DJIA is a commonly used index that is a weighted mean calculated from different stock values.

DJIA

14,198

13,338

10,606

11,625

12,929

13,589

16,577

18,054

Sunspot

Number

7.5

2.9

3.1

16.5

55.7

57.6

64.7

79.3

Correlation Use a 0.05 significance level to test for a linear correlation between the DJIA values and the sunspot numbers. Is the result as you expected? Should anyone consider investing in stocks based on sunspot numbers?

Stocks and Sunspots. Listed below are annual high values of the Dow Jones Industrial Average (DJIA) and annual mean sunspot numbers for eight recent years. Use the data for Exercises 1–5. A sunspot number is a measure of sunspots or groups of sunspots on the surface of the sun. The DJIA is a commonly used index that is a weighted mean calculated from different stock values.

DJIA

14,198

13,338

10,606

11,625

12,929

13,589

16,577

18,054

Sunspot

Number

7.5

2.9

3.1

16.5

55.7

57.6

64.7

79.3

Hypothesis Test The mean sunspot number for the past three centuries is 49.7. Use a 0.05 significance level to test the claim that the eight listed sunspot numbers are from a population with a mean equal to 49.7.

Interpreting a Computer Display. In Exercises 9–12, refer to the display obtained by using the paired data consisting of Florida registered boats (tens of thousands) and numbers of manatee deaths from encounters with boats in Florida for different recent years (from Data Set 10 in Appendix B). Along with the paired boat, manatee sample data, StatCrunch was also given the value of 85 (tens of thousands) boats to be used for predicting manatee fatalities.

Predicting Manatee Fatalities Using x = 85 (for 850,000 registered boats), what is the single value that is the best predicted number of manatee fatalities resulting from encounters with boats?

The following exercises are based on the following sample data consisting of numbers of enrolled students (in thousands) and numbers of burglaries for randomly selected large colleges in a recent year (based on data from the New York Times).

Exercise 1 stated that ris found to be 0.499. Does that value change if the actual enrollment values of 53,000, 28,000, 27,000, 36,000, and 42,000 are used instead of 53, 28, 27, 36, and 42?

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free