Explore! Exercises 9 and 10 provide two data sets from “Graphs in Statistical Analysis,” by F. J. Anscombe, the American Statistician, Vol. 27. For each exercise,

a. Construct a scatterplot.

b. Find the value of the linear correlation coefficient r, then determine whether there is sufficient evidence to support the claim of a linear correlation between the two variables.

c. Identify the feature of the data that would be missed if part (b) was completed without constructing the scatterplot.

x

10

8

13

9

11

14

6

4

12

7

5

y

9.14

8.14

8.74

8.77

9.26

8.10

6.13

3.10

9.13

7.26

4.74

Short Answer

Expert verified

a. The scatterplot is obtainedbelow:

b. The value of the correlation coefficient is 0.8163. There is enough evidence to support the claim that there is a linear correlation between the two variables.

c. The scatterplot shows that the data consists of an outlier thatdeviates the measure of the correlation coefficient toa large extent.

Step by step solution

01

Given information

The samples size is11(n).

The data for two variables is shown.

x

y

10

7.46

8

6.77

13

12.74

9

7.11

11

7.81

14

8.84

6

6.08

4

5.39

12

8.15

7

6.42

5

5.73

02

Sketch a scatterplot

a.

When the data is visualized on a graph in paired form, it is referred to as a scatterplot.Here, one axis represents the values of x, and the other axis represents the values of y.

Steps to sketch a scatterplot:

  1. Make two axes, x and y, for each of the two variables.
  2. Map each paired value corresponding to the scale of the axes.
  3. Thus, a scatter plot for the paired data is obtained.

03

Compute the measure of the correlation coefficient

b.

The formula for the correlation coefficient is given below:

\(r = \frac{{n\sum {xy} - \left( {\sum x } \right)\left( {\sum y } \right)}}{{\sqrt {n\left( {\sum {{x^2}} } \right) - {{\left( {\sum x } \right)}^2}} \sqrt {n\left( {\sum {{y^2}} } \right) - {{\left( {\sum y } \right)}^2}} }}\)

The valuesare listed in the following table:

x

y

\({x^2}\)

\({y^2}\)

\(xy\)

10

7.46

100

55.6516

74.6

8

6.77

64

45.8329

54.16

13

12.74

169

162.3076

165.62

9

7.11

81

50.5521

63.99

11

7.81

121

60.9961

85.91

14

8.84

196

78.1456

123.76

6

6.08

36

36.9664

36.48

4

5.39

16

29.0521

21.56

12

8.15

144

66.4225

97.8

7

6.42

49

41.2164

44.94

5

5.73

25

32.8329

28.65

\(\sum x = 99\)

\(\sum y = 82.5\)

\(\sum {{x^2}} = 1001\)

\(\sum {{y^2} = } \;659.9762\)

\(\sum {xy\; = \;} 797.47\)

Substitute the values to obtain the value of r.

\(\begin{aligned} r &= \frac{{11\left( {797.47} \right) - \left( {99} \right)\left( {82.50} \right)}}{{\sqrt {11\left( {1001} \right) - {{\left( {99} \right)}^2}} \sqrt {11{{\left( {659.9762} \right)}^2} - {{\left( {82.5} \right)}^2}} }}\\ &= 0.8163\end{aligned}\)

Thus, the correlation coefficient is 0.8163.

04

Step 4:Conduct a hypothesis test for correlation

The statistical hypotheses are formulated below:

\(\begin{array}{l}{{\rm{H}}_{\rm{o}}}:\rho = 0\\{{\rm{{\rm H}}}_{\rm{a}}}:\rho \ne 0\end{array}\)

Here,\(\rho \)isthe actual measure ofthe correlation coefficientfor the variables.

Calculate the test statistic as shown below:

\(\begin{aligned} t &= \frac{r}{{\sqrt {\frac{{1 - {r^2}}}{{n - 2}}} }}\\ &= \frac{{0.8163}}{{\sqrt {\frac{{1 - {{0.8163}^2}}}{{11 - 2}}} }}\\ &= 4.239\end{aligned}\)

Thus, the value of the test statistic is 4.239.

The degree of freedom is computedbelow:

\(\begin{aligned} df &= n - 2\\ &= 11 - 2\\ &= 9\end{aligned}\)

The p-value is computedfrom the t-distribution table.

\(\begin{aligned} p{\rm{ - value}} &= 2P\left( {T > 4.239} \right)\\ &= 2\left( {1 - P\left( {T < 4.239} \right)} \right)\\ &= 0.002\end{aligned}\)

Since the p-value is lesser than the significance level, the null hypothesis is rejected.

Therefore, there is sufficient evidence to support the existence of a linear correlation between two variables.

05

Analyze the importance of the scatterplot

c.

The scatterplot for the data reveals that one point lies at an extreme,beyond the straight-line pattern established by other points in the data. Resultant ofthis, the correlation measure, which was expected to be close to 1, is lower.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Critical Thinking: Is the pain medicine Duragesic effective in reducing pain? Listed below are measures of pain intensity before and after using the drug Duragesic (fontanels) (based on data from Janssen Pharmaceutical Products, L.P.). The data are listed in order by row, and corresponding measures are from the same subject before and after treatment. For example, the first subject had a measure of 1.2 before treatment and a measure of 0.4 after treatment. Each pair of measurements is from one subject, and the intensity of pain was measured using the standard visual analog score. A higher score corresponds to higher pain intensity.

Pain intensity before Duragestic Treatment

1.2

1.3

1.5

1.6

8

3.4

3.5

2.8

2.6

2.2

3

7.1

2.3

2.1

3.4

6.4

5

4.2

2.8

3.9

5.2

6.9

6.9

5

5.5

6

5.5

8.6

9.4

10

7.6

Pain intensity after Duragestic Treatment

0.4

1.4

1.8

2.9

6.0

1.4

0.7

3.9

0.9

1.8

0.9

9.3

8.0

6.8

2.3

0.4

0.7

1.2

4.5

2.0

1.6

2.0

2.0

6.8

6.6

4.1

4.6

2.9

5.4

4.8

4.1

Regression:Use the given data to find the equation of the regression line. Let the response (y) variable be the pain intensity after treatment. What would be the equation of the regression line for a treatment having absolutely no effect?

Testing for a Linear Correlation. In Exercises 13–28, construct a scatterplot, and find the value of the linear correlation coefficient r. Also find the P-value or the critical values of r from Table A-6. Use a significance level of A = 0.05. Determine whether there is sufficient evidence to support a claim of a linear correlation between the two variables. (Save your work because the same data sets will be used in Section 10-2 exercises.)

Sports Diameters (cm), circumferences (cm), and volumes (cm3) from balls used in different sports are listed in the table below. Is there sufficient evidence to conclude that there is a linear correlation between diameters and circumferences? Does the scatterplot confirm a linear association?


Diameter

Circumference

Volume

Baseball

7.4

23.2

212.2

Basketball

23.9

75.1

7148.1

Golf

4.3

13.5

41.6

Soccer

21.8

68.5

5424.6

Tennis

7

22

179.6

Ping-Pong

4

12.6

33.5

Volleyball

20.9

65.7

4780.1

Softball

9.7

30.5

477.9

\({s_e}\)Notation Using Data Set 1 “Body Data” in Appendix B, if we let the predictor variable x represent heights of males and let the response variable y represent weights of males, the sample of 153 heights and weights results in\({s_e}\)= 16.27555 cm. In your own words, describe what that value of \({s_e}\)represents.

Exercises 13–28 use the same data sets as Exercises 13–28 in Section 10-1. In each case, find the regression equation, letting the first variable be the predictor (x) variable. Find the indicated predicted value by following the prediction procedure summarized in Figure 10-5 on page 493.

Use the pizza costs and subway fares to find the best predicted

subway fare, given that the cost of a slice of pizza is $3.00. Is the best predicted subway fare likely to be implemented?

Ages of Moviegoers Based on the data from Cumulative Review Exercise 7, assume that ages of moviegoers are normally distributed with a mean of 35 years and a standard deviation of 20 years.

a. What is the percentage of moviegoers who are younger than 30 years of age?

b. Find\({P_{25}}\), which is the 25th percentile.

c. Find the probability that a simple random sample of 25 moviegoers has a mean age that is less than 30 years.

d. Find the probability that for a simple random sample of 25 moviegoers, each of the moviegoers is younger than 30 years of age. For a particular movie and showtime, why might it not be unusual to have 25 moviegoers all under the age of 30?

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free