Outlier Refer to the accompanying Minitab-generated scatterplot. a. Examine the pattern of all 10 points and subjectively determine whether there appears to be a correlation between x and y. b. After identifying the 10 pairs of coordinates corresponding to the 10 points, find the value of the correlation coefficient r and determine whether there is a linear correlation. c. Now remove the point with coordinates (10, 10) and repeat parts (a) and (b). d. What do you conclude about the possible effect from a single pair of values?

Short Answer

Expert verified

a. The pattern indicates an upward trend.Hence, a correlation can be expected for variables x and y.

b. The values are recorded as shown below:

x

y

1

1

2

1

3

1

1

2

1

3

2

2

2

3

3

2

3

3

10

10

The correlation coefficient is 0.906.

Also, there is sufficient evidence to support the existence ofa linear correlation between the two variables.

c. The correlation coefficient is 0, and there is insufficient evidence to support the claim that there is a linear correlation between the two variables.

d. A single pair of values has a substantial effect on the correlation measure.

Step by step solution

01

Given information

The scatterplot generated on Minitab is given.

02

Analyze the scatterplot

a.

A scatterplot is a two-dimensional graph thatrepresents a pair of values for two variables.

Here,the scatterplot represents an overall upward pattern, which means the values of one variable are expected to increase with the other.

Due to this pattern and moderately close observations, it can be expected that there exists a linear correlation between the two variables.

03

Compute the measure of the correlation coefficient

b.

The observations obtained from the scatterplot are as follows:

x

y

1

1

2

1

3

1

1

2

1

3

2

2

2

3

3

2

3

3

10

10

The formula for the correlation coefficient is shown below:

\(r = \frac{{n\sum {xy} - \left( {\sum x } \right)\left( {\sum y } \right)}}{{\sqrt {n\left( {\sum {{x^2}} } \right) - {{\left( {\sum x } \right)}^2}} \sqrt {n\left( {\sum {{y^2}} } \right) - {{\left( {\sum y } \right)}^2}} }}\)

The valuesare in the table below:

x

y

\({x^2}\)

\({y^2}\)

\(xy\)

1

1

1

1

1

2

1

4

1

2

3

1

9

1

3

1

2

1

4

2

1

3

1

9

3

2

2

4

4

4

2

3

4

9

6

3

2

9

4

6

3

3

9

9

9

10

10

100

100

100

\(\sum x = 28\)

\(\sum y = 28\)

\(\sum {{x^2}} = 142\)

\(\sum {{y^2} = } \;142\)

\(\sum {xy\; = \;} 136\)

Substitute the values to obtain the correlation coefficient.

\(\begin{aligned} r &= \frac{{10\left( {136} \right) - \left( {28} \right)\left( {28} \right)}}{{\sqrt {10\left( {142} \right) - {{\left( {28} \right)}^2}} \sqrt {10{{\left( {142} \right)}^2} - {{\left( {28} \right)}^2}} }}\\ &= 0.906\end{aligned}\)

Thus, the correlation coefficient is 0.906.

04

Step 4:Conduct a hypothesis test for correlation

Let\(\rho \)be the true correlation coefficient.

Form the hypotheses as shown:

\(\begin{array}{l}{{\rm{H}}_{\rm{o}}}:\rho = 0\\{{\rm{{\rm H}}}_{\rm{a}}}:\rho \ne 0\end{array}\)

The samplesize is10(n).

The test statistic is computed as follows:

\(\begin{aligned} t &= \frac{r}{{\sqrt {\frac{{1 - {r^2}}}{{n - 2}}} }}\\ &= \frac{{0.906}}{{\sqrt {\frac{{1 - {{0.906}^2}}}{{10 - 2}}} }}\\ &= 6.054\end{aligned}\)

Thus, the test statistic is 6.054.

The degree of freedom is computedbelow:

\(\begin{aligned} df &= n - 2\\ &= 10 - 2\\ &= 8\end{aligned}\)

The p-value is computedfrom the t-distribution table.

\(\begin{aligned} p{\rm{ - value}} &= 2P\left( {T > t} \right)\\ &= 2P\left( {T > 6.054} \right)\\ &= 2\left( {1 - P\left( {T < 6.054} \right)} \right)\\ &= 0.0003\end{aligned}\)

As thep-value is lesser than 0.05, the null hypothesis is rejected.

Therefore, there is sufficient evidence to prove theexistence of a linear correlation between thetwo variables.

05

Analyze the scatterplot after removing the coordinates (10,10)

c.

The data without coordinate (10,10) is

x

y

1

1

2

1

3

1

1

2

1

3

2

2

2

3

3

2

3

3

The scatterplot hence formed is shown below:

Thus, there appears to be an association between the two variables.

06

Compute the correlation coefficient

The values are in the table below:

x

y

\({x^2}\)

\({y^2}\)

\(xy\)

1

1

1

1

1

2

1

4

1

2

3

1

9

1

3

1

2

1

4

2

1

3

1

9

3

2

2

4

4

4

2

3

4

9

6

3

2

9

4

6

3

3

9

9

9

\(\sum x = 18\)

\(\sum y = 18\)

\(\sum {{x^2}} = 42\)

\(\sum {{y^2} = } \;42\)

\(\sum {xy\; = \;} 36\)

Substitute the values to obtain the correlation coefficient:

\(\begin{aligned} r &= \frac{{9\left( {36} \right) - \left( {18} \right)\left( {18} \right)}}{{\sqrt {9\left( {42} \right) - {{\left( {18} \right)}^2}} \sqrt {9{{\left( {42} \right)}^2} - {{\left( {18} \right)}^2}} }}\\ &= 0\end{aligned}\)

Thus, the correlation coefficient is 0.

07

Conduct a hypothesis test for correlation

Let\(\rho \)denote the actual correlation coefficient.

The hypotheses areformulatedas shown below

\(\begin{array}{l}{{\rm{H}}_{\rm{o}}}:\rho = 0\\{{\rm{{\rm H}}}_{\rm{a}}}:\rho \ne 0\end{array}\)

The samplesize is9(n).

The test statistic is computed as follows:

\(\begin{aligned} t &= \frac{r}{{\sqrt {\frac{{1 - {r^2}}}{{n - 2}}} }}\\ &= \frac{0}{{\sqrt {\frac{{1 - {0^2}}}{{9 - 2}}} }}\\ &= 0\end{aligned}\)

Thus, the test statistic is 0.

The degree of freedom is computedbelow:

\(\begin{aligned} df &= n - 2\\ &= 9 - 2\\ &= 7\end{aligned}\)

The p-value is computed from the t-distribution table.

\(\begin{aligned} p - value &= 2P\left( {t > 0} \right)\\ &= 2\left( {1 - P\left( {t < 0} \right)} \right)\\ &= 1\end{aligned}\)

As the p-value is greater than 0.05, the null hypothesis fails to be rejected.

Therefore, there is not sufficient evidence to support the claim that the variables have a linear correlation between them.

08

Discuss the effect of a single pair of values

The result changes to a large extent as one single paired observation is removed from the data. The correlation measure changes from 0.906 to 0 as one pair is removed from the data.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Interpreting the Coefficient of Determination. In Exercises 5–8, use the value of the linear correlation coefficient r to find the coefficient of determination and the percentage of the total variation that can be explained by the linear relationship between the two variables.

Weight , Waist r = 0.885 (x = weight of male, y = waist size of male)

Testing for a Linear Correlation. In Exercises 13–28, construct a scatterplot, and find the value of the linear correlation coefficient r. Also find the P-value or the critical values of r from Table A-6. Use a significance level of A = 0.05. Determine whether there is sufficient evidence to support a claim of a linear correlation between the two variables. (Save your work because the same data sets will be used in Section 10-2 exercises.)

Pizza and the Subway The “pizza connection” is the principle that the price of a slice of pizza in New York City is always about the same as the subway fare. Use the data listed below to determine whether there is a significant linear correlation between the cost of a slice of pizza and the subway fare.

Year

1960

1973

1986

1995

2002

2003

2009

2013

2015

Pizza Cost

0.15

0.35

1

1.25

1.75

2

2.25

2.3

2.75

Subway Fare

0.15

0.35

1

1.35

1.5

2

2.25

2.5

2.75

CPI

30.2

48.3

112.3

162.2

191.9

197.8

214.5

233

237.2

Exercises 13–28 use the same data sets as Exercises 13–28 in Section 10-1. In each case, find the regression equation, letting the first variable be the predictor (x) variable. Find the indicated predicted value by following the prediction procedure summarized in Figure 10-5 on page 493.

Use the CPI/subway fare data from the preceding exercise and find

the best predicted subway fare for a time when the CPI reaches 500. What is wrong with this prediction?

Critical Thinking: Is the pain medicine Duragesic effective in reducing pain? Listed below are measures of pain intensity before and after using the drug Duragesic (fentanyl) (based on data from Janssen Pharmaceutical Products, L.P.). The data are listed in order by row, and corresponding measures are from the same subject before and after treatment. For example, the first subject had a measure of 1.2 before treatment and a measure of 0.4 after treatment. Each pair of measurements is from one subject, and the intensity of pain was measured using the standard visual analog score. A higher score corresponds to higher pain intensity.

Pain Intensity Before Duragesic Treatment

1.2

1.3

1.5

1.6

8

3.4

3.5

2.8

2.6

2.2

3

7.1

2.3

2.1

3.4

6.4

5

4.2

2.8

3.9

5.2

6.9

6.9

5

5.5

6

5.5

8.6

9.4

10

7.6










Pain Intensity After Duragesic Treatment

0.4

1.4

1.8

2.9

6

1.4

0.7

3.9

0.9

1.8

0.9

9.3

8

6.8

2.3

0.4

0.7

1.2

4.5

2

1.6

2

2

6.8

6.6

4.1

4.6

2.9

5.4

4.8

4.1










Two Independent Samples The methods of Section 9-2 can be used to test the claim that two populations have the same mean. Identify the specific claim that the treatment is effective, then use the methods of Section 9-2 to test that claim. The methods of Section 9-2 are based on the requirement that the samples are independent. Are they independent in this case?

Testing for a Linear Correlation. In Exercises 13–28, construct a scatterplot, and find the value of the linear correlation coefficient r. Also find the P-value or the critical values of r from Table A-6. Use a significance level of A = 0.05. Determine whether there is sufficient evidence to support a claim of a linear correlation between the two variables. (Save your work because the same data sets will be used in Section 10-2 exercises.)

CSI Statistics Use the paired foot length and height data from the preceding exercise. Is there sufficient evidence to conclude that there is a linear correlation between foot lengths and heights of males? Based on these results, does it appear that police can use foot length to estimate the height of a male?

Shoe print(cm)

29.7

29.7

31.4

31.8

27.6

Foot length(cm)

25.7

25.4

27.9

26.7

25.1

Height (cm)

175.3

177.8

185.4

175.3

172.7

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free