Chapter 10: Q9BSC (page 468)

Explore! Exercises 9 and 10 provide two data sets from “Graphs in Statistical Analysis,” by F. J. Anscombe, the American Statistician, Vol. 27. For each exercise,
a. Construct a scatterplot.
b. Find the value of the linear correlation coefficient r, then determine whether there is sufficient evidence to support the claim of a linear correlation between the two variables.
c. Identify the feature of the data that would be missed if part (b) was completed without constructing the scatterplot.
x
10
8
13
9
11
14
6
4
12
7
5
y
9.14
8.14
8.74
8.77
9.26
8.10
6.13
3.10
9.13
7.26
4.74

Short Answer

Expert verified

a. The scatter plot is shown below:

b. The correlation coefficient is 0.8162. There is enough evidence to support the claim that there is a linear correlation between the two variables.

c. The scatterplot shows that the data follows a non-linear pattern missing in part (b).

Step by step solution

Given information

The paired data for two variables arerecorded.

x	10	8	13	9	11	14	6	4	12	7	5
y	9.14	8.14	8.74	8.77	9.26	8.1	6.13	3.1	9.13	7.26	4.74

Sketch a scatterplot

A scatterplot is a graph that represents observations for a paired set of data.

Steps to sketch a scatterplot:

Define thex and yaxes for each of the two variables. The horizontal axis is thex-axis, and the vertical axis is the y-axis.
Map each paired value corresponding to the axes.
Thus, a scatter plot for the paired data is obtained.

Compute the measure of the correlation coefficient

The correlation coefficient is computed below:

\(r = \frac{{n\sum {xy} - \left( {\sum x } \right)\left( {\sum y } \right)}}{{\sqrt {n\left( {\sum {{x^2}} } \right) - {{\left( {\sum x } \right)}^2}} \sqrt {n\left( {\sum {{y^2}} } \right) - {{\left( {\sum y } \right)}^2}} }}\)

The valuesare listedin the table below:

x	y	\({x^2}\)	\({y^2}\)	\(xy\)
10	9.14	100	83.5396	91.4
8	8.14	64	66.2596	65.12
13	8.74	169	76.3876	113.62
9	8.77	81	76.9129	78.93
11	9.26	121	85.7476	101.86
14	8.1	196	65.61	113.4
6	6.13	36	37.5769	36.78
4	3.1	16	9.61	12.4
12	9.13	144	83.3569	109.56
7	7.26	49	52.7076	50.82
5	4.74	25	22.4676	23.7
\(\sum x = 99\)	\(\sum y = 82.51\)	\(\sum {{x^2}} = 1001\)	\(\sum {{y^2} = } \;660.1763\)	\(\sum {xy\; = \;} 797.59\)

Substitute the values in the formula:

\(\begin{aligned} r &= \frac{{11\left( {797.59} \right) - \left( {99} \right)\left( {82.51} \right)}}{{\sqrt {11\left( {1001} \right) - {{\left( {99} \right)}^2}} \sqrt {11{{\left( {660.1763} \right)}^2} - {{\left( {82.51} \right)}^2}} }}\\ &= 0.8162\end{aligned}\)

Thus, the correlation coefficient is 0.8162.

Step 4:Conduct a hypothesis test for correlation

Let\(\rho \)be the true correlation coefficient measure for the paired variables.

For testing the claim, form the hypotheses as shown below:

\(\begin{array}{l}{{\rm{H}}_{\rm{o}}}:\rho = 0\\{{\rm{{\rm H}}}_{\rm{a}}}:\rho \ne 0\end{array}\)

The samples size is11(n).

The test statistic is computed as follows:

\(\begin{aligned} t &= \frac{r}{{\sqrt {\frac{{1 - {r^2}}}{{n - 2}}} }}\\ &= \frac{{0.8162}}{{\sqrt {\frac{{1 - {{0.8162}^2}}}{{11 - 2}}} }}\\ &= 4.238\end{aligned}\)

Thus, the test statistic is 4.238.

The degree of freedom is computed below:

\(\begin{aligned} df &= n - 2\\ &= 11 - 2\\ &= 9\end{aligned}\)

The p-value is computed using the t-distribution table.

\(\begin{aligned} p{\rm{ - value}} &= 2P\left( {T > t} \right)\\ &= 2P\left( {T > 4.238} \right)\\ &= 2\left( {1 - P\left( {T < 4.238} \right)} \right)\\ &= 0.002\end{aligned}\)

Thus, the p-value is 0.002.

Since the p-value is lesser than 0.05, the null hypothesis is rejected.

Therefore, there is sufficient evidence to conclude that variables x and y have a linear correlation between them.