Chapter 12: Q. 54 (page 818)

A scatterplot of $y$ versus $x$ shows a positive, nonlinear association. Two different transformations are attempted to try to linearize the association: using the logarithm of the $y -$ values and using the square root of the $y -$ values. Two least-squares regression lines are calculated, one that uses x to predict log(y) and the other that uses x to predict $\sqrt{y}$ . Which of the following would be the best reason to prefer the least-squares regression line that uses x to predict log(y)?
a. The value of $r^{2}$ is smaller.
b. The standard deviation of the residuals is smaller.
c. The slope is greater.
d. The residual plot has more random scatter.
e. The distribution of residuals is more Normal.

Short Answer

Expert verified

The correct option is (b).

Step by step solution

Given information

Two least-squares regression lines are, one that uses x to predict log(y) and the other that uses $x$ to predict $y$ .

Explanation

a. When the value of $r^{2}$ is smaller, it signifies that $x$ has explained less of the variance in $\log y$ compared to the model that predicts $\sqrt{y}$ instead, and so the model is a worse model. As a result, there is no compelling reason to select the model that predicts logy using $x$

b. When the residuals' standard deviation is not too high, there is less fluctuation between the actual and projected values, and thus the model is more accurate. As a result, there is a compelling reason to prefer the model that predicts $\log y$ using $x$

C. The size of the slope has no bearing on how excellent a model is, thus this isn't the best reason to prefer the model that expects log y using $x$ .

d. The presence of more random scatter in a residual figure does not necessarily signal that the model is better; the reason for this is that the higher scatter could be due to more fluctuation between the expected and actual values. This implies that there is no compelling reason to prefer the model that predicts log y using $x$

e. It is normal that the distribution of the residual of the residual has no bearing on the quality of a model. As a result, this is not the best reason to.

So the correct option is (b).