Chapter 12: Q. T12.12 (page 825)

T12.12 Foresters are interested in predicting the amount of usable lumber they can harvest from various tree species. They collect data on the diameter at breast height (DBH) in inches and the yield in board feet of a random sample of 20 Ponderosa pine trees that have been harvested. (Note that a board foot is defined as a piece of lumber 12 inches by 12 inches by 1 inch.) Here is a scatterplot of the data.
a. Here is some computer output and a residual plot from a least-squares regression on these data. Explain why a linear model may not be appropriate in this case.
The foresters are considering two possible transformations of the original data: (1) cubing the diameter values or (2) taking the natural logarithm of the yield measurements. After transforming the data, a least-squares regression analysis is performed. Here is some computer output and a residual plot for each of the two possible regression models:
b. Use both models to predict the amount of usable lumber from a Ponderosa pine with diameter 30 inches.
c. Which of the predictions in part (b) seems more reliable? Give appropriate evidence to support your choice.

Short Answer

Expert verified

(a) The pattern in the residual plot involves substantial curvature, a linear model will not be appropriate because the variables have a curved connection.

(b) The predicted yield for option $1$ is $117.0899$ board feet and the predicted yield for option $2$ is $102.967$ board feet.

Step by step solution

Part (a) Step 1: Given information

To determine that a linear model may not be appropriate in this case.

Part (a) Step 2: Explanation

Foresters want to know how much useful lumber they'll be able to get from different tree species.
They took measurements of a random sample of Ponderosa pine trees' diameter at breast height in inches and yield in broad feet.
In the question, will find the computer output as well as a residual plot from least square regression.
Because the pattern in the residual plot involves substantial curvature, a linear model will not be acceptable because the variables have a curved connection.

Part (b) Step 1: Given information

To use both models to predict the amount of usable lumber from a Ponderosa pine with diameter $30$ inches.

Part (b) Step 2: Explanation

Foresters want to know how much useful lumber they will be able to collect from different tree types. They measured the diameter of a random sample of Ponderosa pine trees at breast height in inches and the yield in broad feet. The question includes the computer results as well as a residual graphic from a least square regression. The foresters are exploring cubing the diameter values or taking the natural logarithm of the yields measurements as two feasible modifications of the original data.
As a result, the general equation of the least square regression line for option $1$ is:
$\hat{y} = b_{0} + b_{1} x$
The value of the constant $b_{0}$ is calculated as follows in the computer output's row "Constant" and column "Coef":
$b_{0} = 2.078$
The value of the constant $b_{1}$ is calculated as follows in the computer output's row " $D B H 3$ " and column "Coef":
$b_{1} = 0.0042597$

In the general equation, replace $b_{0}$ with $2.078$ and $b_{1}$ with $b_{1} = 0.0042597$ .
$\hat{y} = b_{0} + b_{1} x$
$\Rightarrow \hat{y} = 2.078 + 0.0042597 x$

Hence the cubic equation is calculated as:

\hat{y} = 2.078 + 0.0042597 x^{3}

Substitute $x$ for $30$ :

$\hat{y} = 2.078 + 0.0042597 x^{3}$

$= 2.078 + 0.0042597 (30)^{3}$

$= 117.0899$

As a result, the predicted yield is $117.0899$ board feet.

Part (b) Step 3: Explanation

Then, the general equation of the least square regression line for option $2$ is:
$\hat{y} = b_{0} + b_{1} x$
The value of the constant $b 0$ is calculated as follows in the computer output's row "Constant" and column "Coef":
$b_{0} = 1.2319$

The value of the constant $b 1$ is calculated as follows in the computer output's row "DBH" and column "Coef":
$b_{1} = 0.113417$
In the general equation, replace $b_{0}=1.2319$ and $b_{1}$ with $b_{1}=0.113417$,
$\hat{y} = b_{0} + b_{1} x$
$\Rightarrow \hat{y} = 1.2319 + 0.113417 x$
Use the logarithm in the equation:
$\ln \hat{y} = 1.2319 + 0.113417 x$
Then multiply $x$ by $30$ to get:
$\ln \hat{y} = 1.2319 + 0.113417 x$
$= 1.2319 + 0.113417 (30)$
$= 4.63441$
Take each side's exponential:
$\hat{y} = e^{\ln \hat{y}}$
$= e^{4.63441}$
$= 102.967$

As a result, the predicted yield is $102.967$ board feet.

Part (c) Step 1: Given information

To find the predictions in part (b) seems more reliable and to explain with appropriate evidence.

Part (c) Step 2: Explanation

Foresters want to know how much useful lumber they will be able to collect from different tree types. They measured the diameter of a random sample of Ponderosa pine trees at breast height in inches and the yield in broad feet.
The question includes the computer results as well as a residual graphic from a least square regression. The foresters are exploring cubing the diameter values or taking the natural logarithm of the yields measurements as two feasible modifications of the original data.
As a result, the residual plot of option 1 has no strong curvature, whereas the residual plot of option $2$ has strong curvature.
Also, the model in option $1$ is appropriate for making predictions, but the model in option $2$ is not.
Therefore, estimated that forecast using option $1$ will be more accurate.