Chapter 12: Q114E (page 781)

Question: Accuracy of software effort estimates. Periodically, software engineers must provide estimates of their effort in developing new software. In the Journal of Empirical Software Engineering (Vol. 9, 2004), multiple regression was used to predict the accuracy of these effort estimates. The dependent variable, defined as the relative error in estimating effort, y = (Actual effort - Estimated effort)/ (Actual effort) was determined for each in a sample of n = 49 software development tasks. Eight independent variables were evaluated as potential predictors of relative error using stepwise regression. Each of these was formulated as a dummy variable, as shown in the table.
Company role of estimator: x₁ = 1 if developer, 0 if project leader
Task complexity: x₂ = 1 if low, 0 if medium/high
Contract type: x₃ = 1 if fixed price, 0 if hourly rate
Customer importance: x₄ = 1 if high, 0 if low/medium
Customer priority: x₅ = 1 if time of delivery, 0 if cost or quality
Level of knowledge: x₆ = 1 if high, 0 if low/medium
Participation: x₇ = 1 if estimator participates in work, 0 if not
Previous accuracy: x₈ = 1 if more than 20% accurate, 0 if less than 20% accurate
a. In step 1 of the stepwise regression, how many different one-variable models are fit to the data?
b. In step 1, the variable x₁ is selected as the best one- variable predictor. How is this determined?
c. In step 2 of the stepwise regression, how many different two-variable models (where x₁ is one of the variables) are fit to the data?
d. The only two variables selected for entry into the stepwise regression model were x₁ and x₈. The stepwise regression yielded the following prediction equation:
Give a practical interpretation of the β estimates multiplied by x₁ and x₈.
e) Why should a researcher be wary of using the model, part d, as the final model for predicting effort (y)?

Short Answer

Expert verified

Answer

a. Since there are 8 independent variables, there will be 8 1-variable models which will be fitted to the data.

b. The best predictor variable is selected by comparing the t-values of all the variables. The variable with the highest absolute t-value is selected.

c. 7 2-variable models are fitted.

d. The β estimates of x₁ and x₈are - 0.28 and 0.27. Negative sign of β₁ indicate an inverse relationship between x₁ and y and positive sign of β₈ indicate a positive relationship between x₈ and y.

e. Precautions while using stepwise model - First, an extremely large number of t-tests have been conducted, leading to a high probability of making one or more Type I or Type II errors. Second, the stepwise model does not include any higher-order or interaction terms.

Step by step solution

1-variable models

Since there are 8 independent variables, there will be 8 1-variable models which will be fitted to the data.

Best predictor variable

The best predictor variable is selected by comparing the t-values of all the variables. The variable with the highest absolute t-value is selected.

2-variable model

Since there are 8 independent variables, (k-1) no of models are 2-variable models are fitted in step 2 of stepwise regression.

So, 7 2-variable models are fitted.

Interpretation of β estimates

The β estimates of x₁ and x₈are – 0.28 and 0.27. Negative sign of β₁ indicate an inverse relationship between x₁ and y and positive sign of β₈ indicate a positive relationship between x₈ and y.

Precautions while using stepwise model

Precautions while using stepwise model -

First, an extremely large number of t-tests have been conducted, leading to a high probability of making one or more Type I or Type II errors. Second, the stepwise model does not include any higher-order or interaction terms.