Question: Accuracy of software effort estimates. Periodically, software engineers must provide estimates of their effort in developing new software. In the Journal of Empirical Software Engineering (Vol. 9, 2004), multiple regression was used to predict the accuracy of these effort estimates. The dependent variable, defined as the relative error in estimating effort, y = (Actual effort - Estimated effort)/ (Actual effort) was determined for each in a sample of n = 49 software development tasks. Eight independent variables were evaluated as potential predictors of relative error using stepwise regression. Each of these was formulated as a dummy variable, as shown in the table.

Company role of estimator: x1 = 1 if developer, 0 if project leader

Task complexity: x2 = 1 if low, 0 if medium/high

Contract type: x3 = 1 if fixed price, 0 if hourly rate

Customer importance: x4 = 1 if high, 0 if low/medium

Customer priority: x5 = 1 if time of delivery, 0 if cost or quality

Level of knowledge: x6 = 1 if high, 0 if low/medium

Participation: x7 = 1 if estimator participates in work, 0 if not

Previous accuracy: x8 = 1 if more than 20% accurate, 0 if less than 20% accurate

a. In step 1 of the stepwise regression, how many different one-variable models are fit to the data?

b. In step 1, the variable x1 is selected as the best one- variable predictor. How is this determined?

c. In step 2 of the stepwise regression, how many different two-variable models (where x1 is one of the variables) are fit to the data?

d. The only two variables selected for entry into the stepwise regression model were x1 and x8. The stepwise regression yielded the following prediction equation:

Give a practical interpretation of the β estimates multiplied by x1 and x8.

e) Why should a researcher be wary of using the model, part d, as the final model for predicting effort (y)?

Short Answer

Expert verified

Answer

a. Since there are 8 independent variables, there will be 8 1-variable models which will be fitted to the data.

b. The best predictor variable is selected by comparing the t-values of all the variables. The variable with the highest absolute t-value is selected.

c. 7 2-variable models are fitted.

d. The β estimates of x1 and x8are - 0.28 and 0.27. Negative sign of β1 indicate an inverse relationship between x1 and y and positive sign of β8 indicate a positive relationship between x8 and y.

e. Precautions while using stepwise model - First, an extremely large number of t-tests have been conducted, leading to a high probability of making one or more Type I or Type II errors. Second, the stepwise model does not include any higher-order or interaction terms.

Step by step solution

01

1-variable models

Since there are 8 independent variables, there will be 8 1-variable models which will be fitted to the data.

02

Best predictor variable

The best predictor variable is selected by comparing the t-values of all the variables. The variable with the highest absolute t-value is selected.

03

2-variable model

Since there are 8 independent variables, (k-1) no of models are 2-variable models are fitted in step 2 of stepwise regression.

So, 7 2-variable models are fitted.

04

Interpretation of β estimates

The β estimates of x1 and x8are – 0.28 and 0.27. Negative sign of β1 indicate an inverse relationship between x1 and y and positive sign of β8 indicate a positive relationship between x8 and y.

05

Precautions while using stepwise model

Precautions while using stepwise model -

First, an extremely large number of t-tests have been conducted, leading to a high probability of making one or more Type I or Type II errors. Second, the stepwise model does not include any higher-order or interaction terms.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Question: The Excel printout below resulted from fitting the following model to n = 15 data points: y=β0+β1x1+β2x2+ε

Where,

x1=(1iflevel20ifnot)x2=(1iflevel30ifnot)

Question: Women in top management. Refer to the Journal of Organizational Culture, Communications and Conflict (July 2007) study on women in upper management positions at U.S. firms, Exercise 11.73 (p. 679). Monthly data (n = 252 months) were collected for several variables in an attempt to model the number of females in managerial positions (y). The independent variables included the number of females with a college degree (x1), the number of female high school graduates with no college degree (x2), the number of males in managerial positions (x3), the number of males with a college degree (x4), and the number of male high school graduates with no college degree (x5). The correlations provided in Exercise 11.67 are given in each part. Determine which of the correlations results in a potential multicollinearity problem for the regression analysis.

  1. The correlation relating number of females in managerial positions and number of females with a college degree: r =0.983.

  2. The correlation relating number of females in managerial positions and number of female high school graduates with no college degree: r =0.074.

  3. The correlation relating number of males in managerial positions and number of males with a college degree: r =0.722.

  4. The correlation relating number of males in managerial positions and number of male high school graduates with no college degree: r =0.528.

Minitab was used to fit the complete second-order modeE(y)=β0+β1x1+β2x2+β3x1x2+β4x12+β5x22to n = 39 data points. The printout is shown on the next page.

a. Is there sufficient evidence to indicate that at least one of the parameters—β1,β2,β3,β4, andβ1,β2,β3,β4—is nonzero? Test usingα=0.05.

b. TestH0:β4=0againstHa:β40. Useα=0.01.

c. TestH0:β5=0againstHa:β50. Useα=0.01.

d. Use graphs to explain the consequences of the tests in parts b and c.

Comparing private and public college tuition. According to the Chronicle of Higher Education Almanac, 4-year private colleges charge, on average, five times as much for tuition and fees than 4-year public colleges. In order to estimate the true difference in the mean amounts charged for an academic year, random samples of 40 private colleges and 40 public colleges were contacted and questioned about their tuition structures.

  1. Which of the procedures described in Chapter 8 could be used to estimate the difference in mean charges between private and public colleges?

  2. Propose a regression model involving the qualitative independent variable type of college that could be used to investigate the difference between the means. Be sure to specify the coding scheme for the dummy variable in the model.

  3. Explain how the regression model you developed in part b could be used to estimate the difference between the population means.


Factors that impact an auditor’s judgment. A study was conducted to determine the effects of linguistic delivery style and client credibility on auditors’ judgments (Advances in Accounting and Behavioural Research, 2004). Two hundred auditors from Big 5 accounting firms were each asked to perform an analytical review of a fictitious client’s financial statement. The researchers gave the auditors different information on the client’s credibility and linguistic delivery style of the client’s explanation. Each auditor then provided an assessment of the likelihood that the client-provided explanation accounted for the fluctuation in the financial statement. The three variables of interest—credibility (x1), linguistic delivery style (x2) , and likelihood (y) —were all measured on a numerical scale. Regression analysis was used to fit the interaction model,y=β0+β1x1+β2x2+β3x1x2+ε . The results are summarized in the table at the bottom of page.

a) Interpret the phrase client credibility and linguistic delivery style interact in the words of the problem.

b) Give the null and alternative hypotheses for testing the overall adequacy of the model.

c) Conduct the test, part b, using the information in the table.

d) Give the null and alternative hypotheses for testing whether client credibility and linguistic delivery style interact.

e) Conduct the test, part d, using the information in the table.

f) The researchers estimated the slope of the likelihood–linguistic delivery style line at a low level of client credibility 1x1 = 222. Obtain this estimate and interpret it in the words of the problem.

g) The researchers also estimated the slope of the likelihood–linguistic delivery style line at a high level of client credibility 1x1 = 462. Obtain this estimate and interpret it in the words of the problem.

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free