Question: Entry-level job preferences. Benefits Quarterly published a study of entry-level job preferences. A number of independent variables were used to model the job preferences (measured on a 10-point scale) of 164 business school graduates. Suppose stepwise regression is used to build a model for job preference score (y) as a function of the following independent variables:

x1={1ifflextimeposition0ifnotx2={1iffdaycaresupportrequired0ifnotx3={1iffsupporttransfersupportrequired0ifnotx4=Numberofsickdaysallowed

x5={1iff1applicantmarried0ifnotx6=Numberofchildrenapplicantx6={1iffmaleapplicant0iffemaleapplicant

a. How many models are fit to the data in step 1? Give the general form of these models.

b. How many models are fit to the data in step 2? Give the general form of these models.

c. How many models are fit to the data in step 3? Give the general form of these models.

d. Explain how the procedure determines when to stop adding independent variables to the model.

e. Describe two major drawbacks to using the final stepwise model as the best model for job preference score y.

Short Answer

Expert verified

Answer

a. In step 1 of stepwise regression since there are 7 variables, 7 linear models in one variable is fitted to the data for 7 independent variables. The general model for step 1 isforE(y)=β0+β1xi.

b. In step 2 of stepwise regression since there are 7 independent variables,linear models in two variables are fitted to the data for 7 independent variables. The general model for step 1 isforE(y)=β0+β1x1+β2xi.

c. In step 3 of stepwise regression since there are 7 independent variables,linear models in three variables is fitted to the data for 7 independent variables. The general model for step 1 is forrole="math" localid="1658381585196" E(y)=β0+β1x1+β2x2+β3xi

d. The stepwise regression keeps on adding independent variables till no further independent variable can be added that gives significant t-values.

e. The final model reached with step-wise regression doesn’t account for interaction or higher-order terms which might be more fitted for the data. Also since for every added variable, t-tests are conducted which might lead to the high probability of making type I or type II errors.

Step by step solution

01

Given Information

There are total seven independent variables out of which five are qualitative (binary) while two are quantitative variables.

02

Models in step 1 of stepwise regression

In step 1 of the stepwise regression, linear model in one independent variable is modelled for all the k no of variables in the question.

So, in this situation since there are 7 variables, 7 linear models in one variable is fitted to the data for 7 independent variables.

The general model for step 1 is forE(y)=β0+β1xi.

03

Models in step 2 of stepwise regression 

In step 2 of the stepwise regression, linear model in two independent variables is modelled for selected independent variable in step 1 and all the remaining (k=1) no of variables in the question.

Hence, in this situation since there are 7 independent variables, combination formula is used to choose i items from a total of n items) linear models in two variables is fitted to the data for 7 independent variables. The general model for step 1 is for E(y)=β0+β1x1+β2xi.

04

Models in step 2 of stepwise regression

In step 3 of the stepwise regression, linear model in three independent variables is modelled for selected independent variables in step 2 and all the remaining (k-2) no of variables in the question.

Therefore, in this situation since there are 7 independent variables, (combination formula is used to choose x items from a total of n items) linear models in three variables is fitted to the data for 7 independent variables. The general model for step 1 is forE(y)=β0+β1x1+β2x2+β3xi

05

Procedure of step wise regression

The stepwise regression keeps on adding independent variables till no further independent variable can be added that gives significant t-values. In the question since there are 7 independent variables, the step-wise regression will be run till step 7 and t-test will be conducted to check the significance of each added variable.

06

Drawback of using stepwise regression model

The final model reached with step wise regression doesn’t account for interaction or higher order terms which might be more fitted for the data. Also since for every added variable, t-tests are conducted which might lead to the high probability of making type I or type II error.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Impact of race on football card values. University of Colorado sociologists investigated the impact of race on the value of professional football players’ “rookie” cards (Electronic Journal of Sociology, 2007). The sample consisted of 148 rookie cards of National Football League (NFL) players who were inducted into the Football Hall of Fame. The price of the card (in dollars) was modeled as a function of several qualitative independent variables: race of player (black or white), card availability (high or low), and player position (quarterback, running back, wide receiver, tight end, defensive lineman, linebacker, defensive back, or offensive lineman).

  1. Create the appropriate dummy variables for each of the qualitative independent variables.
  2. Write a model for price (y) as a function of race. Interpret theβ’s in the model.
  3. Write a model for price (y) as a function of card availability. Interpret theβ’s in the model.
  4. Write a model for price (y) as a function of position. Interpret theβ’s in the model.

Question: Diet of ducks bred for broiling. Corn is high in starch content; consequently, it is considered excellent feed for domestic chickens. Does corn possess the same potential in feeding ducks bred for broiling? This was the subject of research published in Animal Feed Science and Technology (April 2010). The objective of the study was to establish a prediction model for the true metabolizable energy (TME) of corn regurgitated from ducks. The researchers considered 11 potential predictors of TME: dry matter (DM), crude protein (CP), ether extract (EE), ash (ASH), crude fiber (CF), neutral detergent fiber (NDF), acid detergent fiber (ADF), gross energy (GE), amylose (AM), amylopectin (AP), and amylopectin/amylose (AMAP). Stepwise regression was used to find the best subset of predictors. The final stepwise model yielded the following results:

TME^=7.70+2.14(AMAP)+0.16(NDF), R2 = 0.988, s = .07, Global F p-value = .001

a. Determine the number of t-tests performed in step 1 of the stepwise regression.

b. Determine the number of t-tests performed in step 2 of the stepwise regression.

c. Give a full interpretation of the final stepwise model regression results.

d. Explain why it is dangerous to use the final stepwise model as the “best” model for predicting TME.

e. Using the independent variables selected by the stepwise routine, write a complete second-order model for TME.

f. Refer to part e. How would you determine if the terms in the model that allow for curvature are statistically useful for predicting TME?

Question: The complete modelE(y)=β0+β1x1+β2x2+β3x3+β4x4+εwas fit to n = 20 data points, with SSE = 152.66. The reduced model,E(y)=β0+β1x1+β2x2+ε, was also fit, with

SSE = 160.44.

a. How many β parameters are in the complete model? The reduced model?

b. Specify the null and alternative hypotheses you would use to investigate whether the complete model contributes more information for the prediction of y than the reduced model.

c. Conduct the hypothesis test of part b. Use α = .05.

To model the relationship between y, a dependent variable, and x, an independent variable, a researcher has taken one measurement on y at each of three different x-values. Drawing on his mathematical expertise, the researcher realizes that he can fit the second-order model Ey=β0+β1x+β2x2 and it will pass exactly through all three points, yielding SSE = 0. The researcher, delighted with the excellent fit of the model, eagerly sets out to use it to make inferences. What problems will he encounter in attempting to make inferences?

Suppose the mean value E(y) of a response y is related to the quantitative independent variables x1and x2

E(y)=2+x1-3x2-x1x2

a) Identify and interpret the slope forx2

b) Plot the linear relationship between E(y) andx2for role="math" localid="1649796003444" x1=0,1,2, whererole="math" localid="1649796025582" 1x23

c) How would you interpret the estimated slopes?

d) Use the lines you plotted in part b to determine the changes in E(y) for eachrole="math" localid="1649796051071" x1=0,1,2.

e) Use your graph from part b to determine how much E(y) changes whenrole="math" localid="1649796075921" 3x15androle="math" localid="1649796084395" 1x23.

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free