Question: Household food consumption. The data in the table below were collected for a random sample of 26 households in Washington, D.C. An economist wants to relate household food consumption, y, to household income, x1, and household size, x2, with the first-order model.

Ey=β0+β1x1+β2x2

  1. Fit the model to the data. Do you detect any signs of multicollinearity in the data? Explain.
  2. Is there visual evidence (from a residual plot) that a second-order model may be more appropriate for predicting household food consumption? Explain.
  3. Comment on the assumption of constant error variance, using a residual plot. Does it appear to be satisfied?
  4. Are there any outliers in the data? If so, identify them.
  5. Based on a graph of the residuals, does the assumption of normal errors appear to be reasonably satisfied? Explain.

Short Answer

Expert verified

Answers

  1. To detect the sign of multicollinearity, it can be seen that the sign of the household’s income is negative but logically, the household’s consumption would increase with an increase in income. This might indicate the existence of multicollinearity.
  2. From the residual plot, it can be seen that the second-order model is more appropriate for the data.
  3. The error variance from the residual plot does not look constant as the error terms are closer for the early observation while for the later observations, the spread in error terms increases.
  4. Observation 26 is an outlier as the residual value for the observation was 2.789.
  5. The assumption of normal errors is not satisfied here as the error variance from the graph is visible that is not constant.

Step by step solution

01

Given information  

The number of observations is 26 households and the first order model is given as.

02

Model fitting 

a.

Given in the question is data of 26 household regarding their food consumption, y, to household income, and household size. The excel summary output is attached below. To detect the sign of multicollinearity, it can be seen that the sign of the household’s income is negative but logically, the household’s consumption would increase with an increase in income. This might indicate that existence of multicollinearity.

The model can be fitted using excel function data analysis. The values of y and ,x1 and x2 can be taken from the excel table and the regression model can be fitted using data analysis function in the data tab in the excel. This function automatically gives summary output of the model after getting the data about dependent, y, and independent variables, x1 and x2 .

For the anova table we need to calculate the mean if the independent variable and then calculate the SSR, SSE, and SST, after that one need to calculate the degrees of freedom and the mean squares and the F.

The SSR is calculated by usingnΣ(Xj--x¯j..)2, and the SSE is calculated by squaring each term and adding them all. The SST is the sum of SSR and SSE. The MS regression is calculated by dividing SST by degrees of regression and similarly the MS residual is calculated by dividing SSE by degrees of residual and F is calculated by dividing MS regression by MS residual.

The coefficients of x is calculated by using this formula: nxy-xynx2-x2whereas the coefficient of intercept is calculated by yx2-xxynx2-x2.

Thestandard error is calculated bydividingthe standard deviation by the sample size's square root.

The excel summary input is attached here.

03

Residual plot

b.

The process to drawn the residual plot is given as follows:

  • Mean E = 0 - First, we demonstrate how a residual plot can detect a model in which the hypothesized relationship between E(y) and an independent variable x is mis specified. The assumption of mean error of 0 is violated in these types of models.
  • Constant Error Variance-Residual plots can also be used to detect violations of the assumption of constant error variance.
  • Errors Normally Distributed- Several graphical methods are available for assessing whether the random error e has an approximate normal distribution. If the assumption of normally distributed errors is satisfied, then we expect approximately 95% of the residuals to fall within 2 standard deviations of the mean of 0, and almost all of the residuals to lie within 3 standard deviations of the mean of 0.
  • Errors Independent- The assumption of independent errors is violated when successive errors are correlated.

From the residual plot, it can be seen that second-order model is more appropriate for the data.

The graph can be drawn by plotting the residual values which are calculated by y^-yon the y -axis and putting the no of observations on the x-axis. After plotting the individual combinations, a line can be drawn to reflect the relationship between the two parameters.

04

Constant error variance assumption

c.

The error variance from the residual plot does not look constant as the error terms are closer for the early observation while for the later observations, the spread in error terms increases.

05

Outlier

d.

Observation 26 is an outlier as the residual value for the observation was 2.789 and from the graph also it is visible that there is an outlier.

06

Assumption of normal errors 

The assumption of normally distributed errors is satisfied, then we expect approximately 95% of the residuals to fall within 2 standard deviations of the mean of 0, and almost all of the residuals to lie within 3 standard deviations of the mean of 0.

Here the assumption of normal errors is not satisfied here as the error variance from the graph is visible that is not constant. Some residual value observations are close to the regression line indicating small variance. However, some values are far from the regression line indicating a large variance between the y values and regressed y-values. This indicates that the error variance is not the same.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Question: Manipulating rates of return with stock splits. Some firms have been accused of using stock splits to manipulate their stock prices before being acquired by another firm. An article in Financial Management (Winter 2008) investigated the impact of stock splits on long-run stock performance for acquiring firms. A simplified version of the model fit by the researchers follows:

E(y)=β0+β1x1+β2x2+β3x1x2

where

y = Firm’s 3-year buy-and-hold return rate (%)

x1 = {1 if stock split prior to acquisition, 0 if not}

x2 = {1 if firm’s discretionary accrual is high, 0 if discretionary accrual is low}

a. In terms of the β’s in the model, what is the mean buy and- hold return rate (BAR) for a firm with no stock split and a high discretionary accrual (DA)?

b. In terms of the β’s in the model, what is the mean BAR for a firm with no stock split and a low DA?

c. For firms with no stock split, find the difference between the mean BAR for firms with high and low DA. (Hint: Use your answers to parts a and b.)

d. Repeat part c for firms with a stock split.

e. Note that the differences, parts c and d, are not the same. Explain why this illustrates the notion of interaction between x1 and x2.

f. A test for H0: β3 = 0 yielded a p-value of 0.027. Using α = .05, interpret this result.

g. The researchers reported that the estimated values of both β2 and β3 are negative. Consequently, they conclude that “high-DA acquirers perform worse compared with low-DA acquirers. Moreover, the underperformance is even greater if high-DA acquirers have a stock split before acquisition.” Do you agree?

Question: Reality TV and cosmetic surgery. Refer to the Body Image: An International Journal of Research (March 2010) study of the impact of reality TV shows on one’s desire to undergo cosmetic surgery, Exercise 12.17 (p. 725). Recall that psychologists used multiple regression to model desire to have cosmetic surgery (y) as a function of gender(x1) , self-esteem(x2) , body satisfaction(x3) , and impression of reality TV (x4). The SPSS printout below shows a confidence interval for E(y) for each of the first five students in the study.

  1. Interpret the confidence interval for E(y) for student 1.
  2. Interpret the confidence interval for E(y) for student 4

Consider the following data that fit the quadratic modelE(y)=β0+β1x+β2x2:

a. Construct a scatterplot for this data. Give the prediction equation and calculate R2based on the model above.

b. Interpret the value ofR2.

c. Justify whether the overall model is significant at the 1% significance level if the data result into a p-value of 0.000514.

Question:Suppose you fit the first-order model y=β0+β1x1+β2x2+β3x3+β4x4+β5x5+εto n=30 data points and obtain SSE = 0.33 and R2=0.92

(A) Do the values of SSE and R2suggest that the model provides a good fit to the data? Explain.

(B) Is the model of any use in predicting Y ? Test the null hypothesis H0:β1=β2=β3=β4=β5=0 against the alternative hypothesis {H}at least one of the parameters β1,β2,...,β5 is non zero.Useα=0.05 .

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free