The National Oceanic and Atmosphere Administration publishes temperature information of cities around the world in Climates of the World. A random sample of \(50\) cities gave the data on average high and low temperatures in January shown on the WeissStats in the site.

a. Obtain a scatterplot for the data.

b. Decide whether finding a regression line for the data is reasonable. If so, then also do parts (c)-(f)

c. Determine the interpret the regression equation for the data.

d. Identify potential outliers and influential observations.

e. In case a potential outlier is present, remove it and discuss the effect.

f. In case a potential influential observation is present, remove it and discuss the effect.

Short Answer

Expert verified

Part a.

Part b. It can be noted that, there is no strong curvature present in the scatterplot, therefore it is reasonable to find the regression line to the data.

Part c. \(\hat{y}=-7.5692+0.9168x\)

Part d. There is no any potential outliers and influential observations.

Part e. Not applicable

Part f. Not applicable

Step by step solution

01

Part a. Step 1. Given information

The below table gives the average high and low temperatures of \(50\) cities.

02

Part a. Step 2. Graph

The below graph represents the given points and high temperature is on the horizontal axis and the low temperature is on the vertical axis.

03

Part b. Step 1. Explanation

It is reasonable to find the regression line for the data if there is no strong curvature present in the scatterplot.

It can be noted that, there is no strong curvature present in the scatterplot, therefore it is reasonable to find the regression line to the data.

04

Part c. Step 1. Explanation

Assume that, the response variable is low temperature and the predictor variable is high temperature of the cities.

The sample size \(n=50\).

Below are the necessary sums.

\(\sum x_{i}=2843\)

\(\sum y_{i}=2228\)

\(\sum x_{i}^{2}=181233\)

\(\sum x_{i}y_{i}=144636\)

To find \(s_{xy}\) and \(s_{xx}\):

\(s_{xx}=181233-\frac{2843^{2}}{50}=19580.02\)

\(s_{xy}=144636-\frac{(2843)(2228)}{50}=17951.92\)

To find the averages:

\(\bar{x}=\frac{2843}{50}=56.86\)

\(\bar{y}=\frac{2228}{50}=44.56\)

Hence the parameters are:

\(b_{1}=\frac{17951.92}{19580.02}\)

\(=0.9168\)

\(b_{0}=44.56-(0.9168)\times 56.86\)

\(=-7.5692\)

The regression equation to predict the low temperature \((y)\) from the high temperature \((x)\) is,

\(\hat{y}=-7.5692+0.9168x\)

From the regression equation, the low temperature is increase on average by \(0.9168\) if the high temperature is increase by \(1\).

05

Part d. Step 1. Concept Used

If a data point lies far from the regression line then it is an outlier.

If the removal of a point causes a considerable change in the regression equation then the point is called an influential observation. That is, the removal of a point causes a considerable change in the direction of the regression line.

06

Part d. Step 2. Calculation

The predicted values for the given data are summarized in the below table.

The below graph represents the given points and the fitted regression line.

From the plotted graph,

  • All the points are closed to the regression line therefore there are no potential outliers in the dataset.
  • The removal of a point does not cause any considerable change in the direction of the regression line therefore there are no potential influential observations.
07

Part e. Step 1. Explanation

Not applicable, because it was concluded that there are no outliers in part (d).

08

Part f. Step 1. Explanation

Not applicable, because it was concluded that there are no potential influential observation in part (d).

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Sample Covariance. For a set of n data points, the sample covariance, sxy+is given by

The sample covariance can be used as an alternative method for tinding the slope and y-intercept of a regression line. The formulas are

b1=sv/xk2andb0=y^-b1i^n

where sidenotes the sample standard deviation of the x-values.

a. Use Equation (4.1) to determine the sample covariance of the data points in Exercise 4,45.

b. Use Equation (4.2) and your answer from part (a) to find the regression equation. Compare your result to that found in Exercise 4.57.

In Problems 3-5, answer true or false to each statement. Explain your answers.

If a line has a positive slope, y-values on the line decrease as thex-values decrease.

As we noted, because of the regression identity, we can express the coefficient of determination in terms of the total sum of squares and the error sum of squares as r2=1-SSE/SST

a. Explain why this formula shows that the coefficient of determination can also be interpreted as the percentage reduction obtained in the total squared error by using the regression equation instead of the mean. Y¯. to predict the observed values of the response variable.

b.

x
6
6
6
2
2
5
4
5
1
4
y
290
280
295
425
384
315
355
325
425
325

What percentage reduction is obtained in the total squared error by using the regression equation instead of the mean of the observed prices to predict the observed prices?

To determine whether ris positive, negative, or zero.

More Money, More Beer?Does a higher state per capita income equate to a higher per capita beer consumption? From the document Survey of Current Business, published by the U.S. Bureau of Economic Analysis, and from the Brewer's Almanac, published by the Beer Institute, we obtained data on personal income per capita, in thousands of dollars, and per capita beer consumption, in gallons, for the 50states and Washington. D.C. Those data are provided on the Weiss Stats site.

a. Obtain a scatterplot for the data.

b. Decide whether finding a regression line for the data is reasonable. If so, then also do parts (c)-(f).

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free