Why Divide by n − 1? Let a population consist of the values 9 cigarettes, 10 cigarettes, and 20 cigarettes smoked in a day (based on data from the California Health Interview Survey). Assume that samples of two values are randomly selected with replacement from this population. (That is, a selected value is replaced before the second selection is made.)

a. Find the varianceσ2 of the population {9 cigarettes, 10 cigarettes, 20 cigarettes}.

b. After listing the nine different possible samples of two values selected with replacement, find the sample variance s2 (which includes division by n - 1) for each of them; then find the mean of the nine sample variances s2.

c. For each of the nine different possible samples of two values selected with replacement, find the variance by treating each sample as if it is a population (using the formula for population variance, which includes division by n); then find the mean of those nine population variances.

d. Which approach results in values that are better estimates ofσ2 part (b) or part (c)? Why? When computing variances of samples, should you use division by n or n - 1?

e. The preceding parts show that s2 is an unbiased estimator of σ2. Is s an unbiased estimator of σ? Explain

Short Answer

Expert verified

(a) Population variance σ2 is equal to 24.7.

(b)s12= 0.5,s22= 60.5,s32= 50.0,s42= 0.0,s52= 0.0,s62= 0.0,s72= 0.5,s82= 60.5, ands92= 50.0. The mean of the 9-sample variances is 24.7.

(c) σ12= 0.25,σ22= 30.25,σ32= 25.0,σ42= 0.0,σ52= 0.0,σ62= 0.0,σ72= 0.25,σ82= 30.25, andσ92= 25.0. The mean of the 9-population variances is 12.3.

(d) The method in part (b) results in a better estimate as multiple samples are used to compute the mean of the sample variances. Thus, the value becomes equal to the population variance. Moreover, using n–1 gives a precise estimate.

(e) No, s is not an unbiased estimator of σ as the mean of the sample standard deviations is not equal to the population standard deviation.

Step by step solution

01

Given information

A population of three values (number of cigarettes) is given.

Out of these, nine samples are selected with replacement.

02

Population variance and sample variance

Population varianceσ2 is calculated by dividing the sum of the squared differences of the population observations (from the mean) by the count of observations.

Mathematically,

σ2=i=1nxi-μ2n

Here, n is the total number of observations.

Sample variances2 is calculated by dividing the sum of the squared differences of the sample observations from the mean by n–1.

Mathematically,

s2=i=1nxi-x¯2n-1

03

Compute the population variance

(a)

To compute the value of the population variance, find the population mean μ as shown below.

μ=9+10+203=13.0

The population variance is computed as follows:

σ2=i=1nxi-μ2n=9-13.02+10-13.02+20-13.023=24.7

The population variance is 24.7.

04

Describe the feasible samples of size two from the collection

(b)

The nine different samples selected with replacement are shown below:

Sample 1

Sample 2

Sample 3

9

9

10

10

20

20

Sample 4

Sample 5

Sample 6

9

10

20

9

10

20

Sample 7

Sample 8

Sample 9

10

20

20

9

9

10

The mean of each sample is computed using the formula x¯=xn .

The mean for each sample is stated in the brackets in the following table.

Sample 1

Sample 2

Sample 3

9

9

10

10

20

20

x¯1=9.5

x¯2=14.5

x¯3=15

Sample 4

Sample 5

Sample 6

9

10

20

9

10

20

x¯4=9

x¯5=10

x¯6=20

Sample 7

Sample 8

Sample 9

10

20

20

9

9

10

x¯7=9.5

x¯8=14.5

x¯9=15

The sample variances are computed as shown below.

s12=9-9.52+10-9.522-1=0.5s22=9-14.52+20-14.522-1=60.5

s32=10-152+20-1522-1=50.0s42=9-92+9-922-1=0.0

s52=10-102+10-1022-1=0.0s62=20-202+20-2022-1=0.0

s72=10-9.52+9-9.522-1=0.5s82=20-14.52+9-14.522-1=60.5

s92=20-152+10-1522-1=50.0

The mean of the nine sample variances is

s¯2=i=19si29=24.7

Thus, the mean of the sample variances is 24.7.

05

Describe the variances for each sample  using the population variance

(c)

The variance of samples is computed using the formula for population variance, as shown below.

Considering the above nine samples as populations, you can compute the population variances as shown below.

σ12=9-9.52+10-9.522=0.25σ22=9-14.52+20-14.522=30.25

σ32=10-152+20-1522=25.0σ42=9-92+9-922=0.0

σ52=10-102+10-1022=0.0σ62=20-202+20-2022=0.0

σ72=10-9.52+9-9.522=0.25σ82=20-14.52+9-14.522=30.25

σ92=20-152+10-1522=25.0

The mean of the nine population variances is

σ¯2=i=19σi29=12.3.

Thus, the mean of the population variances is 12.3.

06

Compare the results of parts (b) and (c)

(d)

Part (b) gives a better estimate. By usingn–1for sample variance, the value gives a precise estimate of the population variance.

Here, a repeated number of samples tends tocenterthe value of the resultant variance close to the population variance. In the case of sample variance,division by n–1 is performed rather than by n. If divided by n, the value of the sample variance underestimates the value of population variance.

07

Explain if the sample standard deviation is an unbiased estimator of the population standard deviation

(e)

An unbiased estimate is a measure for sample values that have a mean equivalent or are close to the population value of the measure.

The standard deviations for the nine samples are calculated below:

s1=s12=0.7s2=s22=7.8

s3=s32=7.1s4=s42=0.0

s5=s52=0.0s6=s62=0.0

s7=s72=0.7s8=s82=7.8

s9=s92=7.1

The mean of these nine sample standard deviations is

s¯=i=19si9=3.5.

Therefore, the mean of the sample standard deviations is 3.5.

The population standard deviation is

σ=σ2=5.0.

Thus, the value of the population standard deviation is 5.0.

Here, the mean of the sample standard deviations is not equal to the population standard deviation.

Therefore, the sample standard deviation s is not an unbiased estimator of the population standard deviation σ.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

In Exercises 5–20, find the range, variance, and standard deviation for the given sample data. Include appropriate units (such as “minutes”) in your results. (The same data were used in Section 3-1, where we found measures of center. Here we find measures of variation.) Then answer the given questions.

California Smokers In the California Health Interview Survey, randomly selected adults are interviewed. One of the questions asks how many cigarettes are smoked per day, and results are listed below for 50 randomly selected respondents. How well do the results reflect the smoking behavior of California adults?

9 10 10 20 40 50 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

In Exercises 13–16, use z scores to compare the given values.

Birth Weights Based on Data Set 4 “Births” in Appendix B,newbornmales have weights with a mean of 3272.8 g and a standard deviation of 660.2 g.Newbornfemales have weights with a mean of 3037.1 g and a standard deviation of 706.3 g. Who has the weight that is more extreme relative to the group from which they came: a male who weighs 1500 g or a female who weighs 1500 g?

In Exercises 33–36, use the range rule of thumb to identify the limits separating values that are significantly low or significantly high

Body Temperatures Based on Data Set 3 “Body Temperatures” in Appendix B, body temperatures of adults have a mean of 98.20°F and a standard deviation of 0.62°F. (The data from 12 AM on day 2 are used.) Is an adult body temperature of 100oF significantly low or significantly high?

In Exercises 37–40, refer to the frequency distribution in the given exercise and find the standard deviation by using the formula below, where x represents the class midpoint, f represents the class frequency, and n represents the total number of sample values. Also, compare the computed standard deviations to these standard deviations obtained by using Formula 3-4 with the original list of data values: (Exercise 37) 11.5 years; (Exercise 38) 8.9 years; (Exercise 39) 59.5; (Exercise 40) 65.4.

Standard deviation for frequency distribution

s=nf×x2-f×x2nn-1

Blood Platelet Count of Males

Frequency

0-99

1

100-199

51

200-299

90

300-399

10

400-499

0

500-599

0

600-699

1

Chebyshev’sTheorem Based on Data Set 3 “Body Temperatures” in Appendix B, body temperatures of healthy adults have a bell-shaped distribution with a mean of 98.20°F and a standard deviation of 0.62°F (using the data from 12 AM on day 2). Using Chebyshev’s theorem, what do we know about the percentage of healthy adults with body temperatures that are within 2 standard deviations of the mean? What are the minimum and maximum body temperatures that are within 2 standard deviations of the mean??

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free