Benford’s Law. According to Benford’s law, a variety of different data sets include numbers with leading (first) digits that follow the distribution shown in the table below. In Exercises 21–24, test for goodness-of-fit with the distribution described by Benford’s law.

Leading Digits

Benford's Law: Distributuon of leading digits

1

30.10%

2

17.60%

3

12.50%

4

9.70%

5

7.90%

6

6.70%

7

5.80%

8

5.10%

9

4.60%

Tax Cheating? Frequencies of leading digits from IRS tax files are 152, 89, 63, 48, 39, 40, 28, 25, and 27 (corresponding to the leading digits of 1, 2, 3, 4, 5, 6, 7, 8, and 9, respectively, based on data from Mark Nigrini, who provides software for Benford data analysis). Using a 0.05 significance level, test for goodness-of-fit with Benford’s law. Does it appear that the tax entries are legitimate?

Short Answer

Expert verified

There is not enough evidence to conclude thatthe observed frequencies of the leading digits are not the same as the frequencies expected from Benford’s law.

Yes, it appears that the tax entries are legitimate.

Step by step solution

01

Given information

The frequencies of the different leading digits from IRS tax files are recorded.

02

Check the requirements

Assume that random sampling is conducted.

Let O denote the observed frequencies of the leading digits.

The observed frequencies are noted below:

\(\begin{aligned}{c}{O_1} = 152\\{O_2} = 89\\{O_3} = 63\;\;\\{O_4} = 48\end{aligned}\)

\({O_5} = 39\)

\(\begin{aligned}{c}{O_6} = 40\\{O_7} = 28\;\;\\{O_8} = 25\;\;\\{O_9} = 27\end{aligned}\)

The sum of all observed frequencies is computed below:

\(\begin{aligned}{c}n = 152 + 89 + ... + 27\\ = 511\end{aligned}\)

Let E denote the expected frequencies.

Let the expected proportion and expected frequencies of the ith digit as given by Benford’s law.

Leading Digits

Benford's Law: Distributuon of leading digits

Proportions

Expected Frequencies

1

30.10%

\(\begin{aligned}{c}{p_1} = \frac{{30.1}}{{100}}\\ = 0.301\end{aligned}\)

\[\begin{aligned}{c}{E_1} = n{p_1}\\ = 511\left( {0.301} \right)\\ = 153.811\end{aligned}\]

2

17.60%

\(\begin{aligned}{c}{p_2} = \frac{{17.6}}{{100}}\\ = 0.176\end{aligned}\)

\[\begin{aligned}{c}{E_2} = n{p_2}\\ = 511\left( {0.176} \right)\\ = 89.936\end{aligned}\]

3

12.50%

\(\begin{aligned}{c}{p_3} = \frac{{12.5}}{{100}}\\ = 0.125\end{aligned}\)

\[\begin{aligned}{c}{E_3} = n{p_3}\\ = 511\left( {0.125} \right)\\ = 63.875\end{aligned}\]

4

9.70%

\[\begin{aligned}{c}{p_4} = \frac{{9.7}}{{100}}\\ = 0.097\end{aligned}\]

\[\begin{aligned}{c}{E_4} = n{p_4}\\ = 511\left( {0.097} \right)\\ = 49.567\end{aligned}\]

5

7.90%

\[\begin{aligned}{c}{p_5} = \frac{{7.9}}{{100}}\\ = 0.079\end{aligned}\]

\[\begin{aligned}{c}{E_5} = n{p_5}\\ = 511\left( {0.079} \right)\\ = 40.369\end{aligned}\]

6

6.70%

\[\begin{aligned}{c}{p_6} = \frac{{6.7}}{{100}}\\ = 0.067\end{aligned}\]

\[\begin{aligned}{c}{E_6} = n{p_6}\\ = 511\left( {0.067} \right)\\ = 34.237\end{aligned}\]

7

5.80%

\(\begin{aligned}{c}{p_7} = \frac{{5.8}}{{100}}\\ = 0.058\end{aligned}\)

\[\begin{aligned}{c}{E_7} = n{p_7}\\ = 511\left( {0.058} \right)\\ = 29.638\end{aligned}\]

8

5.10%

\(\begin{aligned}{c}{p_8} = \frac{{5.1}}{{100}}\\ = 0.051\end{aligned}\)

\[\begin{aligned}{c}{E_8} = n{p_8}\\ = 511\left( {0.051} \right)\\ = 26.061\end{aligned}\]

9

4.60%

\(\begin{aligned}{c}{p_9} = \frac{{4.6}}{{100}}\\ = 0.046\end{aligned}\)

\[\begin{aligned}{c}{E_9} = n{p_9}\\ = 511\left( {0.046} \right)\\ = 23.506\end{aligned}\]

As all the expected values are higher than 5, the requirements of the test are satisfied.

03

State the hypotheses

The null hypothesis for conducting the given test is as follows:

The observed frequencies of leading digits are the same as the frequencies expected from Benford’s law.

The alternative hypothesis is as follows:

The observed frequencies of leading digits are not the same as the frequencies expected from Benford’s law.

04

Conduct the hypothesis test

The table below shows the necessary calculations:

Leading Digits

O

E

\(\left( {O - E} \right)\)

\(\frac{{{{\left( {O - E} \right)}^2}}}{E}\)

1

152

153.811

-1.811

0.021323

2

89

89.936

-0.936

0.009741

3

63

63.875

-0.875

0.011986

4

48

49.567

-1.567

0.049539

5

39

40.369

-1.369

0.046426

6

40

34.237

5.763

0.970067

7

28

29.638

-1.638

0.090527

8

25

26.061

-1.061

0.043196

9

27

23.506

3.494

0.519358

The value of the test statistic is equal to:

\[\begin{aligned}{c}{\chi ^2} = \sum {\frac{{{{\left( {O - E} \right)}^2}}}{E}} \\ = 0.021323 + 0.009741 + ... + 0.519358\\ = 1.762163\end{aligned}\]

Thus,\({\chi ^2} = 1.762\).

Let k be the number of digits, which are 9.

The degrees of freedom for\({\chi ^2}\)is computed below:

\(\begin{aligned}{c}df = k - 1\\ = 9 - 1\\ = 8\end{aligned}\)

05

State the conclusion

The critical value of\({\chi ^2}\)at\(\alpha = 0.05\)with 8 degrees of freedom is equal to 15.507, obtained using the chi-square table.

The p-value is equal to 0.987.

Since the test statistic value is less than the critical value and the p-value is greater than 0.05, the null hypothesis is failed to be rejected.

There is not enough evidence to conclude thatthe observed frequencies of the leading digits are not the same as the frequencies expected from Benford’s law.

Yes, it appears that the tax entries are legitimate.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Car Repair Costs Listed below are repair costs (in dollars) for cars crashed at 6 mi/h in full-front crash tests and the same cars crashed at 6 mi/h in full-rear crash tests (based on data from the Insurance Institute for Highway Safety). The cars are the Toyota Camry, Mazda 6, Volvo S40, Saturn Aura, Subaru Legacy, Hyundai Sonata, and Honda Accord. Is there sufficient evidence to conclude that there is a linear correlation between the repair costs from full-front crashes and full-rear crashes?

Front

936

978

2252

1032

3911

4312

3469

Rear

1480

1202

802

3191

1122

739

2767

In soccer, serious fouls in the penalty box result in a penalty kick withone kicker and one defending goalkeeper. The table below summarizes results from 286 kicksduring games among top teams (based on data from “Action Bias Among Elite Soccer Goalkeepers:

The Case of Penalty Kicks,” by Bar-Eli et al., Journal of Economic Psychology,Vol.28, No. 5). In the table, jump direction indicates which way the goalkeeper jumped, where thekick direction is from the perspective of the goalkeeper. Use a 0.05 significance level to test theclaim that the direction of the kick is independent of the direction of the goalkeeper jump. Dothe results support the theory that because the kicks are so fast, goalkeepers have no time toreact, so the directions of their jumps are independent of the directions of the kicks?

Goalkeeper Jump

Left

Center

Right

Kick to Left

54

1

37

Kick to Center

41

10

31

Kick to Right

46

7

59

Clinical Trial of Lipitor Lipitor is the trade name of the drug atorvastatin, which is used to reduce cholesterol in patients. (Until its patent expired in 2011, this was the largest-selling drug in the world, with annual sales of $13 billion.) Adverse reactions have been studied in clinical trials, and the table below summarizes results for infections in patients from different treatment groups (based on data from Parke-Davis). Use a 0.01 significance level to test the claim that getting an infection is independent of the treatment. Does the atorvastatin (Lipitor) treatment appear to have an effect on infections?


Placebo

Atorvastatin 10 mg

Atorvastatin 40 mg

Atorvastatin 80 mg

Infection

27

89

8

7

No Infection

243

774

71

87

In Exercises 1–4, use the following listed arrival delay times (minutes) for American Airline flights from New York to Los Angeles. Negative values correspond to flights that arrived early. Also shown are the SPSS results for analysis of variance. Assume that we plan to use a 0.05 significance level to test the claim that the different flights have the same mean arrival delay time.

Flight 1

-32

-25

-26

-6

5

-15

-17

-36

Flight 19

-5

-32

-13

-9

-19

49

-30

-23

Flight 21

-23

28

103

-19

-5

-46

13

-3

Test Statistic What is the value of the test statistic? What distribution is used with the test statistic?

A case-control (or retrospective) study was conductedto investigate a relationship between the colors of helmets worn by motorcycle drivers andwhether they are injured or killed in a crash. Results are given in the table below (based on datafrom “Motorcycle Rider Conspicuity and Crash Related Injury: Case-Control Study,” by Wellset al., BMJ USA,Vol. 4). Test the claim that injuries are independent of helmet color. Shouldmotorcycle drivers choose helmets with a particular color? If so, which color appears best?

Color of helmet


Black

White

Yellow/Orange

Red

Blue

Controls (not injured)

491

377

31

170

55

Cases (injured or killed)

213

112

8

70

26

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free