Book Search

Download this chapter in PDF format

Chapter34.pdf

Table of contents

How to order your own hardcover copy

Wouldn't you rather have a bound book instead of 640 loose pages?
Your laser printer will thank you!
Order from Amazon.com.

Chapter 34: Explaining Benford's Law

Solving Mystery #2

The second mystery is: Why does one set of numbers follow Benford's law, while another set of numbers does not? Again we can answer this question by examining Fig. 34-5. Our goal is to find the characteristics of pdf(g) that result in ost(g) having a constant value of 0.301. As shown above, the average value of ost(g) will always be 0.301, regardless if Benford's law is being followed or not. So our only concern is whether ost(g) has oscillations, or is a flat line.

For ost(g) to be a flat line it must have no sinusoidal components. In the frequency domain this means that OST(f) must be equal to zero at all frequencies above f=0. However, OST(f) is equal to SF(f) × PSF(f), and SF(f) is nonzero only at the integer frequencies, f = 0, 1, 2, 3, 4, and so on. Therefore, ost(g) will be flat, if and only if, PSF(f) has a value of zero at the integer frequencies. The particular example in Fig. 34-5 clearly does not meet this condition, and therefore does not follow Benford's law. In Fig. (d), PDF(1) has a value of 0.349. Multiplying this by the value of SF(1) = 0.516, we find OST(1) = 0.18. Therefore, ost(g) has a sinusoidal component with a period of one, and an amplitude of 0.18. This is a key result, describing what criterion a distribution must meet to follow Benford's law. This is important enough that we will express it as a theorem.

Benford's Law Compliance Theorem
Let P be a random process generating numbers in base B on the linear number line, pdf(g) its probability density function expressed on the base B logarithmic number line, and PDF(f) the Fourier transform of pdf(g). The numbers generated by P will follow Benford's law, if and only if, PDF(f) = 0 at all nonzero integer frequencies.

Our next step is to examine what type of distributions comply with this theorem. There are two distinct ways that PDF(f) can have a value of zero at the nonzero integer frequencies. As shown in Fig. 34-6b, PDF(f) can be oscillatory, periodically hitting zero at frequencies that include the integers. In the logarithmic domain this corresponds to two or more discontinuities spaced an integer distance apart, such as sharp edges or abrupt changes in the slope. Figure (a) shows an example of this, a rectangular pulse with edges at -1 and 1. These discontinuities can easily be created by human manipulation, but seldom occur in natural or unforced processes. This type of distribution does follow Benford's law, but it is mainly just a footnote, not the bulk of the mystery.

Figure (d) shows a far more important situation, where PDF(f) smoothly decreases in value with increasing frequency. This behavior is more than common, it is the rule. It is what you would find for most any set of random numbers you examine. The key parameter we want to examine is how fast the curve drops to zero. For instance, the curve in Fig. 34-6d drops so rapidly that it has a negligible value at f=1 and all higher frequencies. Therefore, this distribution will follow Benford's law to a very high degree. Now compare this with Fig. 34-5d, an example where PDF(f) drops much slower. Since it has a significant value at f=1, this distribution follows Benford's law very poorly.

Now look at pdf(g) for the above two examples, Figs. 34-6c and 34-5a. Both of these are normal distributions on the logarithmic scale; the only difference between them is their width. A key property of the Fourier transform is the compression/expansion between the domains. If you need to refresh your memory, look at Figure 10-12 in chapter 10. In short, if the signal in one domain is made narrower, the signal in the other domain will become wider, and vice versa. For example, in Fig. 34-5a the standard deviation of pdf(g) is σg = 0.25. This results in PDF(f) having a standard deviation of: σf = 1/(2πσg) = 0.637. In Fig. 34-6 the log domain is twice as wide, σg = 0.50, making the frequency domain twice as narrow, σf = 0.318. In these figures the width of the distribution is indicated as 2σ, that is, -σ to σ. This is common, but certainly not the only way to measure the width.

In short, if pdf(g) is narrow, then PDF(f) will be wide. This results in PDF(f) having a significant amplitude at f=1, and possibly at higher frequencies. Therefore, the distribution will not follow Benford's law. However, if pdf(g) is wide, then PDF(f) will be narrow. This results in PDF(f) falling near zero before f=1, and Benford's law is followed.

A key issue is how wide or narrow pdf(g) needs to be to toggle between the two behaviors. To follow Benford law, PDF(f) must drop to near zero by f=1. Further, f=1 in the frequency domain corresponds to a sinusoid with a period of one on the log scale, making this the critical distance. This gives us the answer to our question. With a few caveats, Benford's law is followed by distributions that are wide compared with unit distance along the logarithmic scale. Likewise, the law is not followed by distributions that are narrow compared with unit distance.

To be clear, one exception occurs when PDF(f) is oscillatory such as in Fig. 34-6b. The other exception is when PDF(f) does not smoothly decreases in value with increasing frequency. Also, the definition of "width" used here is slightly fuzzy. We will improve upon this in the next section. However, these are minor issues and details; do not let them distract from your understanding of the mainstream phenomenon.

Next Section: More on Following Benford's law