Wednesday, January 11, 2012

Scale invariance and the scaling laws of Zipf and Benford

Scaling laws have been empirically observed in the size-distributions of parameters of complex systems, including (but not limited to): 1) incomes; 2) personal wealth; 3) cities (both population and area); 4) earthquakes, both locally and globally; 5) avalanches; 6) forest fires; 7) mineral deposits; and 8) market returns. Several years ago one of my students showed that various measures for the magnitude of terrorist attacks also observed scaling laws.

The general prevalence of scale invariance in geological phenomena is the reason for one of the first rules taught to all geology students--every picture must have a scale. The reason for this is that there is no characteristic scale for many geological phenomena--so one cannot tell without some sort of visual cue whether that photo of folded rocks is a satellite photo or one taken through a microscope--whereas one can make such a distinction about a picture of, say, a moose.

Numerous empirical laws (by which I mean equations) have been developed to describe the size-distribution of scale invariant phenomena. Most of these empirical laws were developed before the idea of scale invariance was well understood. One famous example is the Gutenberg-Richter law describing the size distribution of earthquakes.

Another statistical law, Zipf's Law, describes the relationship between size and rank. For cities, for instance, the largest city in a country will tend to have twice the population of the second-largest city and three times the population of the third. More formally, the relationship is stated as follows:


for a distribution where C is the magnitude of the largest individual in the population, y is the magnitude of an individual with rank r, and k is a constant which characterizes the system--but is commonly about 1.

If we plot rank vs size on a log-log plot, the graph should approximate a straight line with a slope of -1/k.

For instance, a plot of city size vs rank for US cities appears as follows:


Data sourced here.

From the same data source we find a similar relationship when city size is determined from area rather than population:


In the first plot we obtain a value for k very close to 1. The plot where cities are ranked by area is not as clear, but this may be due to the arbitrary nature of city limits. To characterize either of the above plots by Zipf's law is fairly straightforward--draw the straight line from the top-ranked city that best follows the line of observations.

A recent article published in Economic Geology argues that mines in Australia follow Zipf's Law. In summary, not only do the known deposits in Australian greenstone belts follow Zipf's law fairly closesly, but the early estimates of as yet undiscovered gold projected from early Zipf's law characterizations compared favourably with the amount of gold eventually discovered.

The weakness that I see with this approach is that it is all rather strongly dependent on the estimates of the size of the largest deposit. In any given area, it will be true that the largest known deposit will be well studied, but history has shown us that mines can be "mined out" only to be rejuvenated by a new geological or mining idea.

I am unable to reconcile the size-distribution data from the Nevada mineral properties presented recently with Zipf's Law, although they do seem to follow some sort of power law.


Using the straightforward approach to a Zipf Law characterization gives us the red line, which appears to show that there have been far too many gold deposits of > 1/2 million ounces for the largest mine. To reconcile the known gold discoveries with Zipf's Law (green line), someone would need to find a 100-million-ounce deposit (if that doesn't get explorers interested in Nevada, I don't know what will)!

I, however, would prefer to use the interpretation of the above data developed in our last installment--that there is a power-law relationship between size and rank, but this relationship breaks down for the largest deposits because there is some sort of limit to the size of gold deposits (at least near the Earth's surface), although I do not know what the limiting factor(s) would be.

Another scaling law is Benford's Law, which is an empirical observation that the first digits of measurements of many kinds of phenomena are not random. In particular, the first digit is a '1' approximately 30% of the time; '2' 18% of the time, '3' 12% of the time and so on, with the probability descending as the number increases.

First       Probability of
digit        occurrence

1            0.30103
2            0.176091
3            0.124939
4            0.09691
5            0.0791812
6            0.0669468
7            0.0579919
8            0.0511525
9            0.0457575


So if you had a table of the lengths of every river in the world, for instance, you would find that approximately 30% of the first digits were '1'--rivers with lengths of 1 904 km, or 161, or 11 km would fall into this category.

Furthermore, it doesn't matter what units you used--if you had measured the river lengths inches, you would observe the same relationship. The reason for this is that if you were to double a number which begins with '1', you end up with a number which either begins with '2' or '3'. Hence, the probability that the first digit is either '2' or '3' must be the same as the probability of it being '1'. In the table above, we see this is the case.

It isn't only natural phenomena that are characterized by Benford's Law. It has also been used as a tool to identify fraud in forensic accounting.

The deposit-size data from Nevada seem to conform to Benford's Law.


And if I convert the deposit size from ounces to metric tonnes . . .


So although Zipf's Law doesn't describe Nevada gold deposits well (at least at present), Benford's Law does.

1 comment: