Attention conservation notice: Academics with blogs quibbling about obscure corners of applied statistics.
Lurkers in e-mail point me to this pushback against the general pushback against power laws, and ask me to comment. It might be a mistake to do so, but I'm feeling under the weather and so splenetic, so I will.
In our paper, we looked at 24 quantities which people claimed showed power law distributions. Of these, there were seven cases where we could flat-out reject a power law, without even having to consider an alternative, because the departures of the actual distribution from even the best-fitting power law was much too large to be explained away as fluctuations. (One of the wonderful thing about a stochastic model is that it tells you how big its own errors should be.) In contrast, there was only one data set where we could rule out the log-normal distribution.
In some of those cases, you can patch things up, sort of, by replacing a pure power law with a power-law with an exponential cut-off. That is, rather than the probability density being proportional to x-a, it's proportional to x-ae-x/L. (Either way, I am only talking about the probability density in the "right tail", i.e., for x above some xmin.) This gives the infamous straight-ish patch on a log-log plot, for values of x much smaller than L, but otherwise it has substantially different properties. In ten of the twelve cases we looked at, the only way to save the idea of a power-law at all is to include this exponential cut-off. But that exponentially-shrinking factor is precisely what squelches the WTF, X IS ELEVENTY TIMES LARGER THAN EVER! THE BIG ONE IS IN OUR BASE KILLING OUR DOODZ!!!!1!! mega-events. There were ten more cases where we judged the support for power laws as "moderate", meaning "the power law is a good fit but that there are other plausible alternatives as well" (pardon the self-quotation.) Again, those alternatives, like log-normals and stretched exponentials, give very different tail-behavior, with not so much OMG DOOM.
We found exactly one case where the statistical evidence for the power-law was "good", meaning that "the power law is a good fit and that none of the alternatives considered is plausible", which was Zipf's law of word frequency distributions. We were of course aware that when people claim there are power laws, they usually only mean that the tail follows a power law. This is why all these comparisons were about how well the different distributions fit the tail, excluding the body of the data. We even selected where "the tail" begins to maximize the fit to a power law for each case. Even so, there was just this one case where the data compelling support a power law tail.
(All of this — the meaning of "with cut-off", the meaning of our categorizations, the fact that we only compare the tails, etc. — is clear enough from our paper, if you actually read the text. Or even just the tables and their captions.)
I bring up the OMG DOOM because some people, Hanson very much included, like to extrapolate from supposed power laws for various Bad Things to scenarios where THE BIG ONE kills off most of humanity. But, at least with the data we found, the magnitudes of forest fires, solar flares, earthquakes and wars were all better fit by log-normals, by stretched exponentials and by cut-off power laws than by power laws. For fires, flares and quakes, the differences are large enough that they clearly fall into the "with cut-off only" category. The differences in fits for the war-death data are smaller, as (mercifully) is the sample size, so we put it in the "moderate" support category. If you had some compelling other reason to insist on a power law rather than (e.g.) a log-normal there, the data wouldn't slap you down, but they wouldn't back you up either.
Now, I relish the schadenfreude-laden flavors of a mega-disaster scenario as much as the next misanthropic, science-fiction-loving geek, especially when it's paired with some "The fools! Can't they follow simple math?" on the side. Truly, I do. But squeezing that savory, juicy DOOM out of (for instance) the distribution of solar flares relies on the shape of the tail, i.e., whether it's a pure power law or not. The weak support, in the data, for such powers law means you don't really have empirical evidence for your scenarios, and in some cases what evidence there is tells against them. It's a free country, so you can go on telling those stories, but don't pretend that they owe more to confronting hard truths than to literary traditions.
Posted at February 15, 2012 14:00 | permanent link