September 14, 2013

"Bayes and Big Data" (Next Week at the Statistics Seminar)

Attention conservation notice: Only of interest if you care a lot about computational statistics.

For our first seminar of the year, we are very pleased to have a talk which will combine two themes close to the heart of the statistics department:

Steve Scott, "Bayes and Big Data"
Abstract: A useful definition of "big data" is data that is too big to fit on a single machine, either because of processor, memory, or disk bottlenecks. Graphics processing units can alleviate the processor bottleneck, but memory or disk bottlenecks can only be alleviated by splitting "big data" across multiple machines. Communication between large number of machines is expensive (regardless of the amount of data being communicated), so there is a need for algorithms that perform distributed approximate Bayesian analyses with minimal communication. Consensus Monte Carlo operates by running a separate Monte Carlo algorithm on each machine, and then averaging the individual Monte Carlo draws. Depending on the model, the resulting draws can be nearly indistinguishable from the draws that would have been obtained by running a single machine algorithm for a very long time. Examples of consensus Monte Carlo will be shown for simple models where single-machine solutions are available, for large single-layer hierarchical models, and for Bayesian additive regression trees (BART).
Time and place: 4--5 pm on Monday, 16 September 2013, in 1212 Doherty Hall

As always, the talk is free and open to the public.

— A slightly cynical historical-materialist take on the rise of Bayesian statistics is that it reflects a phase in the development of the means of computation, namely the PC era. The theoretical or ideological case for Bayesianism was pretty set by the early 1960s, say with Birnbaum's argument for the likelihood principle1. It nonetheless took a generation or more for Bayesian statistics to actually become common. This is because, under the material conditions of the early 1960s, such ideas could be only be defended and not applied. What changed this was not better theory, or better models, or a sudden awakening to the importance of shrinkage and partial pooling. Rather, it became possible to actually calculate posterior distributions. Specifically, Monte Carlo methods developed in statistical mechanics permitted stochastic approximations to non-trivial posteriors. These Monte Carlo techniques quickly became (pardon the expression) hegemonic within Bayesian statistics, to the point where I have met younger statisticians who thought Monte Carlo was a Bayesian invention2. One of the ironies of applied Bayesianism, in fact, is that nobody actually knows the posterior distribution which supposedly represents their beliefs, but rather (nearly3) everyone works out that distribution by purely frequentist inference from Monte Carlo samples. ("How do I know what I think until I see what the dice say?", as it were.)

So: if you could do Monte Carlo, you could work out (approximately) a posterior distribution, and actually do Bayesian statistics, instead of talking about it. To do Monte Carlo, you needed enough computing power to be able to calculate priors and likelihoods, and to do random sampling, in a reasonable amount of time. You needed a certain minimum amount of memory, and you needed clock speed. Moreover, to try out new models, to tweak specifications, etc., you needed to have this computing power under your control, rather than being something expensive and difficult to access. You needed, in other words, a personal computer, or something very like it.

The problem now is that while our computers keep getting faster, and their internal memory keeps expanding, our capacity to generate, store, and access data is increasing even more rapidly. This is a problem if your method requires you to touch every data point, and especially a problem if you not only have to touch every data point but do all possible pairwise comparisons, because, say, your model says all observations are dependent. This raises the possibility that Bayesian inference will become computationally infeasible again in the near future, not because our computers have regressed but because the size and complexity of interesting data sets will have rendered Monte Carlo infeasible. Bayesian data analysis would then have been a transient historical episode, belonging to the period when a desktop machine could hold a typical data set in memory and thrash through it a million times in a weekend.

Of course, I don't know that Bayesian inference is doomed to become obsolete because it will grow computationally intractable. One possibility is that "Bayesian inference" will be redefined in ways which depart further and further from the noble-Savage ideal, but are computationally tractable — variational Bayes, approximate Bayesian computation, and the generalized updating of Bissiri et al. are three (very different) moves in that direction. Another possibility is that algorithm designers are going to be clever enough to make distributed Monte Carlo approximations for posteriors as feasible as, say, a distributed bootstrap. This is, implicitly, the line Scott is pursuing. I wish him and those like him every success; whatever the issues with Bayesianism and some of its devotees, the statistical world would lose something valuable if Bayes as we know it were to diminish into a relic.

Update, 16 September 2013: It apparently needs saying that the ill-supported speculations here about the past and future of Bayesian computing are mine, not Dr. Scott's.

Update, 18 December 2013: "Asymptotically Exact, Embarrassingly Parallel MCMC" by Neiswanger et al. (arxiv:1311.4780) describes and analyses a very similar scheme to that proposed by Dr. Scott.

Manual trackback: Homoclinic Orbit

[1]: What Birnbaum's result actually proves is another story for another time; in the meanwhile, see Mayo, Evans and Gandenberger. ^

[2]: One such statistician persisted in this belief after reading Geyer and Thompson, and even after reading Metropolis et al., though there were other issues at play in his case. ^

[3]: The most interesting exception to this I know of is Rasmussen and Ghahramani's "Bayesian Monte Carlo" (NIPS, 2002). But despite its elegance and the reputations of its authors, it's fair to say this work has not had much impact. ^

Enigmas of Chance

Posted at September 14, 2013 23:22 | permanent link

Three-Toed Sloth