Attention conservation notice: Only of interest if you care alotabout computational statistics.

For our first seminar of the year, we are very pleased to have a talk which will combine two themes close to the heart of the statistics department:

- Steve Scott, "Bayes and Big Data"
*Abstract*: A useful definition of "big data" is data that is too big to fit on a single machine, either because of processor, memory, or disk bottlenecks. Graphics processing units can alleviate the processor bottleneck, but memory or disk bottlenecks can only be alleviated by splitting "big data" across multiple machines. Communication between large number of machines is expensive (regardless of the amount of data being communicated), so there is a need for algorithms that perform distributed approximate Bayesian analyses with minimal communication. Consensus Monte Carlo operates by running a separate Monte Carlo algorithm on each machine, and then averaging the individual Monte Carlo draws. Depending on the model, the resulting draws can be nearly indistinguishable from the draws that would have been obtained by running a single machine algorithm for a very long time. Examples of consensus Monte Carlo will be shown for simple models where single-machine solutions are available, for large single-layer hierarchical models, and for Bayesian additive regression trees (BART).*Time and place*: 4--5 pm on Monday, 16 September 2013, in 1212 Doherty Hall

As always, the talk is free and open to the public.

— A
slightly ~~cynical~~ historical-materialist
take on the rise of Bayesian statistics is that it reflects a phase in the
development of the means of computation, namely the PC era. The theoretical or
ideological case for Bayesianism was pretty set by the early 1960s, say with
Birnbaum's argument for the
likelihood principle^{1}. It
nonetheless took a generation or more for Bayesian statistics to actually
become common. This is because, under the material conditions of the early 1960s, such ideas could be
only be defended and not applied.
What changed this was not better theory, or better models, or a sudden
awakening to the importance
of shrinkage
and partial pooling. Rather, it became possible to actually *calculate*
posterior distributions. Specifically, Monte Carlo methods developed in
statistical mechanics permitted stochastic approximations to non-trivial
posteriors. These Monte Carlo techniques quickly became (pardon the
expression) hegemonic within Bayesian statistics, to the point where I have met
younger statisticians who thought Monte Carlo was a Bayesian
invention^{2}. One of the ironies of
applied Bayesianism, in fact, is that nobody actually knows the posterior
distribution which supposedly represents their beliefs, but rather
(nearly^{3}) everyone works out that
distribution by purely *frequentist* inference from Monte Carlo samples.
("How do I know what I think until I see what the dice say?", as it were.)

So: if you could do Monte Carlo, you could work out (approximately) a
posterior distribution, and actually *do* Bayesian statistics, instead
of *talking* about it. To do Monte Carlo, you needed enough computing
power to be able to calculate priors and likelihoods, and to do random
sampling, in a reasonable amount of time. You needed a certain minimum amount
of memory, and you needed clock speed. Moreover, to try out new models, to
tweak specifications, etc., you needed to have this computing power under your
control, rather than being something expensive and difficult to access. You
needed, in other words, a personal computer, or something very like it.

The problem now is that while our computers keep getting faster, and their
internal memory keeps expanding, our capacity to generate, store, and access
data is increasing even more rapidly. This is a problem if your method
requires you to touch every data point, and *especially* a problem if
you not only have to touch every data point but do all possible pairwise
comparisons, because, say, your model says all observations are dependent.
This raises the possibility that Bayesian inference will become computationally
infeasible *again* in the near future, not because our computers have
regressed but because the size and complexity of interesting data sets will
have rendered Monte Carlo infeasible. Bayesian data analysis would then have
been a transient historical episode, belonging to the period when a desktop
machine could hold a typical data set in memory and thrash through it a million
times in a weekend.

Of course, I don't *know* that Bayesian inference is doomed to become
obsolete because it will grow computationally intractable. One possibility is
that "Bayesian inference" will be redefined in ways which depart further and
further from
the noble-Savage
ideal, but
*are* computationally tractable
— variational Bayes,
approximate
Bayesian computation, and
the generalized updating of Bissiri et
al. are three (very different) moves in that direction. Another
possibility is that algorithm designers are going to be clever enough to make
distributed Monte Carlo approximations for posteriors as feasible as, say,
a distributed bootstrap. This is,
implicitly, the line Scott is pursuing. I wish him and those like him every
success; whatever the issues with Bayesian*ism*
and some of
its devotees, the
statistical world would lose something valuable if Bayes as we know it were to
diminish into a relic.

**Update**, 16 September 2013: It apparently needs saying that
the ill-supported speculations here about the past and future of Bayesian
computing are mine, not Dr. Scott's.

**Update**, 18 December 2013: "Asymptotically Exact,
Embarrassingly Parallel MCMC" by Neiswanger et
al. (arxiv:1311.4780) describes
and analyses a very similar scheme to that proposed by Dr. Scott.

*Manual trackback:* Homoclinic Orbit

[1]: What Birnbaum's result actually proves is another story for another time; in the meanwhile, see Mayo, Evans and Gandenberger. ^

[2]: One such statistician persisted in this belief after reading Geyer and Thompson, and even after reading Metropolis et al., though there were other issues at play in his case. ^

[3]: The most interesting exception to this I know of is Rasmussen and Ghahramani's "Bayesian Monte Carlo" (NIPS, 2002). But despite its elegance and the reputations of its authors, it's fair to say this work has not had much impact. ^

Posted at September 14, 2013 23:22 | permanent link