Attention conservation notice: 1900+ words of log-rolling promotion of an attempt by friends to stir up an academic controversy, in a matter where pedantic points of statistical theory intersect the artificial dilemmas of psychological experiments.

There's a growing interest among psychologists in modeling how people think as a process of Bayesian learning. Many of the papers that come from this are quite impressive as exercises in hypothetical engineering, in the Design for a Brain tradition, but long-time readers will be bored and unsurprised to hear that I don't buy them as psychology. Not only do I deny that Bayesianism is any sort of normative ideal (and so that Bayesian models are standards of rationality), but the obstacles to implementing Bayesian methods on the nervous system of the East African Plains Ape seem quite insurmountable, even invoking the computational power of the unconscious mind*. Nonetheless, there are all those experimental papers, and it's hard to argue with experimental results...

Unless, of course, the experimental results don't show what they seem to. This is the core message of a new paper, whose insight is completely correct and something I kick myself for not having realized.

- Frederick Eberhardt and David Danks, "Confirmation in the Cognitive
Sciences: The Problematic Case of Bayesian
Models", Minds
and Machines
**21**(2011): 389--410, phil-sci/8778 *Abstract*: Bayesian models of human learning are becoming increasingly popular in cognitive science. We argue that their purported confirmation largely relies on a methodology that depends on premises that are inconsistent with the claim that people are Bayesian about learning and inference. Bayesian models in cognitive science derive their appeal from their normative claim that the modeled inference is in some sense rational. Standard accounts of the rationality of Bayesian inference imply predictions that an agent selects the option that maximizes the posterior expected utility. Experimental confirmation of the models, however, has been claimed because of groups of agents that "probability match" the posterior. Probability matching only constitutes support for the Bayesian claim if additional unobvious and untested (but testable) assumptions are invoked. The alternative strategy of weakening the underlying notion of rationality no longer distinguishes the Bayesian model uniquely. A new account of rationality — either for inference or for decision-making — is required to successfully confirm Bayesian models in cognitive science.

Let me give an extended quotation from the paper to unfold the logic.

In a standard experimental set-up used to confirm a Bayesian model, experimental participants are provided with a cover story about the evidence they are about to see. This cover story indicates (either implicitly or explicitly) the possible hypotheses that could explain the forthcoming data. Either the cover story or pre-training is used to induce in participants a prior probability distribution over this space. Eliciting participants' prior probabilities over various hypotheses is notoriously difficult, and so the use of a novel cover story or pre-training helps ensure that every participant has the same hypothesis space and nearly the same prior distribution. In addition, cover stories are almost always designed so that each hypothesis has equal utility for the participants, and so the participant should care only about the correctness of her answer. In many experiments, an initial set of questions elicits the participant's beliefs to check whether she has extracted the appropriate information from the cover story. Participants are then presented with evidence relevant to the hypotheses under consideration. Typically, in at least one condition of the experiment, the evidence is intended to make a subset of the hypotheses more likely than the remaining hypotheses. After, or sometimes even during, the presentation of the evidence, subjects are asked to identify the most likely hypothesis in light of the new evidence. This identification can take many forms, including binary or n- ary forced choice, free response (e.g., for situations with infinitely many hypotheses), or the elicitation of numerical ratings (for a close-to-continuous hypothesis space, such as causal strength, or to assess the participant's confidence in their judgment that a specific hypothesis is correct). Any change over time in the responses is taken to indicate learning in light of evidence, and those changes are exactly what the Bayesian model aims to capture.These experiments must be carefully designed so that the experimenter controls the prior probability distribution, the likelihood functions, and the evidence. This level of control ensures that we can confirm the predictions of the Bayesian model by directly comparing the participants' belief changes (as measured by the various elicitation methods) with the mathematically computed posterior probability distribution predicted by the model. As is standard in experimental research, results are reported for a participant

population(split over the experimental conditions) to control for any remaining individual variation. Since the model is supposed to provide an account of each participant in the populationindividually, experimental results must be compared to the predictions of an aggregate (or "population") of model predictions.

Here's the problem: in these experiments (at least the published ones...),
there is a decent match between the distribution of choices made by the
population, and the posterior distribution implied by plugging the
experimenters' choices of prior distribution, likelihood, and data into Bayes's
rule. This is however *not* what Bayesian decision theory predicts.
After all, the optimal action should be a function of the posterior
distribution (what a subject believes about the world) and the utility function
(the subjects' preferences over various sorts of error or correctness). Having
carefully ensured that the posterior distributions will be the same across the
population, and having also (as Eberhardt and Danks say) made the utility
function homogeneous across the population, Bayesian decision theory quite
straightforwardly predicts that everyone should make the *same* choice,
because the action with the highest (posterior) expected utility will be the
same for everyone. Picking actions frequencies proportional to the posterior
probability is simply irrational by Bayesian lights ("incoherent"). It is all
very well and good to say that each subject contains multitudes, but the
experimenters have contrived it that each subject should contain
the *same* multitude, and so should acclaim the same choice. Taking the
distribution of choices *across* individuals to confirm the Bayesian
model of a distribution *within* individuals then amounts to a fallacy
of composition. It's as
though the poet saw
two of his three blackbirds fly east and one west, and concluded
that *each* of the birds "was of three minds", two of said minds
agreeing that it was best to go east.

By hypothesis, then, the mind is going to great lengths to maintain and
update a posterior distribution, but then doesn't *use* it in any
sensible way. This hardly seems sensible, let alone rational or adaptive.
Something has to give. One possibility, of course, is that is sort of
cognition is not "Bayesian" in any strong or interesting sense, and this is
certainly the view I'm most sympathetic to. But in fairness we should (as
Eberhardt and Danks
do), explore
branches of the escape tree for the Bayesians.

There are, of course, situations where the utility-maximizing strategy is
randomized; but the conditions needed for that don't seem to hold for these
sorts of experiments. The decision problem the experimentalists
are *trying* to set up is one where the optimal decision is indeed a
deterministic function of the posterior distribution. And even when a
randomized strategy is optimal, it rarely just matches posterior probabilities.
An alternative escape is to consider that while the
experimentalists *try* to make prior, likelihood, data and utility
homogeneous across the subject population, they almost certainly don't succeed
completely. One way this could be modeled is to actually include a random term
in the decision model. This sort of technology has actually been fairly well
developed by economists, who also try to
match actual human behavior to
(specious, over-precise) models of choice. This "curse of determinism" is
broken by economists by adding a purely stochastic term to the utility being
maximized, leading to a distribution of choices. Such random-utility models
have not been applied to Bayesian cognition experiments, and, yet again,
assuming that the individual-level noise terms could be adjusted *just
so* as to get the distribution of individual choices to approximate the
noise-free posterior distribution, why should they be?

Now, I do want to raise a possibility which goes beyond Eberhardt and Danks,
which goes to the *specificity* of the distributional
evidence. The dynamics of Bayesian updating is an example
of the replicator dynamics from evolutionary theory, with hypotheses as
replicators and fitness as likelihood. But not only is Bayes a very narrow
special case of the replicator equations (no sources of variation analogous to
mutation or sex; no interaction between replicators analogous
to frequency dependence),
lots of other adaptive processes approximately follow those equations as well.
Evolutionary search processes (a la Holland et
al.'s Induction)
naturally do so, for instance, but so does mere reinforcement learning,
as several authors
have shown. At the level of changing probability distributions within an
individual, all of these would look extremely similar to each other and to
Bayesian updating. Even if Bayesian models find a way to link distributions
within subjects to distributions across populations, specifically supporting
*Bayesian* models would need evidence which *differentially*
favored them over all other replicator-ish models. One way to provide such
differential support would be to show that Bayesian models are not only rough
matches to the data, they fit it in detail, and fit it better than non-Bayesian
models could. Another kind of differential support would be showing that the
Bayesian models account for other features of the data, beyond the dynamics of
distributions, that their rivals do not. It's for the actual psychologists to
say how much hope there is for any such approach; I will content myself by
observing that it is *very easy* to tell an evolutionary-search or
reinforcement-learning story that ends with the distribution of people's
choices matching the global probability distribution**.

What is not secondary at all is the main point of this paper: Bayesian
models of inference and decision *do not* predict that the population
distribution of choices across individuals should mirror the posterior
distribution of beliefs within each individual. That is rather so far from the
models' predictions as to *refute* the models. Perhaps, with a lot of
technical work in redefining the decision problem and/or modeling experimental
noise, the theories could be reconciled with the data. Unless that work is
done, and done successfully, then as accounts of human cognition these theories
are doomed. Anyone who finds these issues interesting would do well to read
the paper.

*Disclaimer*: Frederick is a friend, and David is on the faculty
here, though in a different department. Neither of them is responsible for
anything I'm saying here.

*Manual
trackback*: Faculty
of Language

*: There *are* times when uninstructed people
are quite good at using Bayes's rule: these are situations where they are
presented with some population frequencies and need to come up with others.
See Gerd Gigerenzer and Ulrich Hoffrage, "How to Improve Bayesian Reasoning
without Instruction: Frequency
Formats", Psychological
Review **102** (1995): 684--704, and Leda Cosmides and
John Tooby, "Are Humans Good Intuitive Statisticians After All? Rethinking
Some Conclusions from the Literature on Judgement Under
Uncertainty", Cognition **58** (1996): 1--73
[PDF]. In my supremely arrogant and unqualified
opinion, this is one of those places
where evolutionary psychology is not
only completely appropriate, but where Cosmides and Tooby's specific ideas are
also quite persuasive.

**: It is also very easy to tell an
evolutionary-search story in which people
*have new ideas*, while (as Andy and I discussed)
it's impossible for a Bayesian agent to believe something it hasn't always
already believed at least a little.

Bayes, Anti-Bayes; Minds, Brains, and Neurons; Enigmas of Chance; Kith and Kin

Posted at September 18, 2011 21:29 | permanent link