September 18, 2011

"I was of three minds, / Like a tree / In which there are three blackbirds"

Attention conservation notice: 1900+ words of log-rolling promotion of an attempt by friends to stir up an academic controversy, in a matter where pedantic points of statistical theory intersect the artificial dilemmas of psychological experiments.

There's a growing interest among psychologists in modeling how people think as a process of Bayesian learning. Many of the papers that come from this are quite impressive as exercises in hypothetical engineering, in the Design for a Brain tradition, but long-time readers will be bored and unsurprised to hear that I don't buy them as psychology. Not only do I deny that Bayesianism is any sort of normative ideal (and so that Bayesian models are standards of rationality), but the obstacles to implementing Bayesian methods on the nervous system of the East African Plains Ape seem quite insurmountable, even invoking the computational power of the unconscious mind*. Nonetheless, there are all those experimental papers, and it's hard to argue with experimental results...

Unless, of course, the experimental results don't show what they seem to. This is the core message of a new paper, whose insight is completely correct and something I kick myself for not having realized.

Frederick Eberhardt and David Danks, "Confirmation in the Cognitive Sciences: The Problematic Case of Bayesian Models", Minds and Machines 21 (2011): 389--410, phil-sci/8778
Abstract: Bayesian models of human learning are becoming increasingly popular in cognitive science. We argue that their purported confirmation largely relies on a methodology that depends on premises that are inconsistent with the claim that people are Bayesian about learning and inference. Bayesian models in cognitive science derive their appeal from their normative claim that the modeled inference is in some sense rational. Standard accounts of the rationality of Bayesian inference imply predictions that an agent selects the option that maximizes the posterior expected utility. Experimental confirmation of the models, however, has been claimed because of groups of agents that "probability match" the posterior. Probability matching only constitutes support for the Bayesian claim if additional unobvious and untested (but testable) assumptions are invoked. The alternative strategy of weakening the underlying notion of rationality no longer distinguishes the Bayesian model uniquely. A new account of rationality — either for inference or for decision-making — is required to successfully confirm Bayesian models in cognitive science.

Let me give an extended quotation from the paper to unfold the logic.

In a standard experimental set-up used to confirm a Bayesian model, experimental participants are provided with a cover story about the evidence they are about to see. This cover story indicates (either implicitly or explicitly) the possible hypotheses that could explain the forthcoming data. Either the cover story or pre-training is used to induce in participants a prior probability distribution over this space. Eliciting participants' prior probabilities over various hypotheses is notoriously difficult, and so the use of a novel cover story or pre-training helps ensure that every participant has the same hypothesis space and nearly the same prior distribution. In addition, cover stories are almost always designed so that each hypothesis has equal utility for the participants, and so the participant should care only about the correctness of her answer. In many experiments, an initial set of questions elicits the participant's beliefs to check whether she has extracted the appropriate information from the cover story. Participants are then presented with evidence relevant to the hypotheses under consideration. Typically, in at least one condition of the experiment, the evidence is intended to make a subset of the hypotheses more likely than the remaining hypotheses. After, or sometimes even during, the presentation of the evidence, subjects are asked to identify the most likely hypothesis in light of the new evidence. This identification can take many forms, including binary or n- ary forced choice, free response (e.g., for situations with infinitely many hypotheses), or the elicitation of numerical ratings (for a close-to-continuous hypothesis space, such as causal strength, or to assess the participant's confidence in their judgment that a specific hypothesis is correct). Any change over time in the responses is taken to indicate learning in light of evidence, and those changes are exactly what the Bayesian model aims to capture.

These experiments must be carefully designed so that the experimenter controls the prior probability distribution, the likelihood functions, and the evidence. This level of control ensures that we can confirm the predictions of the Bayesian model by directly comparing the participants' belief changes (as measured by the various elicitation methods) with the mathematically computed posterior probability distribution predicted by the model. As is standard in experimental research, results are reported for a participant population (split over the experimental conditions) to control for any remaining individual variation. Since the model is supposed to provide an account of each participant in the population individually, experimental results must be compared to the predictions of an aggregate (or "population") of model predictions.

Here's the problem: in these experiments (at least the published ones...), there is a decent match between the distribution of choices made by the population, and the posterior distribution implied by plugging the experimenters' choices of prior distribution, likelihood, and data into Bayes's rule. This is however not what Bayesian decision theory predicts. After all, the optimal action should be a function of the posterior distribution (what a subject believes about the world) and the utility function (the subjects' preferences over various sorts of error or correctness). Having carefully ensured that the posterior distributions will be the same across the population, and having also (as Eberhardt and Danks say) made the utility function homogeneous across the population, Bayesian decision theory quite straightforwardly predicts that everyone should make the same choice, because the action with the highest (posterior) expected utility will be the same for everyone. Picking actions frequencies proportional to the posterior probability is simply irrational by Bayesian lights ("incoherent"). It is all very well and good to say that each subject contains multitudes, but the experimenters have contrived it that each subject should contain the same multitude, and so should acclaim the same choice. Taking the distribution of choices across individuals to confirm the Bayesian model of a distribution within individuals then amounts to a fallacy of composition. It's as though the poet saw two of his three blackbirds fly east and one west, and concluded that each of the birds "was of three minds", two of said minds agreeing that it was best to go east.

By hypothesis, then, the mind is going to great lengths to maintain and update a posterior distribution, but then doesn't use it in any sensible way. This hardly seems sensible, let alone rational or adaptive. Something has to give. One possibility, of course, is that is sort of cognition is not "Bayesian" in any strong or interesting sense, and this is certainly the view I'm most sympathetic to. But in fairness we should (as Eberhardt and Danks do), explore branches of the escape tree for the Bayesians.

There are, of course, situations where the utility-maximizing strategy is randomized; but the conditions needed for that don't seem to hold for these sorts of experiments. The decision problem the experimentalists are trying to set up is one where the optimal decision is indeed a deterministic function of the posterior distribution. And even when a randomized strategy is optimal, it rarely just matches posterior probabilities. An alternative escape is to consider that while the experimentalists try to make prior, likelihood, data and utility homogeneous across the subject population, they almost certainly don't succeed completely. One way this could be modeled is to actually include a random term in the decision model. This sort of technology has actually been fairly well developed by economists, who also try to match actual human behavior to (specious, over-precise) models of choice. This "curse of determinism" is broken by economists by adding a purely stochastic term to the utility being maximized, leading to a distribution of choices. Such random-utility models have not been applied to Bayesian cognition experiments, and, yet again, assuming that the individual-level noise terms could be adjusted just so as to get the distribution of individual choices to approximate the noise-free posterior distribution, why should they be?

Now, I do want to raise a possibility which goes beyond Eberhardt and Danks, which goes to the specificity of the distributional evidence. The dynamics of Bayesian updating is an example of the replicator dynamics from evolutionary theory, with hypotheses as replicators and fitness as likelihood. But not only is Bayes a very narrow special case of the replicator equations (no sources of variation analogous to mutation or sex; no interaction between replicators analogous to frequency dependence), lots of other adaptive processes approximately follow those equations as well. Evolutionary search processes (a la Holland et al.'s Induction) naturally do so, for instance, but so does mere reinforcement learning, as several authors have shown. At the level of changing probability distributions within an individual, all of these would look extremely similar to each other and to Bayesian updating. Even if Bayesian models find a way to link distributions within subjects to distributions across populations, specifically supporting Bayesian models would need evidence which differentially favored them over all other replicator-ish models. One way to provide such differential support would be to show that Bayesian models are not only rough matches to the data, they fit it in detail, and fit it better than non-Bayesian models could. Another kind of differential support would be showing that the Bayesian models account for other features of the data, beyond the dynamics of distributions, that their rivals do not. It's for the actual psychologists to say how much hope there is for any such approach; I will content myself by observing that it is very easy to tell an evolutionary-search or reinforcement-learning story that ends with the distribution of people's choices matching the global probability distribution**.

What is not secondary at all is the main point of this paper: Bayesian models of inference and decision do not predict that the population distribution of choices across individuals should mirror the posterior distribution of beliefs within each individual. That is rather so far from the models' predictions as to refute the models. Perhaps, with a lot of technical work in redefining the decision problem and/or modeling experimental noise, the theories could be reconciled with the data. Unless that work is done, and done successfully, then as accounts of human cognition these theories are doomed. Anyone who finds these issues interesting would do well to read the paper.

Disclaimer: Frederick is a friend, and David is on the faculty here, though in a different department. Neither of them is responsible for anything I'm saying here.

Manual trackback: Faculty of Language

*: There are times when uninstructed people are quite good at using Bayes's rule: these are situations where they are presented with some population frequencies and need to come up with others. See Gerd Gigerenzer and Ulrich Hoffrage, "How to Improve Bayesian Reasoning without Instruction: Frequency Formats", Psychological Review 102 (1995): 684--704, and Leda Cosmides and John Tooby, "Are Humans Good Intuitive Statisticians After All? Rethinking Some Conclusions from the Literature on Judgement Under Uncertainty", Cognition 58 (1996): 1--73 [PDF]. In my supremely arrogant and unqualified opinion, this is one of those places where evolutionary psychology is not only completely appropriate, but where Cosmides and Tooby's specific ideas are also quite persuasive.

**: It is also very easy to tell an evolutionary-search story in which people have new ideas, while (as Andy and I discussed) it's impossible for a Bayesian agent to believe something it hasn't always already believed at least a little.

Bayes, Anti-Bayes; Minds, Brains, and Neurons; Enigmas of Chance; Kith and Kin

Posted at September 18, 2011 21:29 | permanent link

Three-Toed Sloth