Clinical and Actuarial Judgment Compared09 Jun 2021 13:52
Modified: 12 April 2004; 31 August 2008; 10 February 2009; 29 August 2014; 14 January 2019
For something like fifty years now, psychologists have been studying the question of "clinical versus actuarial judgment". The idea goes like this. (This is not any actual experiment, just a description of the general idea.) Say you're interested in diagnosing heart diseases from electrocardiograms. Normally we have clinicians, i.e., expert doctors, look at a chart and say whether the patient has (to be definite) a heart condition requiring treatment within one year. Alternately, we could ask the experts what features they look at, when making their prognosis, and then fit a statistical model to that data, trying to predict the outcome or classification based on those features, which we can still have human experts evaluate. This is the actuarial approach, since it's just based on averages --- "of patients with features x, y and z, q percent have a serious heart condition".
The rather surprising, and completely consistent, result of these studies is that there are no known cases where clinicians reliably out-perform actuarial methods, even when the statistical models are just linear classification rules, i.e., about as simple a model as you can come up with. In many areas, statistical classifiers significantly out-perform human experts. They even out-perform experts who have access to the statistical results, apparently because the experts place too much weight on their own judgment, and not enough on the statistics. Whether you think this is depressing news or not to some degree depends on your feelings about "clinical" experts. (I first learned of this area of research from Brent Staples's memoir Parallel Time, where he talked about doing his Ph.D. work in this area, and his taking a certain malicious satisfaction in the thought that his linear decision rules were smarter than doctors and psychiatrists.) So: human experts are really bad, or at least no better than simple statistical models.
On the other hand, there is another body of experimental work, admittedly more recent, on "simple heuristics that make us smart", which seems to show that people are often very good judges, under natural conditions. That is to say, we're very good at solving the problems we tend to actually encounter, presented in the way we encounter them. The heuristics we use to solve those problems may not be generally applicable, but they are adapted to our environments, and, in those environments, are fast, simple and effective.
I have a bit of difficulty reconciling these two pictures in my mind. I can think of three resolutions.
- The "clinicial versus actuarial" results are wrong, or at least irrelevant. The experiments do not reflect the "natural" conditions of clinical judgment. There are many possibilities here, but the one which springs immediately to mind is that clinicians may not actually have much insight into the way they really make decisions, and that the factors they think they attend to may not really be the ones that matter to them. What one really wants is a representative sample of actual cases, comparing the normal judgment of clinicians to that of the statistical models. This may have been done; I don't know.
- The "fast and frugal heuristics" results are wrong, or at least irrelevant. Whatever adaptive mechanisms let us figure out good heuristics in everyday life don't apply in the situations where we rely on clinical expertise, or at least not in a lot of them. (See, for instance, the discussion of projective tests like the Rorsharch ink-blots in Holland et al.'s Induction.) The problem can't just be that we didn't evolve to make psychiatric diagnoses, since we didn't evolve to do most of the diagnostic/prognostic tasks the fast-and-frugal-heuristics experiments show we can do, presumably by expating the mechanisms that let our ancestors answer questions like "Just how angry will my neighbors be if they catch me fishing in their stream?". There has to be something special about the conditions of clinicial judgment that render our normal cognitive mechanisms ineffective there.
- Clinicial judgment is a "fast and frugal heuristic", with emphasis on the fast and frugal. That is, it is true that (e.g.) linear classifiers are more accurate, but the decision procedures clinicians are using may be as accurate as one can get, using only a reasonable amount of information and a reasonable amount of time, while still using the human brain, which is not a computing platform well-suited to floating-point operations. The problem here is that there are areas where clinicians do seem to do as well as statistical methods.
I am unable to judge between these.
- Jason Collins, Simple Heuristics That Make Algorithms Smart [Another take on the dilemma above, which I found some years later...]
- Robyn M. Dawes, House of Cards: Psychology and Psychotherapy Built on Myth
- Robyn M. Dawes, David Faust and Paul E. Meehl, "Clinical Versus Actuarial Judgment", Science 243 (1989): 1668--1674
- Gerd Gigerenzer, Adaptive Thinking: Rationality in the Real World
- Gerd Gigerenzer, Peter Todd et al., Simple Heuristics That Make Us Smart
- William M. Grove, "Clinical Versus Statistical Prediction: The Contribution of Paul E. Meehl", Journal of Clinical Psychology 61 (2005): 1233--1243 [PDF reprint]
- Bernard E. Harcourt, Against Prediction: Profiling, Policing, and Punishing in an Actuarial Age [Review: Harcourt contra divinationem]
- John H. Holland, Keither J. Holyoak, Richard E. Nisbett and Paul R. Thagard, Induction: Processes of Learning, Inference and Discovery [Review: The Best-Laid Schemes o' Mice an' Men]
- Brent Staples, Parallel Time: Growing Up in Black and White
- Sonja B. Starr, "Evidence-Based Sentencing and the Scientific Rationalization of Discrimination", Stanford Law Review 66 (2014) 803--872 [The strongest part of this, to my mind, is the causal-inference critique: predicting the risk that someone will re-offend within \( k \) years, under current conditions, is not at all the same as predicting the risk of their committing another crime as a function of the sentence they receive. I am also very sympathetic to the points about the very modest predictive power of the existing algorithms, the possibility of great unmeasured heterogeneity within groups, and the ethical dubiousness of punishing someone more because of demographic groups they belong to. About the legal-constitutional issues I'm not fit to comment. One point to which Starr doesn't, I think, give enough weight is that even if risk-prediction formulas aren't any fairer or more accurate than what judges do now, they are however more explicit and public, and so both more subject to democratic control and to improvement over time. (Comment written in 2014.)]
- Frits Tazellar and Chris Snijders, "The myth of purchasing professionals' expertise. More evidence on whether computers can make better procurement decisions", Journal of Purchasing and Supply Management 10 (2004): 211--222 [PDF reprint via Snijders]
- Noted without recommendation:
- Gregory A. Caldeira, "Expert Judgment versus Statistical Models: Explanation versus Prediction", Perspectives on Politics 2 (2004): 777--780
- To read:
- Ian Ayres, Super Crunchers: Why Thinking-by-Numbers Is the New Way to Be Smart [Despite the painful title, Ayres has done a lot of interesting work on social statistics]
- Michael A. Bishop and J. D. Trout, "50 Years of Susccessful Predictive Modeling Should Be Enough: Lessons for Philosophy of Science", Philosophy of Science 69 (2002): S197--S208 [Implications of actuarial prediction studies for philosophy of science. PDF reprint]
- Robyn M. Dawes, "The Ethics of Using or Not Using Statistical Prediction Rules in Psychological Practice and Related Consulting Activities", Philosophy of Science 69 (2002): S178--S184
- K. Anders Ericsson and Jacqui Smith (eds.), Towards a General Theory of Expertise: Prospects and Limits
- Klaus Fiedler and Peter Juslin (eds.), Information Sampling and Adaptive Cognition
- Howard N. Garb, Studying the Clinician: Judgment Research and Psychological Assessment
- William M. Grove, David H. Zald, Boyd S. Lebow, Beth E. Snitz, and Chad Nelson, "Clinical Versus Mechanical Prediction: A Meta-Analysis", Psychological Assessment 12 (2000): 19--30 [PDF reprint via Prof. Grove]
- Konstantinous V. Katsikopoulos, Thorsten Pachur, Edouard Machery and Annika Wallin, "From Meehl to Fast and Frugal Heuristics (and Back): New Insights into How to Bridge the Clinical-Actuarial Divide", Theory and Psychology 18 (2008): 443--464
- Paul E. Meehl, "Causes and Effects of My Disturbing Little Book", Journal of Personality Assessment 50 (1986): 370--375 [PDF reprint]
- Nate Silver, The Signal and the Noise: Why So Many Predictions Fail --- But Some Don't
- Ewout W. Steyerberg, Clinical Prediction Models: A Practical Approach to Development, Validation and Updating
- Philip E. Tetlock, Expert Political Judgment: How Good Is It? How Can We Know?
- George G. Woodworth and Joseph B. Kadane, "Expert Testimony Supporting Post-Sentence Incarceration of Violent Sexual Offenders", Law, Probability and Risk 3 (2004): 221--241