The Bactra Review: Occasional and eclectic book reviews by Cosma Shalizi 172

Theory and Credibility

Integrating Theoretical and Empirical Social Science

by Scott Ashworth, Christopher R. Berry and Ethan Bueno de Mesquita

Princeton, New Jersey: Princeton University Press, 2021

One Hand Washes the Other

This is a textbook on social science methodology, aimed at those aspiring to Ph.D.s in political science. But the poli. sci. content is mostly in the specific examples; I imagine a teacher who was willing to provide examples from economics, sociology, demography, etc., could use it perfectly well.

The big point that the authors want to drive home is that rigorous formal modeling and careful analysis of empirical data ought to be seen as complements, not rivals. Formal models are how we work out the consequences of our ideas about how some part of the world works, or might work. (They give a careful account of modeling, based on Ronald Giere's ideas, emphasizing selective similarity of the model to the target.) It is by articulating formal models that we see what our ideas imply, and what assumptions are needed to reach various conclusions. (I am not super fond of the particular style of "rational choice" modeling used here, but nothing vital turns on that.) Since we don't want to just elaborate ideas but actually check them, we need data and we need to analyze it. But at this point we have to make sure that the data we have is actually measuring the variables we are theorizing about (or close enough), and that what we estimate from that data is actually relevant to checking or developing the theory.

This is where "credibility" comes in, as a reference to the "credibility revolution", the rather self-congratulatory name given to the wide adoption of more careful methods for non-experimental causal inference in the social sciences since the 1980s. (I wish I could remember who joked that it's really been a revolution of in-credulity about everyone else's identification assumptions.) Good courses in econometrics and data analysis for the social sciences will now give these methods a lot of attention. (So, too, will some courses I cannot in good conscience recommend.) This book is not a textbook on causal-inference techniques. Rather, they're relevant here because they often give us way to estimate the quantities we're theorizing about, and not other quantities which resemble our favorites only superficially. (*)

Going the other way, the great temptation of the new causal inference tools is to use them to precisely answer whatever questions they can be applied to, irrespective of whether or not those questions are worth answering. (In the immortal words of Tom Pepinsky: "OMFG Exogenous Variation!") Since there are many, many average treatment effects (or whatever) that you could, in principle, estimate, why this one? "Because I can" is not, in fact, going to reliably advance knowledge. Maybe more subtly: as some of us emphasize when teaching causal inference, all of these methods require assumptions about the data-generating process and the causal structure of the world in order to work reliably. Their credibility rests on scientific theory and is not intrinsic.

So: formal modeling that aspires to say something about the world finds modern data-analysis very helpful, and scientific theories guide and underlie the data analysis. The two complement each other usefully. (I am curious about the authors' class-room experience here: is there more need to persuade data analysts that theory matters, or vice versa? do they have more success in doing so?) Thus the core of the book.

Beyond the core, the authors go over a range of ideas for making theory and empirics work productively together: finding distinct empirical implications for competing theories, elaborating on theories to help make them identifiable, etc. All of these are illustrated with case studies based on papers, or series of papers, from the political science literature, mostly since 2000. (This is where an economist or demographer or epidemiologist might need to swap in their own examples.) Their commentary here is always thoughtful, and does a good job of balancing respect for the particulars of the examples with drawing more general lessons.

This is not a very technical work on either the theoretical or the data-analytic side. (Readers are expected to be able to do basic algebra and a little calculus, to find expected values, and to at least sort-of remember how linear least squares works.) This is appropriate because the technicalities are abundantly covered elsewhere, and aren't really the point here. The writing is straightforward and clear, if a bit dry (though of course I savored the sporadic snarky footnotes). I don't have any very direct use for the book in my teaching, but I almost wish I did. I strongly recommend it to all those who would like to see better social science.

*: the authors also make an assertion that theories generally only make all-else-being-equal claims, which line up especially well with causal-inference techniques, even if we're not necessarily interested in causal parameters. I am not quite as sold on this part. It seems to me that, for instance, a DSGE model makes a lot of claims about the economy which aren't just of the all-else-being-equal form. But this is a tangent.

Probability and Statistics; Philosophy of Science; Sociology, Social Theory, etc.

Drafted 19 November 2021, posted 7 December 2021