The Bactra Review: Occasional and eclectic book reviews by Cosma Shalizi 176

Sociology as a Population Science

by John H. Goldthorpe

Cambridge, England: Cambridge University Press, 2015, doi:10.1017/CBO9781316412565

Goldthorpe helpfully summarizes each chapter in a proposition at its head, as follows:

"Sociology should be understood as a population science in the sense of Neyman (1975)."
Goldthorpe goes on to gloss Neyman (1975):
[P]opulations in this technical sense could, substantively, be of quite different kinds. They could be human or other animal populations, but also populations of, say, molecules or galaxies. The common feature of such populations was that, while their individual elements were subject to considerable variability and might appear, at least in some respects, indeterminate in their states and behaviour, they could nonetheless exhibit aggregate-level regularities of a probabilistic kind.
The aims of a science dealing with such pluralistic subjects of study --- or, that is, of what could be called a 'population science' --- were then twofold. The initial aim was to investigate, and to establish, the probabilistic regularities that characterise a particular population, or its appropriately defined subpopulations....
However, Neyman also made it clear that once population regularities had been empirically established, the further aim of a population science had to be that of determining the processes or 'mechanisms' which in their operation at the individual level actually produced these regularities. And since the regularities --- the explananda of a population science --- were probabilistic, the mechanisms that would need to be envisaged would be ones that, rather than being entirely grounded in deterministic laws, incorporated chance. [pp. 7--8, Goldthorpe's emphasis]
"Sociology has to be understood as a population science, primarily on account of the degree of variability evident in human social life, at the level of sociocultural entities, but also, and crucially, at the individual level --- this latter variability being inadequately treated within the 'holistic' paradigm of inquiry, for long prevalent in sociology but now increasingly called into question."
"In sociology, understood as a population science, an 'individualistic' rather than a holistic paradigm of inquiry is required because of the high degree of variability existing at the individual level, and, further, because individual action, while subject to sociocultural conditioning and constraints, has to be accorded causal primacy in human social life, on account of the degree of autonomy that it retains."
"For sociology understood as a population science, the basic explananda are probabilistic population regularities rather than singular events or events that are grouped together under some rubric but without any adequate demonstration of the underlying regularities that would warrant such a grouping."
"Statistics has to be regarded as foundational for sociology as a population science in the sense that, as the means through which population regularities are established, it actually constitutes the explananda or 'objects of study' of sociology --- although always in conjunction with the concepts that sociologists form."
"In sociology as a population science, the foundational role played by statistics in establishing population regularities stems, in the first place, from the need for methods of data collection that are able to accommodate the degree of variability characteristic of human social life, in particular at the individual level, and that can thus provide an adequate basis for the analysis of regularities occurring within the variation that exists."
"In sociology as a population science the foundational role played by statistics in establishing population regularities stems, in the second place, from the need for methods of data analysis that are able to demonstrate the presence and the form of the population regularities that are emergent from the variability of human social life."
"While statistically informed methods of data collection and analysis are foundational in establishing the probabilistic population regularities that constitute sociological explananda, statistical analysis alone cannot lead to causal explanations of these regularities."
"In order to provide causal explanations for established population regularities, causal processes, or mechanisms, must be hypothesized in terms of individual action and interaction that meet two requirements: they should be in principle adequate to generate the regularities in question and their actual operation should be open to empirical test. Advantage lies with mechanisms explicitly specified in terms of action that is in some sense rational."

This may give the impression the book is all abstract argument, which is not the case at all; there are many concrete illustrations of why he thinks his ideas are better than alternatives, some drawn from his own career (particularly on social stratification and the transmission of inequality), but also from other areas of sociology.

I like where Goldthorpe's heart is at, but find his defense of status quo practice in sociology (nos. 7 and 8, mostly) unpersuasive. As a methodological individualist, he very sensibly wants to explain social phenomena in terms of the actions, and interactions, of individuals, especially actions which are (at least) subjectively rational (no. 9). So far so good; this is just having your head screwed on right, as agreed to by everyone from Karl Popper and Jon Elster through Raymond Boudon and Peter Hedström to Manuel DeLanda [*]. But then you'd want to model interacting individuals, and ideally you'd want to compare those models to data on individuals' actions and interactions.

What Goldthorpe instead defends is running regressions on survey data, so both the data and the statistical models bear only very complicated, indirect and lossy relationships to the phenomena described in the kind of theory Goldthorpe (very correctly) advocates. Sociologists of his convictions should almost exclusively build, and compare to data [**], agent-based models --- and some do. [***] Goldthorpe's right that cross-sectional survey data on individuals' attributes are the most accurate and representative kind of social data we've got. But lower-quality data on individuals' actions and interactions might still lead to better inferences for the kind of models Goldthorpe should want. Consider, for example, the sort of narrative-relational data Roberto Franzosi extracts from newspapers. This gives measurements of who did what to whom when, and (perhaps) why, and so is a lot more directly aligned with the sort of models Goldthorpe ought to want (because those models, in turn, are more directly aligned with the sort of theory and explanations he wants). There are issues about sampling bias and coding schemes, etc., for such data, and maybe those drawbacks generally outweigh the advantages, but that can hardly be decided a priori. Alternatively, we might seek new methodology, to figure out how to do statistical inference for ABMs based on our surveys.

(That methodology doesn't exist, yet, but I will geek out about it nonetheless. On the one hand, this might actually create a useful role for the usual sociologists' regressions, as auxiliary models in indirect inference for the ABMs. [But then they might be replaced by totally random features.] On the other hand, I worry that trying to do inference on interactive dynamical models from cross-sectional distributions of individual outcomes would run smack into serious identification problems. [Multiple Markov processes can have the same invariant distribution.] But I shall draw a close to these speculations, and return to Goldthorpe's book.)

As I said, I think lots of this is right-headed, and that it would be good for other sociologists to adopt these positions. I also think that, as I indicated, if Goldthorpe just followed his own ideas a bit more, he'd end up in an interesting and intellectually productive place. It may be that some of his readers in sociology will take this next step. This is, self-consciously, an old warrior's reflections on battles past (and tales of even older warriors few now remember) --- but I have always been fond of such works.

*: At least, DeLanda should agree if we sprinkled in some talk of assemblages; see chapters 1 and 2 of his A New Philosophy of Society (mini-review at the link above). Whether he actually does, I wouldn't presume to say. ^

**: Like Goldthorpe, I am taking it for granted here that we want to compare our models to data (for estimation, for hypothesis testing, for model-checking, and even for prediction). That being said, an important part of developed sciences is ~~playing around with~~ analyzing deliberately stylized and simplified models which aren't brought to the data and shouldn't be --- their point is to explore the consequences of assumptions, and build understanding for more complicated situations. (I don't think Goldthorpe would disagree.) If theoretical sociology actually existed as a sub-discipline, subjecting data-free ABMs to analysis would be an important topic for it. ^

***: Since, of course, ABMs are just interacting Markov processes, in some cases their microstructure won't matter and they can be usefully aggregated to compartment models, as in demography. This will, naturally, simplify both the probabilistic analysis of the models and their statistical inference. ^

Sociology, Social Theory, Social-Science Methodology / Probability and Statistics

Drafted 6--31 October 2022, posted 7 January 2023, small revisions and amplifications 8 January 2023