The Bactra Review: Occasional and eclectic book reviews by Cosma Shalizi 168

Measurement in Psychology

A Critical History of a Methodological Concept

by Joel Michell

It is rare for a work in the history and philosophy of science to have a villain, but this one does: the Harvard psychologist S. S. Stevens, who in 1946 propounded the following definition, which was become received wisdom (even folklore) in psychology: "Measurement is the assignment of numerals to objects or events according to rule". Michell hates this definition with a seething passion that comes through his very academic prose, and it is not hard to see why. According to this definition, social security numbers are "measurements" of people!

The definition becomes especially horrible when combined with notions about "operational definitions", which Stevens enthusiastically embraced, and which still retain currency in psychology. If you buy this package of ideas, then creating a test you call a "narcissism tests" automatically measures narcissism. This is because (i) you're measuring something (by Stevens's definition of "measurement") and (ii) "narcissism is what you measure with a narcissism test, just like length is what you measure with a ruler" (operational definitions). You may then go on to correlate narcissism with the number of times someone uses a first-person singular pronoun (or, if you're really sophisticated, throw both variables into a path model), which presumes that arithmetic on narcissism scores means anything. You may do this blissfully unconcerned with whether there really is a one-dimensional variable across people which your test responds to, or whether it makes any sense to say that the difference between Irene's neuroticism and Joey's is twice as great as that between Joey and Karl. The mystery is how this appallingly bad notion became so entrenched within an academic field.

The mystery deepens when it's made clear, as Michell is at pains to , that much better ideas about measurement are certainly available, and indeed are targeted specifically at the problems of psychological measurement. This was the work of Patrick Suppes, R. D. Luce and collaborators in the 1960s and 1970s, culminating in David Krantz et al.'s Foundations of Measurement. This alternative tradition explored questions like "what are some sets of properties which are strong enough to ensure that a variable can be represented by real numbers, with arithmetic operations being meaningful?", "what kinds of structure are necessary for representation by real numbers?", and "how can the presence of such structure be tested, without presuming any particular numerical representation?" This theory allows, for example, for situations where two variables can only be measured together ("conjoint measurement"), as, to pick an example not at all at random, with using scores of multiple people on multiple tests to get at both "how smart is the test-taker?" and "how hard is the test?". In other words, this sub-field used serious mathematics to enable researchers to answer questions like "is this narcissism test really capturing a one-dimensional attribute with quantitative structure?" That is, this line of work on the representational theory of measurement makes (at least some aspects of) the validity of specific forms measurement into scientific, rather than metaphysical, problems. One would then naturally expect (at least some) scientists in the relevant field, namely psychology, to address those scientific problems. This has conspicuously not happened.

Thus, a propos Spearman's "two-factor" theory of intelligence, where scores on an intelligence test which is "homogeneous" for subject matter are supposed to be a sum of general ability, plus subject-matter specific ability, Michell (correctly) writes

The theory of conjoint measurement applies directly to any theory of this form, and, in doing so, brings out clearly (i) that such a theory could be mistaken in its requirements that the relevant attributes be quantitative, and (ii) that, as a consequence, in the absence of relevant evidence, confidence that these attributes are measurable is misplaced.
In this instance there are three further requirements necessary to apply conjoint measurement theory: a theory of problem solving capable of distinguishing homogeneous from non-homogeneous tests; some way of identifying values of general ability that is independent of test scores, some way of identifying values of specific ability, also independently of test scores, and some way of, first, identifying and, then, controlling other relevant causes, so that the features of the data diagnostic of additive structure are not swamped by error. These are matters that require the theory [of general intelligence] to be elaborated well beyond its present state. [pp. 206--207]

Now, following what Michell is at pains to show is an ancient, indeed a Euclidean, tradition, he wants to reserve the word "measurement" for determining the magnitudes of "quantities", basically variables isomorphic to the positive real numbers. (The paradigm case is length, with mass in a supporting role.) He traces the history of this conception, and the fact that the pioneers of both psychophysics (Fechner) and mental testing (Spearman; interestingly, not Binet) thought they were conducting measurement in this sense. He also traces mathematical work, especially by Hölder, which illuminates what structure is required for such measurement to be possible. (This discussion led me to learn of very cool work by Norbert Wiener in 1914--1921, working out the sort of order or structure that can be constructed by combining relations of the form "just noticeably stronger than" and "not noticeably stronger than". This apparently had no impact at all, not least because he presumed readers already familiar with Principia Mathematica.) Michell investigates how the psychologists felt compelled to produce measurement in this sense, and also confronted obvious difficulties in showing that psychological attributes or variables were, in fact, quantitative.

Michell then surveys developments in philosophy and methodology which loosened some of the strictness of the old, Euclidean notion of measurement, but which also paved the way for the representational work he likes so much --- ideas in which numbers are used to represent aspects of the world, and "representation" involves similarity of structure, e.g., isomorphism. (Bertrand Russell's early work, around 1900, features here, and Michell doesn't care for it too much. I think he would find Russell's later thinking, e.g., in The Analysis of Matter, more congenial.) This, in Michell's telling, prepared the way both for Stevens, and for the representationalist work of Suppes, Luce and co.

Controversies with physicists about whether psychologists could really measure the intensity of sensations made it clear that a defense of psychologists' existing practices of quantifying was needed. This Stevens delivered in spades. (Michell makes it clear that Stevens's 1946 paper seriously distorted the views of participants in the earlier debates, though his evidence leaves open, to my mind, the question of whether the distortion was deliberate or merely ignorant.) His definition, which truly has all the advantages of theft over honest toil, has been part of the canon of psychology ever since.

As is probably clear, I find Michell's story largely convincing. There are places where I think he is not an ideal guide. Some of these are minor quirks (he has bees in his bonnet about the reality of numbers, and about the importance of mathematizing late Scholastics like Oresme, despite their conspicuous failure to perform any measurements [1]). Others are a bit more troubling --- I am pretty sure that according to his favored definitions, position does not count as a quantitative variable, because it is a three-dimensional vector. You could say that each coordinate of the vector is a length (or a difference in lengths), hence a quantity sensu Michell, but that depends on a certain system of coordinates. It's not clear to me that coordinate-free expressions for laws relating vectors, as we find in differential-geometric formulations of physics, would thus count as "quantitative" for Michell. Worse, rotations around different axes do not combine additively, indeed they don't even commute (like as combining position vectors does). Now, if the psychologists had shown that the (supposed) Big Five personality factors were really a 5-vector, or isomorphic to a subgroup of SO(4), or anything remotely like that, I think Michell would have to admit they had done everything he could possibly want by way of establishing that these variables can be measured. Indeed, sequencing a genome seems very much like a form of measurement, and the representation there is just a sequence of categorical values, of "nominals" (A, C, G, T). If psychologists could show that possible personalities were (isomorphic to) strings from some regular grammar, we should all be very happy, and never mind additivity.

Thus I don't get what Michell has against broadening the definition of "measurement" to something like "finding mathematical representations of attributes that preserve the empirical structure of the attributes, and ascertaining the correct representations of particular cases". If, to preserve continuity with tradition, he reserves "measurement" for numerical magnitudes, then we need some other word for the broader notion. (Maybe "observation"?) Of course, psychologists have said that intelligence, narcissism, etc., are one-dimensional magnitudes, and they they haven't (for the most part) expressed interest in other forms of mathematical structure, so however the terminology shakes out, Michell's main critical points stand. This really ought to be read by anyone interested in psychological (or social) measurement.

[1]: See, briefly, Alfred W. Crosby, The Measure of Reality: Quantification and Western Society, 1250--1600, pp. 56--69. ^

xvi + 246 pp., bibliography, index

In print as a hardback, ISBN 0-521-62120-8, and as a paperback

History of Science
Philosophy of Science
Cognitive Science;
Mathematics;
Debunking

27 February 2018; small wording fixes, 13 April 2018