It is rare for a work in the history and philosophy of science to have a
villain, but this one does: the Harvard psychologist S. S. Stevens, who
in 1946 propounded the
following definition, which was become received wisdom (even folklore) in
psychology: "Measurement is the assignment of numerals to objects or events
according to rule". Michell *hates* this definition with a seething
passion that comes through his very academic prose, and it is not hard to see
why. According to this definition, social security numbers are "measurements"
of people!

The definition becomes especially horrible when combined with notions about
"operational definitions", which Stevens enthusiastically embraced, and which
still retain currency in psychology. If you buy this package of ideas, then
creating a test you *call* a "narcissism tests" *automatically*
measures narcissism. This is because (i) you're measuring something (by
Stevens's definition of "measurement") and (ii) "narcissism is what you measure
with a narcissism test, just like length is what you measure with a ruler"
(operational definitions). You may then go on
to correlate narcissism with the
number of times someone uses a first-person singular pronoun (or, if you're
really sophisticated, throw both variables into a path model), which presumes
that arithmetic on narcissism scores means anything. You may do this
blissfully unconcerned with whether there really is a one-dimensional variable
across people which your test responds to, or whether it makes any sense to say
that the difference between Irene's neuroticism and Joey's is twice as great as
that between Joey and Karl. The mystery is how this appallingly bad notion
became so entrenched within an academic field.

The mystery deepens when it's made clear, as Michell is at pains to , that
much better ideas about measurement are certainly available, and indeed are
targeted specifically at the problems of psychological measurement. This was
the work of Patrick Suppes, R. D. Luce and collaborators in the 1960s and
1970s, culminating in David Krantz et
al.'s Foundations
of Measurement. This alternative tradition explored questions like
"what are some sets of properties which are strong enough to ensure that a
variable can be represented by real numbers, with arithmetic operations being
meaningful?", "what kinds of structure are *necessary* for
representation by real numbers?", and "how can the presence of such structure
be *tested*, without presuming any particular numerical representation?"
This theory allows, for example, for situations where two variables can only be
measured together
("conjoint
measurement"), as, to pick an example not at all at random, with using
scores of multiple people on multiple tests to get at *both* "how smart
is the test-taker?" and "how hard is the test?". In other words, this
sub-field used serious mathematics to enable researchers to answer questions
like "is this narcissism test really capturing a one-dimensional attribute with
quantitative structure?" That is, this line of work on the representational
theory of measurement makes (at least some aspects of) the validity of specific
forms measurement into scientific, rather than metaphysical, problems. One
would then naturally expect (at least some) scientists in the relevant field,
namely psychology, to address those scientific problems. This has
conspicuously *not* happened.

Thus, a propos Spearman's "two-factor" theory of intelligence, where scores on an intelligence test which is "homogeneous" for subject matter are supposed to be a sum of general ability, plus subject-matter specific ability, Michell (correctly) writes

The theory of conjoint measurement applies directly to any theory of this form, and, in doing so, brings out clearly (i) that such a theory could be mistaken in its requirements that the relevant attributes be quantitative, and (ii) that, as a consequence, in the absence of relevant evidence, confidence that these attributes are measurable is misplaced.In this instance there are three further requirements necessary to apply conjoint measurement theory: a theory of problem solving capable of distinguishing homogeneous from non-homogeneous tests; some way of identifying values of general ability that is independent of test scores, some way of identifying values of specific ability, also independently of test scores, and some way of, first, identifying and, then, controlling other relevant causes, so that the features of the data diagnostic of additive structure are not swamped by error. These are matters that require the theory [of general intelligence] to be elaborated well beyond its present state. [pp. 206--207]

Now, following what Michell is at pains to show is an ancient, indeed a
Euclidean, tradition, he wants to reserve the word "measurement" for
determining the magnitudes of "quantities", basically variables isomorphic to
the positive real numbers. (The paradigm case is length, with mass in a
supporting role.) He traces the history of this conception, and the fact that
the pioneers of both psychophysics (Fechner) and mental testing (Spearman;
interestingly, not Binet) thought they were conducting measurement in this
sense. He also traces mathematical work, especially by Hölder, which
illuminates what structure is required for such measurement to be possible.
(This discussion led me to learn
of very cool work
by Norbert Wiener in
1914--1921, working out the sort of order or structure that can be constructed
by combining relations of the form "just noticeably stronger than" and "not
noticeably stronger than". This apparently had no impact at all, not least
because he
presumed readers
already familiar with Principia Mathematica.) Michell
investigates how the psychologists felt compelled to produce measurement in
this sense, and *also* confronted obvious difficulties in showing that
psychological attributes or variables were, in fact, quantitative.

Michell then surveys developments in philosophy and methodology which loosened some of the strictness of the old, Euclidean notion of measurement, but which also paved the way for the representational work he likes so much --- ideas in which numbers are used to represent aspects of the world, and "representation" involves similarity of structure, e.g., isomorphism. (Bertrand Russell's early work, around 1900, features here, and Michell doesn't care for it too much. I think he would find Russell's later thinking, e.g., in The Analysis of Matter, more congenial.) This, in Michell's telling, prepared the way both for Stevens, and for the representationalist work of Suppes, Luce and co.

Controversies with physicists about whether psychologists could
really *measure* the intensity of sensations made it clear that a
defense of psychologists' existing *practices* of quantifying was
needed. This Stevens delivered in spades. (Michell makes it clear that
Stevens's 1946 paper seriously distorted the views of participants in the
earlier debates, though his evidence leaves open, to my mind, the question of
whether the distortion was deliberate or merely ignorant.) His definition,
which truly has all the advantages of theft over honest toil, has been part of
the canon of psychology ever since.

As is probably clear, I find Michell's story largely convincing. There are
places where I think he is not an ideal guide. Some of these are minor quirks
(he has bees in his bonnet about the reality of numbers, and about the
importance of mathematizing late Scholastics like Oresme, despite their
conspicuous failure to perform any measurements [1]). Others are a bit more troubling --- I am pretty sure
that according to his favored definitions, *position* does not count as
a quantitative variable, because it is a three-dimensional vector. You could
say that each coordinate of the vector is a length (or a difference in
lengths), hence a quantity
*sensu Michell*, but that depends on a certain system of coordinates.
It's not clear to me that coordinate-free expressions for laws relating
vectors, as we find
in differential-geometric
formulations of physics, would thus count as "quantitative" for Michell.
Worse, *rotations* around different axes do not combine additively,
indeed they don't even commute (like as combining position vectors does). Now,
if the psychologists had shown that the (supposed) Big Five personality factors
were really a 5-vector, or isomorphic to a subgroup
of SO(4),
or anything remotely like that, I think Michell would have to admit they had
done everything he could possibly want by way of establishing that these
variables can be *measured*. Indeed, sequencing a genome *seems*
very much like a form of measurement, and the representation there is just a
sequence of categorical values, of "nominals" (A, C, G, T). If psychologists
could show that possible personalities were (isomorphic to) strings from some
regular grammar, we should all be very happy, and never mind additivity.

Thus I don't get what Michell has against broadening the definition of
"measurement" to something like "finding mathematical representations of
attributes that preserve the empirical structure of the attributes, and
ascertaining the correct representations of particular cases". If, to preserve
continuity with tradition, he reserves "measurement" for numerical magnitudes,
then we need some other word for the broader notion. (Maybe "observation"?)
Of course, psychologists *have* said that intelligence, narcissism,
etc., are one-dimensional magnitudes, and they they *haven't* (for the
most part) expressed interest in other forms of mathematical structure, so
however the terminology shakes out, Michell's main critical points stand. This
really ought to be read by anyone interested in psychological (or social)
measurement.

[1]: See, briefly, Alfred W. Crosby, The Measure of Reality: Quantification and Western Society, 1250--1600, pp. 56--69. ^

xvi + 246 pp., bibliography, index

In print as a hardback, ISBN 0-521-62120-8, and as a paperback

History of Science

Philosophy of Science

Cognitive Science;

Mathematics;

Debunking

27 February 2018; small wording fixes, 13 April 2018