The Bactra Review: Occasional and eclectic book reviews by Cosma Shalizi   141

What Is Intelligence?

Beyond the Flynn Effect

by James R. Flynn

Expanded edition

Cambridge University Press, 2009 (first edition, 2008)

The Domestication of the Savage Mind

[A much shorter version of this review appeared in American Scientist 97:3 (May-June 2009): 244.]

In 1980, James Flynn wrote a book called Race, IQ, and Jensen, where he tried to assess the then-current state of the IQ controversy, especially the claim, prominently pushed by Arthur Jensen, that the mean IQ differences between black and white Americans were due to the former being hereditarily dumber than the latter, rendering all attempts to change the situation futile (at best). The book was a valuable exercise in clarification, but Flynn, like many people, found the IQ literature unpleasant, and in his preface he swore that he was going to ignore the whole matter forever after.

Fortunately, Flynn broke this oath, and went on to write a series of papers, culminating in the now-classic "Massive IQ Gains in 14 Nations: What IQ Tests Really Measure" (Psychological Bulletin 101 (1987): 171--191), establishing the phenomenon that Charles Murray and Richard Herrnstein later named "the Flynn effect". In every country where we can find records of consistent IQ tests given to large numbers of people, scores have been rising as far back as the records go, in some cases to the early 20th century, and by large amounts, sometimes (e.g., for draftees in the Netherlands) as much as twenty IQ points every thirty years. This book is Flynn's attempt to explain this phenomenon, and explore some of implications of that explanation.

To explain Flynn's hypothesis, I first need to talk about how IQ scores are calculated, which will also explain how the Flynn effect went unnoticed for so long. (He did have a few predecessors.) By convention, IQ tests are designed so that the mean score is 100 points, the standard deviation is 15 points, and the scores follow a Gaussian probability distribution, the now-infamous bell-shaped curve. At least, all of this is true of a norming or reference sample of test-takers, when the test is put together; they are hoped to be representative of future test-takers. Scores on individual questions are weighted and added up, and then transformed, as the distribution of raw scores is quite skewed rather than symmetrically bell-shaped. In essence, the IQ scores of future test-takers is computed by seeing where their raw scores fall in the distribution of the original reference sample, and reading off the corresponding Gaussian value. There are wrinkles — e.g., some test-makers set the standard deviation to be 16 or even 24 points — but those are the basics.

Two test-takers who give exactly the same set of answers to the same questions can thus get different IQ scores, if they are normed against different reference samples. Test-makers periodically re-norm their tests against new samples, keeping the mean at 100, but that mean score can represent very different levels of absolute performance. Flynn's discovery came from intelligence tests which had been consistently given with the same sets of questions over time, and where the raw scores had been recorded. What he found is that someone who gets an IQ score of 100 today gets more questions right than did someone who got a score of 100 in 1950, who in turn answered more right than did someone with a score of 100 in 1900. The exact rate of gain depends on the country and on the test, from a high of 6--7 IQ points per decade to a low of only a few points over a half-century. A rough summary is that measured IQ has been rising at, conservatively, 3 points per decade for as far back as the data go, across the industrialized world. This rate is enough that someone who had an IQ of 100 in 1900 would have had an IQ of only 70 in 2000 — low enough to be classified as mentally retarded, and so, in the US, exempt from capital punishment, as being incapable of fully understanding their own actions. (Flynn's chapter 6, aptly titled "IQ Gains Can Kill", is devoted to the implications of that fact, but space precludes going into it here.)

A number of explanations have been suggested for the Flynn effect, most of which Flynn swats down with little trouble. It is just too large, too widespread, and too steady, to be due to improved nutrition, greater familiarity with IQ tests, or (a personal favorite) hybrid vigor from mixing previously-isolated populations, all of which have been seriously proposed. Nobody seems to have bit the bullet and suggested that modern societies have natural or sexual selection for higher IQ; but the numbers wouldn't add up in any case.

The Flynn effect seems to imply at least one of two things: either our ancestors of a century ago were astonishingly stupid, or IQ tests measure intelligence badly. Flynn contends that our ancestors were no dumber than we are, but that most of them used their minds in different ways than we do, to which IQ tests are more or less insensitive; we have become increasingly skilled at the uses of intelligence IQ tests do catch. Though he doesn't put it this way, he thinks that IQ tests are massively culturally biased, and that the culture they favor has been imposed on the populations of the developed countries (and, increasingly, the rest of the world) through a far-reaching, sustained and successful campaign of cultural imperialism and social engineering.

This can be seen in Flynn's discussion of a hypothetical, but typical, test question: "How are rabbits and dogs alike?" Answers like "both are raised on farms", "both come in breeds with different colors", "both are eaten by people in some parts of the world and kept as pets in others", "both have claws", "both can destroy gardens", and Flynn's example answer, "you can use dogs to hunt rabbits" are true, but not what IQ testers look for. (Even the answer "they're not alike, in any way that matters" could be sensibly defended.) The test-makers want you to say "both are mammals". What the testers look for, in other words, is not knowledge of the concrete world or of functional relationships, but mastery of one set of abstract concepts, which the test-makers themselves have internalized as highly-trained scientific professionals and literate intellectuals.

All thought involves some degree of abstraction, but IQ testers, like intellectuals in general, tend to value abstraction as such. For instance, a (now-dropped) item on the standard WISC test for chilren was "What do liberty and justice have in common?", scored as follows: "2 points for the answer that both are ideals or that both are moral rights, 1 point for both are freedom, 0 for both are what we have in America. The examiner is told that 'freedoms' gets 1 point while 'free things gets 0 because the latter is a more concrete response" (pp. 27--28). Flynn does not inform us how to score a response like "Things America will never restore while it remains shackled by political correctness", which, agree or disagree, would definitely show more thought than the rote response "moral values".

As well as preferring answers which show familiarity with our current scientific concepts, IQ tests also reward certain kinds of problem-solving abilities, what Flynn describes as solving "problems not solvable by mechanical application of a learned method" (p. 53; I don't think he really means to deny the possibility of AI). Prime examples, to his mind, are things like tests of similarities and analogies, and pattern-completion tests like Raven's Progressive Matrices. In the latter, each question consists of a series of line drawings, followed by a choice of several extra drawings from which the test-taker is supposed to pick the one that completes or finishes the sequence. (See here for an example.) Raven hoped that his test would be a fairly pure measurement of ability to "educe relations", i.e., to discover patterns, which he regarded as the essence of intelligence. Raven's test is often said to be subject to little or no cultural bias (a claim resting on basically no evidence whatsoever). Yet it is on tests of this type that the Flynn effect is strongest, 5 points per decade at the least. Below them come similarities and analogies tests of the rabbit/dog kind. Scores on vocabulary, arithmetic and general-information tests, on the other hand, show the lowest rates of improvement, and even some small declines.

Flynn refers to these transformations in how we think as "liberation from the concrete" and "putting on scientific spectacles". His claims that the Flynn effect is a consequence of the changes in how people live and what skills they cultivate brought about by the industrial revolution. We now overwhelmingly keep dogs as pets, not to hunt, and we go to schools where we are not just taught to read but to think abstractly, and to use a common set of abstractions. Flynn refers here to the well-known work done by the great Soviet psychologist A. R. Luria in the 1930s, described in the latter's Cognitive Development: Its Social and Cultural Foundations (1974). Luria claimed to show, by means of fieldwork among peasants and nomads in Uzbekistan, that the kind of abstract reasoning skills Flynn points to developed in tandem with literacy, schooling, and participation in the modern economy. While Luria's work has flaws (an Uzbekistani peasant who had abstract reasoning skills, confronted in the 1930s by a Russian Communist official asking them strange and leading questions, had many excellent reasons to play dumb), his findings are broadly consonant with later work on cross-cultural psychology.

At a larger scale, there is a connection, which Flynn does not draw, to the investigations of historians and sociologists into links between industrialization, nationalism and schooling. Americans may recall that our public schools were consciously used to make this country a melting-pot, to turn the descendants of immigrants from dozens of countries with many languages and cultures into a more-or-less unified people. Similar processes took place in the late 19th and early 20th centuries in all the developed countries — and, somewhat later, took off in the rest of the world. Governments and educated classes sought, in historian Eugen Weber's phrase, to turn "peasants into Frenchmen" — or into Dutchmen, Germans, Italians, Poles, Serbs, Russians, etc.; at the time Luria worked, the Soviet government was busy turning peasants into Uzbeks.

Out of the blooming, buzzing confusion of local dialects and traditions, intellectuals invented (or, as they saw it, codified) standardized literary languages and "ancient folk customs", which they then propagated through state-organized universal education and the new mass media. Simultaneously, they took modes of thinking which previously had been the reserve of their own small minority of literate specialists and made them part of everyone's education. As the sociologist Ernest Gellner emphasized, this was not just an exercise in cultural domination. An industrial economy constantly creates new jobs and destroys old ones, so learning a trade, probably one's father's, by immersion from childhood won't work any longer; more generic and so more abstract training is required. In an industrial society, people constantly face strangers and novelties. Action then cannot be guided by custom and familiar context, but instead by explicit impersonal rules, cultural conventions shared across whole countries rather than single villages, and original thought and decision. An industrial society is one in which the whole economically effective population has to deal with machines and with written communications, again with minimal help from context, and where a large fraction of workers must have some mastery of the abstract, scientific concepts which make industrial technologies comprehensible. Finally, in an industrial society everyone routinely deals with large bureaucracies (when privately owned we call them "corporations"), and actually most people work within them. All of this points towards not just standardized and literate cultures, but also one which reward abstract thinking, and even more a change of atitudes, to be willing or even eager to follow arbitrary-seeming abstract rules with no immediate point or relevance, just because a person in authority tells you to do so.

Again, this did not create new ways of thinking so much as spread ones which had existed for a few millennia but been very rare. If you had asked medieval scholars like Averroes or William of Ockham "how is a rabbit is like a dog?", they would have replied that rabbits and dogs are both species of the genus "quadruped animals". (Ockham might have quibbled about the difference between names and things.) They were already "liberated from the concrete", but they used a somewhat different system of abstractions than we do. William Gibson once said that "the future is already here, it just isn't widely distributed yet"; the same was once true of this aspect of the present.

If this is right, two consequences follow for IQ tests. First, schooling should increase IQ scores. Though Flynn does not address this, the best estimates (e.g. those of Winship and Korenman) show that, in contemporary American samples, each additional year of secondary education increases IQ by, on average, between 2 and 4 points. (These estimates ignore school quality, but they do control for early-childhood IQ, and so for the possibility that kids with lower IQ leave school earlier.) If — and it is a big if! — this holds over time as well as in cross-section, to account for the US Flynn effect, educational attainment would have had to have risen by one year per decade, which is a bit more than it actually did.

Second, IQ scores gains should not be equal across different tests, but rather should be vary depending on the content of the tests, being highest in those which rely most on mastering abstract taxonomies and on-the-spot problem-solving. This is, precisely, where the gains are highest. They are lowest in tests like arithmetic, vocabulary, and general information, i.e., questions of the form "What is the capital of Argentina?"

That such trivia-quiz questions appear in tests which supposedly gauge mental ability brings us to the question Flynn poses in his title. He begins well, correctly saying that the task is to take a pre-theoretical notion and try to shape it into something which is a moving part in a theoretical explanatory mechanism. His pre-theoretical notion, following Jensen, is that "intelligence" means "how well and how quickly someone learns"; the most intelligent person is the one who learns best and fastest. This is plausible, at least to my ears, but also not the only possible choice. John Dewey, for instance, said intelligence was the "capacity to estimate the possibilities of a situation and to act in accordance with [that] estimate". This also sounds plausible — it's the intelligence of Odysseus, the man who is never at a loss — but it would lead to a rather different theory. After all, "the people who learn best and fastest are the people who always know what to do" is not a tautology!

Still, let's give Flynn and Jensen this, and even suppose (as they do implicitly) that there's no trade-off between learning well and learning quickly; it doesn't follow that this is a single attribute. Who learns best and fastest depends on what is being learned, on what is already known, on how people try to learn, on how (if at all) others try to teach them, etc. Flynn knows this, of course, and asserts that intelligence consists of the combination of "(1) mental acuity ... (2) habits of mind ... (3) attitudes ... (4) knowledge and information ... (5) speed of information processing ... (6) memory". (He does not say how he came up with this list, and gives no attention to the cognitive science literatures on any of these topics.) He also claims that in a narrow sense intelligence is just mental acuity, "the ability to provide on-the-spot solutions to problems we have never encountered before". There may, for all I know, be one such ability, completely independent of problem content, but it's not obvious, and it's conceivable, though perhaps false, that the first item on Flynn's inventory doesn't actually exist, though the others do.

The flaw in this aspect of Flynn's book doesn't turn on that point, however, so much as the way that he basically stops with the inventory. This is not a mechanism but a sketch of a mechanism's outline, and it does no work at all. It says that "Jack solved all the Raven's Matrices problems because he is very intelligent" means "Jack solved all the Raven's Matrices problems because he has a lot of ability to provide solutions to problems", which as an explanation is no better than "The pill put Jack to sleep because it has a lot of dormitive ability". The most charitable take would be that such statements might focus our attention on what needs explaining.

Though Flynn's attempt to explicate intelligence doesn't go very far, it at least points in the direction of an explanatory theory and a substantive account of what is and is not relevant to its variables. This is far superior to the current practice in IQ testing (very much subscribed to by Jensen, among others), which fetishizes certain statistical methods, especially the data-reduction tool called "factor analysis". Starting with measurements of different variables which are correlated with one another, factor analyses mathematically construct new, unobserved variables, the "factors", which can reproduce the observed correlations. Specifically, the model supposes that the observed variables are directly correlated solely with the factors, and only indirectly correlated with each other. If this works, one can reduce many measured values to estimates of a few factors, without losing information about the correlations.

Looking at the components of an IQ test (arithmetic, vocabulary, general information, analogies, Raven's, etc.), one finds that they are all positively correlated — those who do well on one tend to do better on the others — and the usual factor-analytic methods produce a "general factor", or g, with which each sub-test is more or less positively correlated. To simplify slightly but not unfairly, in current practice what makes something an IQ test is that it correlates sufficiently strongly with things which are already accepted as IQ tests and so with g, and what makes something a good IQ test question is that it correlates with other, accepted IQ test questions and with g. To correlate it has to vary, so "What is the capital of Argentina?" might work as an IQ item in North America or South Africa, but not very well in Argentina.

As data reduction, factor analysis is harmless, but there has always been a temptation to "reify" the factors, to suppose that factor analysis discovers the hidden causal structure which generates the observations. This is a temptation which many psychologists, especially IQ-testers, have failed to resist, even eagerly embraced. Flynn protests the "conceptual imperialism" of g. He correctly insists that factor analysis (and related techniques, like item response theory) at most finds patterns of correlation, and these arise from a complicated mixture of our current social arrangements and priorities and actual functional or causal relationships between mental abilities. Factor analysis is helpless to separate these components, and gives no reason to expect that "factor loadings" will persist. Indeed, the pattern of Flynn-effect gains on different types of IQ test is basically unrelated to the results of factor analysis.

But really the whole enterprise rests on circularities. It's mathematically necessary that any group of positively-correlated variables has a "positively loaded" general factor. (This follows from the Perron-Frobenius theorem of linear algebra.) A sub-test is "highly g loaded" if and only if it is comparatively strongly correlated with all the other tests; or, to adapt a slogan, positive correlation does not imply common causation. (Saying "Jack solved all the Raven's problems because he had high scores on many other tests which are positively correlated with scores on Raven's" is even more defective as an attempted explanation than attributing sleep to a dormitive power.) Since IQ test questions are selected to be positively correlated, the appearance of g in factor analyses just means that none of the calculations was botched. The only part of the enterprise which isn't either a mathematical tautology or true by construction are the facts that (1) it is possible to assemble large batteries of positively-correlated questions, and (2) the test scores correlate with non-test variables, though more weakly than one is often led to believe. Flynn does not make this argument, and some of his remarks suggest he still attributes too much inferential power to factor analysis, though he correctly says that it has contributed little to our understanding of the brain or cognition.

After a century of IQ testing, there is still no theory which says which questions belongs on an intelligence test, just correlational analyses and tradition. This is no help in deciding whether IQ tests do measure intelligence, and so whether the Flynn effect means we are becoming smarter. If we accept Flynn's idea that intelligence is how well and how quickly we learn, an IQ test is an odd way to measure it. None of the tests, for instance, set standardized learning tasks and measure the performance achieved within a fixed time. At best they gauge the success of past learning, which could indirectly measure how well and how quickly people learn if we presume that the test-takers had similar opportunities to learn the material they're being tested on. Even then it would be confounded with things like executive function and current and past motivation. For instance, in 1998 Lovaglia et al. (American Journal of Sociology 104: 195--228) did an experiment where they took groups of college students and spent fifteen minutes creating a situation in which either the right- or left- handed students could expect to be better-rewarded for their efforts and abilities; the favored hand was randomly varied by the experimenters. This consistently made students in the favored group score about 7 IQ points higher on Raven's Matrices than those in the disfavored group. That is, a quarter of an hour of motivational priming can be worth a decade or more of the Flynn effect.

By now, the reader may be protesting that, after all, at least the more mathematical questions on IQ tests are objective. This mistakes the issue. If asked to continue the sequence "1, 1, 2, 3, 5", most readers would recognize the Fibonacci sequence and say "8". But there are infinitely many other sequences where the next number is 7 (e.g., pick the largest prime number less than or equal to the sum of the previous two numbers), or for that matter 11 (the smallest prime number greater than or equal to, etc.). Similarly, what Raven's matrices test is not how well you can "educe relations", but how well you can find the patterns Raven liked — personally, I can solve such puzzles only by guessing what was going through the test-maker's mind. In either case, to even begin to respond appropriately requires certain culturally-transmitted cognitive tools, and the motivation to use them on command.

This, and my re-phrasing of Flynn in terms of cultural bias and imperialism, may have given the wrong impression. (I admit to some deliberate provocation.) I am thoroughly committed to the kind of culture IQ tests favor, as I suspect are most of my readers, because that culture has much to recommend it. Knowing that rabbits and dogs are both mammals is a different kind of knowledge than knowing that you can use dogs to hunt rabbits, and our kind of knowledge grants both a deeper understanding of the world and (when embedded into a vast division of labor) greater power over the world. Progress of many kinds is difficult or impossible without scientific knowledge and the habits of abstract thought which go with it. Spreading this kind of thinking is a Good Thing, and worth a lot of effort. It's just that it's also true that thinking this way entails a specific kind of culture, and we do no one any favors by confusing this, our favorite use of the mind or exercise of intellect, with thinking or intelligence as such.

That mistake is particularly tempting because of how we use IQ tests. Up through the nineteenth century, intellectuals' feelings about the prospect of democracy mostly ranged from ambivalence to terror, even in France and the United States; the masses, they said, were incapable of thinking, and letting them rule, rather than be led, was full of peril. "Meritocracy" was a later compromise with democracy: there would still be elite rulers, but they could be recruited on the basis of objectively-assessed merit, rather than mere birth. (This ideal helped institutionalize IQ testing, including such modified IQ tests as the SAT.) What Flynn's arguments suggest is that these fears and hopes were at most half-right. The masses were, back in the day, mostly very bad at thinking like intellectuals; they were not bad at managing their own affairs. (The twentieth century was over-supplied with disasters, but few of them can be blamed on democratic decision-making, and plenty on the actions of elites.) Meritocracy, as Flynn says, is an incoherent ideal — even if we agreed on "merit", and allocated rewards on that basis once, the meritorious would use some of their resources to give their kith and kin more than those people merited. But spreading educational opportunities and opening up positions of influence to broader peaceful competition has been widely beneficial.

If Flynn is right, the issue of how many picture-puzzles different vintages of teenage Dutch boys could solve is actually a window through which we can see a momentous change, the "liberation from the concrete", not just among a few clerics and scribes, but as the common condition of humanity. This book has flaws, some of which I have indicated above, others of which I could expand upon (the self-indulgent sections on postmodernism and relativism; the weird naivete about people like Arthur Jensen and Charles Murray), but these are not that significant. It would almost be damning this book with faint praise to say it's a valuable addition to the IQ debate (though it is); it's an important take on what we have made of ourselves over the last few centuries, and might yet make of ourselves in the future.

xii + 216 pp., line drawings, bibliography, index (weak)

Cognitive Science / Education

Currently in print as a paperback, ISBN 9780521741477, US$18.99 [Buy from Powell's]

With thanks to those who helped me shorten the original; and to Dave Bacon for typo-catching

27 April 2009