The Bactra Review: Occasional and eclectic book reviews by Cosma Shalizi 153

Networks, Crowds, and Markets

Reasoning about a Highly Connected World

by David Easley and Jon Kleinberg

Cambridge University Press, 2010

Connecting the Dots

This review ran in American Scientist 99 (2011): 335, with some links added.

This is one of the first textbooks on what could reasonably be called "network science" — the study of networks of semi-autonomous but interdependent units and of the way those networks shape both the behavior of individuals and the large-scale patterns that emerge from small-scale interactions. This is, of course, a very broad description, and it's not at all obvious that a single book should try to explain, within a common framework, information search on the Web, the spread of epidemic diseases, patterns of scientific collaboration, and much else besides. That these topics are grouped together not by rambling paranoiacs (who find connections everywhere), but by sober, mathematically minded scientists, employing a common and coherent set of concepts, testifies to a remarkable change in perception over the past few decades among scientists and the general educated public: We now see networks everywhere.

Studies dealing with what we now recognize to be social networks go back to the years around 1900, when political economists, social reformers and muckraking journalists began looking at interlocking directorates of corporate boards and other institutions through which the ruling classes (as they were then called) coordinated their actions without actually having an executive committee. People spoke of "social circles." By the 1950s, sociologists had a notion of "social networks", a concept that had a small band of enthusiastic devotees but was an esoteric idea even within mathematical social science. Even 25 years ago, the idea of networks as a form of social organization was reasonably avant-garde. (One can trace some of this evolution in Linton C. Freeman's The Development of Social Network Analysis, but a proper history of the "network" concept has not yet been written.)

Nowadays, companies whose sole and explicit purpose is the formalization of social networks have hundreds of millions of active customers. (Although they are not often seen this way, these firms are massive exercises in centrally planned social engineering, inspired by sociological theories.) These companies rely, of course, on the Internet, which has made itself an essential part of daily life throughout the civilized world. Even humble and sometimes ancient pieces of infrastructure like telephones, electrical power and roads are naturally called networks now (whereas we used to speak more often of, for instance, "the electrical grid"). The concept of the network as a form of organization that is neither a centralized hierarchy nor completely unstructured has become incredibly widespread.

So widespread, in fact, that it is already impossible to cover all aspects of network science within one set of covers. Some selection is necessary. Given the authors' backgrounds — Easley is a professor of economics and Kleinberg a professor of computer science, both at Cornell — it is fitting that their book focuses on social networks, especially those defined by transactions, and on computer networks, with a special devotion to places where these intersect (online auctions, for example).

The first few chapters cover basic ideas about networks and their structure. Fundamentally, networks are represented as graphs that consist of nodes, or vertices, some of which are connected by links called "edges". These edges may or may not have directions, or attributes like strength or weakness, but the fundamental assumption is that the important relations between individual nodes are binary ones, or ones that can be built up out of binary relations. (So an edge can represent affection, but not jealousy — at least not so easily.) Graph theory then provides a set of mathematical tools for reasoning about networks, and for articulating theories about them.

Two prominent ideas that receive considerable attention in the early chapters are "triadic closure", or "clustering", and "structural balance". To oversimplify, triadic closure occurs when two members of a network — Irene and Karl, say — who are both friends with a third node — Joey, say — become friends themselves. This is of course common in many social networks, but it is hardly mathematically necessary, and the presence of triadic closure tells us something about the processes generating the network. (For instance, in most but certainly not all milieus, Irene and Karl are especially unlikely to become romantically involved if they have both been romantically involved with Joey in the past.) Structural balance is the observation that a situation in which Joey is a friend of both Irene and Karl, but Irene and Karl can't stand each other, is an uncomfortable and thus unstable one — it is unbalanced. In contrast, balanced triads are ones in which everyone likes everyone else or in which two people (Irene and Karl, for instance) are driven together by their common loathing for a third party (Joey). Actual social networks tend to contain vastly more balanced than unbalanced triads and thus resolve into blocks in which everyone inside the block is friends with everyone else but dislikes everyone in all the other blocks.

After these opening chapters on network structure, the authors turn to a more detailed consideration of how to model interactions between individuals. The extensive sixth chapter, which takes up about 50 of the book's 700 pages, is devoted to classical game theory, built up from scratch. What game theorists somewhat disturbingly call "rationality" is assumed throughout — in other words, game players are assumed to be hedonistic yet infinitely calculating sociopaths endowed with supernatural computing abilities. Evolutionary game theory occupies the much shorter (18 pages) chapter 7. Rather than focusing on forward-looking, calculating agents who choose strategies, this chapter centers on the strategies themselves, which agents adopt in a backward-looking, adaptive fashion. That is, the agents do more of what seemed to work well for others in the past, and less of what didn't. So evolutionary game models require only agents that can learn through imitation — not the literally superhuman computational feats of classical game theory. Thus they constitute a substantial step toward realism. (Behavioral game theory, and models that actually match extensive experimental results, do not get even a look.) After the reader is armed with game theory, chapter 8 looks at traffic flow as a game (one in which players interact by crowding the roads) and chapters 9 through 12 examine network structure and strategic interaction in several settings: auctions, matching markets, trade with networks of intermediaries and bargaining games on networks. (The chapter on bargaining does discuss actual experimental findings — and the fact that they do not agree with the kind of models laid out in the text — but rather unsystematically.)

Next come three chapters on information networks and the World Wide Web. These deal in large part with structural considerations. The chapter on searching the Web for information is excellent; it has great explanations both of the "page rank" method that launched Google and of Kleinberg's rival system of link analysis using "hubs and authorities." Roughly speaking, the page rank of a webpage reflects how many links it has from pages that have high rank themselves. This sounds like a vicious circle, but in this case the circle can, so to speak, be shrunk to a unique fixed point, which is related to the trajectory of a random walk on the Web graph. The "hubs and authorities" system is similar but distinguishes between pages based on their quality as an authority (a source of information about a particular topic, such as Bessel functions) or as a hub (an aggregator of pointers to many sources of information, such as guides to mathematics). This approach is in many ways more mathematically and conceptually elegant than page rank, and it has never been clear to me (or, I think, to Kleinberg) why it didn't take off.

Chapters 16 through 18 look at patterns of interaction and their consequences. Chapter 16 considers the phenomenon of "herding", or "information cascades", in which people facing uncertain information take the behavior or actions of others as cues to what they should do themselves, thereby amplifying perhaps very small or even entirely random or misleading signals. The upshot of the chapter is to cast doubt not just on "the wisdom of crowds," but on the whole enterprise of trusting the opinions of others — a remarkable example of arguing-against-interest for any writer, let alone textbook authors. Chapter 17 considers "network externalities", situations in which the value of a technology or other choice depends on how many others are also using it or choosing it; this is, of course, why most of us have QWERTY keyboards, and for that matter why nearly every personal computer runs either the Windows operating system or some version of Unix. Naturally, this feeds directly off the idea of cascades and herding. Chapter 18 looks at power-law distributions and the "rich get richer" stochastic processes that can generate them, which can naturally arise from cascades and from network externalities.

Chapters 19 through 21 combine dynamical and structural considerations, looking at how different aspects of network architecture can affect cascades, the spread of epidemics and rumors, and information diffusion. These chapters provide some of the clearest expositions of theories about these topics that I have ever seen, and they offer compelling examples of the insight that comes from considering differentiated networks rather than homogeneous populations.

The final three chapters (22 through 24) are fairly conventional expositions of incomplete and asymmetric information in markets; social choice theory (building up to a proof of Arrow's impossibility theorem, which states that no voting system can provide all of a set of innocent-seeming desiderata); and the "tragedy of the commons," which is presented as an argument for assigning property rights in nearly everything.

As I said at the outset, this is a textbook, rather than a contribution to the research literature or a popularization. The reader is expected to be able to follow a basic numerical argument expressed in algebraic symbols but needs to know almost nothing more about mathematics and is guided skillfully through the elements of constructing a proof, and of multiplying a vector by a matrix; even that is a special topic. The book could be taught to first-year undergraduates or even bright high school students.

Despite its many virtues, Networks, Crowds, and Markets is not without flaws. As someone who comes to networks from a background in statistical physics, I would have liked to see more about models of network structure; as someone who is now a statistician, I would have liked to see anything at all about data analysis. But the book is aimed at readers who would lack the mathematical knowledge to appreciate those topics — which fortunately are now covered by Newman's Networks: An Introduction and Kolaczyk's Statistical Analysis of Network Data, respectively.

A more serious shortcoming, to my mind, is what I can only call a certain lack of concern with the realities behind the mathematics. Any mathematical model is a story about how a part of the world works, and so one wants to know whether the story gets things right; but Easley and Kleinberg have a curious air of not caring about this. I will give three examples, which deal with topics about which I happen to know a reasonable amount.

The statistical basis for many claimed power laws is shockingly bad, and many such claims are just wrong. (It doesn't help that Easley and Kleinberg endorse a popular, but inaccurate and obsolete, technique for estimating such laws.) The relevance of models that produce clear power laws is thus more than a little obscure.
The account of the tragedy of the commons in the last chapter largely follows the famous, and rhetorically quite compelling, essay of that title by Garrett Hardin. Unfortunately for Hardin, and for the authors of this book, but fortunately for humanity, it turns out that Hardin was pretty completely wrong about commons and the range of institutions that make them work — and these errors of description and explanation are not exactly obscure, because in 2009 Elinor Ostrom won the Nobel Prize in Economics for elucidating those mechanisms, which involve social networks quite intimately.
Thomas Schelling's model of residential segregation, discussed in chapter 4, contains profound insights about social self-organization and shows how aggregate outcomes can depart drastically from what anyone intends or wants. But Schelling knew better than to say that his model explained how racial segregation in the United States arose or is maintained. To see it presented as such an explanation to students too young and callow to know better — giving Chicago, of all cities, as an example — is a remarkable, but I hope inadvertent, piece of pedagogical malpractice.

The common thread in these examples, and others I could mention, is finding a model's story so elegant and compelling that actually checking it against data seems superfluous. (How could it be wrong?) As most of us have discovered to our cost, however, infatuation is a poor guide to truth, and this good book would have been even better had its authors paid more attention to the more complicated, but empirically better-grounded, accounts of such matters available within their own disciplines.

For this is a good book. Despite my carping, it is thorough, wide-ranging, clear, intellectually serious and often thought-provoking. It will introduce students to aspects of our social life that are either newly important or whose importance has only recently become clear, and it will provide them with tools for thought that will serve them well for a long time to come. I would have no hesitation about using it in my classroom — although I would argue with it.

xvi + 727 pp., numerous black-and-white figures, bibliography, index

Hardback, ISBN 978-0-521-19533-1, US$50
A next-to-final draft is free online from the authors

Computers and Computing / Economics / The Information Society / Probability and Statistics / Self-Organization / Sociology

8 May 2011, lightly edited 8 January 2012