August 22, 2007

Making a Suitable Match

Last summer, I was conned into arranging happily volunteered to organize a small session on complex networks at the Joint Statistical Meeting, and naturally invited people whose only real connection was that they study networks and I found their work interesting. I am immensely pleased to be able to say that one outcome of this is a new joint paper between two of the speakers, who I dare say would not have collaborated if I hadn't brought them together in Seattle:

Jörg Reichardt and Douglas R. White, "Role models for complex networks", arxiv:0708.0958
Abstract: We present a framework for automatically decomposing ("block-modeling") the functional classes of agents within a complex network. These classes are represented by the nodes of an image graph ("block model") depicting the main patterns of connectivity and thus functional roles in the network. Using a first principles approach, we derive a measure for the fit of a network to any given image graph allowing objective hypothesis testing. From the properties of an optimal fit, we derive how to find the best fitting image graph directly from the network and present a criterion to avoid overfitting. The method can handle both two-mode and one-mode data, directed and undirected as well as weighted networks and allows for different types of links to be dealt with simultaneously. It is non-parametric and computationally efficient. The concepts of structural equivalence and modularity are found as special cases of our approach. We apply our method to the world trade network and analyze the roles individual countries play in the global economy.

Roughly speaking, the idea behind the (horribly-named) "generalized block models" of networks is that the nodes (people, countries, companies, proteins) can be broken up into a certain number of discrete classes, or blocks. All of the nodes in a given block have the same kind of pattern of connectivity, meaning that they connect to other nodes in other blocks, or within the block in the same kind of ways. Put like this, it sounds (like many network concepts) viciously self-referential, but then so does page-rank. Still, its resemblance to some aspects of structuralist anthropology is no accident, as many sorts of kinship-groups and castes are similarly defined by how they relate to other kinship-groups and castes, which in turn are defined by their relations. [Update: see below.] The conventional ethnographic approach to figuring out such groupings and their relationships might be caricatured as a combination of asking your native informants how they see the matter, combined with more or less inspired guess-work on the part of the ethnographer. This does not work so well when dealing with, say, protein networks, since it is hard to find a native informant among the proteins. Even with social networks, it is not at all obvious that the members understand all the different roles people play in the network (though they may think they do), so block-modelers generally rely less on self-presentations, and more on trial-and-error efforts to find blocks that fit the connectivity patterns.

Rather than floating off into the empyrean, what Jörg and Doug have done here is to develop an algorithm for automatically extracting a set of roles and their inter-relations from the observable ties in the network. While their idea of structurally equivalent roles comes from mathematical theories of social structure (see below), the algorithm grows out of earlier work by Jörg and Stefan Bornholdt on decomposing networks into nearly-independent communities or modules (cond-mat/0402349 and cond-mat/0603718), itself part of a recent burst of work on that problem by statistical physicists.

What Jörg and Stefan realized is that the problem of finding modules could be mapped on to a problem in statistical mechanics, called the Potts model. Imagine that each node can have one of a certain number of colors. When two nodes share the same color, they are in the same module or community. The Potts model also has nodes with multiple colors, which can interact in one of two ways. Two nodes can have an attractive ("ferromagnetic") interaction, which means it is energetically favorable for them to have the same color; or they can have a repulsive ("antiferromagnetic") interaction, which means it is energetically favorable for them to have different colors. The problem then is to find the assignment of colors to nodes which gives the best over-all value of the energy. This is hard to do exactly, because different interactions can push the same node in different directions (cf.), but one can quite rapidly find configurations which come close to the minimum, through the magic of simulated annealing. (If you play around with this Java applet, cooling it slowly, you get a bit of a sense of how that works.)

In the original modularity papers, what Jörg and Stefan did was introduce an attractive interaction between two nodes if they were more tightly linked to each other than average, and a repulsive interaction if they were less tightly linked. (See their papers for a more exact statement.) This let them very quickly discover significant modular structure in very large networks. It also nicely adapts to more information-theoretic notions of "connection".

In the new paper, they employ essentially the same idea. Given a "role model" of blocks and their connections, links, or lacks of links, which follow what one expects from the model are energetically rewards; those which deviate from their assigned roles are energetically dis-favored. Optimizing the total energy then gives the best-fitting assignment of nodes to roles. (Community discovery re-emerges as a special case, when each block ideally connects only to itself.) This still leaves the problem of discovering the roles and their relations from the data; here they use what I can only describe as a very clever trick, which I will not attempt to explain in this space.

As an illustration, they apply this new method to analyzing a part of the international flow of commodities. Amusing though it is to see patterns of combined and uneven development pop out of adjacency matrices, I keep wondering what could be done with this technique and good data on brains...

Incidentally, while Jörg is a statistical physicist, Doug has been studying social networks, and doing anthropological fieldwork, for longer than I've been alive; so I'd very much like to know whether their paper would be a black dot or a white one in the graph showing the near-disconnection of the two approaches to networks.

Update, that afternoon: Just to clarify, block modeling, and the notions of structural and regular equivalence underlying it, are autonomous developments of sociology and anthropology in the 1970s and 1980s, and they do in fact grow out of structuralist anthropology. In Doug's words (from an e-mail which he was kind enough to let me quote):

The first definition that I know of for structural equivalence, which is what we measure, was in an MA thesis by Fran&ccdeil;ois Lorrain, 1968, Ecole des Hautes Etudes en Sciences Sociales, Paris, under Guilbaud and Lévi Strauss, showing up an English version in Parts I and II, 1968 and 1969 MS, Harvard, "Tools for the Formal Study of Networks." and in Lorrain's Harvard PhD thesis (1969). From there it travelled to Lorrain and Harrison C. White's 1971 article while Lorrain was a graduate student at Harvard ["Structural Equivalence in Social Networks", Journal of Mathematical Sociology 1: 49--80], then to H. White et al under the name of blockmodeling... and on through a long series of sociological publications and those in related fields, only a few of which we cite in our paper. ** Lorrain was working on a more general algorithm of for "correlative classes" defined not in their own terms, or in terms of attributes, but in terms of relations that connect their elements, a concept found in Ossowski (1963, Class Structure in the Social Consciousness, Routledge and Kagan Paul). Now called "regular equivalence" as in my 1983 paper with Karl Reitz (a term linked to earlier work of Claude Flament), algorithms for computing regular equivalence (or "correlative") classes were also developed in the fields of logic and computer science (e.g.. by Higgens (1971), working with groupoids and category theory, similar to the work on Lorrain, and again by Stark (1972 in automata theory) under the name of dynamic logic (Harel et al. 2002). Mark and Masuch's (2003) article in Social Networks provides a review and further formalization.

Significantly, the weightings used in Reichardt and my Potts' method for finding role model structures in a network of multiple relations can be adjusted to identify regular equivalence of positions.

So, never let it be said that structuralist anthropology was completely barren of fruit; it's just the productive part had to do with social organization and not with culture — as Ernest Gellner used to say, who you can marry, as opposed to what to wear to the wedding.

Networks; Incestuous Amplification

Posted at August 22, 2007 12:10 | permanent link

Three-Toed Sloth