Attention conservation notice: Puffery about a new manuscript on the statistical theory of some mathematical models of networks. In the staggeringly unlikely event this is actually of interest to you, why not check back later, and see if peer review has exposed it all as a tissue of fallacies?

A new paper, which I flatter myself is of some interest to those who care about network models, or exponential families, or especially about exponential families of network models:

- CRS and Alessandro Rinaldo, "Consistency under Sampling of Exponential Random Graph Models", arxiv:1111.3054 [As of 2012, forthcoming in Annals of Statistics]
*Abstract*: The growing availability of network data and of scientific interest in distributed systems has led to the rapid development of statistical models of network structure. Typically, however, these are models for the entire network, while the data consists only of a sampled sub-network. Parameters for the whole network, which is what is of interest, are estimated by applying the model to the sub-network. This assumes that the model is consistent under sampling, or, in terms of the theory of stochastic processes, that it defines a projective family. Focusing on the popular class of exponential random graph models (ERGMs), we show that this apparently trivial condition is in fact violated by many popular and scientifically appealing models, and that satisfying it drastically limits ERGM's expressive power. These results are actually special cases of more general ones about exponential families of dependent random variables, which we also prove. Using such results, we offer easily checked conditions for the consistency of maximum likelihood estimation in ERGMs, and discuss some possible constructive responses.

*Obligatory Disclaimer*: Ale didn't approve this post.

This started because Ale and I shared an interest in
exponential family random graph models
(ERGMs), whose basic idea is sheer elegance in its simplicity. You want to
establish some distribution over graphs or networks; you decree some set of
functions of the graph to be the sufficient statistics; and then you make the
log probability of any given graph proportional to a weighted sum of these
statistics. The weights are the parameters, and this is an exponential
family.^{1}. They inherit all of the wonderfully convenient
mathematical and statistical properties of exponential families in general,
e.g., finding the maximum likelihood estimator by equating
expected and observed values of the sufficient statistics. (This
is also the maximum entropy distribution, though I set little store by that.) They are also,
with judicious choices of the statistics, quite spiffy-looking network models.
This paper by Goodreau et
al., for instance, is exemplary in using them to investigate teenage
friendship networks and what they can tell us about general social mechanisms,
and deserves a post of its own. (Indeed, a half-written post sits in my drafts
folder.) This is probably the best class of statistical models of networks now
going, which I have happily taught and recommended to students, with a special
push for `statnet`.

What Ale and I wanted to do was to find conditions under which maximum
likelihood estimation would be consistent --- when we saw more and more data
from the same source, our estimates of the parameters would come closer and
closer to each other, and to the truth. The consistency of maximum likelihood
estimates for *independent* observations is classical, and but networks,
of course, are full of *dependent* data. People have proved the
consistency of maximum likelihood for some kinds of models of time series and
of spatial data, but those proofs (at least the ones we know) mostly turned on
ordering or screening-off properties of time and space, lacking in arbitrary
graphs. Those which didn't turned on the "blocking" trick, where one argues
that widely-separated events are nearly independent, and so approximates the
dependent data by independent surrogates, plus weak corrections. This can work
with random fields *on* networks, as
in this
excellent paper by Xiang and Neville, but it doesn't seem to work for
models *of* networks, where distance itself is endogenous.

I remember very distinctly sitting in Ale's office on a sunny October
afternoon just over a year ago^{2}, trying to find some way of making
the blocking trick work, when it occurred to us that maybe the reason we
couldn't show that estimates converged as we got more and more data from the
same ERGM was that the very idea of "more and more data from the same ERGM" did
not, in general, make sense. *What* exactly prompted this thought I do
not recall, though I dare say the fact that we had both recently
read Lauritzen's book on
sufficiency, with its emphasis on repetitive and projective structures, had
something to do with it.

The basic point is this. Suppose we observe a social network among (say) a
sample of 500 students at a high school, but know there are 2000 students in
all. We might think that the whole network should be described by some ERGM or
other. How, however, are we to estimate it from the mere sample? Any graph
for the whole network implies a graph for the sampled 500 students, so the
toilsome and infeasible, but correct, approach would be to enumerate all
whole-network graphs compatible with the observed sample graph, and take the
likelihood to be the sum of their probabilities in the whole-network ERGM. (If
you do not strictly know how large the whole network is, then I believe you are
strictly out of luck.) This is not, of course, what people actually do.
Rather, guided by experience with problems of survey sampling, regression, time
series, etc., they have assumed that the *same* ERGM, with the same
sufficient statistics and the same parameter values, applies to both the whole
network and to the sample. They have assumed, in other words, that the ERGMs
form a

Once you recognize this, it turns out to be straightforward to show that
projectibility imposes very strong restrictions on the sufficient statistics
--- they have to obey a condition about how they "add up" across sub-graphs
which we called^{3} "having separable increments". This condition is
"physically" reasonable but not automatic, and I will not attempt to write it
out in HTML. (Read the paper!) Conversely, so long as the statistics have
such "separable increments", the exponential family is projectible. (Pinning
down the converse was the tricky bit.) Once we have this, conditions for
consistency of maximum likelihood turn out to be straightforward, as all the
stuff about projectibility implies the *change* to the statistics when
adding new data must be unpredictable from the old data. The sufficient
statistics themselves form a stochastic process with independent increments,
something for which there is a lot of convergence theory. (This
does *not* mean the data must be independent, as we show by example.)
All of these results prove to be perfectly general facts about exponential
families of dependent variables, with no special connection to networks.

The punch-line, though, is that the most commonly used specifications for ERGMs all include — for good reasons! — statistics which break projectibility. Models with "dyadic independence", including the models implicit or explicit in a lot of community discovery work, turn out to be spared. Anything more sophisticated, however, has got a very real, though admittedly somewhat subtle, mathematical pathology. Consistency of estimation doesn't even make sense, because there is no consistency under sampling.

We have some thoughts on where this leaves statistical models of networks, and especially about how to actually move forward constructively, but I will let you read about them in the paper.

*Update*, next day: fixed typos, clarified a sentence and added a
reference.

1: Or if, like me, you were brought up in statistical mechanics, a Boltzmann-Gibbs ensemble, with the statistics being the extensive thermodynamic variables (think "volume" or "number of oxygen molecules"), and the parameters their conjugate intensive variables (think "pressure" or "chemical potential of oxygen"). If this line of thought intrigues you, read Mandelbrot.

2: With merely a year between the idea and the submission, this project went forward with what is, for me, unseemly haste.

3: We couldn't find a name for the property the statistics needed to have, so we made one up. If you have encountered it before, please let me know.

Posted at November 14, 2011 22:00 | permanent link