Notebooks
http://bactra.org/notebooks
Cosma's NotebooksenGraph Limits and Infinite Exchangeable Arrays
http://bactra.org/notebooks/2024/04/17#graph-limits
<P>An exchangeable random sequence \( X \) is a sequence of random variables,
\( X_1, X_2, \ldots \), whose distribution is invariant under permutation of
the indices. All such sequences are formed by taking mixtures of independent
and identically-distributed (IID) sequences.
(See <a href="exchangeable.html">Exchangeable Random Sequences</a>.) An
exchangeable random <em>array</em>, \( G \), is simply a matrix or array of
random variables \( G_{ij} \) whose distribution is invariant under permutation
of row and column indices (\( i \) and \( j \)). I mostly care about what are
sometimes called <bf>jointly exchangeable</bf> arrays, where the same
permutation is applied to the rows and the columns. If we can apply different
permutations to rows and to columns and still get invariance in distribution,
then the process is <bf>separately exchangeable</bf>; this however does not
interest me so much, for reasons which I hope will be clear in a moment.
<P>As I said, infinite-dimensional exchangeable distributions are all formed by
taking mixtures of certain basic or <bf>extremal</bf> distributions, which are
the infinite-dimensional IID distributions. To generate an exchangeable
sequence, one first randomly draws an probability law from some prior
distribution, and then draws from that law independently until the end of time.
(Again, see <a href="exchangeable.html">Exchangeable Random Sequences</a>.) Is
there an analogous set of extremal distributions for exchangeable arrays?
Well, yes, or else I wouldn't have asked the question...
<P>It's easiest to understand what's going on if we restrict ourselves to
binary arrays, so \( G_{ij} \) must be either 0 or 1. One very important
instance of this --- or at least one I use a lot --- makes \( G \)
the <strong>adjacency matrix</strong> (or "sociomatrix") of
a <a href="complex-networks.html">network</a>, with \( G_{ij} = 1 \) if there is
a edge linking \( i \) and \( j \), and \( G_{ij}=0 \) otherwise.
<P>For each \( i \), draw an independent random number \( U_i \) uniformly on the unit
interval. Now, separately, fix a function \( w(u,v) \) from the unit square
\( {[0,1]}^2 \) to the unit interval \( [0,1] \), with the symmetry \( w(u,v) = w(v,u) \).
Finally, set \( G_{ij} = 1 \) with probability \( w(U_i, U_j) \), independently
across <strong>dyads</strong> \( ij \) . Conditional on the \( U_i \), all edges are
now independent (though not identically distributed). Moreover, \( G_{ij} \) and
\) G_{kl} \) are independent, unless the indices overlap. (However, \( G_{ij}\) and \( G_{kl} \) can be dependent given, say, \( G_{jk} \).) But edges with nodes
in common are not independent, nor are edges identically distributed, unless
the function \( w \) is constant almost everywhere. Call the resulting
stochastic graph \( G \) a \( w \)-random graph.
<P>(Using the unit interval for the \( U \) variables is inessential; if we have a
measurable mapping \( f \) from \( [0,1] \) to any other space, with a measurable
inverse \( f^{-1} \), then set \( V_i = f(U_i) \), and
\[
\Pr{\left(G_{ij} = 1\right)} = w^{\prime}(V_i, V_j) = w(f^{-1}(U_i),f^{-1}(U_j)) ~.
\]
So if you really want to make the variable for each node a 7-dimensional
Gaussian rather than a standard uniform, go ahead.)
<P>What are some examples of \( w \)-random graphs? Well, as I said, setting \( w \)
to a constant, say \( p \), does in fact force the edges to be IID, each edge being
present with probability \( p \), so the whole family of Erdos-Renyi random graphs,
i.e., random graphs in the strict sense, is included. Beyond this, a simple
possibility is to partition the unit interval into sub-intervals, and force \( w \)
to be constant on the rectangles we get by taking products of the
sub-intervals. This corresponds exactly to what the sociologists call
<a href="stochastic-block-models.html">"stochastic block models"</a>, where
each node belongs to a discrete type or <strong>block</strong> of nodes (=
sub-interval), and the probability of an edge between \( i \) and \( j \) depends only
on which blocks they are in. <a href="community-discovery.html">Community- or
module- discovery</a> in networks is mostly based on the assumption that not
only is there some underlying block model, but that the probability of an
intra- block connection is greater than that of an inter- block edge, no matter
the blocks; that is, \( w \) is peaked along the diagonal. Since every measurable
function can be approximated arbitrarily-closely by piecewise-constant "simple
functions", one can in fact conclude that every \( w \)-random graph can be
approximated arbitrarily closely (in distribution) by a stochastic block model,
though it might need a truly huge number of blocks to get an adequate
approximation. This also gives an easy way to see that two different \( w \)
functions can give rise to the same distribution on graphs, so we'll ignore the
difference between \( w \) and \( w^{\prime} \) if \( w(u,v) = w^{\prime}(T(u), T(v)) \),
where \( T \) is an invertible map from \( [0,1] \) onto \( [0,1] \) that preserves the
length of intervals (i.e., preserves Lebesgue measure). The reason we ignore
this difference is that \( T \) just "relabels the nodes", without changing the
distribution of <em>graphs</em>.
<P>It's not hard to convince yourself that every \( w \)-random graph is
exchangeable. (Remember that we see only the edges \( G_{ij} \), and not the
node-specific random variables \( U_i \).) What is very hard to show, but is in
fact true, is that the distribution of every infinite exchangeable random graph
is a mixture of \( w \)-random graph distributions. Symbolically, the way to
produce an infinite exchangeable graph is always to go through the recipe
\[
\begin{eqnarray*}
W & \sim & p\\
U_i|W & \sim_{\mathrm{IID}} & \mathcal{U}(0,1)\\
G_{ij}| W, U_i, U_j &\sim & \mathrm{Bernoulli}(W(U_i,U_j))
\end{eqnarray*}
\]
for some prior distribution \( p \) over \( w \)-functions.
<P>In the exchangeable-sequence case, if all we have is a single realization of
the process, we cannot learn anything about the prior distribution over IID
laws. (Similarly, if we have only a single realization of a stationary
process, we can only learn about the one ergodic component that realization
happens to be in, though in principle we can learn everything about it.) If we
have only a single network to learn from, then we cannot learn anything about
the prior distribution \( p \), but we can learn about the particular \( W \) that it
generated, and that will let us extrapolate to other, currently-unseen parts of
the network.
<P>Here is where a very interesting connection comes in to what at first sight
seems like a totally different set of ideas. Suppose I have a sequence of
graphs \( G^1, G^2, \ldots \), all of finite size. When can I say that this
sequence of graphs is converging to a limit, and what kind of object is its
limit?
<P>Experience with analysis tells us that we would like converging objects to
get more and more similar in their various properties, and one important set of
properties for graphs is the appearance of specific sub-graphs,
or <strong>motifs</strong>. For instance, when \( G_{ij} = G_{jk} = G_{ki} = 1 \),
we say that \( i,j,k \) form a triangle, and we are often interested in the
number of triangles in \( G \). More broadly, let \( H \) be some graph with fewer
nodes than \( G \), and define \( m(H,G) \) to be the number of ways of mapping \( H \)
onto \( G \) --- picking out nodes in \( G \) and identifying them with nodes in \( H \)
such that the nodes in \( G \) have edges if and only if their counterpart nodes in
\( H \) have edges. (In a phrase, the number of homomorphisms from \( H \) into \( G \).)
The maximum possible number of such mappings is limited by the number of nodes
in the two graphs. The <strong>density</strong> of \( H \) in \( G \) is
\[
t(H,G) \equiv \frac{m(H,G)}{{|G| \choose |H|}}
\]
If \( H \) has more nodes than \( G \), we define \( m(H,G) \) and \( t(H,G) \) to be 0.
(Actually, there are a couple of different choices for defining the allowed
mappings from \( G \) to \( H \), and so for the normalizing factor in the
denominator of \( t \), but these end up not making much difference.)
<P>We can now at last define convergence of a graph sequence: \( G^1, G^2, \ldots
\) converge when, for each motif \( H \), the density sequence \( t(H,G^1),
t(H,G^2), \ldots \) converges. There are several points to note about this
definition:
<ol>
<li> If, after a certain point \( n \), the graph sequence becomes constant, \( G^n
= G^{n+m} \), then the sequence converges. This is a reasonable sanity-check on
our using the word "convergence" here.
<li> A sequence of isomorphic graphs (i.e., ones which are the same after some
re-labeling of the nodes) has already converged, since they all have the same
density for every motif. So the definition of convergence is insensitive to
isomorphisms. This is good, in a way, because isomorphic graphs really are the
same in a natural sense, but bad, because deciding whether two graphs are
isomorphic is <a href="http://en.wikipedia.org/wiki/Graph_isomorphism_problem">computationally non-trivial</a>, and may even be <a href="http://bactra.org/reviews/nature-of-computation.html">NP-complete</a>.
<li> If the sequence of graphs keep growing, then convergence of the sequence
implies convergence not of the <em>number</em> of edges, triangles, four-stars,
etc., but of their suitably-normalized densities.
<li> The definition is strongly analogous to that of "convergence in
distribution" (a.k.a. "weak convergence") in probability theory. A sequence of
distributions \( P^1, P^2, \ldots \), converges if and only if, for every bounded
and continuous function \( f \), the sequence of expected values
\[
P^i f \equiv \int{f(x) dP^{i}(x)}
\]
converges. Densities of motifs act like bounded and continuous "test
functions".
<li> The limit of a sequence of graphs is not necessarily a graph.
Analogously, the limit of a sequence of discrete probability distributions,
like our empirical distribution at any \( n \), is not necessarily discrete —
it might be a distribution with a continuous density, a mixture of a continuous
and a discrete part, etc.
The people who developed the
theory of such graph limits called the limiting objects
<strong>graphons</strong>. Roughly speaking, graphons are to graphs as general
probability distributions are to discrete ones.
</ol>
<P>How are graphons represented, if they are not graphs? Well, they turn out
to be representable as symmetric functions from the unit square to the unit
interval, i.e., \( w \)-functions! It is easy to see how to turn any finite
graph's adjacency matrix into a \( 0-1 \)-valued \( w \)-function: divide the unit
interval into \( n \) equal segments, and make \( w \) 0 or 1 on each square depending
on whether the corresponding nodes had an edge or not. Call this \( w_G \). It
turns out, through an argument I do not feel up to even sketching today, that
the density \( t(H,G) \) can be expressed as an integral, which depends on \( H \)
with respect to \( w \)-function derived from \( G \):
\[
t(H,G) = \int_{[0,1]^{|H|}}{\prod_{(i,j)\in H}{w_{G}(u_i,u_j)} du_1 \ldots du_{|H|}}
\]
This carries over to the limit: if the sequence \( G^n \) converges, then
\[
\lim_{n\rightarrow\infty}{t(H,G^n)} = \int_{[0,1]^{|H|}}{\prod_{(i,j)\in H}{w(u_i,u_j)} du_1 \ldots du_{|H|}}
\]
for some limiting function \( w \). (If you are the kind of person who finds the
analogy to convergence in distribution helpful, you can fill in this part of
the analogy now.) We identify the limiting object, the graphon, with the
limiting \( w \)-function, or rather with the equivalence class of limiting
\( w \)-functions.
<P>To sum up: If we start with an infinite exchangeable graph distribution,
then what gets realized comes from a (randomly-chosen) extremal distribution.
But the limits of sequences of graphs are, precisely, the extremal
distributions of the family of exchangeable graphs. So we would seem to have
the kind of nice, closed circle which makes statistical inference possible: a
sufficiently large realization becomes representative of the underlying
process, which lets us infer that process by examining the realization. What I
am <em>very much</em> interested in is how to actually use this suggestion to
do some concrete, non-parametric statistics for networks. In particular, it
would seem that understanding this would open the way to being able to smooth
networks and/or bootstrap them, and either one of those would make me very
happy.
<P>Specific points of interest:
<ol>
<li> Understand how to metrize graph convergence, and efficiently calculate
the metrics; use for tests of network difference.
<li> Suppose that the sequence of graphs \( G^n \) are <strong>sparse</strong>,
so that the number of edges per node grows less than proportionally to the number
of nodes. Then all motif densities tend to zero
and we lose the ability to distinguish between graph sequences. What is the
best way of defining convergence of sparse graphs? What does this do to the
probabilistic analogs of graphons? A huge literature has sprung up around this question (samples from it below).
<li> How does this relate to the issues of <a href="../weblog/837.html">projectibility</a>
for <a href="ergms.html">exponential-family random graph models</a>?
<li> Given a graph sequence, when can we consistently estimate the, or a,
limiting \( w \)-function? Bickel, Chen and Levina (below) define a set of
statistics whose expected values characterize the \( w \)-function and which can be
consistently estimated. This was extremely clever, but inverting the mapping
from \( w \) to those expectations looks totally intractable — and indeed
they don't even try. My own feeling is that this is more of a job for
smoothing than for the method of moments, but I'm not comfortable saying much
more, yet.
</ol>
<h4>An idea on sparsity I have failed, and am failing, to turn into something useful</h4>
\[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \]
I said above that a graph sequence is "sparse" when the number of edges per
node doesn't grow linearly with the number of nodes. The alternative
is that a graph sequence is <strong>dense</strong>, so the number of edges
per node is proportional to the number of nodes. The troublesome point
is that sequences of graphs generated from the same \( w \)-function can't be
sparse, in this sense. To see this, pick your favorite node \( i \), of degree
\( D_i \). Then for each \( j \), \( \Expect{G_{ij} | U_i = u} = \int_{[0,1]}{w(u, v) dv} \),
and, by additivity of expectation,
\( \Expect{D_i | U_i=u} = (n-1) \int_{[0,1}{w(u, v) dv} \) when there are \( n \)
nodes. But the latent variables \( U_i \) are IID, so
\( \frac{1}{n-1}{D_i} \rightarrow \int_{0}^{1}{w(u, v) dv} \) almost surely,
conditional on \( U_i=u \), by the law of large numbers. Since this holds for all
\( u \), \( \frac{1}{n-1}{D_i} \rightarrow \int_{[0,1]^2}{w(u,v) du dv} \) almost
surely as well.
And, unless the \( w \) function is 0 almost everywhere, this limiting density is \( >0 \). Thus the degree of any given node will grow proportionately to the number
of nodes.
<P>Some people find this a disturbing prospect in an asymptotic framework for
network analysis. After all, if I look at larger and larger samples of a
collaboration network, it doesn't <em>seem</em> as though everyone's degree
should keep growing in proportion to the number of nodes --- that every
doubling in the number of scientists should on average double everyone's
degree. On Mondays, Wednesdays and Fridays I share this unease. On Tuesdays
and Thursdays, I remind myself that our data-collection processes bare little
resemblance to this story, and that anyway asymptotics is all about
approximation. After all, no <em>one</em> graph is "dense" or "sparse" in this
sense. (On the weekend I try not to think about the issue.)
<P>But, as I said above, the Aldous-Hoover theorem tells us that every
exchangeable random graph is either a \( w \)-random graph, or a mixture of
\( w \)-random graphs --- and the mixtures will still give us dense graph
sequences, so that's no escape. What this implies that if you think
growing sparse graph sequences is a desirable property in a network model,
you need to abandon exchangeability. What we should use instead of
exchangeability is still very much an open question.
<P>Here is one idea which I have been toying with for a number of years,
without getting very far. I put it out here now in case anyone else can make
something of it; if you do, I'd appreciate an acknowledgment.
<P>In the ordinary time-series / random sequence world, a very natural symmetry
that's weaker than exchangeability (= invariance under permutation) is
stationarity (= invariance under translation). I <em>suspect</em> we may be
able to do something with stationary rather than exchangeable random graphs.
For a random sequence, we say that it's stationary if for every block length
\( k \) and translation \( h \), the sub-sequence \( (X_1, X_2, \ldots X_k) \) and the
sub-sequence \( (X_{h+1}, X_{h+2} \ldots X_{h+k}) \) have the same distribution.
(But this doesn't require that \( (X_1, X_2) \) and \( (X_2, X_1) \) have the same
distribution, which exchangeability does.) So we could say that a graph is
stationary, with respect to a certain ordering of the nodes, when, for every
\( k \) and \( h \), the sub-graph formed by nodes \( 1:k \) and that formed by nodes
\( (h+1):(h+k) \) are equal in distribution. This would preserve the notion that
(in probability) the graph "looks the same everywhere", without requiring
the extremely strong form of this that \( w \)-random graphs do.
<P>The program then would be to answer the following questions:
<ol>
<li> What are the extremal distributions with this symmetry like? (The extremal distributions for exchangeable sequences are IID sequences; for exchangeable random graphs, \( w \)-random graphs; for stationary sequences, stationary and ergodic sequences; etc.).
<li> With a characterization of the extremal distributions in hand, in what sense can sequences of individual graphs converge on those limits? (This would presumably be some sort of ergodic theorem.)
<li> <em>Can</em> distributions over graphs with this symmetry produce sparse graph sequences?
</ol>
<P>I will just say a little, here, about the third item. From the way I've set
up the symmetry in the distribution, the <em>expected</em> number of edges
within any group of \( k \) nodes that are contiguous in the given order has to be
the same --- we get the same expected number of edges among nodes 1--5 as among
6--10 as among 501017--501021. So that's a contribution to the expected number
of edges which is growing proportionately to the number of nodes. Moreover, by
considering contiguous groups of length \( 2k \), we see that the expected number
of nodes between groups of length \( k \) is also going to grow proportionately to
\( n \). But there doesn't seem to be any reason why that number of between-group
edges couldn't be considerably less than the number within the groups. In
particular, it'd seem like we could have a lower and lower probability of edges
between nodes which are further and further apart in the ordering. So
I <em>think</em> it should be possible to get distributions over sparse
graph sequences which obey this symmetry.
<P>There's also another possible notion of "stationarity", which would go as
follows. Pick our favorite node \( i \), and define its "radius 1" neighborhood as
the subgraph among \( i \) and its neighbors. We'll could say that the
distribution is radius-1 stationary if the distributions for the radius-1
neighborhood of any two nodes \( i \) and \( j \) are equal (up to isomorphism). We
define the radius-\( k \) neighborhood around \( i \) recursively, as the subgraph of
all nodes whose distance to \( i \) is \( \leq k \), and similarly stationarity out to
radius \( k \), and finally over-all stationarity as stationarity out to
arbitrarily large radii. (This is a little bit more like how we define
stationarity for <a href="random-fields.html">random fields</a>.) I find this
notion a bit less satisfying, because it seems more dependent on the
randomly-generated graph, but on the other hand my first notion invoked an
ordering of the nodes pulled from (to be polite) the air.
<P>Finally, I should add that my own contribution to the sparse-graph-models
literature, with Neil Spencer, doesn't invoke either of these notions of
symmetry --- we <em>started</em> with a latent-space generative model
and showed it had good properties. (It's stationarity in the latent space.)
Tackling the issue from the side of the symmetry is, as I said, something I've
played around with for some years, but haven't made much headway with, hence
this addition to this notebook.
<ul>See also:
<li><a href="mixtures-of-processes.html">Characterizing Mixtures of Processes by Summarizing Statistics</a>
<li><a href="exchangeable.html">Exchangeable Random Sequences</a>
<li><a href="graph-theory.html">Graph Theory</a>
</ul>
<ul>Recommended, big picture:
<li>Christian Borgs, Jennifer Chayes, László Lovász, Vera T. Sós, Balázs Szegedy and Katalin Vesztergombi,
"Graph Limits and Parameter Testing", <cite>Proceedings of the 38th Annual {ACM} Symposium on the Theory of Computing [STOC 2006]</cite>, pp. 261--270
[<a href="http://research.microsoft.com/en-us/um/people/jchayes/Papers/TestStoc.pdf">PDF reprint</a> via Dr. Chayes]
<li>Persi Diaconis and Svante Janson, "Graph Limits and Exchangeable Random Graphs", <cite>Rendiconti di Matematica e delle sue Applicazioni</cite>
<strong>28</strong> (2008): 33--61, <a href="http://arxiv.org/abs/0712.2749">arxiv:0712.2749</a>
<li>Olav Kallenberg, <cite>Probabilistic Symmetries and Invariance
Principles</cite> [Chapter 7 has the best treatment of exchangeable arrays I've
seen. The key results are due to Aldous and Hoover in the early 1980s, but
their proofs are notoriously hard, and Kallenberg provided the first "natural",
probabilistic proofs.]
<li>Laszlo Lovasz
<ul>
<li>"Very large graphs", <a href="http://arxiv.org/abs/0902.0132">arxiv:0902.0132</a>
<li><cite><a href="http://bactra.org/weblog/algae-2013-07.html#lovasz">Large Networks and Graph Limits</a></cite>
</ul>
<li>Steffen L. Lauritzen
<ul>
<li>"Exchangeable Rasch Matrices", <cite>Rendiconti di Matematica e delle sue Applicazioni</cite> <strong>28</strong> (2008): 83--95
[<a href="http://www.stats.ox.ac.uk/~steffen/papers/rendiconti.pdf">PDF reprint</a> via Prof. Lauritzen]
<li>"Exchangeable Matrices and Random Networks",
[<a href="http://www.stats.ox.ac.uk/~steffen/teaching/grad/arrays.pdf">PDF slides</a>; earlier lectures (<a href="http://www.stats.ox.ac.uk/~steffen/teaching/grad/definetti.pdf">1</a>, <a href="http://www.stats.ox.ac.uk/~steffen/teaching/grad/partial.pdf">2</a>) probably useful for context]
</ul>
<li>Patrick J. Wolfe, Sofia C. Olhede, "Nonparametric graphon estimation", <a href="http://arxiv.org/abs/1309.5936">arxiv:1309.5936</a>
</ul>
<ul>Recommended, close-ups, the general theory of graph limits and exchangeable random graphs:
<li>David J. Aldous, "Representations for partially exchangeable arrays of random variables", <a href="https://doi.org/10.1016/0047-259X(81)90099-3"><cite>Journal of Multivariate Analysis</cite> <strong>11</strong> (1981): 581--598</a>
<li>Christian Borgs, Jennifer Chayes and László Lovász, "Moments of Two-Variable Functions and the Uniqueness of Graph Limits",
<a href="http://dx.doi.org/10.1007/s00039-010-0044-0">Geometric and Functional Analysis</cite> <strong>19</strong> (2010): 1597--1619</a> [<a href="http://research.microsoft.com/pubs/101953/4-Unique-GAFA.pdf">PDF preprint</a>]
<li>Christian Borgs, Jennifer Chayes, László Lovász, Vera T. Sós and Katalin Vesztergombi, "Convergent Sequences of Dense Graphs I: Subgraph Frequencies, Metric Properties and Testing", <cite>Advances in Mathematics</cite> <strong>219</strong> (2008): 1801--1851 [<a href="http://research.microsoft.com/en-us/um/people/jchayes/Papers/ConvMetric.pdf">PDF reprint</a> via Dr. Chayes]
<li>Christian Borgs, Jennifer Chayes, László Lovász, Vera T. Sós and Katalin Vesztergombi, "Convergent Sequences of
Dense Graphs II: Multiway Cuts and Statistical Physics" [<a href="http://research.microsoft.com/en-us/um/people/borgs/Papers/ConRight.pdf">PDF preprint</a> via Dr. Borgs]
<li>Olav Kallenberg, "On the representation theorem for exchangeable arrays",
<a href="https://doi.org/10.1016/0047-259X(89)90092-4"><cite>Journal of Multivariate Analysis</cite> <strong>30</strong> (1989): 137--154</a>
<li>Steffen Lauritzen, "Harmonic Analysis of Symmetric Random Graphs", <a href="http://arxiv.org/abs/1908.06456">arxiv:1908.06456</a> [This is an alternative way of getting to graphons and graph limits, by exploiting a correspondence between exchangeable distributions and "characters" on Abelian semi-groups, i.e., functions which act like exponentials. From this point of view,
graphons are a more natural (generalized) <a href="exponential-family.html".exponential family</a> for networks than are <a href="ergms.html">exponential-family random graphs</a>. This relates to work Prof. Lauritzen has done on statistical sufficiency and generalized exponential families in other areas of statistics, linked to below.]
<li>Laszlo Lovasz, Balazs Szegedy, "Limits of dense graph
sequences", <a href="http://arxiv.org/abs/math/0408173">arxiv:math/0408173</a>
[The original graph-limits paper. Note especially theorem 2.5, which shows
that the probability of \( t(H,G^n) \) being very different from the limiting
value is exponentially small in \( n \).]
</ul>
<ul>Recommended, close-ups, graphon estimation:
<li>Edoardo M. Airoldi, Thiago B. Costa, Stanley H. Chan, "Stochastic blockmodel approximation of a graphon: Theory and consistent estimation", <a href="http://arxiv.org/abs/1311.1731">arxiv:1311.1731</a>
<li>Peter J. Bickel, Aiyou Chen, and Elizaveta Levina, "The method of moments and degree distributions for network models", <a href="http://projecteuclid.org/euclid.aos/1321020525"><cite>Annals
of Statistics</cite> <strong>39</strong> (2011): 38--59</a>, <a href="http://arxiv.org/abs/1202.5101">arxiv:1202.5101</a>
<li>Stanley H. Chan, Edoardo M. Airoldi, "A Consistent Histogram Estimator for Exchangeable Graph Models", <a href="http://arxiv.org/abs/1402.1888">arxiv:1402.1888</a>
<li>Sourav Chatterjee, "Matrix estimation by Universal Singular Value Thresholding", <a href="http://arxiv.org/abs/1212.1247">arxiv:1212.1247</a>
<li>David S. Choi, Patrick J. Wolfe, "Co-clustering separately exchangeable network data", <a href="http://arxiv.org/abs/1212.4093">arxiv:1212.4093</a>
<li>Olav Kallenberg, "Multivariate Sampling and the Estimation Problem for Exchangeable Arrays", <a href="http://dx.doi.org/10.1023/A:1021692202530"><cite>Journal of Theoretical Probability</cite> <strong>12</strong> (1999): 859--883</a>
<li>James Robert Lloyd, Peter Orbanz, Zoubin Ghahramani and Daniel M. Roy, "Random function priors for exchangeable arrays with applications to graphs and relational data", <a href="http://books.nips.cc/papers/files/nips25/NIPS2012_0487.pdf">NIPS 2012</a>
<li>M. E. J. Newman and Tiago P. Peixoto, "Generalized communities in networks", <a href="http://dx.doi.org/10.1103/PhysRevLett.115.088701"><cite>Physical Review Letters</cite> <strong>115</strong> (2015): 088701</a>, <a href="http://arxiv.org/abs/1505.07478">arxiv:1505.07478</a>
</ul>
<ul>Recommended, close-ups, the issue of sparsity:
<li>Christian Borgs, Jennifer T. Chayes, Henry Cohn, and Yufei
Zhao, "An \( L^p \) Theory of Sparse Graph Convergence I: Limits, Sparse Random Graph Models, and Power Law Distributions", <a href="http://arxiv.org/abs/1401.2906">arxiv:1401.2906</a>
<li>Christian Borgs, Jennifer Chayes and David Gamarnik, "Convergent sequences of sparse graphs: A large deviations approach", <a href="http://arxiv.org/abs/1302.4615">arxiv:1302.4615</a> [Defining the limit of a sequence of sparse
graphs in terms of <a href="large-deviations.html">large deviations</a> of random measures on them]
<li>Francois Caron, Emily B. Fox, "Bayesian nonparametric models of sparse and exchangeable random graphs", <a href="http://arxiv.org/abs/1401.1137">arxiv:1401.1137</a>
<li>David Gamarnik, "Right-convergence of sparse random graphs", <a href="http://arxiv.org/abs/1202.3123">arxiv:1202.3123</a>
</ul>
<ul>Recommended, close-ups, tangents touched on above:
<li>J. F. C. Kingman, "Uses of Exchangeability",
<a href="https://doi.org/10.1214/aop/1176995566"><cite>Annals of Probability</cite> <strong>6</strong> (1978): 183--197</a>
<li>Steffen L. Lauritzen
<ul>
<li>"Extreme Point Models in Statistics" (with discussion), <cite>Scandinavian Journal of Statistics</cite> <strong>11</strong> (1984): 65--91
[<a href="http://www.jstor.org/pss/4615945">JSTOR</a>]
<li><cite>Extremal Families and Systems of Sufficient Statistics</cite> [<a href="../weblog/algae-2010-09.html#lauritzen">Mini-review</a>]
</ul>
</ul>
<ul>Recommended, close-ups, not otherwise classified but no less valuable on that account:
<li>Peter J. Bickel and Aiyou Chen, "A nonparametric view of network
models and Newman-Girvan and other
modularities", <a href="http://dx.doi.org/10.1073/pnas.0907096106"><cite>Proceedings
of the National Academy of Sciences</cite> (USA) <strong>106</strong> (2009):
21068--21073</a> [This is the paper which introduced me, and many others in the
network area, to the possibility of using graph-limit and exchangeable-array
theory, but in retrospect it is by no means an easy read.]
<li>Sourav Chatterjee, Persi Diaconis and Allan Sly,
"Random graphs with a given degree sequence", <a href="http://projecteuclid.org/euclid.aoap/1312818840"><cite>Annals of Applied Probability</cite> <strong>21</strong> (2011): 1400--1435</a>, <a href="http://arxiv.org/abs/1005.1136">arxiv:1005.1136</a> [Interesting application of the new technology
of graph limits to a classic model. May not be terribly practical yet but definitely promising.]
<li>Sourav Chatterjee and S. R. S. Varadhan, "The large deviation principle for the Erdos-Renyi random graph", <a href="http://arxiv.org/abs/1008.1946">arxiv:1008.1946</a> [Ditto]
</ul>
<ul>Pride compels me to recommend:
<li>Lawrence Wang, <cite><a href="https://www.dropbox.com/s/czp4r8g3s89s3c8/lw_thesis.pdf?dl=0">Network Comparisons using Sample Splitting</a></citE> [Ph.D. thesis, CMU Department of Statistics, 2016]
</ul>
<ul>Modesty forbids me to recommend:
<li>Alden Green and CRS, "Bootstrapping Exchangeable Random Graphs",
<a href="https://10.1214/21-EJS1896"><cite>Electronic Journal of Statistics</cite> <strong>16</strong> (2022): 1058--1095</a>, <a href="https://arxiv.org/abs/1711.00813">arxiv:1711.00813</a>
<li>CRS, <a href="http://stat.cmu.edu/~cshalizi/networks/16-2/">36-781, Advanced Statistical Network Models</a>, fall 2016
<li>Neil Spencer and CRS, "Projective, Sparse, and Learnable Latent Position Network Models", <a href="https:.//doi.org/10.1214/23-AOS2340"><cite>Annals of Statistics</cite> <strong>51</strong> (2023): 2506--2525</a>, <a href="http://arxiv.org/abs/1709.09702">arxiv:1709.09702</a>
</ul>
<ul>To read:
<li>Miklós Abért, Tamás Hubai, "Benjamini-Schramm convergence and the distribution of chromatic roots for sparse graphs", <a href="http://arxiv.org/abs/1201.3861">arxiv:1201.3861</a>
<li>David Aldous, Russell Lyons, "Processes on Unimodular Random Networks", <a href="http://arxiv.org/abs/math/0603062">arxiv:math/0603062</a>
<li>Tim Austin, Dmitry Panchenko, "A hierarchical version of the de Finetti and Aldous-Hoover representations", <a href="http://arxiv.org/abs/1301.1259">arxiv:1301.1259</a>
<li>Itai Benjamini, Russell Lyons, Oded Schramm, "Unimodular Random Trees", <a href="http://arxiv.org/abs/1207.1752">arxiv:1207.1752</a>
<li>Béla Bollobás, Svante Janson and Oliver Riordan,
"The Phase Transition in Inhomogeneous Random Graphs"
<li>Béla Bollobás and Oliver Riordan, "Sprase
graphs: metrics and random models", <a href="http://arxiv.org/abs/0812.2656">arxiv:0812.2656</a>
<li>Marián Boguñá and Romualdo Pastor-Satorras,
"Class of correlated random networks with hidden variables",
<a href="http://dx.doi.org/10.1103/PhysRevE.68.036112"><cite>Physical Review E</cite> <strong>68</strong> (2003): 036112</a>, <a href="http://arxiv.org/abs/cond-mat/0306072">arxiv:cond-mat/0306072</a>
<li>Christian Borgs, Jennifer T. Chayes, Souvik Dhara, Subhabrata Sen, "Limits of Sparse Configuration Models and Beyond: Graphexes and Multi-Graphexes", <a href="http://arxiv.org/abs/1907.01605">arxiv:1907.01605</a>
<li>Christian Borgs, Jennifer T. Chayes, Henry Cohn, Shirshendu Ganguly, "Consistent nonparametric estimation for heavy-tailed sparse graphs", <a href="http://arxiv.org/abs/1508.06675">arxiv:1508.06675</a>
<li>Christian Borgs, Jennifer T. Chayes, Henry Cohn, László Miklós Lovász, "Identifiability for graphexes and the weak kernel metric", <a href="http://arxiv.org/abs/1804.03277">arxiv:1804.03277</a>
<li>Fan Chung, "From quasirandom graphs to graph limits and graphlets",
<a href="http://arxiv.org/abs/1203.2269">arxiv:1203.2269</a>
<li>Harry Crane, "Infinitely exchangeable random graphs generated from a Poisson point process on monotone sets and applications to cluster analysis for networks", <a href="http://arxiv.org/abs/1110.4088">arxiv:1110.4088</a>
<li>Persi Diaconis, Susan Holmes and Svante Janson, "Threshold
Graph Limits and Random Threshold Graphs", <a href="http://arxiv.org/abs/0908.2448">arxiv:0908.2448</a>
<li>Mahya Ghandehari, Teddy Mishura, "Robust recovery of Robinson Lp-graphons", <a href="http://arxiv.org/abs/2303.16598">arxiv:2303.16598</a> [We may have been scooped...]
<li>Rajat Subhra Hazra, Frank den Hollander, Maarten Markering, "Large deviation principle for the norm of the Laplacian matrix of inhomogeneous Erdos-Renyi random graphs", <a href="http://arxiv.org/abs/2307.02324">arxiv:2307.02324</a>
<li>Tue Herlau, Mikkel N. Schmidt, Morten Morup, "Completely random measures for modelling block-structured networks", <a href="http://arxiv.org/abs/1507.02925">arxiv:1507.02925</a>
<li>Brian Karrer, M. E. J. Newman, "Random graphs containing arbitrary distributions of subgraphs", <a href="http://arxiv.org/abs/1005.1659">arxiv:1005.1659</a> [Not sure if this really connects or not...]
<li>P. Latouche, S. Robin, "Bayesian Model Averaging of Stochastic Block Models to Estimate the Graphon Function and Motif Frequencies in a W-graph Model", <a href="http://arxiv.org/abs/1310.6150">arxiv:1310.6150</a>
<li>Tâm Le Minh, Sophie Donnet, François Massol, Stéphane Robin, "Hoeffding-type decomposition for U-statistics on bipartite networks", <a href="http://arxiv.org/abs/2308.14518">arxiv:2308.14518</a>
<li>A. Martina Neuman, Jason J. Bramburger, "Transferability of Graph Neural Networks using Graphon and Sampling Theories", <a href="http://arxiv.org/abs/2307.13206">arxiv:2307.13206</a>
<li>Terence Tao, "A correspondence principle between (hyper)graph theory and probability theory, and the (hyper)graph removal lemma", <a href="http://arxiv.org/abs/math/0602037">arxiv:math/0602037</a>
<li>Johan Ugander, Lars Backstrom, Jon Kleinberg, "Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large Graph Collections", <a href="http://arxiv.org/abs/1304.1548">arxiv:1304.1548</a>
</ul>
<ul>To write:
<li>CRS + co-conspirators to be named later, "Detecting Differences in Network Structure"
<li>Co-conspirators to be named later + CRS, "Smoothing Networks"
</ul>