Notebooks

Random Boolean Networks, Nk Networks

Last update: 11 Dec 2024 09:54
First version: 21 June 2006; major expansion 25 November 2024

Attention conservation notice: Anything useful I know about this I learned from Bill Tozier, many years ago now, but he's not to blame for my misapprehensions and bad judgment calls.

Somewhat confusingly, Stuart Kauffman invented two models which are both called "the Kauffman's \( Nk \) model", because in both, there are $N$ genes, each of which interacts, somehow, with $k$ others.

In the older model, which is also referred to as "random Boolean networks", goes as follows. Time is discrete, and at each time $t$, each of the $N$ genes is either active/expressed (1) or inactive/un-expressed (0). The state of each gene at time $t+1$ is a function of the state of $k$ genes at time $t$, and those $k$ genes are chosen at random. This randomness is however "quenched", i.e., the random partners are chosen once and for all at time $t=0$ and then don't change. In symbols: \( X_i(t) \) is the state of gene $i$ at time $t$, and \( \mathbf{A} \) is an $N \times N$ matrix, where \( A_{ij} = 1 \) if gene $j$ is an input into the state of gene $i$. The matrix is random, subject only to the constraint that, for each $i$, \[ \sum_{j}{A_{ij}} = k \] Define $P(i)$ as the set of all $j$ where \( A_{ij} = 1 \), and \( X_{P(i)}(t)\) as the vector of their states at time $t$. Then the dynamics are given by \[ X_i(t+1) = F_i(X_{P(i)}(t)) \] where \( F_i \) is a binary-valued function of $k$ binary arguments, and the capital letter is an indication that this, too, is chosen randomly at the beginning of time.

To appreciate what's going on here, a little history may help. Kauffman first published this model in 1969. Just a few years before, the Nobel Prize in physiology had gone to Jacob, Lwoff and Monod for (basically) establishing how gene expression is regulated --- how some genes work to inhibit or allow the expression of other genes. In the specific model system Jacob and Monod worked with, expression was indeed pretty much binary, and the expression of one gene was controlled by that of a few other genes, and by environmental conditions; you could write it as a small Boolean expression. (The point of the circuit is to make the enzyme for eating lactose only when lactose is present, and no better food is available.)

Kauffman's model kept the binary states, and the small Boolean expressions, but dropped the adaptation-to-the-environment part. * The model then asks what happens when you have a lot of such overlapping and interacting circuits --- what patterns do they produce? (For instance, it's easy to see that any state must either be part of a periodic cycle, or be a transient, never-revisited state which eventually feeds into a cycle. [The cycle might have period 1, i.e., be a fixed point.] How many cycles are there? What's the distribution of periodicities? How stable are they under small perturbations?) Phenomena which appear with high probability in these random Boolean networks are in some sense easy for Boolean networks to exhibit. Said differently, if your favorite Boolean network doesn't display those phenomena, it is weird and rare and probably more or less engineered / selected to avoid them.

(You can at this point begin introducing all sorts of variations: is \( X_i(t) \) always an input into \( X_{i}(t+1) \), or never, or only with probability $1/N$? [Jacob and Monod's circuit wasn't self-referential this way, but other ] Is updating synchronous, as I've implicitly assumed, or asynchronous, and what kind of asynchrony? Is the updating deterministic, as I've assumed above, or just probabilistic, and if so with what probabilities? Etc., etc.)

The other Kauffman \( Nk \) model, more properly the Kauffman-Levin model, is about fitness landscapes, in the sense of evolutionary genetics. Again, each gene is one of two states, say \( g_i \) for gene $i$, but now those are two genetic variants, two "alleles". Each gene interacts with $k$ other genes to make an additive contribution to the fitness of that genotype: \[ \Phi(g) = \sum_{i}{\phi_i(g_i, g_{P(i)})} \] where the values of the functions $\phi_i$ are drawn iidly from some fixed distribution. The questions now concern stuff like "how many local maxima are there?", "how easy is it to escape a local maximum under different sorts of search/evolution?", "are the local maxima clustered in any interesting way in the search space?", etc., etc. But this is not a model of dynamics at all. (Though trying to optimize on one of these landscapes can introduce lots of complicated dynamics...)

*: The original paper presents this as temporary but harmless simplification:

Since the autonomous, undriven behavior of a system must be elucidated before the effect of exogeneous [sic] inputs can be understood, I have studied the behavior of switching nets free of external inputs. A bacterium in a constant environment undergoes autonomous changes in the concentrations of molecular species, and the sea urchin, in a similarly homogeneous surrounding, develops in an orderly sequence of states from its zygote. Since constant external input to a net is equivalent to a similar net held free of external input, stable oscillations of chemical species and cell differentiation seem to be largely autonomous behaviors of metabolic nets." [p. 440]
I do wonder if this was every really intended as a mere preliminary , for two reasons. One is that Kauffman's later thought amplified themes of second part of my quotation, very much emphasizing the internal dynamics of living things, to the point of neglecting adaptation to the environment. (Or so it seems to me; but he's a great biologist and I very much am not.) Second, it was the 1960s, structuralism was in the air, and self-referential dynamics of binary signs is about as structuralist as you can get. ^


Notebooks: