June 16, 2009

Chaos, Complexity and Inference: 2009 Syllabus

See this earlier post or the course homepage for more. This should be an RSS feed for this page, so you can follow updates, which will generally be posted after lectures.

Final exam
take-home, due 12 May
Lecture 28, (Thursday, 30 April): Agents 4: A real-world example of agents on networks
Hedstrom and Aberg, "Quantitative research, agent-based modelling and theories of the social", ch. 6 (pp. 114--144) in Hedstrom, Dissecting the Social [PDF]; (*) Guttorp, ch. 4
Lecture 27, (Tuesday, 28 April): Networks 5: Community Discovery
No slides
Readings: Clauset, Moore and Newman, arxiv:0811.0484 (code); Girvan and Newman, arxiv:cond-mat/0112110; Hofman and Wiggins, arxiv:0709.3512; Newman, arxiv:physics/0605087; Reichardt and Bornholdt, arxiv:cond-mat/0603718;
Reichardt and White, arxiv:0708.0958 [discussion]
Lecture 26, (Thursday, 23 April): Complex networks 4: inference for network models
Reading: Hanneke and Xing, "Discrete Temporal Models of Social Networks" [PDF]; Handcock and Jones, "Likelihood-based inference for stochastic models of sexual network formation" [PDF]; Hunter, Goodreau and Handcock, "Goodness of Fit of Social Network Models" [PDF]; Middendorf, Ziv and Wiggins, "Inferring Network Mechanisms: The Drosophila melanogaster Protein Interaction Network", arxiv:q-bio/0408010; Newman, "Structure and Function", sections IV B and V; Newman, Strogatz and Watts, "Random graphs with arbitrary degree distributions and their applications", arxiv:cond-mat/0007235; Wiuf, Brameier, Hagberg and Stumpf, "A likelihood approach to analysis of network data", Proceedings of the National Academy of Sciences (USA) 103 (2006): 7566--7570 [discussion]
Lecture 25, (Tuesday, 21 April): Agents 3: social complexity
Reading: Flake, ch. 17; Miller and Page, chs. 10--11; Krugman, ch. 6; Salganik, Dodds and Watts, "Experimental study of inequality and unpredictability in an artificial cultural market", Science 311 (2006):854--856; (*) Skyrms and Pemantle, "A Dynamic Model of Social Network Formation", arxiv:math.PR/0404101
Lecture 24, (Thursday, 9 April): Complex networks 3: contagion on networks
Reading: Guttorp, sec. 2.11; Newman, "Structure and Function", sec. VIII; Davis, Trapman, Leirs, Begon and Heesterbeek, "The abundance threshold for plague as a critical percolation phenomenon", Nature 454 (2008): 634--637; Bell, Maiden, Munoz-Solomando and Reddy, "'Mind control experiences' on the Internet: Implications for the psychiatric diagnosis of delusions", Psychopathology 39 (2006): 87--91 [PDF];
Smith and Novella, "HIV Denial in the Internet Era", PLoS Medicine 4:8 (2007): e256; (*) Kenah and Robins, "Second look at the spread of epidemics on networks", arxiv:q-bio.QM/0610057
Lecture 23, (Tuesday, 7 April): Agents 2: collective phenomena and self-organization
Reading: Flake, ch. 16; Miller and Page, ch. 9; Krugman, introduction and ch. 1; Healy, Walking to School; Perlstein, The Meaning of Box 722
Lecture 22, (Thursday, 2 April): Agent-based models 1
Reading: Miller and Page, chs. 6 and 7; Flake, ch. 12
Lecture 21, (Tuesday, 30 March): Complex networks 2: growth models
Reading: Newman, "Structure and function", sec. VII
Lecture 20, (Thursday, 26 March): Complex networks 1: basics, network properties
Definition of networks and basic terms. Pictures, emphasizing sex and death. Bipartite networks and the Galois lattice construction. The small world problem. What makes a network complex? The Erdos-Renyi model: first properties and weaknesses.
Watts, "The 'New' Science of Networks", Annual Review of Sociology 30 (2004): 243--270
Newman, "The Structure and Function of Complex Networks", SIAM Review 45 (2003): 167--256, arxiv:cond-mat/0303516 (through sec. VI, but skipping or skimming IV B and V)
Lecture 19, (Tuesday, 24 March): Inference from simulations 2: direct and indirect estimation
Slides and R
The method of simulated moments. Example with the logistic map. Issues with the MSM. The trade-off between tractable models and good models. The indirect inference trick: don't match moments, match a bad-but-tractable model. Examples of indirect inference. Convergence properties.
Reading: Smith, "Indirect Inference"; Kendall et al., "Population Cycles in the Pine Looper Moth: Dynamical Tests of Mechanistic Hypotheses"; (*) Gourieroux, Monfort and Renault, "Indirect Inference" [JSTOR]
Lecture 18, (Thursday, 19 March): Inference from simulations 1
The virtues and limits of simulations, especially stochastic simulations. Active testing: deliberately trying to break your model to gain (or lose!) confidence in it. Bootstrapping once again.
Reading: ch. 5 and appendix B of Miller and Page; Miller, "Active Nonlinear Tests (ANTs) of Complex Simulation Models"
Lecture 17, (Tuesday, 17 March): Inference in general: error statistics and severe testing
Errors and reliable inference. Statistics as principle argument: mostly arguments that certain errors are (or are not) absent or small, unless we're very unlucky. Kinds of errors in statistical inferences. Construction of reliable hypotheses: confidence sets, "probably approximately correct", more. Severe testing. Null hypotheses. Testing for mis-specification, especially by looking at residuals. The two chief world systems.
Reading: Mayo and Cox, "Frequentist statistics as a theory of inductive inference", arxiv:math.ST/0610846; Mayo and Spanos, "Methodology in Practice: Statistical Misspecification Testing" [PDF]; idem, "Severe Testing as a Basic Concept in a Neyman-Pearson Philosophy of Induction" [journal]; Spanos, "Curve-Fitting, the Reliability of Inductive Inference and the Error-Statistical Approach" [journal]
Lecture 16, (Thursday, 5 March): Heavy-tailed distributions 4: Comparing models
Slides, R
Goodness-of-fit tests: Glivenko-Cantelli, "the fundamental theorem of statistics"; the Kolmgorov-Smirnov test; Monte Carlo version of the K-S test when parameters are estimated from data; general remarks on goodness-of-fit. Relative distribution methods for comparing distributions: their virtues; using relative distributions to check power laws; HTTP file size example. Vuong's likelihood-ratio test for model selection. The flight of the albatross. Zombie-hunting.
Readings: Clauset et al. continued; Handcock and Morris, "Relative Distribution Methods"; Freckleton and Sutherland, "Do power laws imply self-regulation?"; idem, "Do in-hospital waiting lists show self-regulation?" (thanks to Nick Watkins for point out those papers); Edwards et al. "Revisiting Lévy flight search patterns of wandering albatrosses, bumblebees and deer"; (*) Vuong, "Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses"
Lecture 15 (Tuesday, 3 March) Heavy-tailed distributions 3: Estimation
The right way to do it: maximum likelihood estimation for power laws; properties of the MLE. Nonparametric bootrstrapping for error estimation. The wrong way to do it: linear regression on a log-log plot; why it should never be used. Determining the scaling range. Examples with city sizes. Non-parametric density estimation, and special considerations for heavy-tailed distributions.
Readings: Clauset, Shalizi and Newman, "Power law distributions in empirical data", arxiv:0706.1062; (*) Markovitch and Krieger, "Nonparametric estimation of long-tailed density functions and its application to the analysis of World Wide Web traffic"
Lecture 14, (Thursday, 26 February): Heavy-tailed distributions 2: how they arise
"How the distributions got their tails". Deflating explanations: measuring ex instead of x; measuring x1/a instead of x; the central limit theorem for multiplicative terms. Everyday explanations: exponential growth from random times; why isn't Acoma, N.M., not the largest city in the US?; the Yule-Simon mechanism for random growth. Exciting and mysterious explanations: why physicists expect everything to follow a Gaussian, except at phase transitions; self-organized criticality; a skeptical note.
Readings: Newman, "Power laws", section IV; Krugman, ch. 3 and the last part of ch. 8; Mitzenmacher, "A Brief History of Generative Models for Power Law and Lognormal Distributions"; video of Mitzenmacher giving a talk on this material; Sornette, "Mechanism for Powerlaws without Self-Organization", arxiv:cond-mat/0110426
Lecture 13, (Tuesday, 24 February): Heavy-tailed distributions 1: what they are
Slides and R
Highly skewed and heavy tailed distributions: definitions. Real-data examples: words, cities, money, papers, phone calls, the Internet, earthquakes, wars, terrorism, solar flares ("die, puny humans!"). Models: the pure power laws (Pareto and Zipf distributions). Properties of power laws: scale-freedom, 80-20, order statistics. Inequality of condition in America. Modifications: generalized Pareto, Yule-Simon. Non-power-law distributions: truncated power laws, log-normal, stretched exponential/Weibull distribution.
Newman, "Power laws, Pareto distributions and Zipf's law", arxiv:cond-mat/0412004 (through section III)
Lecture 12, (Thursday, 19 February): Self-organization 2
Readings: Shalizi, Klinkner and Haslinger, "Quantifying Self-Organization with Optimal Predictors", arxiv:nlin.AO/0409024; Shalizi, Haslinger, Rouquier, Klinkner and Moore, "Automatic Filters for the Detection of Coherent Structure in Spatiotemporal Systems", arxiv:nlin.CG/0508001
Lecture 11, (Tuesday, 17 February): Cellular automata 2: excitable media; mechanisms and abstractions
The heart as an assemblage of excitable elements. Excitable media as an abstract mechanism. Philosophical excursus on "abstraction" and "mechanism". Examples of excitable media. Chemical oscillators: Belousov-Zhabotinsky type. Chemical oscillators: Turing type. City growth as Turing morphogenesis. Slime mold morphogenesis. The spiral ice-canyons of Mars. Cellular automata models of excitability: Greenberg-Hastings, cyclic CA. Some math for these models.
Readings: Fisch, Gravner and Griffeath, "Threshold-range scaling of excitable cellular automata" [PDF]; idem, "Cyclic Cellular Automata in Two Dimensions" [zipped PostScript]; Greenberg and Hastings, "Spatial Patterns for Discrete Models of Diffusion in Excitable Media" [JSTOR]; Griffeath, "Self-Organization of Random Cellular Automata: four snapshot" [zipped PostScript]; Krugman, ch. 2 and first part of ch. 8; ch. 4 of Guttorp
Lecture 10, (Thursday, 12 February): Cellular automata 1
What cellular automata are; basic elements. Examples: diffusion-limited aggregation, lattice gases, majority vote, Life. Mechanical self-reproduction.
Readings: Flake, ch. 15; Miller and Page, ch. 8
Lecture 9, (Tuesday, 10 February): Self-organization 1, some examples
Levels of description; micro/macro distinctions, aggregation. Emergence of higher levels from lower levels; how to be a good reductionist. Self-organization. Excursus into thermodynamic entropy and the second law of thermodynamics. How self-organization is compatible with the second law. Some examples from fluids. Classroom demo with fluids.
Reading: Miller and Page, ch. 4
Office of Charles and Ray Eames, Powers of Ten, narration by Philip Morrison
Lecture 8, (Thursday, 5 February): Randomness and determinism
Data compression and description length. Algorithmic information content as the length of the shortest effective description. Random sequences are incompressible; incompressible sequences have all the testable properties of randomness. "Any signal distinguishable from noise is insufficiently compressed." Algorithmic information content is uncomputable. Dynamical systems as sources of randomness; Brudno's theorem connecting algorithmic information content to entropy rate and so to Lyapunov exponents. "Any determinism distinguishable from randomness is insufficiently complicated."
Side-note: Algorithmic Information Content and Marginal Entropies
Readings: Flake, ch. 14; Smith, ch. 11; Poincaré, "Chance", from Science and Method [PDF]
Lecture 7 (Tuesday, 3 February): Information Theory
Entropy of random variables, interpretations (randomness, variability, uncertainty, coding). Mutual information, statistical independence, Markov properties. Relative entropy, a.k.a. Kullback-Leibler divergence. Relative entropy measures how easy it is to tell two distributions apart: statistical uses in sampling, large deviations, estimation (via Fisher information). Divergence is is negative expected log likelihood; maximum likelihood attempts to minimize divergence. Entropy rates measure dynamical randomness; connection between metric entropy rate, topological entropy rate, and Lyapunov exponents. Asymptotic equipartition; convergence of log likelihoods to their expected values.
Reading: Feldman, "Information Theory" [PDF]
M.C. Hawking, "Entropy", from Fear of a Black Hole [lyrics; mp3 (radio-safe Brief History of Rhyme version)]
Ray and Charles Eames, A Communications Primer (via Britta Gustafson)
Lecture 6 (Thursday, 29 January): Statistical Inference for Discrete Random Sequences
Inference for Markov chains: the likelihood function; the maximum likelihood estimate of the transition matrix; its convergence; MLE for parameterized Markov chains. Error estimation and hypothesis testing for Markov chains. Parametric bootrstrapping. Higher-order Markov chains. Random functions of Markov chains; stochastic finite automata and their inference. Variable length Markov chains. CSSR.
Handout: Maximum Likelihood Estimation for Markov Chains
Readings :Guttorp, 2.7--2.9 and 2.12 (I, II, III); Smith, ch. 10; Foulkes, "A Class of Machines Which Determine the Statistical Structure of a Sequence of Characters" [PDF]; (*) Smith, "The Maintenance of Uncertainty" [PDF]
Lecture 5 (Tuesday, 27 January): Symbolic Dynamics
Discrete coarse-grainings of the continuous state space. Symbol sequences determined by initial conditions. The shift map; moving the complexity from the update rule to the space. Motivation: "Continuous math is hard; let's go discretize." Refinement of partitions under dynamics. Generating partitions; discretizing continuous dynamics without loss of information. Symbolic dynamics as discrete stochastic processes; the full logistic map as the Bernoulli process. Allowed and forbidden words. The topological entropy rate; how many things can your process do? Formal languages as tools for describing discrete sequences ("we have ways of making your dynamics talk"). Regular expressions and their grammars. Regular expressions and their machines ("circles and arrows and a paragraph on the back of each one"). Examples of languages and machines from the logistic map. Finite-type vs. sofic processes.
Handout: More on the topological entropy rate
Readings: Daw, Finney and Tracy, "A review of symbolic analysis of experimental data" [reprint]
Lecture 4 (Thursday, 22 January): Attractor Reconstruction and Prediction
Slides, lorenz-generator.pl
"Geometry from a time series": how "time-delay embedding" and Takens's theorem let us reconstruct attractors without knowing equations. An example with the Lorenz system. Attractor reconstruction as a form of inference. Change of coordinates, change of observables. Choice of reconstruction settings, false nearest neighbor method. Prediction in the reconstructed state space. Nearest neighbor prediction; an example with the Lorenz system. Picking predictive settings via cross-validation.
Side-notes: "Smooth change of coordinates"; Nonlinear predictors
Readings: Kantz and Schreiber, chs. 3--4; Smith, chs. 7--9.
Lecture 3 (Tuesday, 20 January): Attractors and Mixing
Slides and R
Attractors as generalizations of fixed points and limit cycles. Geometry of attractors: where do trajectories go? The Hénon map, our first multi-dimensionsional dynamical system. Multiple dimensions and memory. Stretching and folding again. Stable and unstable directions. The Lorenz equations, our first continuous-time dynamical system. Lyapunov exponents to measure stability and instability. Probability on attractors. Invariant distributions again. Subtleties of convergence to the invariant distribution. Mixing and decay of correlations. A central limit theorem for mixing processes. CLT behavior of the logistic map.
Readings: Flake, ch. 11; Miller and Page, chs. 1--3; Smith, chs. 4--6.
Lecture 2 (Thursday, 15 January): More dynamics
Slides and R
Stability of fixed points and cycles. Bifurcations; bifurcation diagrams; the period-doubling route to chaos; periodic windows within chaos. Devaney's definition of chaos; how it implies sensitive dependence; stretching and folding. The Arnold cat map. Intermittency. More ergodicity: the evolution of densities and the Frobenius-Perron operator; time averages; the world's simplest ergodic theorem.
Readings: Flake, ch. 10 and sec. 11.1; Guttorp, ch. 1; Smith, chs. 1--3.
The Arnold Cat Map Movie (starring Marlowe the Cat, directed by Evelyn Sander)
Lecture 1 (Tuesday, 13 January): Introduction to the course; starting with dynamics
Slides and R
Administrivia. What is a model? What is a simulation? What is a dynamical system? What is chaos? First acquaintance with the logistic map. Fixed points and cycles. Exploring some properties of chaos.

Corrupting the Young; Complexity; Enigmas of Chance

Posted at June 16, 2009 09:24 | permanent link

Three-Toed Sloth