September 04, 2010

Links, Pleading to be Dumped

Attention conservation notice: Yet more cleaning out of to-be-blogged bookmarks, with links of a more technical nature than last time. Contains log-rolling promotion of work by friends, acquaintances, and senior colleagues.

Pleas for Attention

Wolfgang Beirl raises an interesting question in statistical mechanics: what is " the current state-of-the-art if one needs to distinguish a weak 1st order phase transition from a 2nd order transition with lattice simulations?" (This is presumably unrelated to Wolfgang's diabolical puzzle-picture.)

Maxim Raginsky's new blog, The Information Structuralist. Jon Wilkin's new blog, Lost in Transcription. Jennifer Jacquet's long-running blog, Guilty Planet.

Larry Wasserman has started a new wiki for inequalities in statistics and machine learning; I contributed an entry on Markov's inequality. Relatedly: Larry's lecture notes for intermediate statistics, starting with Vapnik-Chervonenkis theory. (It really does make more sense that way.)

Pleas for Connection

Sharad Goel on birds of a feather shopping together, on the basis of data set that sounds really quite incredible. "It's perhaps tempting to conclude from these results that shopping is contagious .... Though there is probably some truth to that claim, establishing such is neither our objective nor justified from our analysis." (Thank you!)

Mark Liberman on the Wason selection test. There is I feel something quite deep here for ideas that connect the meaning of words to their use, or, more operationally, test whether someone understands a concept by their ability to use it; but I'm not feeling equal to articulating this.

What it's like being a bipolar writer. What it's like being a schizophrenic neuroscientist (the latter via Mind Hacks).

Pleas for Correction

The Phantom of Heilbronn, in which the combined police forces of Europe spend years chasing a female serial killer, known solely from DNA evidence, only to find that it's all down to contaminated cotton swabs from a single supplier. Draw your own morals for data mining and the national surveillance state. (Via arsyed on delicious.)

Herbert Simon and Paul Samuelson take turns, back in 1962 beating up on Milton Friedman's "Methodology of Positive Economics", an essay whose exquisite awfulness is matched only by its malign influence. (This is a very large scan of a xerox copy, from the CMU library's online collection of Simon's personal files.) Back in July, Robert Solow testified before Congress on "Building a Science of Economics for the Real World" (via Daniel McDonald). To put it in "shorter Solow" form: I helped invent macroeconomics, and let me assure you that this was not what we had in mind. Related, James Morley on DSGEs (via Brad DeLong).

Pleas for Scholarly Attention

This brings us to the paper-link-dump portion of the program.

James K. Galbraith, Olivier Giovanni and Ann J. Russo, "The Fed's Real Reaction Function: Monetary Policy, Inflation, Unemployment, Inequality — and Presidential Politics", University of Texas Inequality Project working paper 42, 2007
A crucial posit of the kind of models Solow and Morley complain about above is that the central bank acts as a benevolent (and far-sighted) central planner. Concretely, they generally assume that the central bank follows some version of the Taylor Rule, which basically says "keep both the rate of inflation and the rate of real economic growth steady". What Galbraith et al. do is look at what actually predicts the Fed's actions. The Taylor Rule works much less well, it turns out, than the assumption that Fed policy is a tool of class and partisan struggle. It would amuse me greatly to see what happens in something like the Kydland-Prescott model with this reaction function.
Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, Patrik O. Hoyer, "Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity", Journal of Machine Learning Research 11 (2010): 1709--1731
The Galbraith et al. paper, like a great deal of modern macroeconometrics, uses a structural vector autoregression. The usual ways of estimating such models have a number of drawbacks — oh, I'll just turn it over to the abstract. "Analysis of causal effects between continuous-valued variables typically uses either autoregressive models or structural equation models with instantaneous effects. Estimation of Gaussian, linear structural equation models poses serious identifiability problems, which is why it was recently proposed to use non-Gaussian models. Here, we show how to combine the non-Gaussian instantaneous model with autoregressive models. This is effectively what is called a structural vector autoregression (SVAR) model, and thus our work contributes to the long-standing problem of how to estimate SVAR's. We show that such a non-Gaussian model is identifiable without prior knowledge of network structure. We propose computationally efficient methods for estimating the model, as well as methods to assess the significance of the causal influences. The model is successfully applied on financial and brain imaging data." (Disclaimer: Patrik is an acquaintance.)
Bharath K. Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Schölkopf, Gert R.G. Lanckriet, "Hilbert Space Embeddings and Metrics on Probability Measures", Journal of Machine Learning Research 11 (2010): 1517--1561
There's been a lot of work recently on representing probability distributions by representing them as points in Hilbert spaces, because really, who doesn't love a Hilbert space? (One can see this as both the long-run recognition that Wahba was on to something profound when she realized that splines became much more comprehensible in reproducing-kernel Hilbert spaces, and the influence of the kernel trick itself.) But there are multiple ways to do this, and it would be nicest if we could chose a representation which has useful probabilistic properties --- distance in the Hilbert space should be zero only when the distributions are the same, and for many purposes it would be even better if the distance in the Hilbert space "metrized" weak convergence, a.k.a. convergence in distribution. This paper gives comprehensible criteria for these properties to hold in a lot of important domains.
Robert Haslinger, Gordon Pipa and Emery Brown, "Discrete Time Rescaling Theorem: Determining Goodness of Fit for Discrete Time Statistical Models of Neural Spiking", Neural Computation 22 (2010): 2477--2506
A broad principle in statistics is that if you have found the right model, whatever the model can't account for should look e completely structureless. One expression of this is the bit of folklore in information theory that an optimally compressed signal is indistinguishable from pure noise (i.e., a Bernoulli process with p=0.5). Another manifestation is residual checking in regression models: to the extent there are patterns in your residuals, you are missing systematic effects. One can make out a good case that this is a better way of comparing models than just asking which has smaller residuals. For example, Aris Spanos argues (Philosophy of Science 74 (2007): 1046--1066; PDF preprint) that looking for small residuals might well lead one to prefer a Ptolemaic model for the motion of Mars to that of Kepler, but the Ptolemaic residuals are highly systematic, while Kepler's are not.
Getting this idea into a usable form for a particular kind of data requires knowing what "structureless noise" means in that context. For point processes, "structureless noise" is a homogeneous Poisson process, where events occur at a constant rate per unit time, and nothing ever alters the rate. If you have another sort of point process, and you know the intensity function, you can use that to transform the original point process into something that looks just like a homogeneous Poisson process, by "time-rescaling" --- you stretch out the distance between points when the intensity is high, and squeeze them together where the intensity is low, to achieve a constant density of points. (Details.) This forms the basis for a very cute goodness-of-fit test for point processes, but only in continuous time. As you may have noticed, actual continuous-time observations are rather scarce; we almost always have data with a finite time resolution. The usual tactic has been to hope that the time bins are small enough that we can pretend our observations are in continuous time, i.e., to ignore the issue. This paper shows how to make the same trick work in discrete time, with really minimal modifications. (Disclaimer: Rob is an old friend and frequent collaborator, and two of the co-authors on the original time-rescaling paper are senior faculty in my department.)

And now, back to work.

Manual trackback: Beyond Microfoundations

Linkage; Enigmas of Chance; The Dismal Science; Minds, Brains, and Neurons; Physics; Networks; Commit a Social Science; Incestuous Amplification

Posted at September 04, 2010 11:05 | permanent link

Three-Toed Sloth