Attention conservation notice: Yet more cleaning out of to-be-blogged bookmarks, with links of a more technical nature than last time. Contains log-rolling promotion of work by friends, acquaintances, and senior colleagues.

Wolfgang
Beirl raises
an interesting question in statistical mechanics: what *is* " the
current state-of-the-art if one needs to distinguish a weak 1st order phase
transition from a 2nd order transition with lattice simulations?" (This is
presumably unrelated to
Wolfgang's diabolical
puzzle-picture.)

Maxim Raginsky's new blog, The Information Structuralist. Jon Wilkin's new blog, Lost in Transcription. Jennifer Jacquet's long-running blog, Guilty Planet.

Larry Wasserman has started a new wiki for inequalities in statistics and machine learning; I contributed an entry on Markov's inequality. Relatedly: Larry's lecture notes for intermediate statistics, starting with Vapnik-Chervonenkis theory. (It really does make more sense that way.)

Sharad Goel on birds
of a feather shopping together, on the basis of data set that sounds really
quite incredible. "It's perhaps tempting to conclude from these results that
shopping is contagious .... Though there is probably some truth to that claim,
establishing such is neither our objective nor justified from our analysis."
(*Thank* you!)

Mark Liberman on the Wason selection test. There is I feel something quite deep here for ideas that connect the meaning of words to their use, or, more operationally, test whether someone understands a concept by their ability to use it; but I'm not feeling equal to articulating this.

What it's like being a bipolar writer. What it's like being a schizophrenic neuroscientist (the latter via Mind Hacks).

The Phantom of Heilbronn, in which the combined police forces of Europe spend years chasing a female serial killer, known solely from DNA evidence, only to find that it's all down to contaminated cotton swabs from a single supplier. Draw your own morals for data mining and the national surveillance state. (Via arsyed on delicious.)

Herbert Simon and Paul Samuelson take turns, back in 1962 beating up on Milton Friedman's "Methodology of Positive Economics", an essay whose exquisite awfulness is matched only by its malign influence. (This is a very large scan of a xerox copy, from the CMU library's online collection of Simon's personal files.) Back in July, Robert Solow testified before Congress on "Building a Science of Economics for the Real World" (via Daniel McDonald). To put it in "shorter Solow" form: I helped invent macroeconomics, and let me assure you that this was not what we had in mind. Related, James Morley on DSGEs (via Brad DeLong).

This brings us to the paper-link-dump portion of the program.

- James K. Galbraith, Olivier Giovanni and Ann J. Russo, "The
Fed's
*Real*Reaction Function: Monetary Policy, Inflation, Unemployment, Inequality — and Presidential Politics", University of Texas Inequality Project working paper 42, 2007 - A crucial posit of the kind of models Solow and Morley complain about above
is that the central bank acts as a benevolent (and
far-sighted) central
planner. Concretely, they generally assume that the central bank follows
some version of the Taylor
Rule, which basically says "keep both the rate of inflation and the rate of
real economic growth steady". What Galbraith
*et al.*do is look at what*actually*predicts the Fed's actions. The Taylor Rule works much less well, it turns out, than the assumption that Fed policy is a tool of class and partisan struggle. It would amuse me greatly to see what happens in something like the Kydland-Prescott model with this reaction function. - Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, Patrik O. Hoyer, "Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity", Journal of Machine Learning Research
**11**(2010): 1709--1731 - The Galbraith
*et al.*paper, like a great deal of modern macroeconometrics, uses a structural vector autoregression. The usual ways of estimating such models have a number of drawbacks — oh, I'll just turn it over to the abstract. "Analysis of causal effects between continuous-valued variables typically uses either autoregressive models or structural equation models with instantaneous effects. Estimation of Gaussian, linear structural equation models poses serious identifiability problems, which is why it was recently proposed to use non-Gaussian models. Here, we show how to combine the non-Gaussian instantaneous model with autoregressive models. This is effectively what is called a structural vector autoregression (SVAR) model, and thus our work contributes to the long-standing problem of how to estimate SVAR's. We show that such a non-Gaussian model is identifiable without prior knowledge of network structure. We propose computationally efficient methods for estimating the model, as well as methods to assess the significance of the causal influences. The model is successfully applied on financial and brain imaging data." (*Disclaimer*: Patrik is an acquaintance.) - Bharath K. Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard
Schölkopf, Gert R.G. Lanckriet, "Hilbert Space Embeddings and Metrics on
Probability
Measures", Journal
of Machine Learning Research
**11**(2010): 1517--1561 - There's been a lot of work
recently on representing probability distributions by representing them
as points in Hilbert spaces, because really, who doesn't love a
Hilbert space? (One
can see this as both the long-run recognition
that Wahba was on to something
profound when she realized
that splines
became much more
comprehensible
in reproducing-kernel
Hilbert spaces, and the influence of
the kernel trick
itself.) But there are multiple ways to do this, and it would be nicest if we
could chose a representation which has useful
*probabilistic*properties --- distance in the Hilbert space should be zero only when the distributions are the same, and for many purposes it would be even better if the distance in the Hilbert space "metrized" weak convergence, a.k.a. convergence in distribution. This paper gives comprehensible criteria for these properties to hold in a lot of important domains. - Robert Haslinger, Gordon Pipa and Emery Brown, "Discrete Time Rescaling
Theorem: Determining Goodness of Fit for Discrete Time Statistical Models of
Neural Spiking", Neural
Computation
**22**(2010): 2477--2506 - A broad principle in statistics is that if you have found the right model,
whatever the model can't account for should look e completely structureless.
One expression of this is the bit
of folklore
in information
theory that an optimally compressed signal is indistinguishable from pure
noise (i.e.,
a Bernoulli
process with
*p*=0.5). Another manifestation is residual checking in regression models: to the extent there are patterns in your residuals, you are missing systematic effects. One can make out a good case that this is a better way of comparing models than just asking which has*smaller*residuals. For example, Aris Spanos argues (Philosophy of Science**74**(2007): 1046--1066; PDF preprint) that looking for small residuals might well lead one to prefer a Ptolemaic model for the motion of Mars to that of Kepler, but the Ptolemaic residuals are highly systematic, while Kepler's are not. - Getting this idea into a usable form for a particular kind of data requires
knowing what "structureless noise" means in that context.
For point processes,
"structureless noise" is
a homogeneous Poisson
process, where events occur at a constant rate per unit time, and nothing
ever alters the rate. If you have another sort of point process, and you know
the intensity function, you can use that to transform the original point
process into something that looks just like a homogeneous Poisson process, by
"time-rescaling" --- you stretch out the distance between points when the
intensity is high, and squeeze them together where the intensity is low, to
achieve a constant density of
points. (Details.)
This forms the basis for
a very cute
goodness-of-fit test for point processes, but only in continuous time. As
you may have noticed,
*actual*continuous-time observations are rather scarce; we almost always have data with a finite time resolution. The usual tactic has been to hope that the time bins are small enough that we can pretend our observations are in continuous time, i.e., to ignore the issue. This paper shows how to make the same trick work in discrete time, with really minimal modifications. (*Disclaimer*: Rob is an old friend and frequent collaborator, and two of the co-authors on the original time-rescaling paper are senior faculty in my department.)

And now, back to work.

*Manual trackback*: Beyond Microfoundations

Linkage; Enigmas of Chance; The Dismal Science; Minds, Brains, and Neurons; Physics; Networks; Commit a Social Science; Incestuous Amplification

Posted at September 04, 2010 11:05 | permanent link