These are my preprints; also, publishers are to be destroyed
Attention conservation notice:
Advertisements for myself and my co-authors. If these were of interest, you'd
probably already have seen them on arxiv.
I don't seem to have publicized new papers at all this year. Except for the
first, they're all working their way through the peer-review
system.
- CRS, "Comment on `Why and When ``Flawed'' Social Network Analyses Still
Yield Valid Tests of No
Contagion'", Statistics,
Politics, and Policy 3 (2012): 5
[PDF reprint]
- Abstract: VanderWeele et al.'s paper is a useful contribution to
the on-going scientific conversation about the detection of contagion from
purely observational data. It is especially helpful as a corrective to some of
the more extreme statements of Lyons
(2011). Unfortunately, this paper, too, goes too far in some places, and
so needs some correction itself.
- Comment: As you can tell, this is an invited comment on a paper by
VanderWeele,
Ogburn and Tchetgen Tchetgen, which is
on VanderWeele's
website.. It began life as my referee report on their paper.
- Editores delenda sunt: The journal Statistics, Politics, and
Policy used to be published by
the Berkeley Electronic Press, but the
title was recently taken over by De
Gruyter. The latter did not, of course, devote any resources to helping me
write my paper, or to ensure its scholarly merit (such as it might have), since
peer reviewers and editors are unpaid volunteers. De Gruyter provided copy
editing, which amounted to mis-understanding
how LaTeX's
hyperref package works and telling me to "fix" it, and not catching any of
my actual mistakes (e.g., the editing fragment "could at be" on p. 2). For all
this, they charge readers \$42 for a copy of my paper, i.e., \$14 per page of
text. (Of course fees like that are really to force libraries to subscribe to
the whole journal, for a more dependable revenue stream.) The experience left
me feeling dirty, and not in a good way. Again,
for-profit journal publishing is a racket and should be destroyed.
- Georg M. Goerg and CRS,
"LICORS: Light Cone Reconstruction of States for Non-parametric Forecasting of
Spatio-Temporal
Systems", arxiv:1206.2398
- Abstract: We present a new, non-parametric forecasting method for
data where continuous values are observed discretely in space and time. Our
method, "light-cone reconstruction of states" (LICORS), uses physical
principles to identify predictive states which are local properties of the
system, both in space and time. LICORS discovers the number of predictive
states and their predictive distributions automatically, and consistently,
under mild assumptions on the data source. We provide an algorithm to
implement our method, along with a cross-validation scheme to pick control
settings. Simulations show that CV-tuned LICORS outperforms standard methods
in forecasting challenging spatio-temporal dynamics. Our work provides applied
researchers with a new, highly automatic method to analyze and forecast
spatio-temporal data.
- Comment: This descends from the old "Our New
Filtering Techniques Are Unstoppable" paper, and so from the work
on self-organization
and prediction on networks,
and ultimately from the last chapter of my
own dissertation. One reason I use the sloth as an emblem is that I really
am very slow this way. (Georg is not, as a comparison of this to his
dissertation proposal will show.) All of the older methods had to assume
that space, time, and the field being predicted were all discrete. The last
point, the discretized field, was annoying, since the mathematical theory had
no such restriction, and the discretizing data before working with it is
dubious.
- What Georg did here was figure out how, exactly, to use non-parametric
density estimation and high-dimensional two-sample tests to work with
continuous-valued fields. (We still need space and time to be discrete.) To
prove consistency, we assumed a limited number of predictive states, but we let
that number grow with the sample size.
- We are indebted to Larry
Wasserman for a great deal of advice, and most of all for the acronym
LICORS, pronounced like "liquors".
- Xiaoran Yan, Jacob E. Jensen, Florent Krzakala, Cristopher Moore, CRS, Lenka Zdeborova, Pan Zhang and Yaojia Zhu, "Model Selection for Degree-corrected Block Models", arxiv:1207.3994
- Abstract: A central problem in analyzing networks is partitioning
them into modules or communities, clusters with a statistically homogeneous
pattern of links to each other or to the rest of the network. One of the best
tools for this is the stochastic block model, which in its basic form imposes a
Poisson degree distribution on all nodes within a community or block. In
contrast, degree-corrected block models allow for heterogeneity of degree
within blocks. Since these two model classes often lead to very different
partitions of nodes into communities, we need an automatic way of deciding
which model is more appropriate to a given graph. We present a principled and
scalable algorithm for this model selection problem, and apply it to both
synthetic and real-world networks. Specifically, we use belief propagation to
efficiently approximate the log-likelihood of each class of models, summed over
all community partitions, in the form of the Bethe free energy. We then derive
asymptotic results on the mean and variance of the log-likelihood ratio we
would observe if the null hypothesis were true, i.e., if the network were
generated according to the non-degree-corrected block model. We find that for
sparse networks, significant corrections to the classic asymptotic
likelihood-ratio theory (underlying \( \chi^2 \) hypothesis testing or the AIC)
must be taken into account. We test our procedure against two real-world
networks and find excellent agreement with our theory.
- Comment: Initially, I was unthinkingly sure the log-likelihood
ratio would have a \( \chi^2 \) distribution. Writing this was most
educational in many ways, and gave me a new appreciation of how interestingly
weird network data really is.
- Georg M. Goerg and CRS, "Mixed LICORS: A Nonparametric Algorithm for
Predictive State
Reconstruction", arxiv:1211.3760
- Abstract: We introduce "mixed LICORS", an algorithm for learning
nonlinear, high-dimensional dynamics from spatio-temporal data which can be
used for both prediction and simulation. Mixed LICORS extends the recent
LICORS algorithm Goerg and Shalizi (2012) from hard clustering of predictive
distributions to a non-parametric, EM-like soft clustering. This retains the
asymptotic predictive optimality of LICORS, but, as we show in simulations,
greatly improves out-of-sample forecasts with limited data. We also implement
the proposed methodology in the R
package LICORS.
- Comment: We as a community really ought to understand
nonparametric expectation-maximization better.
- Daniel J. McDonald, CRS,
and Mark J. Schervish, "Time
series forecasting: model evaluation and selection using nonparametric risk
bounds", arxiv:1212.0463
- Abstract: We derive generalization error bounds --- bounds on the
expected inaccuracy of the predictions --- for traditional time series
forecasting models. Our results hold for many standard forecasting tools
including autoregressive models, moving average models, and, more generally,
linear state-space models. These bounds allow forecasters to select among
competing models and to guarantee that with high probability, their chosen
model will perform well without making strong assumptions about the data
generating process or appealing to asymptotic theory. We motivate our
techniques with and apply them to standard economic and financial forecasting
tools --- a GARCH model for predicting equity volatility and a dynamic
stochastic general equilibrium model (DSGE), the standard tool in macroeconomic
forecasting. We demonstrate in particular how our techniques can aid
forecasters and policy makers in choosing models which behave well under
uncertainty and mis-specification.
- Comment: Another chapter or two from Daniel's
dissertation. Keeping our set-up close to what time-series-wallahs usually
do — the ARMA alphabet soup plus state-space models, mean-squared error,
etc. — was very much the goal of the project. (At the same time,
assuming any of those models is ever correctly specified is silly.)
Controlling mean-squared error in particular is why I've grown so interested
in generalization error guarantees for unbounded loss
functions.
- I would say more about the economic implications of the results here, but
we're preparing a separate paper on that, aimed at economists, and I don't want
to blunt its edge.
20 December: more on Georg's thesis
work.
Self-Centered;
Enigmas of Chance;
Networks;
Complexity;
The Dismal Science;
Learned Folly
Posted at December 06, 2012 21:05 | permanent link