Statistics

Last update: 07 Jul 2025 12:03
First version: 15 February 2000; substantial edit, 5 December 2024

Since June 2005, I have been a professor of statistics. My considered view, after observing the discipline from within and without for some time, is that the discipline of statistics is (at least) three things at once.

One of the mathematical sciences, elaborating on certain applications of probability which are closely related to (or just are) non-demonstrative inference and induction.
A branch of engineering, concerned with systems and procedures for drawing reliable inferences from partial and noisy data. Like any engineering discipline, it has both prescriptive and descriptive parts; embraces design, analysis, implementation, operation, critique and repair; has a place for theoretical studies of simplified model set-ups to guide practice, etc., etc.
A form of rhetoric appropriate to persuading highly numerate audiences. This is the least common perspective --- I learned it from Abelson's Statistics as Principled Argument --- but I find it's immediately convincing to anyone who's familiar with both statistics and classical rhetoric.

The first view is what you'll get from most textbooks. The second perspective is an explicit part of my teaching, but then, my school did begin life as "Carnegie Tech". I don't think the third viewpoint is taken in teaching anywhere, though a school where it was would be a very interesting place to teach. (Maybe you could get away with it at St. John's, or even the University of Chicago?) See also: Properties vs. principles in defining "good statistics"

Teaching Statistics
Dependent data: Statistical inference for stochastic processes, a.k.a. time-series analysis. Signal processing and filtering. Spatial statistics. Spatio-temporal statistics.
Model selection: Especially: adapting to unknown characteristics of the data, like unknown noise distributions, or unknown smoothness of the regression function. Inference after model selection.
Model discrimination: That is, designing experiments so as to discriminate between competing classes of model. Adaptation to data issues here.
Rates of convergence of estimators to true values: Empirical process theory. (Cf. some questions in ergodic theory).
Estimating distribution functions: And estimating entropies, or other functionals of distributions.
Non-parametric methods: Both those that are genuinely distribution-free, and those that would more accurately be mega-parametric (even infinitely-parametric) methods, such as neural networks
Regression
Bootstrapping and other resampling methods
Cross-validation
Sufficient statistics
Exponential families
Information Geometry
Partial identification of parametric statistical models
Causal Inference
Decision theory: Conventional, and the sorts with some connection to how real decisions are made.
Graphical models
Monte Carlo and other simulation methods
"De-Bayesing": Ways of taking Bayesian procedures and eliminating dependence on priors, either by replacing them by initial point-estimates, or by showing the prior doesn't matter, asymptotically or hopefully sooner. See: Frequentist consistency of Bayesian procedures.
Computational Statistics
Statistics of structured data
Statistics on manifolds: i.e., what to do when the data live in a continuous but non-Euclidean space.
Grammatical Inference
Factor analysis
Mixture models
Multiple testing
Predictive distributions: ... especially if they have confidence/coverage properties
Density estimation: especially conditional density estimation; and density estimation on graphical models
Indirect inference: And other species of simulation-based inference
"Missing mass" and species abundance problems: I.e., how much of the distribution have we not yet seen?
Independence Tests, Conditional Independence Tests, Measures of Dependence and Conditional Dependence
Two-Sample Tests
Statistical Emulators for Simulation Models
Hilbert Space Methods for Statistics and Probability
Large Deviations and Information Theory in the Foundations of Statistics
Confidence Sets, Confidence Intervals
Nonparametric Confidence Sets for Functions
Conformal prediction
Optimal Linear Prediction and Estimation
Empirical Likelihood
(Decision, Classification, Regression, Prediction) Trees in Statistics and Machine Learning

Larry Gonick and Woollcott Smith, The Cartoon Guide to Statistics
Ian Hacking, The Taming of Chance
D. Huff, How to Lie with Statistics
Theodore Porter, The Rise of Statistical Thinking, 1820--1900
Constance Reid, Neyman from Life [Biography of Jerzy Neyman, one of the makers of modern statistical theory, and, I am happy to say, among the brighter lights of my alma mater. Reid does an excellent job of explaining Neyman's work in terms accessible to the general reader. There is a new edition, titled simply Neyman, but otherwise unchanged. Review by Steve Laniel]
Edward R. Tufte
- The Visual Display of Quantitative Information
- Visual Explanations

Jordan Ellenberg, How Not to Be Wrong: The Power of Mathematical Thinking
Francis Galton, "Statistical Inquiries into the Efficacy of Prayer," Fortnightly Review 12 (1872): 125--135 [online]

Richard A. Berk, Regression Analysis: A Constructive Critique
Leo Breiman, "Statistical Modeling: The Two Cultures", Statistical Science 16 (2001): 199--231 [very much including the discussion by others and the reply by Breiman. Thanks to Chris Wiggins for alerting me to this.]
Robert E. Kass, "Statistical Inference; The Big Picture", Statistical Science 26 (2011): 1--19, arxiv:1106.2895
Galit Shmueli, "To Explain or to Predict?", Statistical Science 25 (2010): 289--310, arxiv:1101.0891

Ole E. Barndorff-Nielsen and David R. Cox, Inference and Asymptotics
M. S. Bartlett
- "Inference and Stochastic Processes", Journal of the Royal Statistical Society A 130 (1967): 457--478 [JSTOR]
- "Chance or Chaos?", Journal of the Royal Statistical Society A 153 (1990): 321--347 [JSTOR]
David R. Brillinger, "The 2005 Neyman Lecture: Dynamic Indeterminism in Science", Statistical Science 23 (2008): 48--64, arxiv:0808.0620 [With discussions and response]
D. R. Cox and Christl A. Donnelly, Principles of Applied Statistics [Review: Turning Scientific Perplexity into Ordinary Statistical Uncertainty]
Harald Cramér, Mathematical Methods of Statistics
A. C. Davison, Statistical Models
Peter Guttorp, Stochastic Modeling of Scientific Data [Good introduction to using dependent data]
Trever Hastie, Robert Tibshirani and Jerome Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction [Website, with full text free in PDF]
Tony Lin [Prof. Dr. Lin was working on his doctorate when I was an undergrad at Berkeley; we became friends at the I-House, if that is the word I want for someone who offered to keep my brain alive in a jigger-glass and subject it to random electrical shocks ("Jzzt! Jzzt!"). But despite his questionable tastes in acquaintances, he's a damn good statistician and a model teacher.]
- Virtual Statistics 50 [Intro. statistics]
- Virtual Statistics 154A [Intro. statistics with algebra and calculus]
Deborah Mayo, Error and the Growth of Experimental Knowledge [Review: We Have Ways of Making You Talk, or, Long Live Peircism-Popperism-Neyman-Pearson Thought!]
NIST, Electronic Handbook of Statistical Methods
E. J. G. Pitman, Some Basic Theory for Statistical Inference
Jorma Rissanen, Stochastic Complexity in Statistical Inquiry
Mark Schervish, Theory of Statistics
Aad van der Vaart, Asymptotic Statistics
Larry Wasserman
- All of Statistics
- All of Nonparametric Statistics

A. C. Atkinson and A. N. Donev, Optimum Experimental Design
F. Bacchus, H. E. Kyburg and M. Thalos, "Against Conditionalization," Synthese 85 (1990): 475--506 [Why "Dutch book" arguments do not, in fact, mean that rational agents must be Bayesian reasoners]
M. J. Bayarri and James O. Berger, "$ P $ Values for Composite Null Models", Journal of the American Statistical Association 95 (2000) 127--1142 [To be read in conjunction with Robins, van der Vaart and Ventura, below. JSTOR]
Anil K. Bera and Aurobindo Ghosh, "Neyman's Smooth Test and Its Applications in Econometrics", pp. 177--230 in Aman Ullah, Alan T. K. Wan and Anoop Chaturvedi (eds.), Handbook of Applied Econometrics and Statistical Inference, SSRN/272888
Julian Besag, "A Candidate's Formula: A Curious Result in Bayesian Prediction", Biometrika 76 (1989): 183 [A wonderful and bizarre expression for the Bayesian predictive density, in terms of how adding a new data point would change the posterior. JSTOR]
P. J. Bickel, "On Adaptive Estimation", Annals of Statistics 10 (1982): 647--671
Pier Bissiri, Chris Holmes, Stephen Walker, "A General Framework for Updating Belief Distributions", arxiv:1306.6430
David Blackwell and M. A. Girshick, Theory of Games and Statistical Decisions
Leo Breiman, "No Bayesians in Foxholes", IEEE Expert: Intelligent Systems and Their Applications 12 (1997): 21--24 [PDF reprint; comments by Andy Gelman]
Peter Bühlmann and Sara van de Geer, Statistics for High-Dimensional Data: Methods, Theory and Applications
Ronald W. Butler, "Predictive Likelihood Inference with Applications", Journal of the Royal Statistical Society B 48 (1986): 1--38 ["in the predictive setting, all parameters are nuisance parameters". JSTOR]
Venkat Chandrasekaran and Michael I. Jordan, "Computational and Statistical Tradeoffs via Convex Relaxation", Proceedings of the National Academy of Sciences (USA) 110 (2013): E1181--E1190, arxiv:1211.1073
Hwan-sik Choi and Nicholas M. Kiefer, "Differential Geometry and Bias Correction in Nonnested Hypothesis Testing" [PDF preprint via Kiefer]
J. Bradford DeLong and Kevin Lang, "Are All Economic Hypotheses False?", Journal of Political Economy 100 (1992): 1257--1272 [PDF preprint. The point is about abuses of hypothesis testing, not economic hypotheses as such.]
Amir Dembo and Yuval Peres, "A Topological Criterion for Hypothesis Testing", Annals of Statistics 22 (1994): 106--117 ["A simple topological criterion is given for the existence of a sequence of tests for composite hypothesis testing problems, such that almost surely only finitely many errors are made."]
John Earman, Bayes or Bust? A Critical Account of Bayesian Confirmation Theory
Bradley Efron
- "Bootstrap Methods: Another Look at the Jackknife", Annals of Statistics 7 (1979): 1--26 [The original paper; staggeringly understandable]
- The Jackknife, the Bootstrap, and Other Resampling Plans
- "Maximum Likelihood and Decision Theory", The Annals of Statistics 10 (1982): 340--356
Mikhail Ermakov, "On Consistent Hypothesis Testing", arxiv:1403.6296
Michael Evans, "What does the proof of Birnbaum's theorem prove?", arxiv:1302.5468
S. N. Evans and P. B. Stark, "Inverse Problems as Statistics" [Abstract, PDF]
Steve Fienberg, The Analysis of Cross-Classified Categorical Data
Andrew Gelman and Iain Pardoe, "Average predictive comparisons for models with nonlinearity, interactions, and variance components", Sociological Methodology 37 (2007): 23--51 [PDF preprint, Gelman's comments]
Christopher Genovese, Peter Freeman, Larry Wasserman, Robert C. Nichol and Christopher Miller, "Inference for the Dark Energy Equation of State Using Type IA Supernova Data", Annals of Applied Statistics 3 (2009): 144--178, arxiv:0805.4136 [I am biased, because Genovese and Wasserman are friends, but this seems to me a model of a modern applied statistics paper: use interesting statistical ideas to say something helpful about an important scientific problem on its own terms, rather than distorting the problem until it "looks like a nail".]
Charles J. Geyer, "Le Cam Made Simple: Asymptotics of Maximum Likelihood without the LLN or CLT or Sample Size Going to Infinity", arxiv:1206.4762 [There are two separable points here. One is that much of the usual asymptotic theory of maximum likelihood follows from the quadratic form of the likelihood alone; whenever and however that is reached, those consequences follow. Approximately quadratic likelihoods imply approximations to the usual asymptotics. This is unquestionably correct. The other is some bashing of results like the law of large numbers and central limit theorem, which seems misguided to me.]
Tilmann Gneiting, "Making and Evaluating Point Forecasts", Journal of the American Statistical Association 106 (2011): 746--762, arxiv:0912.0902
Tilmann Gneting, Fadoua Balabdaoui and Adrian E. Raftery, "Probabilistic Forecasts, Calibration and Sharpness", Journal of the Royal Statistical Society B 69 (2007): 243--268
Mark S. Handcock and Martina Morris, Relative Distribution Methods in the Social Sciences
Bruce E. Hansen
- "The Likelihood Ratio Test Under Nonstandard Conditions: Testing the Markov Switching Model of GNP", Journal of Applied Econometrics 7 (1992): S61--S82 [I very much like the approach of treating the likelihood ratio as an empirical process; why haven't I seen it before? (Also, the state-of-the-art in simulating Gaussian processes must be much better now than what Hansen had in '92, which would make this even more practical. PDF reprint.]
- "Inference when a nuisance parameter is not identified under the null hypothesis", Econometrica 64 (1996): 413--430
Jeffrey D. Hart, Nonparametric Smoothing and Lack-of-Fit Tests
Christopher C. Heyde, Quasi-Likelihood and Its Applications: A General Approach to Optimal Parameter Estimation
Kieran Healy, Data Visualization: A Practical Introduction
Nils Lid Hjort and David Pollard, "Asymptotics for minimisers of convex processes", arxiv:1107.3806 [Very elegant]
Peter J. Huber
- "On the Non-Optimality of Optimal Procedures"
- "The behavior of maximum likelihood estimates under nonstandard conditions", Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1 (Univ. of Calif. Press, 1967), pp. 221-233
Wilbert C. M. Kallenberg and Teresa Ledwina, "Data-driven smooth tests when the hypothesis is composite", Journal of the American Statistical Association 92 (1997): 1094--1104 [Abstract, PDF reprint; JSTOR]
Gary King, A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data
Gary King and Margaret Roberts, "How Robust Standard Errors Expose Methodological Problems They Do Not Fix" [PDF preprint]
Evelyn M. Kitagawa, "Components of a Difference Between Two Rates", Journal of the American Statistical Association 50 (1955): 1168--1194
Solomon W. Kullback, Information Theory and Statistics
Michael Lavine and Mark J. Schervish, "Bayes Factors: What They Are and What They Are Not" [PS preprint]
Steffen Lauritzen, Extremal Families and Systems of Sufficient Statistics [See comments under sufficient statistics]
J. F. Lawless and Marc Fredette, "Frequentist prediction intervals and predictive distributions", Biometrika 92 (2005): 529--542 ["Frequentist predictive distributions are defined as confidence distributions .... A simple pivotal-based approach that produces prediction intervals and predictive distributions with well-calibrated frequentist probability interpretations is introduced, and efficient simulation methods for producing predictive distributions are considered. Properties related to an average Kullback-Leibler measure of goodness for predictive or estimated distributions are given."]
Lucien Le Cam
- "Neyman and Stochastic Models" [PDF. Some vignettes of Neyman putting together models, and his model-building process.]
- "Maximum Likelihood; An Introduction" [PDF. Not an introduction, but rather a collection of examples of where it just does not work, or at least doesn't work well. That this is presented as "an introduction" is entirely characteristic of the author.]
Erich L. Lehmann, "On likelihood ratio tests", math.ST/0610835
Bing Li, "A minimax approach to consistency and efficiency for estimating equations," Annals of Statistics 24 (1996): 1283--1297
Bruce Lindsay and Liawei Liu, "Model Assessment Tools for a Model False World", Statistical Science 24 (2009): 303--318, arxiv:1010.0304 [Their model-adequacy index is, essentially, the number of samples needed to detect the falsity of the model with some reasonable, pre-set level of power, with fixed size/significance level. This is a very natural quantity. In fact, by results which go back to Kullback's book, the power grows exponentially, with a rate equal to the Kullback-Leibler divergence rate. (More exactly, one minus the power goes to zero exponentially at that rate, but you know what I meant.) Large deviations theory includes generalizations of this result. Many statisticians, I'd guess, would prefer the Lindsay-Liu index because will feel it more natural to them to gauge error in terms of a sample size rather than bits, but to each their own.]
Brad Luen and Philip B. Stark, "Testing earthquake predictions", pp. 302--315 in Deborah Nolan and Terry Speed (eds.), Probability and Statistics; Essays in Honor of David A. Freedman [The issues arise however not just for earthquakes, but for all sorts of clustered events]
Charles Manski, Identification for Prediction and Decision
Deborah G. Mayo and D. R. Cox, "Frequentist statistics as a theory of inductive inference", math.ST/0610846
Karthika Mohan, Judea Pearl and Jin Tian, "Graphical Models for Inference with Missing Data", NIPS 2013 [There was at least one preprint version with the more pointed title "Missing Data as a Causal Inference Problem"]
M. B. Nevel'son and R. Z. Has'minskii, Stochastic Approximation and Recursive Estimation
Andrey Novikov, "Optimal sequential multiple hypothesis tests", arxiv:0811.1297
David Pollard
- "Asymptotics via Empirical Processes", Statistical Science 4 (1989): 341--354
- Empirical Processes: Theory and Applications
Jeffrey S. Racine, "Nonparametric Econometrics: A Primer", Foundations and Trends in Econometrics 3 (2008): 1--88 [Good primer of nonparametric techniques for regression, density estimation and hypothesis testing; next to no economic content (except for examples). Presumes reasonable familiarity with parametric statistics. PDF reprint]
J. N. K. Rao, "Some recent advances in model-based small area estimation", Survey Methodology 25 (1999): 175--186
James M. Robins and Ya'acov Ritov, "Toward a curse of Dimensionality Appropriate (CODA) Asymptotic Theory for Semi-Parametric Models", Statistics in Medicine 16 (1997): 285--319 [PDF reprint via Prof. Robins]
James M. Robins, Aad van de Vaart and Valérie Ventura, "Asymptotic Distribution of P Values in Composite Null Models", Journal of the American Statistical Association 95 (2000): 1143--1156 [JSTOR. Paired article with Bayarri and Berger, above. The discussions and rejoinders (pp. 1157--1172) are valuable.]
George G. Roussas, Contiguity of Probability Measures: Some Applications in Statistics
C. Scott and R. Nowak, "A Neyman-Pearson Approach to Statistical Learning", IEEE Transactions on Information Theory 51 (2005): 3806--3819 [Comments: Learning Your Way to Maximum Power]
Steven G. Self and Kung-Yee Liang, "Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests Under Nonstandard Conditions", Journal of the American Statistical Association 82 (1987): 605--610 [JSTOR]
Tom Shively, Stephen Walker, "On the Equivalence between Bayesian and Classical Hypothesis Testing", arxiv:1312.0302
Jeffrey S. Simonoff, Smoothing Methods in Statistics
Spyros Skouras, "Decisionmetrics: Towards a Decision-Based Approach to Econometrics," SFI Working Paper 2001-11-064 [Applies far outside econometrics. If what you really want to do is to minimize a known loss function, optimizing a conventional accuracy measure, e.g. least squares, can be highly counterproductive.]
Aris Spanos
- "The Curve-Fitting Problem, Akaike-type Model Selection, and the Error Statistical Approach" [Or: could your model selection tell you that Kepler is better than Ptolemy? Technical report, economics dept., Virginia Tech, 2006. PDF]
- "Where do statistical models come from? Revisiting the problem of specification", math.ST/0610849
Yun Ju Sung, Charles J. Geyer, "Monte Carlo likelihood inference for missing data models", Annals of Statistics 35 (2007): 990--1011, arxiv:0708.2184
Alexandre B. Tsybakov, Introduction to Nonparametric Estimation
Sara van de Geer, Empirical Process Theory in M-Estimation
Quang H. Vuong, "Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses", Econometrica 57 (1989): 307--333
Grace Wahba, Spline Models for Observational Data
Michael E. Wall, Andreas Rechtsteiner and Luis M. Rocha, "Singular Value Decomposition and Principal Component Analysis," physics/0208101
Michael D. Ward, Brian D. Greenhill and Kristin M. Bakke, "The perils of policy by p-value: Predicting civil conflicts", Journal of Peace Research 47 (2010): 363--375
Larry Wasserman, "Low Assumptions, High Dimensions", RMM 2 (2011): 201--209
Halbert White, Estimation, Inference and Specification Analysis
Achilleas Zapranis and Apostolos-Paul Refenes, Principles of Neural Model Identification, Selection and Adequacy, with Applications to Financial Econometrics
Sven Zenker, Jonathan Rubin, Gilles Clermont, "From Inverse Problems in Mathematical Physiology to Quantitative Differential Diagnoses", PLoS Computational Biology 3 (2007): e205
Johanna F. Ziegel and Tilmann Gneiting, "Copula Calibration", arxiv:1307.7650

Trygve Haavelmo, "The Probability Approach in Econometrics", Econometrica 12 (1944, supplement): iii--115 [JSTOR]
Jerzy Neyman, "On the Two Different Aspects of the Representative Method: The Method of Stratified Sampling and the Method of Purposive Selection", Journal of the Royal Statistical Society 97 (1934): 558--625 [This is an astonishing paper on multiple levels. One is the thoroughness with which it achieves its main objective, of demonstrating the superiority of random sampling over alternatives. Another is that it seems to be the first conscious use of confidence intervals. Yet another is the way it set the pattern for a huge fraction of all subsequent statistics down to the present.]
Henry Scheffe, "Statistical Inference in the Non-Parametric Case", Annals of Mathematical Statistics 14 (1943): 305--332 [Recommended not as a historical study, but a historical document]
Abraham Wald, "Estimation of a Parameter When the Number of Unknown Parameters Increases Indefinitely with the Number of Observations", Annals of Mathematical Statistics 19 (1948): 220--227

Nicola Giocoli, "From Wald to Savage: homo economicus becomes a Bayesian statistician" [preprint]
Richard William Farebrother, Fitting Linear Relationships: A History of the Calculus of Observations 1750--1900
Erich L. Lehmann, "On the history and use of some standard statistical models", pp. 114--126 in Deborah Nolan and Terry Speed (eds.), Probability and Statistics: Essays in Honor of David A. Freedman
Stephen M. Stigler, "The Epic Story of Maximum Likelihood", Statistical Science 22 (2007): 598--620, arxiv:0804.2996
Aad van der Vaart, "The Statistical Work of Lucien Le Cam", Annals of Statistics 30 (2002): 631--682

Andrew Gelman and CRS, "Philosophy and the practice of Bayesian statistics", submitted to the Journal of the American Statistical Association, arxiv:1006.3868
CRS
- Advanced Data Analysis from an Elementary Point of View
- The Truth About Linear Regression

Peter J. Diggle and Amanda G. Chetwynd, Statistics and Scientific Method: An Introduction for Students and Researchers [A missed opportunity.]
Peter McCullagh, "What is a statistical model?", Annals of Statistics 30 (2002): 1225--1310 [I'm not sure what to think about this; some of the ideas about requiring invariance (or equivariance) under transformations make sense, but I don't know that they lead to anything positive, or need such arcane category-theoretic expression. We should however have cited this in our paper on projectibility and consistency under sampling. (I blame our referees for not making the connection.) --- The discussion and rejoinder are worth reading. Kalman's contribution is very special.]

R. Harald Baayen, Analyzing Linguistic Data: A Practical Introduction to Statistics Using R
Vic Barnett, Comparative Statistical Inference
Anirban DasGupta, Asymptotic Theory of Statistics and Probability
George Estabrook, Computational Approach to Statistical Arguments in Ecology and Evolution
William W. Hsieh, Machine Learning Methods in the Environmental Sciences: Neural Networks and Kernels
Alan Julian Izenman, Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning
Erich L. Lehmann, Elements of Large-Sample Theory
Yudi Pawitan, In All Likelihood: Statistical Modeling and Inference Using Likelihood
Russell A. Poldrack, Statistical Thinking: Analyzing Data in an Uncertain World
Aris Spanos, Probability Theory and Statistical Interference: Econometric Modeling with Observational Data

Odd O. Aalen, Per Kragh Andersen, \Ornulf Borgan, Richard D. Gill, Niels Keiding, "History of applications of martingales in survival analysis", Electronic Journal for History of Probability and Statistics 5 (2009), arxiv:1003.0188
Carolina Armenteros, "From Human Nature to Normal Humanity: Joseph de Maistre, Rousseau, and the Origins of Moral Statistics", Journal of the History of Ideas 68 (2007): 107--130 [Abstract, text links]
Dan Bouk, How Our Days Became Numbered: Risk and the Rise of the Statistical Individual
Phaedra Daipha, Masters of Uncertainty: Weather Forecasters and the Quest for Ground Truth
Michael Friendly and Howard Wainer, A History of Data Visualization and Graphic Communication
Ian Hacking, "The Theory of Probable Inference: Neyman, Peirce and Braithwaite," in Science, Belief and Behavior: Essays in Honor of R. B. Braithwaite ed. D. H. Mellor
Anders Hald, A History of Parametric Statistical Inference from Bernoulli to Fisher, 1713--1935
Orit Halpern, Beautiful Data: A History of Vision and Reason Since 1945
Kendall and Plackett (eds.), Studies in the History of Statistics and Probability
Kyburg, Uncertain Inference
Erich Lehmann and Juliet Shadder, Fisher, Neyman, and the Creation of Classical Statistics
Mayo and Hollander (eds.), Acceptable Evidence: Science and Values in Risk Management
Leland Gerson Neuberg, Conceptual Anomalies in Economics and Statistics: Lessons from the Social Experiment
Theodore Porter, Trust in Numbers
Nico Randeraad, States and Statistics in the Nineteenth Century: Europe by Numbers
Thomas A. Stapleford, The Cost of Living in America: A Political History of Economic Statistics, 1880--2000
Stephen M. Stigler
- The History of Statistics: The Measure of Uncertainty before 1900
- Statistics on the Table: The History of Statistical Concepts and Methods
Gregory D. Wilson, Articulation Theory and Disciplinary Change: Unpacking the Bayesian-Frequentist Paradigm Conflict in Statistical Science [Ph.D. thesis, New Mexico State University, 2001]
S. L. Zabell, Symmetry and Its Discontents: Essays on the History of Inductive Probability

W. J. Dixon and W. L. Nicholson (eds.), Exploring Data Analysis: The Computer Revolution in Statistics (1974)
A Selection of Early Statistical Papers of J. Neyman
J. Neyman and E. S. Pearson, Joint Statistical Papers

Felix Abramovich, Yoav Benjamini, David L. Donoho and Iain M. Johnstone, "Adapting to Unknown Sparsity by controlling the False Discovery Rate", math.ST/0505374 [I don't really care about sparsity, but they promise novel relations between the FDR control and asymptotic minimaxity and complexity-penalized model selection.]
Elizabeth S. Allman, Catherine Matias, John A. Rhodes, "Identifiability of parameters in latent structure models with many observed variables", Annals of Statistics 37 (2009): 3099--3132, arxiv:0809.5032
Nabil I. Al-Najjar, Alvaro Sandroni, Rann Smorodinsky and Jonathan Weinstein, "Testing Theories with Learnable and Predictive Representations", Journal of Economic Theory 145 (2010): 220-3--2217
Barry C. Arnold et al., Conditional Specification of Statistical Models
R. A. Bailey, Design of Comparative Experiments
Sivaraman Balakrishnan, Edward H. Kennedy, Larry Wasserman, "The Fundamental Limits of Structure-Agnostic Functional Estimation", arxiv:2305.04116
Sivaraman Balakrishnan, Alessandro Rinaldo, Don Sheehy, Aarti Singh, Larry Wasserman, "Minimax Rates for Homology Inference", arxiv:1112.5627
Roger Barlow, "Asymmetric Errors", physics/0401042
Ole E. Barndorff-Nielsen and David R. Cox, "Prediction and Asymptotics", Bernoulli 2 (1996): 319--340
Ole E. Barndorff-Nielsen, David R. Cox and Claudia Klüppelberg (eds.), Complex Stochastic Systems
M. J. Bayarri and M. E. Castellanos, "Bayesian Checking of the Second Levels of Hierarchical Models", Statistical Science 22 (2007): 322--343
Zvika Ben-Haim and Yonina C. Eldar, "The Cramer-Rao Bound for Sparse Estimation", arxiv:0905.4378
David R. Bickel
- "The Strength of Statistical Evidence for Composite Hypotheses: Inference to the Best Explanation" [Preprint]
- "Resolving conflicts between statistical methods by probability combination: Application to empirical Bayes analyses of genomic data", arxiv:1111.6174
- "A prior-free framework of coherent inference and its derivation of simple shrinkage estimators" [preprint]
Peter J. Bickel, C. A. J. Klaassen, Y. Ritov and J. A. Wellner, Efficient and Adaptive Estimation for Semiparametric Models
Peter J. Bickel and Bo Li, "Regularization in Statistics", Test 15 (2006): 271--344 [PDF reprint]
Peter J. Bickel and Y. Ritov, "Non-Parametric Estimators Which Can Be `Plugged-In' " UCB Stat. Tech. Rep. 602 [abstract, pdf]
Lucien Birgé, About the non-asymptotic behaviour of Bayes estimators", arxiv:1402.3695
Gilles Blanchard, Sylvain Delattre, Etienne Roquain , "Testing over a continuum of null hypotheses", arxiv:1110.3599
Ingwer Borg and Patrick J. F. Groenen, Modern Multidimensional Scaling: Theory and Application
A. R. Brazzale and A. C. Davison, "Accurate Parametric Inference for Small Samples", Statistical Science 23 (2008): 465--484 [Apparently, a preview for the book.]
A. R. Brazzale, A. C. Davison and N. Reid, Applied Asymptotics: Case Studies in Small-Sample Statistics
Trevor S. Breusch, "Hypothesis Testing in Unidentified Models", Review of Economic Studies 53 (1986): 635--651 [JSTOR]
Florentina Bunea, Alexandre B. Tsybakov, Marten H. Wegkamp, Adrian Barbu, "Spades and Mixture Models", Annals of Statistics 38 (2010): 2525--2558, arxiv:0901.2044
Dizza Bursztyn and David M. Steinberg, "Comparison of designs for computer experiments", Journal of Statistical Planning and Inference 136 (2006): 1103--1119
Emmanuel Candes and Terence Tao, "Near Optimal Signal Recovery from Random Projections and Universal Encoding Strategies", math.CA/0410542
Hervé Cardot, David Degras, Etienne Josserand, "Confidence bands for Horvitz-Thompson estimators using sampled noisy functional data", Bernoulli 19 (2013): 2067--2097, arxiv:1105.2135
Hervé Cardot, Andre Mas and Pascal Sarda, "CLT in Functional Linear Regression Models", math.ST/0508073
Kamalika Chaudhuri and Daniel Hsu, "Convergence Rates for Differentially Private Statistical Estimation", arxiv:1206.6395
Djalil Chafai and Didier Concordet, "On the strong consistency of approximated M-estimators", math.ST/0507102
In Hong Chang and Rahul Mukerjee, "Asymptotic results on the frequentist mean squared error of generalized Bayes point predictors", Statistics and Probability Letters 67 (2004): 65--71 [Note to self: file this one under "de-Bayesing".]
Sandra Chapman, George Rowlands and Nicholas Watkins
- "Extremum statistics: A framework for data analysis," cond-mat/0106015
- "Extremum Statistics and Signatures of Long Range Correlations," cond-mat/0106015
- "The relationship between extremum statistics and universal fluctuations," cond-mat/0007275
Xiaohong Chen, Markus Reiss, "On rate optimality for ill-posed inverse problems in econometrics", arxiv:0709.2003 [Non-parametric instrumental variables?]
Xinjia Chen, "Sequential Tests of Statistical Hypotheses with Confidence Limits", arxiv:1007.4278
Russell Cheng, Non-Standard Parametric Statistical Inference
N. N. Chentsov, Statistical Decision Rules and Optimal Inference
Christine Choirat and Raffaello Seri, "Estimation in Discrete Parameter Models", Statistical Science 27 (2012): 278--293
Bertrand Clarke, "Desiderata for a Predictive Theory of Statistics", Bayesian Analysis 5 (2010): 1--36
Jon Cockayne, Matthew M. Graham, Chris J. Oates, T. J. Sullivan, Onur Teymur, "Testing whether a Learning Procedure is Calibrated", arxiv:2012.12670
John Copas and Shinto Eguchi, "Likelihood for statistically equivalent models", Journal of the Royal Statistical Society B MStrong>72 (2010): 193--217
Daniel Commenges, "Statistical models: Conventional, penalized and hierarchical likelihood", Statistics Surveys 3 (2009): 1--17, arxiv:0808.4042
Daniel Commenges, Helene Jacqmin-Gadda, Cecile Proust, and Jeremie Guedj, "A Newton-Like Algorithm for Likelihood Maximization: The Robust-Variance Scoring Algorithm", math.ST/0610402
D. R. Cox, "A return to an old paper: 'Tests of separate families of hypotheses'", Journal of the Royal Statistical Society B 75 (2013): 207--215
Cox and Wermuth, Multivariate Dependencies: Models, Analysis and Interpretation
Alexandre d'Aspremont, Onureena Banerjee, Laurent El Ghaoui, "First-order methods for sparse covariance selection", math.OC/0609812
I. Dattner, A. Goldenshluger, A. Juditsky, "On deconvolution of distribution functions", arxiv:1006.3918 ["nonparametric estimation of a continuous distribution function from observations with measurement errors... rate optimal estimators based on direct inversion of empirical characteristic function"]
P. L. Davies
- "Data Features", Statistica Neerlandica 49 (1995): 185--245
- "Approximating Data", Journal of the Korean Statistical Society 37 (2008): 191--211 [With discussion and rejoinder. Open access?]
A. Philip Dawid, Steven de Rooij, Glenn Shafer, Alexander Shen, Nikolai Vereshchagin, Vladimir Vovk, "Martingales and p-values as measures of evidence", arxiv:0912.4269
Pierpaolo De Blasi and Stephen G. Walker, "Bayesian Estimation of the Discrepancy with Misspecified Parametric Models", Bayesian Analysis 8 (2013): 781--800
Aurore Delaigle, Peter Hall and Jiashun Jin, "Robustness and accuracy of methods for high dimensional data analysis based on Student's t-statistic", Journal of the Royal Statistical Society B forthcoming (2011)
Joshua V Dillon, Guy Lebanon, "Stochastic Composite Likelihood", Journal of Machine Learning Research 11 (2010): 2597--2633, apparently the final version of arxiv:1003.0691
David L. Donoho, "Estimation by epsilon-nets" (Le Cam Lecture, 2003; find citation)
David L. Donoho and Richard C. Liu, "The ``Automatic'' Robustness of Minimum Distance Functionals", Annals of Statistics 16 (1988): 552--586
David L. Donoho and Jared Tanner, "Observed Universality of Phase Transitions in High-Dimensional Geometry, with Implications for Modern Data Analysis and Signal Processing", arxiv:0906.2530
Mathias Drton, "Likelihood ratio tests and singularities", Annals of Statistics 37 (2009): 979--1012, arxiv:math.ST/0703360
Mathias Drton and Seth Sullivant, "Algebraic statistical models", math.ST/0703609
Jin-Chuan Duan and Andras Fulop, "A stable estimator of the information matrix under EM for dependent data", Statistics and Computing 21 (2011): 83--91
John C. Duchi, Michael I. Jordan, Martin J. Wainwright, "Local Privacy and Statistical Minimax Rates", arxiv:1302.3203
Morris L. Eaton, Multivariate Statistics: A Vector Space Approach ["a version of multivariate statistical theory in which vector space and invariance methods replace, to a large extent, more traditional multivariate methods"]
Sam Efromovich
- "Distribution estimation for biased data", Journal of Statistical Planning and Inference 124 (2004): 1--43
- "Dimension Reduction and Adaptation in Conditional Density Estimation", Journal of the American Statistical Association 105 (2010): 761--774
- Nonparametric Curve Estimation
Bradley Efron and Trevor Hastie, Computer Age Statistical Inference: Algorithms, Evidence, and Data Science
Thibault Espinasse, Paul Rochet, "A Cramér-Rao inequality for non differentiable models", arxiv:1204.2763
Michael Evans and Gun Ho Jang, "Invariant P-values for model checking", Annals of Statistics 38 (2010): 512--525
Stefano Favaro, Antonio Lijoi, and Igor Prünster, "Asymptotics for a Bayesian nonparametric estimator of species variety", Bernoulli 18 (2012): 1267--1283
Thomas S. Ferguson, A Course in Large Sample Theory
Jean-David Fermanian and Bernard Salanié "A Nonparametric Simulated Maximum Likelihood Estimation Method", Econometric Theory 20 (2004): 701--734
Ana K. Fermin and Carenne Ludena, "A Statistical view of Iterative Methods for Linear Inverse Problems", math.ST/0504064
Luisa Turrin Fernholz, von Mises Calculus for Statistical Functionals
S. E. Fienberg, P. Hersh, A. Rinaldo and Y. Zhou, "Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data", arxiv:0709.3535
D. A. S. Fraser, N. Reid, E. Marras and G. Y. Yi, "Default priors for Bayesian and frequentist inference", Journal of the Royal Statistical Society B 72 (2010): 631--654
A. Fraysse, "Why minimax is not that pessimistic", arxiv:0902.3311 [Because, apparently, learning a generic function is just as hard as minimax leads you to think. Bummer if true.]
Magalie Fromont and Béatrice Laurent, "Adaptive goodness-of-fit tests in a density model", Annals of Statistics 34 (2006): 680--720, math.ST/0607013
Surya Ganguli and Haim Sompolinsky, "Statistical Mechanics of Compressed Sensing", Physical Review Letters 104 (2010): 188701
Seymour Geisser, Predictive Inference
Josep Ginebra, "On the Measure of the Information in a Statistical Experiment", Bayesian Analysis 2 (2007): 167--212
Tilmann Gneiting and Roopesh Ranjan, "Combining predictive distributions", Electronic Journal of Statistics 7 (2013): 1747--1782
Yuri Golubev, Vladimir Spokoiny, "Exponential bounds for minimum contrast estimators", arxiv:0901.0655
Grassberger and Nadal (eds.), From Statistical Physics to Statistical Inference and Back
Ulf Grenander, Abstract Inference
Peter Guttorp, "Statistics and Climate", Annual Review of Statistics and Its Applications 1 (2014): 87--101
Robert Hable, "Asymptotic Normality of Support Vector Machines for Classification and Regression", arxiv:1010.0535
Peter Hall, Hans-Georg Müller, Fang Yao, "Estimation of functional derivatives", Annals of Statistics 37 (2009): 3307--3329, arxiv:0909.1157
Marc Hallin, Davy Paindaveine, and Miroslav Siman, "Multivariate quantiles and multiple-output regression quantiles: From L1 optimization to halfspace depth", Annals of Statistics 38 (2010): 635--669
Bruce E. Hansen, "Interval Forecasts and Parameter Uncertainty", Journal of Econometrics 135 (2006): 377--398 [Preprint]
Wolfgang Härdle, Marlene Müller, Stefan Sperlich and Axel Werwatz, Nonparametric and Semiparametric Models: An Introduction
Matthew T. Harrison, "Valid p-Values using Importance Sampling", arxiv:104.2910
David F. Hendry and Jurgen A. Doornik, Empirical Model Discovery and Theory Evaluation: Automatic Selection Methods in Econometrics
David A. Hensher et al. Applied Choice Analysis: A Primer ["Application of quantitative statistical methods to study choices made by individuals"]
Tim Hesterberg, Nam Hee Choi, Lukas Meier, Chris Fraley, "Least angle and $\ell_1$ penalized regression: A review", Statistics Surveys 2 (2008): 61--93, arxiv:0802.0964
David Hinkley, "Predictive Likelihood", Annals of Statistics 7 (1979): 718--728
Peter D. Hoff, "A hierarchical eigenmodel for pooled covariance estimation", Journal of the Royal Statistical Society B 71 (2009): 971--992
Peter Hoff, Jon Wakefield, "Bayesian sandwich posteriors for pseudo-true parameters", arxiv:1211.0087
Torsten Hothorn, Thomas Kneib, Peter Bühlmann, "Conditional transformation models", Journal of the Royal Statistical Society B forthcoming
Joel L. Horowitz, Semiparametric and Nonparametric Methods in Econometrics
Serkan Hosten, Amit Khetan and Bernd Sturmfels, "Solving the Likelihood Equations", math.ST/0408270
Ping-Hung Hsieh, "A nonparametric assessment of model adequacy based on Kullback-Leibler divergence", Statistics and Computing 23 (2013): 149--162
Mia Hubert, Peter J. Rousseeuw, Stefan Van Aelst, "High-Breakdown Robust Multivariate Methods", Statistical Science 23 (2008): 92--119, arxiv:0808.0657
Eyke Hüllermeier, Willem Waegeman, "Aleatoric and Epistemic Uncertainty in Machine Learning: An Introduction to Concepts and Methods", arxiv:1910.09457
Alexander Ilin, Tapani Raiko, "Practical Approaches to Principal Component Analysis in the Presence of Missing Values", Journal of Machine Learning Research 11 (2010): 1957--2000
Stefano M. Iacus and Davide La Torre
- "Approximating Distribution Functions by Iterated Function Systems," math.PR/0111152
- "Nonparametric estimation of distribution and density functions in presence of missing data: an IFS approach," math.PR/0302016
Jean Jacod and Michael Sorensen, "A review of asymptotic theory of estimating functions", Statistical Inference for Stochastic Processes 21 (2018): 415--434
Leah Jager, Jon A. Wellner, "Goodness-of-fit tests via phi-divergences", Annals of Statistics 35 (2007): 2018--2053, arxiv:math/0603238
Thomas Jaki and and R. Webster West, "Maximum Kernel Likelihood Estimation", Journal of Computational and Graphical Statistics 17 (2008): 976--993
Michael Jansson and Demian Pouzo, "Towards a General Large Sample Theory for Regularized Estimators", arxiv:1712.07248
Jiantao Jiao, Kartik Venkat, Tsachy Weissman, "Maximum Likelihood Estimation of Functionals of Discrete Distributions", arxiv:1406.6959
Adam M. Johansen, Arnaud Doucet and Manuel Davy, "Particle methods for maximum likelihood estimation in latent variable models", Statistics and Computing 18 (2008) : 47--57
Ana Justel, Daniel Pena, Ruben Zamar, "A multivariate Kolmogorov-Smirnov test of goodness of fit", Statistics and Probability Letters 35 (1997): 251--259 [PDF reprint via Prof. Pena]
Paul Kabaila and Kreshna Syuhada, "The Asymptotic Efficiency of Improved Prediction Intervals", arxiv:0901.1911
Ata Kaban, "Non-parametric detection of meaningless distances in high dimensional data", Statistics and Computing 22 (2011): 375--385
Oscar Kempthorne, "The classical problem of inference--goodness of fit", Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, 235--249
Ioannis Kontoyiannis and S. P. Meyn, "Computable exponential bounds for screened estimation and simulation", Annals of Applied Probability 18 (2008): 1491--1518, arxiv:math/0612040
Jim Kuelbs and Anand N. Vidyashankar, "Asymptotic inference for high-dimensional data", Annals of Statistics 38 (2010): 836--869
Solomon Kullback, "Probability densities with given marginals," Annals of Mathematical Statistics 39 (1968): 1236--1243
Masayuki Kumon and Akimichi Takemura, "On a simple strategy weakly forcing the strong law of large numbers in the bounded forecasting game", math.PR/0508190 ["In the framework of the game-theoretic probability of Shafer and Vovk (2001) ... construct an explicit strategy weakly forcing the strong law of large numbers (SLLN) in the bounded forecasting game. ... simple finite-memory strategy based on the past average of Reality's moves, which weakly forces the strong law of large numbers with the convergence rate of $O(\sqrt{\log n/n})$.... We show that if Reality violates SLLN, then the exponential growth rate of Skeptic's capital process is explicitly described in terms of the Kullback divergence between the average of Reality's moves when she violates SLLN and the average when she observes SLLN."]
Tze Leung Lai, Shulamith T. Gross, and David Bo Shen, "Evaluating probability forecasts", Annals of Statistics 39 (2011): 2356--2382, arxiv:1202.5140
Mikhail Langovoy, "Data-driven goodness-of-fit tests", arxiv:0708.0169
Nicole A. Lazar, "A Review of Empirical Likelihood", Annual Review of Statistics and Its Application 8 (201): 329--344
Guy Lebanon, The Analysis of Data
Youngjo Lee and John A. Nelder, "Likelihood Inference for Models with Unobservables: Another View", Statistical Science 24 (2009): 255--269, arxiv:1010.0303 [with discussion and replies following]
M. Lerasle, R. I. Oliveira, "Robust Empirical Mean Estimators", arxiv:1112.3914
Bo Li and Marc G. Genton, "Nonparametric Identification of Copula Structures", Journal of the American Statistical Association 108 (2013): 666--675
Feng Liang, Sayan Mukherjee, Mike West, "The Use of Unlabeled Data in Predictive Modeling", Statistical Science 22 (2007): 189--205, arxiv:0710.4618
Perry Liang, Francis Bach, Guillaume Bouchard and Michael I. Jordan, "Asymptotically Optimal Regularization in Smooth Parametric Models" [PDF preprint via Prof. Jordan]
Bruce G. Lindsay, Marianthi Markatou, Surajit Ray, Ke Yang, Shu-Chuan Chen, "Quadratic distances on probabilities: A unified foundation", Annals of Statistics 36 (2008): 983--1006, arxiv:0804.0991
Richard A. Lockhart, "Conditional limit laws for goodness-of-fit tests", Bernoulli 18 (2012): 857--882
Thomas Lumley, Complex Surveys: A Guide to Analysis Using R
Victor S. L'vov, Anna Pomyalov and Itamar Procaccia, "Outliers, Extreme Events and Multiscaling," nlin.CD/0009049
Christian K. Machens, "Adaptive sampling by information maximization," physics/0112070
Edouard Machery, "Power and Negative Results", Philosophy of Science 79 (2012): 808--820
Ryan Martin, Chuanhai Liu, "Inferential models: A framework for prior-free posterior probabilistic inference", arxiv:1206.4091
McCabe and Tremayne, Modern Asymptotic Theory
Peter McCullagh, Tensor Methods in Statistics
Mead et al., Statistical Principles for the Design of Experiments: Applications to Real Experiments
Alexander Meister, Deconvolution Problems in Nonparametric Statistics ["e.g., density estimation based on contaminated data, errors-in-variables regression, and image reconstruction"]
Vladimir N. Minin, John D. O'Brien, Arseni Seregin, "Empirically corrected estimation of complete-data population summaries under model misspecification", arxiv:0911.0930
David Mumford and Agnes Desolneux, Pattern Theory: The Stochastic Analysis of Real-World Signals
Jerzy Neyman and Elizabeth L. Scott, "Consistent Estimates Based on Partially Consistent Observations", Econometrica 16 (1948): 1--32 [What to do when some parameters only influence a finite number of observations, even as the data size grows to infinity...]
Richard Nickl, "Donsker-type theorems for nonparametric maximum likelihood estimators", Probability Theory and Related Fields 138 (2007): 411--449
Wojciech Olszewski, Alvaro Sandroni, "A nonmanipulable test", Annals of Statistics 37 (2009): 1013--1039, arxiv:0904.0338
Giulio Palombo, "Multivariate Goodness of Fit Procedures for Unbinned Data: An Annotated Bibliography", arxiv:1102.2407
Leandro Pardo, Statistical Inference Based on Divergence Measures
William Perkins, Mark Tygert, Rachel Ward, "Significance testing without truth", arxiv:1301.1208
Mario Peruggia, Jason Hsu, and Yifan Huang, "Cartesian displays of many interval estimates", Electronic Journal of Statistics 7 (2013): 91--104
Tomaz Podobnik and Tomi Zivko, "On Consistent and Calibrated Inference about the Parameters of Sampling Distributions", physics/0508017
Thorsten Poeschel, Werner Ebeling, and Helge Rose, "Guessing probability distributions from small samples", Journal of Statistical Physics 80 (1995): 1443, cond-mat/0203467
Dimitris N. Politis, Model-Free Prediction and Regression: A Transformation-Based Approach to Inference
David Pollard, "Some thoughts on Le Cam's statistical decision theory", arxiv:1107.3811
Joel Predd, Robert Seiringer, Elliott H. Lieb, Daniel Osherson, Vincent Poor, Sanjeev Kulkarni, "Probabilistic coherence and proper scoring rules", IEEE Transactions on Information Theory 55 (2009): 4786, arxiv:0710.3183
Ramsay and Silverman, Functional Data Analysis
C. Radhakrishna Rao, "Diversity: Its measurement, decomposition, apportionment and analysis", Sankhya: The Indian Journal of Statistics 44(A) (1982): 1--22 [Sankhya is not in JSTOR! Why is Sankhya not in JSTOR?!?!]
R.-D. Reiss and M. Thomas, Statistical Analysis of Extreme Values: With Applications to Insurance, Finance, Hydrology and Other Fields
Irina Rish, Sparse Modeling; Theory, Algorithms, and Applications
Irina Rish et al. (eds.), Practical Applications of Sparse Modeling
James Robins, Lingling Li, Eric Tchetgen, Aad van der Vaart, "Higher order influence functions and minimax estimation of nonlinear functionals", arxiv:0805.3040
Sylvain Rubenthaler, Tobias Ryden and Magnus Wiktorsson, "Fast simulated annealing in $\R^d$ and an application to maximum likelihood estimation", math.PR/0609353
Birgit Rudloff, Ioannis Karatzas, "Testing composite hypotheses via convex duality", Bernoulli 16 (2010): 1224--1239, arxiv:0809.4297
Emilio Seijo and Bodhisattva Sen, "A continuous mapping theorem for the smallest argmax functional", Electronic Journal of Statistics 5 (2011): 421--439
Haochang Shou, Russell T. Shinohara, Han Liu, Daniel Reich, and Ciprian Crainiceanu, "Soft Null Hypotheses: A Case Study of Image Enhancement Detection in Brain Lesions", Johns Hopkins University, Dept. of Biostatistics Working Paper 257
Karline Soetaert, Thomas Petzoldt, "Inverse Modelling, Sensitivity and Monte Carlo Analysis in R Using Package FME", Journal of Statistical Software 33 (2010): 3
Jascha Sohl-Dickstein, "The Natural Gradient by Analogy to Signal Whitening, and Recipes and Tricks for its Use", arxiv:1205.1828
Jascha Sohl-Dickstein, Peter Battaglino, Michael R. DeWeese, "Minimum Probability Flow Learning", arxiv:0906.4779
Christopher G. Small, The Statistical Theory of Shape
Aris Spanos
- "Revisiting the Omitted Variables Argument: Substantive vs. Statistical Adequacy" [PDF preprint]
- "Is Frequentist Testing Vulnerable to the Base-Rate Fallacy?", Philosophy of Science 77 (2010): 565--583
Vladimir Spokoiny
- "A penalized exponential risk bound in parametric estimation", arxiv:0903.1721
- "Parametric estimation. Finite sample theory", Annals of Statistics 40 (2012): 2877--2909, arxiv:1111.3029
Pablo Sprechmann, Ignacio Ramírez, Guillermo Sapiro, Yonina Eldar, "C-HiLasso: A Collaborative Hierarchical Sparse Modeling Framework", arxiv:1006.1346
Trevor J. Sweeting, "Parameter-Based Asymptotics", Biometrika 79 (1992): 219--230 [JSTOR]
Olivier Thas, Comparing Distributions [mostly about goodness-of-fit tests]
Samuel Vaiter, Mohammad Golbabaee, Jalal Fadili, Gabriel Peyré, "Model Selection with Piecewise Regular Gauges", arxiv:1307.2342
Mark J. van der Laan and Sherri Rose, Targeted Learning: Causal Inference for Observational and Experimental Data
Aki Vehtari and Janne Ojanen, "A survey of Bayesian predictive methods for model assessment, selection and comparison", Statistics Surveys 6 (2012): 142--228
Xiaogang Wang and James V. Zidek, "Selecting likelihood weights by cross-validation", Annals of Statistics 33 (2005): 463--500, math.ST/0505599
Fabian L. Wauthier, Michael I. Jordan, "Heavy-Tailed Processes for Selective Shrinkage", arxiv:1006.3901
Holger Wendland, Scattered Data Approximation
Halbert White
- Asymptotic Theory for Econometricians [Useful source, it seems, for non-IID central limit theorems]
- "A Reality Check for Data Snooping", Econometrica 68 (2000): 1097--1126
Christopher K. I. Williams, "How to Pretend That Correlated Variables Are Independent by Using Difference Observations", Neural Computation 17 (2005): 1--6

Statistics of Inequality and Discrimination
The Half-True Essentials of Asymptotic Statistics: Materials for a First-and-Last Course in Statistical Theory