Bootstrapping, and Other Resampling Methods

Last update: 21 Apr 2025 21:17
First version: 23 May 2010

Bootstrapping is a way of figuring out the properties of statistical estimators (and other procedures, like hypothesis tests) by simulation. What we would really like to know his how different our answers could have been, if we re-ran our experiment. We can't actually do this, but we can fit a model to our data and simulate from it, and see what answer we'd get from the simulations. We can even do this from exceedingly general non-parametric estimates, like re-sampling the original data. This is a brilliant idea, and my default way of handling the uncertainty of estimation in complex models or with complex systems. But having written 3500 words on this for a magazine, plus a textbook chapter, I feel absolutely no inclination to explain myself further.

I most interested in resampling techniques for dependent data, and would be ecstatic if I could figure out a non-parametric bootstrap for networks. (Update: see my paper with Alden Green; the ecstasy was real but inevitably fleeting.) --- Presumably universal prediction algorithms could be used for this purpose?

A. C. Davison and D. V. Hinkley, Bootstrap Methods and their Applications
Bradley Efron
- "Bootstrap Methods: Another Look at the Jackknife", Annals of Statistics 7 (1979): 1--26 [The original paper; staggeringly understandable]
- The Bootstrap, the Jackknife, and Other Resampling Plans [1982 notes volume]

Peter Bühlmann
- "Bootstraps for Time Series", Statistical Science 17 (2002): 52--72
- "Sieve Bootstrap with Variable Length Markov Chains for Stationary Categorical Time Series", Journal of the American Statistical Association 97 (2002): 443--456 [PDF preprint]
Paul Doukhan, Silika Prohl, and Christian Y. Robert, "Subsampling weakly dependent time series and application to extremes", arxiv:1009.0805 [Thanks to Dr. Prohl for a pre-pre-print]
A. C. Field and A. H. Welsh, "Bootstrapping clustered data", Journal of the Royal Statistical Society B 69 (2007): 369--390
Silvia Goncalves and Halbert White, "Maximum likelihood and the bootstrap for nonlinear dynamic models", Journal of Econometrics 119 (2004): 199--219
Peter Hall, "On Bootstrap Confidence Intervals in Nonparametric Regression", Annals of Statistics 20 (1992): 695--711
Peter Hall and Joel Horowitz, "A simple bootstrap method for constructing nonparametric confidence bands for functions", Annals of Statistics 41 (2013): 1892--1921, arxiv:1309.4864
Peter Hall and Tapabrata Maiti, "On parametric bootstrap methods for small area prediction", Journal of the Royal Statistical Society B 68 (2006): 221--238
Tim Hesterberg, "What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum", arxiv:1411.5279
Ariel Kleiner, Ameet Talwalkar, Purnamrita Sarkar, Michael I. Jordan
- "A Scalable Bootstrap for Massive Data", arxiv:1112.5016
- "The Big Data Bootstrap", arxiv:1206.6415
Hans R. Künsch, "The Jackknife and the Bootstrap for General Stationary Observations", Annals of Statistics 17 (1989): 1217--1241
S. N. Lahiri, Resampling Methods for Dependent Data [Mini-review]
Elizaveta Levina and Peter J. Bickel, "Texture synthesis and nonparametric resampling of random fields", Annals of Statistics 34 (2006): 1751--1773
Peter Mccullagh, "Resampling and exchangeable arrays", Bernoulli 6 (2000): 285--301 [Well, half-recommended. Everything he says here is right, but I think one could construct an exactly parallel argument to show that resampling could not get correct standard errors for the mean of a stationary sequence; and of course it can't if you insist on resampling dependent data as though it were independent. See the papers by Owen and by Owen and Eckles.]
Art B. Owen, "The pigeonhole bootstrap", Annals of Applied Statistics 1 (2007): 386--411
Art B. Owen and Dean G. Eckles, "Bootstrapping data arrays of arbitrary order", Annals of Applied Statistics 6 (2012): 895--927, arxiv:1106.2125
Xianyang Zhang and Guang Cheng, "Bootstrapping High Dimensional Time Series", arxiv:1406.1037

William D. Fahy, CRS and Ryan Christopher Sullivan, "A universally applicable method of calculating confidence bands for ice nucleation spectra derived from droplet freezing experiments", Atmospheric Measurement Techniques (under review, 2022)
Alden Green and CRS, "Bootstrapping Exchangeable Random Graphs", Electronic Journal of Statistics 16 (2022): 1058--1095, arxiv:1711.00813
Robert Lunde and CRS, "Bootstrapping Generalization Error Bounds for Time Series", arxiv:1711.02834
CRS
- "The Bootstrap", American Scientist 98 (2010): 186--190 [Self-commentary]
- The chapter on the bootstrap in Advanced Data Analysis from an Elementary Point of View

Michael R. Chernick, Bootstrap Methods: A Practitioner's Guide

Fumiya Akashi, Shuyang Bai, Murad S. Taqqu, "Robust Regression on Stationary Time Series: A Self-Normalized Resampling Approach", Journal of Time Series Analysis 39 (2018): 417--432
Sylvain Arlot, Gilles Blanchard, and Etienne Roquain, "Some nonasymptotic results on resampling in high dimension, I: Confidence regions", Annals of Statistics 38 (2010): 51--82
Eytan Bakshy, Dean Eckles, "Uncertainty in Online Experiments with Dependent Data: An Evaluation of Bootstrap Methods", arxiv:1304.7406
Raymond Chambers and Hukum Chandra, "A Random Effect Block Bootstrap for Clustered Data", Journal of Computational and Graphical Statistics 22 (2013): 452--470
Snigdhansu Chatterjee and Arup Bose, "Generalized bootstrap for estimating equations", Annals of Statistics 33 (2005): 414--436, math.ST/0504515
Guang Cheng and Jianhua Z. Huang, "Bootstrap consistency for general semiparametric M-estimation", Annals of Statistics 38 (2010): 2884--2915
Andreas Christmann, Matias Salibian-Barrera, Stefan Van Aelst, "On the stability of bootstrap estimators", arxiv:1111.1876
Miklos Csorgo, Masoud M Nasari, "Bootstrapped Pivots for Sample and Population Means and Distribution Functions", arxiv:1307.5476
H. Dehling, O.Sh. Sharipov, M. Wendler, "Bootstrap for dependent Hilbert space-valued random variables with application to von Mises statistics", arxiv:1312.3870
Herold Dehling, Martin Wendler, "Central Limit Theorem and the Bootstrap for U-Statistics of Strongly Mixing Data", arxiv:0811.1888
Mathias Drton and Benjamin Williams, "Quantifying the failure of bootstrap likelihood ratio tests", Biometrika 98 (2011): 919--934
Bradley Efron, "Bayesian inference and the parametric bootstrap", Annals of Applied Statistics 6 (2012): 1971--1997
Efron and Tibshirani, An Introduction to the Bootstrap
Mikhail Ermakov, "A moderate deviation principle for empirical bootstrap measure", arxiv:1206.1459
Yanqin Fan, Qi Li and Insik Min, "A Nonparametric Bootstrap Test of Conditional Distributions", Econometric Theoy 22 (2006): 587--613
Cheng-Der Fuh and Inchi Hu, "Estimation in hidden Markov models via efficient importance sampling", Bernoulli 13 (2007): 492--513, arxiv:0708.4152
Jurgen Franke, Jens-Peter Kreiss and Enno Mammen, "Bootstrap of Kernel Smoothing in Nonlinear Time Series", Bernoulli 8 (2002): 1--37
Axel Gandy, Patrick Rubin-Delanchy, "An algorithm to compute the power of Monte Carlo tests with guaranteed precision", arxiv:1110.1248
Philip Good
- Permutation, Parametric, and Bootstrap Tests of Hypotheses
- Resampling Methods: A Practical Guide to Data Analysis
Peter G. Hall, The Bootstrap and Edgeworth Expansion,
Peter Hall and Hugh Miller, "Bootstrap confidence intervals and hypothesis tests for extrema of parameters", Biometrika 97 (2010): 881--892 [E.g., looking at the largest (or smallest) of a bunch of regression coefficients]
Eunju Hwang, Dong Wan Shin, "Stationary bootstrapping for non-parametric estimator of nonlinear autoregressive model", Journal of Time Series Analysis forthcoming (2011)
Arnold Janssen and Thorsten Pauls, "How Do Bootstrap and Permutation Tests Work?", The Annals of Statistics 31 (2003): 768--806
Carsten Jentsch, Dimitris N. Politis and Efstathios Paparoditis, "Block Bootstrap Theory for Multivariate Integrated and Cointegrated Processes", Journal of Time Series Analysis forthcoming (2014)
Jens-Peter Kreiss, Efstathios Paparoditis, Dimitris N. Politis, "On the range of validity of the autoregressive sieve bootstrap", Annals of Statistics 39 (2011): 2103--2130, arxiv:12016211
S. N. Lahiri
- "Asymptotic expansions for sums of block-variables under weak dependence", arxiv:math/0606739
- "Edgeworth expansions for studentized statistics under weak dependence", Annals of Statistics 38 (2010): 388--434
S. N. Lahiri, C. Spiegelman, J. Appiah, and L. Rilett, "Gap bootstrap methods for massive data sets with an application to transportation engineering", Annals of Applied Statistics 6 (2012): 1552--1587
Stephen M. S. Lee and P. Y. Lai, "Improving coverage accuracy of block bootstrap confidence intervals", arxiv:0804.4361
Anne Leucht, "Degenerate U- and V-statistics under weak dependence: Asymptotic theory and bootstrap consistency", Bernoulli 18 (2012): 552--585
Chris J. Lloyd, "Some non-asymptotic properties of parametric bootstrap P-values in discrete models", Electronic Journal of Statistics 6 (2012): 2449--2462
W. S. Lok and Stephen M. S. Lee, "Robustness Diagnosis for Bootstrap Inference", Journal of Computational and Graphical Statistics 20 (2011): 448--460
Marco Meyer and Jens-Peter Kreiss, "On the Vector Autoregressive Sieve Bootstrap", Journal of Time Series Analysis forthcoming (2014)
Michael H. Neumann, Efstathios Paparoditis, "Goodness-of-fit tests for Markovian time series models: Central limit theory and bootstrap approximations", Bernoulli 14 (2008): 14--46, arxiv:0803.0835
Daniel J. Nordman, "A note on the stationary bootstrap's variance", Annals of Statistics 37 (2009): 359--370, arxiv:0903.0474
Daniel J. Nordman and Soumendra N. Lahiri, "Convergence rates of empirical block length selectors for block bootstrap", Bernoulli 20 (2014): 958-978
Dimitris N. Politis, "The Impact of Bootstrap Methods on Time Series Analysis", Statistical Science 18 (2003): 219--230
Dimitris N. Politis, Joseph P. Romano and Michael Wolf, Subsampling
Zacharias Psaradakis
- "A sieve bootstrap test for stationarity," Statistics and Probability Letters 62 (2003): 263--274
- "Blockwise bootstrap testing for stationarity", Statistics and Probability Letters 76 (2006): 562--570
Joseph P. Romano, Azeem M. Shaikh, "On the Uniform Asymptotic Validity of Subsampling and the Bootstrap", Annals of Statistics 40 (2012): 2798--2822, arxiv:1204.2762
Matias Salibian-Barrera, Stefan van Aelst and Gert Willems, "Fast and robust bootstrap", Statistical Methods and Applications 17 (2009): 41--71
Jun Shao and Dongsheng Tu, The Jackknife and the Bootstrap
Xiaofeng Shao, "The Dependent Wild Bootstrap", Journal of the American Statistical Association 105 (2010): 218--235
Xiaofeng Shao and Dimitris N. Politis, "Fixed $b$ subsampling and the block bootstrap: improved confidence sets based on $p$-value calibration", Journal of the Royal Statistical Society B forthcoming (2012)
Olimjon Sh. Sharipov and Martin Wendler
- "Bootstrap for the Sample Mean and for U-Statistics of Stationary Processes", arxiv:0911.3083
- "Normal Limits, Nonnormal Limits, and the Bootstrap for Quantiles of Dependent Data", arxiv:1204.5633
Jan Sprenger, "Science without (parametric) models: the case of bootstrap resampling", Synthese 180 (2011): 65--76
Johannes Tewes, "Block Bootstrap for the Empirical Process of Long‐Range Dependent Data", Journal of Time Series Analysis 39 (2018): 28--53
José Trashorras, Olivier Wintenberger, "Large deviations for bootstrapped empirical measures", arxiv:1110.4620
Lionel Truquet, "On a nonparametric resampling scheme for Markov random fields", Electronic Journal of Statistics 5 (2011): 1503--1536
Stefan Van Aelst and Gert Willems, "Fast and Robust Bootstrap for Multivariate Inference: The R Package FRB", Journal of Statistical Software 53:3 (2013)
Stanislav Volgushev, Xiaofeng Shao, "A general approach to the joint asymptotic analysis of statistics from sub-samples", arxiv:1305.5618
D. Volk and M. G. Stepanov, "Resampling methods for document clustering," cond-mat/0109006
Herwig Wendt, Patrice Abry and Stephane Jaffard, "Bootstrap for Empirical Multifractal Analysis", IEEE Signal Processing Magazine July 2007, pp. 38--48 [+ technical papers by these authors]
Michael Wolf and Dan Wunderli, "Bootstrap Joint Prediction Regions", Journal of Time Series Analysis forthcoming (2014)
Guosheng Yin and Yanyuan Ma, "Pearson-type goodness-of-fit test with bootstrap maximum likelihood estimation", Electronic Journal of Statistics 7 (2013): 412--427
Ting Zhang, Hwai-Chung Ho, Martin Wendler, Wei Biao Wu, "Block Sampling under Strong Dependence", arxiv:1312.5807
Abdelhak M. Zoubir and D. Robert Iskander, Bootstrap Techniques for Signal Processing