For a project I just finished, I produced this figure:
The project gave me an excuse to finally read Efron's original paper on the bootstrap, where my eye was caught by "Remark A" on p. 19 (my linkage):
Method 2, the straightforward calculation of the bootstrap distribution by repeated Monte Carlo sampling, is remarkably easy to implement on the computer. Given the original algorithm for computing R, only minor modifications are necessary to produce bootstrap replications R*1, R*2, ..., R*N. The amount of computer time required is just about N times that for the original computations. For the discriminant analysis problem reported in Table 2, each trial of N = 100 replications, [sample size] m = n = 20, took about 0.15 seconds and cost about 40 cents on Stanford's 370/168 computer. For a single real data set with m = n = 20, we might have taken N=1000, at a cost of \$4.00.
My bootstrapping used N = 800, n = 2527. Ignoring the differences between fitting Efron's linear classifier and my smoothing spline, creating my figure would have cost \$404.32 in 1977, or \$1436.90 in today's dollars (using the consumer price index). But I just paid about \$2400 for my laptop, which will have a useful life of (conservatively) three years, a ten-minute pro rata share of which comes to 1.5 cents.
The inexorable economic logic of the price mechanism forces me to conclude that bootstrapping is about 100,000 times less valuable for me now than it was for Efron in 1977.
Update: Thanks to D.R. for catching a typo.
[1]: Yes, yes, unless the real regression function is a smooth piecewise cubic there's some approximation bias from using splines, so this is really a confidence band for the optimal spline approximation to the true regression curve. I hope you are as scrupulous when people talk about confidence bands for "the" slope of their linear regression models. (Added 7 March to placate quibblers.)
Posted at March 04, 2010 13:35 | permanent link