Notebooks

## Nearest Neighbor Methods in Statistics and Data Mining

24 Feb 2024 17:24

--- In lieu of explaining here, I will point readers at my lecture notes on this subject from my class on data mining / statistical learning (2022 version of said notes). Much of those notes are "Cover (1968a, 1968b) and Cover and Hart (1967) made even simpler"...

--- Something which, in retrospect, I should have paid more attention to: One-nearest-neighbor regression, and one-nearest-neighbor classification, "interpolates", i.e., it always fits the training data perfectly. (Every training point is its own nearest neighbor.) And yet 1NN regression or classification does generalize; in fact the classical analyses by Cover show that, as the sample size goes to infinity, its risk approaches twice the risk of the optimal regression function/classifier. (There are some continuity assumptions buried in that, but not onerous ones.) Now if the optimal classifier is only right 75% of the time, that's guaranteeing 1NN will approach a coin-toss, but if the optimal classifier has an error rate of, say, 0.001%, 1NN will approach an error rate of no more than 0.002%, which doesn't sound that bad. The point, however, is that this clearly shows it's possible both to interpolate and to generalize. So some people, like me, really shouldn't have been so surprised when this turned out to happen a lot with deep neural networks.

Recommended, close-ups:
• Alberto Abadie and Guido W. Imbens, "Large Sample Properties of Matching Estimators for Average Treatment Effects", Econometrica 74 (2006): 235--267 [PDF. "Matching" is essentially how you say "nearest-neighbors" in causal-inference-ese...]
• Mona Azadkia, "Optimal choice of $$k$$ for $$k$$-nearest neighbor regression", arxiv:1909.05495
• Gérard Biau, Benoît Cadre, Laurent Rouvière, "Statistical analysis of $$k$$-nearest neighbor collaborative recommendation", Annals of Statistics 38 (2010): 1568--1592, arxiv:1010.0499
• Samory Kpotufe, "k-NN Regression Adapts to Local Intrinsic Dimension", pp. 729--737 in NIPS-2011
Recommended, historical:
• Thomas M. Cover
• "Estimation by the Nearest Neighbor Rule", IEEE Transactions on Information Theory 14 (1968): 50--55 [PDF reprint]
• "Rates of Convergence for Nearest Neighbor Procedures", pp. 413--415 in B. K. Kinariwala and F. F. Kuo (eds.), Proceedings of the Hawaii International Conference on Systems Sciences (Honolulu: University of Hawaii Press, 1968) [PDF reprint]
• Thomas M. Cover and P. E. Hart, "Nearest Neighbor Pattern Classification", IEEE Transactions on Information Theory 13 (1967): 21--27 [PDF reprint]
Not sure if I recommend:
• Marcello Pelillo, "Alhazen and the nearest neighbor rule", Pattern Recognition Letters 38 (2014): 34--37 [This sounds good to me, but I'd be very curious what professional historians of science would make of it.]