## Nearest Neighbor Methods in Statistics and Data Mining

*24 Feb 2024 17:24*

Yet Another Inadequate Placeholder

--- In lieu of explaining here, I will point readers at my lecture notes on this subject from my class on data mining / statistical learning (2022 version of said notes). Much of those notes are "Cover (1968a, 1968b) and Cover and Hart (1967) made even simpler"...

--- Something which, in retrospect, I should have paid more attention to:
One-nearest-neighbor regression, and one-nearest-neighbor classification,
"interpolates", i.e., it always fits the training data perfectly. (Every
training point is its own nearest neighbor.) And yet 1NN regression or
classification *does* generalize; in fact the classical analyses by
Cover show that, as the sample size goes to infinity, its risk approaches twice
the risk of the optimal regression function/classifier. (There are some
continuity assumptions buried in that, but not onerous ones.) Now if the
optimal classifier is only right 75% of the time, that's guaranteeing 1NN will
approach a coin-toss, but if the optimal classifier has an error rate of, say,
0.001%, 1NN will approach an error rate of no more than 0.002%, which doesn't
sound that bad. The *point*, however, is that this clearly shows it's
possible both to interpolate and to generalize. So some people, like me,
really *shouldn't* have been so surprised when this turned out to happen
a lot with deep neural networks.

- See also:
- Data Mining
- Estimating Entropies and Informations
- Interpolation in Statistical Learning
- Recommendation Engines and Collaborative Filtering
- Regression
- Statistics

- Recommended, big picture:
- Lászlo Györfi, Michael Kohler, Adam Krzyzak and Harro Walk, A Distribution-Free Theory of Nonparametric Regression
- Trevor Hastie and Robert Tibshirani and Jerome Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction
- Alexander Kraskov and Harald Stögbauer and Peter Grassberger, "Estimating Mutual Information", Physical Review E
**69**(2004): 066138, arxiv:cond-mat/0305641

- Recommended, close-ups:
- Alberto Abadie and Guido W. Imbens, "Large Sample Properties of Matching Estimators for Average Treatment Effects", Econometrica
**74**(2006): 235--267 [PDF. "Matching" is essentially how you say "nearest-neighbors" in causal-inference-ese...] - Mona Azadkia, "Optimal choice of \( k \) for \( k \)-nearest neighbor regression", arxiv:1909.05495
- Gérard Biau, Benoît Cadre, Laurent Rouvière, "Statistical analysis of \( k \)-nearest neighbor collaborative recommendation",
Annals of Statistics
**38**(2010): 1568--1592, arxiv:1010.0499 - Samory Kpotufe, "k-NN Regression Adapts to Local Intrinsic Dimension", pp. 729--737 in NIPS-2011

- Recommended, historical:
- Thomas M. Cover
- "Estimation by the Nearest Neighbor Rule",
IEEE Transactions on Information Theory
**14**(1968): 50--55 [PDF reprint] - "Rates of Convergence for Nearest Neighbor Procedures", pp. 413--415 in B. K. Kinariwala and F. F. Kuo (eds.), Proceedings of the Hawaii International Conference on Systems Sciences (Honolulu: University of Hawaii Press, 1968) [PDF reprint]

- "Estimation by the Nearest Neighbor Rule",
IEEE Transactions on Information Theory
- Thomas M. Cover and P. E. Hart, "Nearest Neighbor Pattern Classification", IEEE Transactions on Information Theory
**13**(1967): 21--27 [PDF reprint]

- Modesty forbids me to recommend:
- Octavio César Mesner and CRS, "Conditional Mutual Information Estimation for Mixed Discrete and Continuous Variables with Nearest Neighbors",
IEEE Transactions on Information Theory
**67**(2021): 464--484, arxiv:1912.03387

- Not sure if I recommend:
- Marcello Pelillo, "Alhazen and the nearest neighbor rule",
Pattern Recognition Letters
**38**(2014): 34--37 [This sounds good to me, but I'd be very curious what professional historians of science would make of it.]

- To read:
- Gerard Biau and Luc Devroye, Lectures on the Nearest Neighbor Method
- Benoit Cadre and Qian Dong, "Dimension reduction for regression
estimation with nearest neighbor method", Electronic
Journal of Statistics
**4**(2010): 436--460 - Deng Cai, "A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search", arxiv:1612.07545
- Timothy I. Cannings, Thomas B. Berrett, Richard J. Samworth, "Local nearest neighbour classification with applications to semi-supervised learning",
Annals of Statistics
**48**(2020): 1789--1814 - Yao-ban Chan, Peter Hall, "Robust nearest-neighbor methods for classifying high-dimensional data", Annals of Statistics
**37**(2009): 3186--3203, arxiv:0909.0184 - Yihua Li, Devavrat Shah, Dogyoon Song, Christina Lee Yu, "Nearest Neighbors for Matrix Estimation Interpreted as Blind Regression for Latent Variable Model", arxiv:1705.04867
- Xingye Qiao, Jiexin Duan, Guang Cheng, "Rates of Convergence for Large-scale Nearest Neighbor Classification", arxiv:1909.01464
- Yue Xing, Qifan Song, Guang Cheng, "Benefit of Interpolation in Nearest Neighbor Algorithms", arxiv:1909.11720