Nearest Neighbor Methods in Statistics and Data Mining
Last update: 14 Dec 2024 15:53First version:
Yet Another Inadequate Placeholder
--- In lieu of explaining here, I will point readers at my lecture notes on this subject from my class on data mining / statistical learning (2022 version of said notes). Much of those notes are "Cover (1968a, 1968b) and Cover and Hart (1967) made even simpler"...
--- Something which, in retrospect, I should have paid more attention to: One-nearest-neighbor regression, and one-nearest-neighbor classification, "interpolates", i.e., it always fits the training data perfectly. (Every training point is its own nearest neighbor.) And yet 1NN regression or classification does generalize; in fact the classical analyses by Cover show that, as the sample size goes to infinity, its risk approaches twice the risk of the optimal regression function/classifier. (There are some continuity assumptions buried in that, but not onerous ones.) Now if the optimal classifier is only right 75% of the time, that's guaranteeing 1NN will approach a coin-toss, but if the optimal classifier has an error rate of, say, 0.001%, 1NN will approach an error rate of no more than 0.002%, which doesn't sound that bad. The point, however, is that this clearly shows it's possible both to interpolate and to generalize. So some people, like me, really shouldn't have been so surprised when this turned out to happen a lot with deep neural networks.
To revisit: using clustering algorithms (like k-means) as locality sensitive hashing for finding nearest neighbors more efficiently. (This is not an original idea of mine, I just need to find where I saw it, and figure out if it really works...)
- See also:
- Data Mining
- Estimating Entropies and Informations
- Interpolation in Statistical Learning
- Recommendation Engines and Collaborative Filtering
- Regression
- Statistics
- Recommended, big picture:
- Lászlo Györfi, Michael Kohler, Adam Krzyzak and Harro Walk, A Distribution-Free Theory of Nonparametric Regression
- Trevor Hastie and Robert Tibshirani and Jerome Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction
- Recommended, close-ups:
- Alberto Abadie and Guido W. Imbens, "Large Sample Properties of Matching Estimators for Average Treatment Effects", Econometrica 74 (2006): 235--267 [PDF. "Matching" is essentially how you say "nearest-neighbors" in causal-inference-ese...]
- Mona Azadkia, "Optimal choice of \( k \) for \( k \)-nearest neighbor regression", arxiv:1909.05495
- Gérard Biau, Benoît Cadre, Laurent Rouvière, "Statistical analysis of \( k \)-nearest neighbor collaborative recommendation", Annals of Statistics 38 (2010): 1568--1592, arxiv:1010.0499
- Samory Kpotufe, "k-NN Regression Adapts to Local Intrinsic Dimension", pp. 729--737 in NIPS-2011
- Alexander Kraskov and Harald Stögbauer and Peter Grassberger, "Estimating Mutual Information", Physical Review E 69 (2004): 066138, arxiv:cond-mat/0305641
- Recommended, historical:
- Thomas M. Cover
- "Estimation by the Nearest Neighbor Rule", IEEE Transactions on Information Theory 14 (1968): 50--55 [PDF reprint]
- "Rates of Convergence for Nearest Neighbor Procedures", pp. 413--415 in B. K. Kinariwala and F. F. Kuo (eds.), Proceedings of the Hawaii International Conference on Systems Sciences (Honolulu: University of Hawaii Press, 1968) [PDF reprint]
- Thomas M. Cover and P. E. Hart, "Nearest Neighbor Pattern Classification", IEEE Transactions on Information Theory 13 (1967): 21--27 [PDF reprint]
- Modesty forbids me to recommend:
- Octavio César Mesner and CRS, "Conditional Mutual Information Estimation for Mixed Discrete and Continuous Variables with Nearest Neighbors", IEEE Transactions on Information Theory 67 (2021): 464--484, arxiv:1912.03387
- Not sure if I recommend:
- Marcello Pelillo, "Alhazen and the nearest neighbor rule", Pattern Recognition Letters 38 (2014): 34--37 [This sounds good to me, but I'd be very curious what professional historians of science would make of it.]
- To read:
- Gerard Biau and Luc Devroye, Lectures on the Nearest Neighbor Method
- Benoit Cadre and Qian Dong, "Dimension reduction for regression estimation with nearest neighbor method", Electronic Journal of Statistics 4 (2010): 436--460
- Deng Cai, "A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search", arxiv:1612.07545
- Timothy I. Cannings, Thomas B. Berrett, Richard J. Samworth, "Local nearest neighbour classification with applications to semi-supervised learning", Annals of Statistics 48 (2020): 1789--1814
- Yao-ban Chan, Peter Hall, "Robust nearest-neighbor methods for classifying high-dimensional data", Annals of Statistics 37 (2009): 3186--3203, arxiv:0909.0184
- Yihua Li, Devavrat Shah, Dogyoon Song, Christina Lee Yu, "Nearest Neighbors for Matrix Estimation Interpreted as Blind Regression for Latent Variable Model", arxiv:1705.04867
- Xingye Qiao, Jiexin Duan, Guang Cheng, "Rates of Convergence for Large-scale Nearest Neighbor Classification", arxiv:1909.01464
- Yue Xing, Qifan Song, Guang Cheng, "Benefit of Interpolation in Nearest Neighbor Algorithms", arxiv:1909.11720