Notebooks

Nearest Neighbor Methods in Statistics and Data Mining

Last update: 14 Dec 2024 15:53
First version:

Yet Another Inadequate Placeholder

--- In lieu of explaining here, I will point readers at my lecture notes on this subject from my class on data mining / statistical learning (2022 version of said notes). Much of those notes are "Cover (1968a, 1968b) and Cover and Hart (1967) made even simpler"...

--- Something which, in retrospect, I should have paid more attention to: One-nearest-neighbor regression, and one-nearest-neighbor classification, "interpolates", i.e., it always fits the training data perfectly. (Every training point is its own nearest neighbor.) And yet 1NN regression or classification does generalize; in fact the classical analyses by Cover show that, as the sample size goes to infinity, its risk approaches twice the risk of the optimal regression function/classifier. (There are some continuity assumptions buried in that, but not onerous ones.) Now if the optimal classifier is only right 75% of the time, that's guaranteeing 1NN will approach a coin-toss, but if the optimal classifier has an error rate of, say, 0.001%, 1NN will approach an error rate of no more than 0.002%, which doesn't sound that bad. The point, however, is that this clearly shows it's possible both to interpolate and to generalize. So some people, like me, really shouldn't have been so surprised when this turned out to happen a lot with deep neural networks.

To revisit: using clustering algorithms (like k-means) as locality sensitive hashing for finding nearest neighbors more efficiently. (This is not an original idea of mine, I just need to find where I saw it, and figure out if it really works...)


Notebooks: