## Variable or Feature Selection for Regression and Classification

*23 Jul 2020 17:27*

"Variable selection" tends to be the statisticians' name; people from data
mining talk about "feature selection". The idea is the same: given some big
pool of variables (or features) which could be used as inputs to predict some
target feature (or variable), which ones actually go in to the predictor? Say
"all" can lead to computational issues, and also statistical ones. (If you
throw in lots of variables which don't matter, at finite sample sizes you'll
have degraded inferences about how, and how much, all the variables matter.
Even if you don't care about inferring parameters [or response functions,
etc.], the *predictions* will suffer, because you'll be over-fitting.)
Of course, sometimes every available feature really *does* matter.
(This is, annoyingly, especially likely to be the case when the features are
pre-selected on the basis of strong subject-matter knowledge.)

All of this is a special case of model selection, so I incorporate all the comments, and the recommended readings, in that notebook by reference.

--- An important model selection problem which is *also* sometimes
called "variable selection" is deciding which nodes in
a graphical model are immediately connected.
(This is very important, for instance,
in causal model discovery.) The
obvious way to go about doing this is to run variable selection (in this sense)
for each node variable. This *may* work, but may also *not* be
what we want, since the goal is direct (and sometimes directed!) connections.
I defer those references to the notebooks just linked to.

See also: Regression

- Recommended (again, see also the model selection notebook):
- Genevera I. Allen, "KNIFE: Kernel Iterative Feature Extraction", arxiv:0906.4391
- Leo Breiman and Philip Spector, "Submodel Selection and Evaluation
in Regression: The X-Random Case", International
Statistical Review
**60**(1992): 291--319 [JSTOR] - Gavin Brown, Adam Pocock, Ming-Jie Zhao, Mikel Luján, "Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection", Journal of Machine Learning Research
**13**(2012): 27--66 - Peter Bühlmann, M. Kalisch and M. H. Maathuis, "Variable selection in high-dimensional linear models: partially faithful distributions and the PC-simple algorithm", Biometrika
**97**(2010): 261--278 - Peter Bühlmann and Sara van de Geer, Statistics for High-Dimensional Data: Methods, Theory and Applications [State-of-the art (2011) compendium of what's known about using the Lasso, and related methods, for model selection. Mini-review]
- Pascal Lavergne and Quang H. Vuong, "Nonparametric Selection of
Regressors: The Nonnested Case", Econometrica
**64**(1996): 207--219 [Picking which variables belong in a regression, by looking at the error of non-parametric kernel regressions. JSTOR] - Nicolai Meinshausen and Peter Bühlmann, "Stability Selection", arxiv:0809.2932
- Wesley Tansey, Victor Veitch, Haoran Zhang, Raul Rabadan, David M. Blei, "The Holdout Randomization Test: Principled and Easy Black Box Feature Selection", arxiv:1811.00645 [This is a brilliant little paper. I have some reservations about the way they use importance sampling to improve power --- I am not at all sure that getting the upper and lower bounds on the importance weights they need is really that much more feasible than just improving your estimate of the conditional density --- but that's a refinement.]

- To read:
- Francis Bach
- "Model-Consistent Sparse Estimation through the Bootstrap", arxiv:0901.3202 ["if we run the Lasso for several bootstrapped replications of a given sample, then intersecting the supports of the Lasso bootstrap estimates leads to consistent model selection" --- compare with the "stability selection" of Meinshausen and Buhlmann]
- "High-Dimensional Non-Linear Variable Selection through Hierarchical Kernel Learning", arxiv:0909.0844

- Rina Foygel Barber, Emmanuel J. Candès, Richard J. Samworth, "Robust inference with knockoffs", arxiv:1801.03896
- Mario Beraha, Alberto Maria Metelli, Matteo Papini, Andrea Tirinzoni, Marcello Restelli, "Feature Selection via Mutual Information: New Theoretical Insights", arxiv:1907.07384
- Justin Bleich, Adam Kapelner, Edward I. George, Shane T. Jensen, "Variable selection for BART: An application to gene regulation",
Annals of Applied Statistics
**8**(2014): 1750--1781, arxiv:1310.4887 - Kasper Brink-Jensen, Claus Thorn Ekstrom, "Inference for feature selection using the Lasso with high-dimensional data", arxiv:1403.4296
- Tom Burr, Herb Fry, Brian McVey, Eric Sander, Joseph Cavanaugh and Andrew Neath, "Performance of Variable Selection Methods in Regression Using Variations of the Bayesian Information Criterion",
Communications in Statistics - Simulation and Computation
**37**(2008): 507--520 - Xin Chen, Changliang Zou, and R. Dennis Cook, "Coordinate-independent sparse sufficient dimension reduction and variable selection",
Annals of Statistics
**38**(2010): 3696--3723 - Laëtitia Comminges and Arnak S. Dalalyan, "Tight conditions for consistency of variable selection in the context of high dimensionality",
Annals of
Statistics
**40**(2012): 2667--2696 - Laurie Davies, Lutz Dümbgen, "A Model-free Approach to Linear Least Squares Regression with Exact Probabilities and Applications to Covariate Selection", arxiv:1906.01990
- Jianqing Fan and Runze Li, "Variable Selection via Nonconcave
Penalized Likelihood and its Oracle Properties", Journal of
the American Statistical Association
**96**(2001): 1348--1360 [PDF reprint via Prof. Fan] - Jianqing Fan, Richard Samworth, Yichao Wu, "Ultrahigh dimensional variable selection: beyond the linear model", arxiv:0812.3201
- Mladen Kolar, Han Liu, "Optimal Feature Selection in High-Dimensional Discriminant Analysis", arxiv:1306.6557 [Heard the talk...]
- Nicole Kraemer, "On the Peaking Phenomenon of the Lasso in Model Selection", arxiv:0904.4416
- Pascal Lavergne, Samuel Maistre, Valentin Patilea, "A Significance Test for Covariates in Nonparametric Regression", arxiv:1403.7063
- Hugh Miller and Peter Hall, "Local polynomial regression and variable selection", arxiv:1006.3342
- Martin Wahl, "Variable selection in high-dimensional additive models based on norms of projections", arxiv:1406.0052
- Adriano Zanin Zambom, Michael G. Akritas, "Significance Testing and Group Variable Selection", arxiv:1205.6843