Regression, especially Nonparametric Regression

Last update: 21 Apr 2025 21:17
First version:

"Regression", in statistical jargon, is the problem of guessing the average level of some quantitative response variable from various predictor variables.

Linear regression is perhaps the single most common quantitative tool in economics, sociology, and many other fields; it's certainly the most common use of statistics. (Analysis of variance, arguably more common in psychology and biology, is a disguised form of regression.) While linear regression deserves a place in statistics, that place should be nowhere near as large and prominent as it currently is. There are very few situations where we actually have scientific support for linear models. Fortunately, very flexible nonlinear regression methods now exist, and from the user's point of view are just as easy as linear regression, and at least as insightful. (Regression trees and additive models, in particular, are just as interpretable.) At the very least, if you do have a particular functional form in mind for the regression, linear or otherwise, you should use a non-parametric regression to test the adequacy of that form.

From a technical point of view, the main drawback of modern regression methods is that their extra flexibility comes at the price of less "efficiency" --- estimates converge more slowly, so you have less precision for the same amount of data. There are some situations where you'd prefer to have more precise estimates from a bad model than less precise estimates from a model which makes smaller systematic errors, but I don't think that's what most users of linear regression are chosing to do; they're just taught to type lm rather than gam. In this day and age, though, I don't understand why not.

(Of course, for the statistician, a lot of the more flexible regression methods look more or less like linear regression in some disguised form, because fundamentally all it does is project on to a function basis. So it's not crazy to make it a foundational topic for statisticians. We should not, however, give the rest of the world the impression that the hat matrix is the source of all knowledge.)

The use of regression, linear or otherwise, for causal inference, rather than prediction, is a different, and far more sordid, story.

Richard A. Berk
- Regression Analysis: A Constructive Critique
- Statistical Learning from a Regression Perspective
Julian J. Faraway
- Linear Models with R
- Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models
Trevor Hastie and Robert Tibshirani and Jerome Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction [This is a corner-stone book, but is about much, much more than just regression.]
Jeffrey S. Racine, "Nonparametric Econometrics: A Primer", Foundations and Trends in Econometrics 3 (2008): 1--88 [Good primer of nonparametric techniques for regression, density estimation and hypothesis testing; next to no economic content (except for examples). PDF reprint]
Jeffrey S. Simonoff, Smoothing Methods in Statistics
Larry Wasserman
- All of Statistics
- All of Nonparametric Statistics
- Notes for 36-707, Regression Analysis
Sanford Weisberg, Applied Linear Regression

Norman H. Anderson and James Shanteau, "Weak inference with linear models", Psychological Bulletin 84 (1977): 1155--1170 [A demonstration of why you should not rely on $R^2$ to back up your claims]
Mikhail Belkin, Partha Niyogi, Vikas Sindhwani, "Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples", Journal of Machine Learning Research 7 (2006): 2399--2434
Peter J. Bickel and Bo Li, "Local polynomial regression on unknown manifolds", pp. 177--186 in Regina Liu, William Strawderman and Cun-Hui Zhang (eds.), Complex Datasets and Inverse Problems: Tomography, Networks and Beyond (2007) ["`naive' multivariate local polynomial regression can adapt to local smooth lower dimensional structure in the sense that it achieves the optimal convergence rate for nonparametric estimation of regression functions ... when the predictor variables live on or close to a lower dimensional manifold"]
Michael H. Birnbaum, "The Devil Rides Again: Correlation as an Index of Fit", Psychological Bulletin 79 (1973): 239--242
Lawrence D. Brown and Mark G. Low, "Asymptotic Equivalence of Nonparametric Regression and White Noise", Annals of Statistics 24 (1996): 2384--2398 [JSTOR]
Peter Bühlmann, M. Kalisch and M. H. Maathuis, "Variable selection in high-dimensional linear models: partially faithful distributions and the PC-simple algorithm", Biometrika 97 (2010): 261--278
Peter Bühlmann and Sara van de Geer, Statistics for High-Dimensional Data: Methods, Theory and Applications [State-of-the art (2011) compendium of what's known about using high-dimensional regression, especially but not just the Lasso.]
A. Buja, R. Berk, L. Brown, E. George, E. Pitkin, M. Traskin, K. Zhan, L. Zhao, "Models as Approximations: How Random Predictors and Model Violations Invalidate Classical Inference in Regression", arxiv:1404.1578
Raymond J. Carroll, Aurore Delaigle, and Peter Hall, "Nonparametric Prediction in Measurement Error Models", Journal of the American Statistical Association 104 (2009): 993--1003
Raymond J. Carroll, J. D. Maca and D. Ruppert, "Nonparametric regression in the presence of measurement error", Biometrika 86 (1999): 541--554
Kevin A. Clarke, "The Phantom Menace: Omitted Variables Bias in Econometric Research" [PDF. Or: Kitchen-sink regressions considered harmful. Including extra variables in your linear regression may or may not reduce the bias in your estimate of any particular coefficients of interest, depending on the correlations between the added variables, the predictors of interest, the response, and omitted relevant variables. Adding more variables always increases the variance of your estimates.]
Eduardo Corona, Terran Lane, Curtis Storlie, Joshua Neil, "Using Laplacian Methods, RKHS Smoothing Splines and Bayesian Estimation as a framework for Regression on Graph and Graph Related Domains" [Technical report, University of New Mexico Computer Science, 2008-06, PDF]
Paramveer S. Dhillon, Dean P. Foster, Sham M. Kakade, Lyle H. Ungar, "A Risk Comparison of Ordinary Least Squares vs. Ridge Regression", Journal of Machine Learning Research 14 (2013): 1505--1511
William H. DuMouchel and Greg J. Duncan, "Using Sample Survey Weights in Multiple Regression Analysis of Stratified Samples", Proceedings of the Survey Research Methods Section, American Statistical Association (1981), pp. 629--637 [PDF reprint; presumably very similar to "Using Sample Survey Weights to Compare Various Linear Regression Models", Journal of the American Statistical Association 78 (1983): 535--543, but I have not looked at the latter]
Andrew Gelman and Iain Pardoe, "Average predictive comparisons for models with nonlinearity, interactions, and variance components", Sociological Methodology forthcoming (2007) [PDF preprint, Gelman's comments]
Lee-Ad Gottlieb, Aryeh Kontorovich, Robert Krauthgamer, "Efficient Regression in Metric Spaces via Approximate Lipschitz Extension", arxiv:1111.4470
Lászlo Györfi, Michael Kohler, Adam Krzyzak and Harro Walk, A Distribution-Free Theory of Nonparametric Regression
Berthold R. Haag, "Non-parametric Regression Tests Using Dimension Reduction Techniques", Scandinavian Journal of Statistics 35 (2008): 719--738
Peter Hall, "On Bootstrap Confidence Intervals in Nonparametric Regression", Annals of Statistics 20 (1992): 695--711
Peter Hall and Joel Horowitz, "A simple bootstrap method for constructing nonparametric confidence bands for functions", Annals of Statistics 41 (2013): 1892--1921, arxiv:1309.4864
W. Härdle and E. Mammen, "Comparing Nonparametric Versus Parametric Regression Fits", Annals of Statistics 21 (1993): 1926--1947
Jeffrey D. Hart, Nonparametric Smoothing and Lack-of-Fit Tests
Yongmiao Hong and Halbert White, "Consistent Specification Testing Via Nonparametric Series Regression", Econometrica 63 (1995): 1133--1159 [JSTOR]
Adel Javanmard, Andrea Montanari, "Confidence Intervals and Hypothesis Testing for High-Dimensional Regression", arxiv:1306.3171
M. Kohler, A. Krzyzak and D. Schafer, "Application of structural risk minimization to multivariate smoothing spline regression estimates", Bernoulli 8 (2002): 475--490
Alexander Korostelev, "A minimaxity criterion in nonparametric regression based on large-deviations probabilities", Annals of Statistics 24 (1996): 1075--1083
Jon Lafferty and Larry Wasserman [To be honest, I haven't checked to see how different these two papers actually are...]
- "Rodeo: Sparse Nonparametric Regression in High Dimensions", math.ST/0506342
- "Rodeo: Sparse, greedy nonparametric regression", Annals of Statistics 36 (2008): 27--63, arxiv:0803.1709
Diane Lambert and Kathryn Roeder, "Overdispersion Diagnostics for Generalized Linear Models", Journal of the American Statistical Association 90 (1995): 1225--1236 [JSTOR]
Abdelkader Mokkadem, Mariane Pelletier, Yousri Slaoui, "Revisiting Révész's stochastic approximation method for the estimation of a regression function", arxiv:0812.3973
Patrick O. Perry, "Fast Moment-Based Estimation for Hierarchical Models", arxiv:1504.04941
Garvesh Raskutti, Martin J. Wainwright, and Bin Yu, "Early stopping and non-parametric regression: An optimal and data-dependent stopping rule", arxiv:1306.3574
B. W. Silverman, "Spline Smoothing: The Equivalent Variable Kernel Method", Annals of Statistics 12 (1984): 898--916
Ryan J. Tibshirani, "Degrees of Freedom and Model Search", arxiv:1402.1920
Gerhard Tutz, Regression for Categorical Data
Sara van de Geer, Empirical Process Theory in M-Estimation
Grace Wahba, Spline Models for Observational Data
Jianming Ye, "On Measuring and Correcting the Effects of Data Mining and Model Selection", Journal of the American Statistical Association 93 (1998): 120--131

Erich L. Lehmann, "On the history and use of some standard statistical models", pp. 114--126 in Deborah Nolan and Terry Speed (eds.), Probability and Statistics: Essays in Honor of David A. Freedman
E. T. Whittaker, "On a New Method of Graduation", Proceedings of the Edinburgh Mathematical Society 41 (1922): 63--75 [Introduces splines, complete with the Bayesian derivation (if you are in to that sort of thing), though without the name.]

Advanced Data Analysis from an Elementary Point of View [Presumes at least some acquaintance with linear regression, however]
The Truth About Linear Regression [Draft textbook for a first course in linear regression for undergraduates]

Adrian W. Bowman and Adelchi Azzalini, Applied Smoothing Techniques for Data Analysis: The Kernel Approach with S-Plus Illustrations
Andrew Gelman and Jennifer Hill, Data Analysis Using Regression and Multilevel/Hierarchical Models

Elena Andreou and Bas J. M. Werker, "An Alternative Asymptotic Analysis of Residual-Based Statistics", Review of Economics and Statistics 94 (2012): 88--99
Sylvain Arlot, "Choosing a penalty for model selection in heteroscedastic regression", arxiv:0812.3141
Sylvain Arlot and Pascal Massart, "Data-driven Calibration of Penalties for Least-Squares Regression", Journal of Machine Learning Research 10 (2009): 245--279
Anil Aswani, Peter Bickel, and Claire Tomlin, "Regression on manifolds: Estimation of the exterior derivative", Annals of Statistics 39 (2011): 48--81
Jean-Baptiste Aubin, Samuela Leoni-Aubin, "A Simple Misspecification Test for Regression Models", arxiv:1003.2294
Jean-Yves Audibert and Olivier Catoni, "Robust linear least squares regression", Annals of Statistics 39 (2011): 2766--2794
Alexandre Belloni, Victor Chernozhukov, "High Dimensional Sparse Econometric Models: An Introduction", arxiv:1106.5242
Gilles Blanchard, Nicole Kraemer, "Kernel Conjugate Gradient is Universally Consistent", arxiv:0902.4380 ["approximate solutions are constructed by projections onto a nested set of data-dependent subspaces"]
Borowiak, Model Discrimination for Nonlinear Regression Models
Lawrence D. Brown, T. Tony Cai, and Harrison H. Zhou, "Nonparametric regression in exponential families", Annals of Statistics 38 (2010): 2005--2046
Peter Bühlmann, "Statistical significance in high-dimensional linear models", arxiv:1202.1377 [Not sure if this goes beyond what's in Bühlmann and van de Geer]
Florentina Bunea, Seth Strimas-Mackey, Marten Wegkamp, "Interpolating Predictors in High-Dimensional Factor Regression", Journal of Machine Learning Research 23 (2022): 10
T. Tony Cai, "Minimax and Adaptive Inference in Nonparametric Function Estimation", Statistical Science 27 (2012): 31--50, arxiv:1203.4911
T. Tony Cai, Harrison H. Zhou, "Asymptotic equivalence and adaptive estimation for robust nonparametric regression", Annals of Statistics 37 (2009): 3204--3235, arxiv:0909.0343
Andrew V. Carter, "Asymptotic approximation of nonparametric regression experiments with unknown variances", Annals of Statistics 35 (2007): 1644--1673, arxiv:0710.3647
Ming-Yen Cheng, Hau-tieng Wu, "Local Linear Regression on Manifolds and its Geometric Interpretation", arxiv:1201.0327
Laëtitia Comminges, Arnak Dalalyan, "Tight conditions for consistent variable selection in high dimensional nonparametric regression", arxiv:1102.3616
R. Dennis Cook, Liliana Forzani, and Adam J. Rothman, "Estimating sufficient reductions of the predictors in abundant high-dimensional regressions", Annals of Statistics 40 (2012): 353--384
Arnak Dalalyan and Alexandre B. Tsybakov, "Sparse Regression Learning by Aggregation and Langevin Monte-Carlo", arxiv:0903.1223
Laurie Davies, Lutz Dümbgen, "A Model-free Approach to Linear Least Squares Regression with Exact Probabilities and Applications to Covariate Selection", arxiv:1906.01990
Robert Davies, Christopher Withers, and Saralees Nadarajah, "Confidence intervals in a regression with both linear and non-linear terms", Electronic Journal of Statistics 5 (2011): 603--618
Aurore Delaigle, Peter Hall, Hans-Georg Müller, "Accelerated convergence for nonparametric regression with coarsened predictors", Annals of Statistics 35 (2007): 2639--2653, arxiv:0803.3017
Ruben Dezeure, Peter Bühlmann, Lukas Meier, Nicolai Meinshausen, "High-dimensional Inference: Confidence intervals, p-values and R-Software hdi", arxiv:1408.4026
Charanpal Dhanjal, Nicolas Baskiotis, Stéphan Clémen&ccdeil;on and Nicolas Usunier, "An Empirical Comparison of V-fold Penalisation and Cross Validation for Model Selection in Distribution-Free Regression", arxiv:1212.1780
Wei Dou, David Pollard, Harrison H. Zhou, "Functional regression for general exponential families", arxiv:1001.3742
Sam Efromovich
- Nonparametric Curve Estimation
- "Conditional density estimation in a regression setting", Annals of Statistics 35 (2007): 2504--2535, arxiv:0803.2984
P. P. B. Eggermont, V. N. LaRiccia, "Uniform error bounds for smoothing splines", arxiv:math/0612776
P. P. B. Eggermont and V. N. LaRiccia, Maximum Penalized Likelihood Estimation, vol. II: Regression [Enthusiastic review in JASA (104 (2010): 1628), appears self-contained]
Jianqing Fan, Shaojun Guo and Ning Hao, "Variance estimation using refitted cross-validation in ultrahigh dimensional regression", Journal of the Royal Statistical Society B 74 (2012): 37--65
Cheryl J. Flynn, Clifford M. Hurvich, Jeffrey S. Simonoff, "On the Sensitivity of the Lasso to the Number of Predictor Variables", arxiv:1403.4544
Jose M. Gonzalez-Barrios and Silvia Ruiz-Velasco, "Regression analysis and dependence", Metrica 61 (2005): 73--87
Juan M Gorriz, J. Ramirez, F. Segovia, F. J. Martinez-Murcia, C. Jiménez-Mesa, J. Suckling, "Statistical Agnostic Regression: a machine learning method to validate regression models", arxiv:2402.15213 [I am intensely skeptical, but I should read this before dismissing it.]
Marvin H. J. Gruber, Regression Estimators: A Comparative Study
Chong Gu, Smoothing Spline ANOVA Models
Haijie Gu, John Lafferty, "Sequential Nonparametric Regression", arxiv:1206.6408
Emmanuel Guerre and Pascal Lavergne, "Data-driven rate-optimal specification testing in regression models", Annals of Statistics 33 (2005): 840--870, math.ST/0505640
P. Richard Hahn, Sayan Mukherjee, Carlos Carvalho, "Predictor-dependent shrinkage for linear regression via partial factor modeling", arxiv:1011.3725
Peter Hall, Joel L. Horowitz, "Nonparametric methods for inference in the presence of instrumental variables", Annals of Statistics 33 (2005): 2904--2929, arxiv:math/0603130
Bruce E. Hansen
- "Uniform Convergence Rates for Kernel Estimation with Dependent Data", Econometric Theory 24 (2008): 726--748 [abstract with link to free PDF]
- Econometrics
Wolfgang Härdle, Applied Nonparametric Regression
Wolfgang Härdle, Marlene Müller, Stefan Sperlich and Axel Werwatz, Nonparametric and Semiparametric Models: An Introduction
Jeffrey D. Hart, "Smoothing-inspired lack-of-fit tests based on ranks", arxiv:0805.2285
Elad Hazan, Tomer Koren, "Linear Regression with Limited Observation", arxiv:1206.4678
Mohamed Hebiri and Sara A. Van De Geer, "The Smooth-Lasso and other $\ell_1+\ell_2$-penalized methods", arxiv:1003.4885
Nancy Heckman, "The theory and application of penalized methods or Reproducing Kernel Hilbert Spaces made easy", arxiv:1111.1915
Tim Hesterberg, Nam Hee Choi, Lukas Meier, Chris Fraley, "Least angle and $\ell_1$ penalized regression: A review", Statistics Surveys 2 (2008): 61--93, arxiv:0802.0964
Jacob Hinkle, Prasanna Muralidharan, P. Thomas Fletcher, Sarang Joshi, "Polynomial Regression on Riemannian Manifolds", arxiv:1201.2395
Giles Hooker and Saharon Rosset, "Prediction-based regularization using data augmented regression", Statistics and Computing 22 (2011): 237--249
Joel L. Horowitz, Enno Mammen, "Rate-optimal estimation for a general class of nonparametric regression models with unknown link functions", Annals of Statistics 35 (2007): 2589--2619, arxiv:0803.2999
Torsten Hothorn, Thomas Kneib, Peter Bühlmann, "Conditional transformation models", Journal of the Royal Statistical Society B forthcoming
Salvatore Ingrassia, Simona C. Minotti, Giorgio Vittadini, "Local statistical modeling by cluster-weighted" [sic], arxiv:0911.2634 [Revisiting Gershenfeld et al.'s "cluster-weighted modeling" from a more properly statistical perspective]
Sameer M. Jalnapurkar, "Learning a regression function via Tikhonov regularization", math.ST/0509420
Bo Kai, Runze Li and Hui Zou, "Local composite quantile regression smoothing: an efficient and safe alternative to local polynomial regression", Journal of the Royal Statistical Society B 72 (2010): 49--69
Gerard Kerkyacharian, Mathilde Mougeot, Dominique Picard, Karine Tribouley, "Learning Out of Leaders", arxiv:1001.1919
Estate V. Khmaladze, Hira L. Koul, "Goodness-of-fit problem for errors in nonparametric regression: Distribution free approach", Annals of Statistics 37 (2009): 3165--3185, arxiv:0909.0170
Heeyoung Kim and Xiaoming Huo, "Asymptotic optimality of a multivariate version of the generalized cross validation in adaptive smoothing splines", Electronic Journal of Statistics 8 (2014): 159--183
Hoyt Koepke, Mikhail Bilenko, "Fast Prediction of New Feature Utility", arxiv:1206.4680
Michael R. Kosorok, Introduction to Empirical Processes and Semiparametric Inference [partial PDF preprint]
Nicole Kraemer, Anne-Laure Boulesteix, Gerhard Tutz, "Penalized Partial Least Squares Based on B-Splines Transformations", math.ST/0608576
Tatyana Krivobokova, Thomas Kneib, and Gerda Claeskens, "Simultaneous Confidence Bands for Penalized Spline Estimators", Journal of the American Statistical Association 105 (2010): 852--863
Arne Kovac, Andrew D.A.C. Smith, "Regression on a Graph", Journal of Computational and Graphical Statistics 20 (2011): 432--447, arxiv:0911.1928
Tatyana Krivobokova, "Smoothing parameter selection in two frameworks for penalized splines", Journal of the Royal Statistical Society B 75 (2013): 725--741
Rafal Kulik and Cornelia Wichelhaus, "Nonparametric conditional variance and error density estimation in regression models with dependent errors and predictors", Electronic Journal of Statistics 5 (2011): 856--898
Pascal Lavergne, Samuel Maistre, and Valentin Patilea, "A significance test for covariates in nonparametric regression", Electronic Journal of Statistics 9 (2015): 643--678
Tri M. Le, Bertrand S. Clarke, "Model Averaging Is Asymptotically Bevtter Than Model Selection For Prediction", Journal of Machine Learning Research 23 (2022): 33
Hannes Leeb, "Evaluation and selection of models for out-of-sample prediction when the sample size is small relative to the complexity of the data-generating process", Bernoulli 14 (2008): 661--690, arxiv:0802.3364
Qi Li and Jeffrey Scott Racine, Nonparametric Econometrics: Theory and Practice
Yehua Li and Tailen Hsing, "Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data", Annals of Statistics 38 (2010): 3321--3351
Heng Lian, "Convergence of Nonparametric Functional Regression Estimates with Functional Responses", arxiv:1111.6230
Han Liu, Xi Chen, John Lafferty and Larry Wasserman, "Graph-Valued Regression", NIPS 23 (2010) [PDF], arxiv:1006.3972
Oliver Linton and Zhijie Xiao, "A Nonparametric Regression Estimator That Adapts To Error Distribution of Unknown Form", Econometric Theory 23 (2007): 371--413
James Robert Lloyd, David Duvenaud, Roger Grosse, Joshua B. Tenenbaum, Zoubin Ghahramani, "Automatic Construction and Natural-Language Description of Nonparametric Regression Models", arxiv:1402.4304
Po-Ling Loh, Martin J. Wainwright, "High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity", Annals of Statistics 40 (2012): 1637--1664, arxiv:1109.3714
Djamal Louani, Sidi Mohamed Ould Maouloud, "Large Deviation Results for the Nonparametric Regression Function Estimator on Functional Data", arxiv:1111.5989
Enno Mammen, Christoph Rothe, and Melanie Schienle, "Nonparametric regression with nonparametrically generated covariates", Annals of Statististics 40 (2012): 1132--1170
Charles E. McCulloch, John M. Neuhaus, "Misspecifying the Shape of a Random Effects Distribution: Why Getting It Wrong May Not Matter", Statistical Science @6 (2011): 388--402, arxiv:1201.1980
Hugh Miller and Peter Hall, "Local polynomial regression and variable selection", arxiv:1006.3342
Jessica Minnier, Lu Tian and Tianxi Cai, "A Perturbation Method for Inference on Regularized Regression Estimates", Journal of the American Statistical Association 106 (2011): 1371--1382
Ursula U. Müller and Ingrid Van Keilegom, "Efficient parameter estimation in regression with missing responses", Electronic Journal of Statistics 6 (2012): 1200--1219
Richard Nickl and Sara van de Geer, "Confidence sets in sparse regression", Annals of Statistics 41 (2013): 2852--2876, arxiv:1209.1508
Andriy Norets, "Approximation of conditional densities by smooth mixtures of regressions", Annals of Statistics 38 (2010): 1733--1766, arxiv:1010.0581
Philippe Rigollet, "Maximum likelihood aggregation and misspecified generalized linear models", arxiv:0911.2919
Cynthia Rudin, "Stability Analysis for Regularized Least Squares Regression", cs.LG/0502016
Laura M. Sangalli, James O. Ramsay, Timothy O. Ramsay, "Spatial spline regression models", Journal of the Royal Statistical Society B 75 (2013): 681--703
George A. F. Seber and C. J. Wild, Nonlinear Regression
Arnab Sen, Bodhisattva Sen, "On Testing Independence and Goodness-of-fit in Linear Models", arxiv:1302.5831
Zuofeng Shang and Guang Cheng, "Local and global asymptotic inference in smoothing spline models", Annals of Statistics 41 (2013): 2608--2638
David Shilane, Richard H. Liang and Sandrine Dudoit, "Loss-Based Estimation with Evolutionary Algorithms and Cross-Validation", UC Berkeley Biostatistics Working Paper 227 [Abstract, PDF]
Tom A. B. Snijders and Johannes Berkhof, "Diagnostic Checks for Multilevel Models"
Emre Soyer and Robin M. Hogarth, "The illusion of predictability: How regression statistics mislead experts" [PDF preprint]
Aris Spanos, "Revisiting the Omitted Variables Argument: Substantive vs. Statistical Adequacy" [PDF preprint]
Pablo Sprechmann, Igancio Ramirez, Guillermo Sapiro and Yonina C. Eldar, "C-HiLasso: A Collaborative Hierarchical Sparse Modeling Framework", arxiv:1006.1346
Ingo Steinwart and Andreas Christmann, Support Vector Machines
Curtis B. Storlie, Howard D. Bondell, and Brian J. Reich, "A Locally Adaptive Penalty for Estimation of Functions With Varying Roughness", Journal of Computational and Graphical Statistics (2010): forthcoming
Liangjun Su and Aman Ullah, "Local polynomial estimation of nonparametric simultaneous equations models", Journal of Econometrics 144 (2008): 193--218
Ryan J. Tibshirani, "The Lasso Problem and Uniqueness", arxiv:1206.0313
Jo-Anne Ting, Aaron D'Souza, Sethu Vijayakumar and Stefan Schaal, "Efficient Learning and Feature Selection in High-Dimensional Regression", Neural Computation 22 (2010): 831--886
Daniell Toth and John L. Eltinge, "Building Consistent Regression Trees From Complex Sample Data", Journal of the American Statistical Association 106 (2011): 1626--1636
Minh-Ngoc Tran, David Nott, Chenlei Leng, "The Predictive Lasso", arxiv:1009.2302
Gerhard Tutz and Sebastian Petty, "Nonparametric estimation of the link function including variable selection", Statistics and Computing 22 (2011): 545--561
Gerhard Tutz and Jan Ulbricht, "Penalized regression with correlation-based penalty", Statistics and Computing 19 (2008): 239--253
Samuel Vaiter, Mohammad Golbabaee, Jalal Fadili, Gabriel Peyré, "Model Selection with Piecewise Regular Gauges", arxiv:1307.2342
Sara van de Geer, Johannes Lederer, "The Lasso, correlated design, and improved oracle inequalities", arxiv:1107.0189
Daniela M. Witten and Robert Tibshirani, "Covariance-regularized regression and classification for high dimensional problems", Journal of the Royal Statistical Society B 71 (2009): 615--636
Yun Yang and Surya T. Tokdar, "Minimax-optimal nonparametric regression in high dimensions", Annals of Statistics 43 (2015): 652--674
Adriano Zanin Zambom, Michael Akritas, "Nonparametric Model Checking and Variable Selection", arxiv:1205.6761
Hao Helen Zhang, Guang Cheng and Yufeng Liu, "Linear or Nonlinear? Automatic Structure Discovery for Partially Linear Models", Journal of the American Statistical Association 106 (2011): 1099--1112 [Presumably they have a reason for not just using an additive model with an extra strong curvature penalty in each univariate smoother.]
Zhibiao Zhao and Wei Biao Wu, "Confidence bands in nonparametric time series regression", Annals of Statistics 36 (2008): 1854--1878, arxiv:0808.1010
Peng Zhau and Bin Yu, "On Model Selection Consistency of Lasso", Journal of Machine Learning Research 7 (2006): 2541--2563
Hongtu Zhu, Joseph G. Ibrahim, Sikyum Lee, Heping Zhang, "Perturbation selection and influence measures in local influence analysis", Annals of Statistics 35 (2007): 2565--2588, arxiv:0803.2986
Ying Zhu, "Phase transitions in nonparametric regressions: a curse of exploiting higher degree smoothness assumptions in finite samples", arxiv:2112.03626