### Splines (Advanced Data Analysis from an Elementary Point of View)

Kernel regression controls the amount of smoothing indirectly by bandwidth;
why not control the irregularity of the smoothed curve directly? The spline
smoothing problem is a penalized least squares problem: minimize mean squared
error, *plus* a penalty term proportional to average curvature of the
function over space. The solution is always a continuous piecewise cubic
polynomial, with continuous first and second derivatives. Altering the
strength of the penalty moves along a bias-variance trade-off, from pure OLS at
one extreme to pure interpolation at the other; changing the strength of the
penalty is equivalent to minimizing the mean squared error under a constraint
on the average curvature. To ensure consistency, the penalty/constraint should
weaken as the data grows; the appropriate size is selected by cross-validation.
An example with the data from homework 4, including confidence bands. Writing
splines as basis functions, and fitting as least squares on transformations of
the data, plus a regularization term. A brief look at splines in multiple
dimensions. Splines versus kernel regression. Appendix: Lagrange multipliers
and the correspondence between constrained and penalized optimization.

PDF notes, incorporating R examples

Advanced Data Analysis from an Elementary Point of View

Posted at February 16, 2011 01:47 | permanent link