### Additive Models (Advanced Data Analysis from an Elementary Point of View)

The curse of dimensionality limits the usefulness of fully non-parametric
regression in problems with many variables: bias remains under control, but
variance grows rapidly with dimensionality. (The number of points required to
pin down a [hyper-]surface to within a given tolerance grows exponentially in
the number of dimensions.) Parametric models do not have this problem, but
have bias and do not let us *discover* anything about the true function.
Structured or constrained non-parametric regression compromises, by adding some
bias so as to reduce variance. Additive models are an example, where each
input variable has a "partial response function", which add together to get the
total regression function; the partial response functions are otherwise
arbitrary. Additive models include linear models as a special case, but still
evade the curse of dimensionality. Visualization and interpretation of
additive models by display of the partial response functions. Fitting additive
models is done iteratively, starting with some initial guess about each partial
response function and then doing one-dimensional smoothing, so that the guesses
correct each other until a self-consistent solution is reached. Incorporation
of parametric terms, and interactions by joint smoothing of subsets of
variables. Examples in R using the California house-price data. Conclusion:
there is hardly ever any reason to prefer linear models to additive ones, and
the continued thoughtless use of linear regression is a scandal.

PDF
notes, incorporating R examples

Advanced Data Analysis from an Elementary Point of View

Posted at February 17, 2011 21:30 | permanent link