## Basis Selection in Function Decomposition

*27 Feb 2017 16:30*

Fourier analysis, as every schoolchild knows, is a way of taking arbitrary functions of time (or space; but let's stick to time) and re-writing them as sums of trigonometric functions. These functions extend evenly, with constant amplitude, over all time, but they have unique, definite frequencies. The reason physicists and engineers are drilled to death in Fourier analysis is this: it's very easy to figure out how a linear, time-invariant system will respond to an input consisting of a single pure frequency, so to see how it responds to a more complex signal, we resolve that signal into its Fourier components, calculate the response to each of them, and then sum up. (This is important in quantum mechanics, too, because that's a fundamentally linear theory, and time-invariant if we're right about conservation of energy, which we are.)

The trigonometric functions form a *basis* for the space of
functions --- a set of primitives, such that any (reasonable) function is a sum
of functions from the basis. To specify a function, then, we can just give the
weight for each basis component in its decomposition. Fourier components, as I
said, extend with equal amplitude over all time, but have a definite frequency.
Their direct counterparts are Dirac delta functions, which are localized to a
single unique time, but, when Fourier transformed, evenly cover all
frequencies. The Dirac delta functions are another basis, but not usually a
very convenient one, since it just amount to specifying the value of the
function at each point!

The last paragraph is meant to suggest two ideas. First, it might be
possible to find bases which are localized in both time and frequency. (It
turns out that you can't have a basis which is completely localized, down to
points, in *both* time and frequency, and in fact there's a trade-off
between the two, which, in the context of quantum mechanics, basically is the
Heisenberg uncertainty principle.) If you like to think about frequencies at
particular times, or at least particular regions of time, this would be a Nice
Thing. The best known venture in this direction are wavelets, but
there are many, many, many others.

Second, some bases allow for more efficient description of some signals than
others. If your signal is a combination of a few pure tones, you just say what
those tones are (and their amplitudes). If your signal is a few isolated
spikes, you use the Dirac basis and say when the spikes happen. If it's not at
either of these extremes, in general you'll have an infinite number of
components in both the Fourier and Dirac bases. Sometimes, we can get by while
ignoring all but a (reasonably) small number of components --- these are
signals which are well-approximated by finite Fourier representations, or
finite Dirac representations. But there's no general guarantee
that *your* favorite signals can be handled in this way. You might, in
fact, ask yourself whether it's not possible to automatically *find* a
basis in which your signals have a "sparse" representation, i.e., one where all
but a few components have zero weight and can be ignored. This is the question
of basis selection or basis pursuit, which I'd like to understand better,
because I currently don't.

Finally, it's worth noting that a given basis might let us represent our
signal sparsely, but at the same time it might not mesh well with the dynamics
we're interested in. You can always Fourier transform your signal, but if you
then run it through some nonlinear transducer,
that's not going to help you much. What to do about *that*, I have even
less idea. Linear transducers can be characterized by saying how they respond
to Fourier components; what one would like would be some analogous
characterization for nonlinear systems. Norbert
Wiener had a very clever proposal along these lines, which was basically to
say how they respond to white noise. (The fact that he invented the
mathematical theory of white noise probably played a role here.)
Unfortunately, there's no particular reason to think that the resulting "Wiener
kernel expansion" is the most efficient way of represent the system...

- Recommended:
- James P. Crutchfield and James E. Hanson, "Turbulent Pattern Bases
for Cellular Automata", Physica D
**69**(1993): 279--301 [Preprint. The paper which first got me interested in the problem of*discovering*the best basis. But they don't actually find a function basis!] - Neil Gershenfeld, The Nature of Mathematical Modeling [As on many topics, this book gives a very clear, if hyper-compressed, explanation of the basic ideas underlying functional decomposition and (especially) wavelets, with good references.]
- Norbert Wiener, Nonlinear Problems in Random Theory

- To read (many thanks to Chris Genovese and Fazal Majid for recommendations):
- T. Cai, "Adaptive wavelet estimation: a block thresholding and
oracle inequality approach", The Annals of
Statistics
**27**(1999): 898-924 - Chen and Donoho, "Basis Pursuit", tech report (1994)
- Coifman, Meyer, and Wickerhauser, "Wavelet analysis and signal processing", pp. 153--178 in Wavelets and Their Applications, M. B. Ruskai et al. (eds.), 1992
- Coifman and Wickerhauser, "Entropy-based algorithms for best-basis
selection", IEEE Transactions on Information
Theory
**38**(1992): 713. - Donoho (1993), "Unconditional Bases are optimal bases for data compression and for statistical estimation", Applied and Computational Harmonic Analysis
- Donoho (1996), "Unconditional Bases and bit-level compression", Applied and Computational Harmonic Analysis
- Donoho and Johnstone, "Ideal Spatial Adaptation via Wavelet
Shrinkage", Biometrika
**81**(1994): 425 - Donoho and Johnstone, "Minimax Risk Over lp-Balls for
lq-Error", Probability Theory and Related
Fields
**99**(1994): 277 - Donoho and Johnstone, "Neo-Classical Minimax Problems, Thresholding
and Adaptation", Bernoulli
**2**(1996): 39 - Donoho and Johnstone, "Minimax Estimation via Wavelet Shrinkage",
Annals of Statistics
**26**(1998): 879 - Donoho and Johnstone, "Asymptotic minimaxity of wavelet estimators
with sampled data", Statistica Sinica
**9**(1999): 1--32 - Donoho, Johnstone, Kerkyacharian, and Picard, "Wavelet Shrinkage:
Asymptopia", Journal of the Royal Statistical Society
B
**57**(1995): 301 - Johnstone (1994), "Minimax Bayes, asymptotic minimax and sparse wavelet priors", Statistical Decision Theory and Related Topics, V (eds. S.S. Gupta and J.O. Berger, 1994), pp. 303ff
- Johnstone and Silverman "Wavelet threshold estimators for data with
correlated noise", Journal of the Royal Statistical Society
B
**59**(1998): 319. - Johnstone and Silverman, "Needles and straw in haystacks: Empirical
Bayes estimates of possibly sparse sequences", Annals of
Statistics
**32**(2004): 1594. - Johnstone and Silverman, "Empirical Bayes selection of wavelet
thresholds", Annals of Statistics
**33**(2005): to appear. - Sridhar Mahadevan, Representation Discovery Using Harmonic Analysis [blurb]
- Mladen Victor Wickerhauser, Adapted Wavelet Analysis: From Theory to Software [Blurb, author's site]]