Basis Selection in Function Decomposition

Fourier analysis, as every schoolchild knows, is a way of taking arbitrary functions of time (or space; but let's stick to time) and re-writing them as sums of trigonometric functions. These functions extend evenly, with constant amplitude, over all time, but they have unique, definite frequencies. The reason physicists and engineers are drilled to death in Fourier analysis is this: it's very easy to figure out how a linear, time-invariant system will respond to an input consisting of a single pure frequency, so to see how it responds to a more complex signal, we resolve that signal into its Fourier components, calculate the response to each of them, and then sum up. (This is important in quantum mechanics, too, because that's a fundamentally linear theory, and time-invariant if we're right about conservation of energy, which we are.)

The trigonometric functions form a basis for the space of functions --- a set of primitives, such that any (reasonable) function is a sum of functions from the basis. To specify a function, then, we can just give the weight for each basis component in its decomposition. Fourier components, as I said, extend with equal amplitude over all time, but have a definite frequency. Their direct counterparts are Dirac delta functions, which are localized to a single unique time, but, when Fourier transformed, evenly cover all frequencies. The Dirac delta functions are another basis, but not usually a very convenient one, since it just amount to specifying the value of the function at each point!

The last paragraph is meant to suggest two ideas. First, it might be possible to find bases which are localized in both time and frequency. (It turns out that you can't have a basis which is completely localized, down to points, in both time and frequency, and in fact there's a trade-off between the two, which, in the context of quantum mechanics, basically is the Heisenberg uncertainty principle.) If you like to think about frequencies at particular times, or at least particular regions of time, this would be a Nice Thing. The best known venture in this direction are wavelets, but there are many, many, many others.

Second, some bases allow for more efficient description of some signals than others. If your signal is a combination of a few pure tones, you just say what those tones are (and their amplitudes). If your signal is a few isolated spikes, you use the Dirac basis and say when the spikes happen. If it's not at either of these extremes, in general you'll have an infinite number of components in both the Fourier and Dirac bases. Sometimes, we can get by while ignoring all but a (reasonably) small number of components --- these are signals which are well-approximated by finite Fourier representations, or finite Dirac representations. But there's no general guarantee that your favorite signals can be handled in this way. You might, in fact, ask yourself whether it's not possible to automatically find a basis in which your signals have a "sparse" representation, i.e., one where all but a few components have zero weight and can be ignored. This is the question of basis selection or basis pursuit, which I'd like to understand better, because I currently don't.

Finally, it's worth noting that a given basis might let us represent our signal sparsely, but at the same time it might not mesh well with the dynamics we're interested in. You can always Fourier transform your signal, but if you then run it through some nonlinear transducer, that's not going to help you much. What to do about that, I have even less idea. Linear transducers can be characterized by saying how they respond to Fourier components; what one would like would be some analogous characterization for nonlinear systems. Norbert Wiener had a very clever proposal along these lines, which was basically to say how they respond to white noise. (The fact that he invented the mathematical theory of white noise probably played a role here.) Unfortunately, there's no particular reason to think that the resulting "Wiener kernel expansion" is the most efficient way of represent the system...

James P. Crutchfield and James E. Hanson, "Turbulent Pattern Bases for Cellular Automata", Physica D 69 (1993): 279--301 [Preprint. The paper which first got me interested in the problem of discovering the best basis. But they don't actually find a function basis!]
Neil Gershenfeld, The Nature of Mathematical Modeling [As on many topics, this book gives a very clear, if hyper-compressed, explanation of the basic ideas underlying functional decomposition and (especially) wavelets, with good references.]
Norbert Wiener, Nonlinear Problems in Random Theory

T. Cai, "Adaptive wavelet estimation: a block thresholding and oracle inequality approach", The Annals of Statistics 27 (1999): 898-924
Chen and Donoho, "Basis Pursuit", tech report (1994)
Coifman, Meyer, and Wickerhauser, "Wavelet analysis and signal processing", pp. 153--178 in Wavelets and Their Applications, M. B. Ruskai et al. (eds.), 1992
Coifman and Wickerhauser, "Entropy-based algorithms for best-basis selection", IEEE Transactions on Information Theory 38 (1992): 713.
Donoho (1993), "Unconditional Bases are optimal bases for data compression and for statistical estimation", Applied and Computational Harmonic Analysis
Donoho (1996), "Unconditional Bases and bit-level compression", Applied and Computational Harmonic Analysis
Donoho and Johnstone, "Ideal Spatial Adaptation via Wavelet Shrinkage", Biometrika 81 (1994): 425
Donoho and Johnstone, "Minimax Risk Over lp-Balls for lq-Error", Probability Theory and Related Fields 99 (1994): 277
Donoho and Johnstone, "Neo-Classical Minimax Problems, Thresholding and Adaptation", Bernoulli 2 (1996): 39
Donoho and Johnstone, "Minimax Estimation via Wavelet Shrinkage", Annals of Statistics 26 (1998): 879
Donoho and Johnstone, "Asymptotic minimaxity of wavelet estimators with sampled data", Statistica Sinica 9 (1999): 1--32
Donoho, Johnstone, Kerkyacharian, and Picard, "Wavelet Shrinkage: Asymptopia", Journal of the Royal Statistical Society B 57 (1995): 301
Johnstone (1994), "Minimax Bayes, asymptotic minimax and sparse wavelet priors", Statistical Decision Theory and Related Topics, V (eds. S.S. Gupta and J.O. Berger, 1994), pp. 303ff
Johnstone and Silverman "Wavelet threshold estimators for data with correlated noise", Journal of the Royal Statistical Society B 59 (1998): 319.
Johnstone and Silverman, "Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences", Annals of Statistics 32 (2004): 1594.
Johnstone and Silverman, "Empirical Bayes selection of wavelet thresholds", Annals of Statistics 33 (2005): to appear.
Sridhar Mahadevan, Representation Discovery Using Harmonic Analysis [blurb]
Mladen Victor Wickerhauser, Adapted Wavelet Analysis: From Theory to Software [Blurb, author's site]]

Notebooks

Basis Selection in Function Decomposition