November 05, 2004

Friday Cat Blogging (Keeping an Eye on the Mouse Issue of Science Geek Edition)

I never expected to get mentioned by the Times because of a running joke, which only started because I don't have a digital camera... But now that I have, I guess I'd better keep it up. (I was traveling last weekend, without net access.) Herewith this week's venture into a scientific discussion related at least peripherally to cats.

Charles F. Stevens, "Preserving properties of object shape by computations in primary visual cortex", Proceedings of the National Academy of Sciences (USA) 101 (2004): 15524--15529 [Link to full text via Open Access]
Abstract: Although our visual system is extremely good at extracting objects from the visual scene, this process involves complicated computations that are thought to require image processing by many successive cortical areas. Thus, intermediate stages in object extraction should not eliminate essential properties of the objects that are still required by later stages. A particularly important characteristic of an object is its shape, and shape has the property that it is unchanged by translations, rotations, and magnifications of the image. I show that the requirement for this property of shape to be preserved in the image, as represented by the firing of neurons in the primary visual cortex (V1), is equivalent to a particular type of computation, known as a wavelet transform, determining the firing rate of V1 neurons in response to an image on the retina. Experimental data support the conclusion that the neural representation of images in V1 is described by a wavelet transform and, therefore, that the properties of shape are preserved.

In the last installment, I talked about a neat experiment which examined the statistics of moving images as seen by cats under (fairly) natural conditions. One aspect of that experiment which I did not dwell on was that their analysis made heavy use of wavelet transforms. There, this was presented more or less as a convenient mathematical tool, but Stevens is arguing that it actually has a biological basis, at least if the right kind of wavelets are used. So let me say just a little about wavelets. (Whole books do not exhaust the subject.)

Consider a two-dimensional image, like the one projected on your retina (or Fuzzy's). Let's ignore color for the moment and think just about light-intensity. One way of representing this image is as a superposition or sum of elementary, basic patterns, which consist of a single, point-size bright dot on a dark background ("Dirac delta functions"). The full image is equivalent to adding up what you'd get from taking a weighted combination of all these elementary patterns, where the weights correspond to the light-intensities at various points. This gains us nothing in itself, but it introduces the idea of representing the image as a combination of some basic patterns, which don't have to be dots. A classical one to use instead is sine waves, in which case the decomposition of the image is called its Fourier transform. The reason why Fourier transforms are classical parts of applied math is that they are extremely useful in solving linear equations, and the reason for that, in turn, is that a sine wave is invariant under translation: shift it left or right (by its wavelength) and you have the same wave. (Delta functions are not translation-invariant.)

The thing is, there's no reason to stick with delta functions and sine waves as the only two function bases. Sine waves spread over all space, but have a single definite frequency; delta functions have a definite location, but their Fourier transform is spread over all frequencies. You can come up with infinitely many other function bases which interpolate between these extremes, ones which are more or less localized in both space and frequency. "Wavelet" is a generic name for a kind of function basis which has certain nice convergence properties, and nice symmetry properties. Specifically, they're invariant under translations, rotations and dilations --- which, as Stevens says, means they preserve shape.

(Stevens does not explain wavelets at all. You might try Wikipedia or Mathworld; Gershenfeld's Nature of Mathematical Modeling has a good section on them.)

Steven's basic argument is nice summed up in his abstract. The visual areas of the mammalian brain are arranged in a more or less hierarchical fashion, with successive areas taking the activity of their predecessors as inputs, and responding to more and more abstract features of the retinal image. ("More or less hierarchical" because there are also important connections running "backwards", from the higher-level areas to the lower-level ones. They're important for things like gestalt effects, but tangential to the present argument.) If the centers higher in the hierarchy are going to be sensitive to shape, the computations carried out lower in the hierarchy had better not eliminate information about shapes; this is particularly true of the primary visual area, V1. This is biologically important, because the animal needs to be able to do things like pick a mouse or another cat out of the background of the scene, track it as it moves, etc., and these are all shape-related properties. (The information is in the scene, but it needs to be made accessible.) Making the common assumption that neurons communicate with each other by changing the rate at which they produce electrical spikes ("rate-coding"), Stevens argues that the spiking rate of neurons in V1 has to be a kind of wavelet transform of the retinal image. But his arguments are really generic, and don't commit him to any of the many different kinds of wavelets.

Here, at last, are the cats. Experimentally, neurons in the V1 area of cats have spiking rates which match very nicely one particular kind of wavelet ("Gabor functions", which are products of Gaussians and sine waves). Stevens points out that, if his analysis is correct, the parameters of those functions should have a certain non-trivial linear relationship to each other. Further examining the data from the cat experiments, the fit looks at least reasonable. (Stevens doesn't report any quantitative measures of fit or significance.)

There are a whole host of issues here. The one which comes first to my mind is the assumption of rate-coding. There's just too much information in the timing of the spikes for me to believe that rate-coding is the whole story, but it's not clear how much of Stevens's story needs to be changed if we allow for time-coding on the part of neurons in V1, or other visual areas. Still, it's a neat idea, and worth taking seriously, and at least marginally connected to cats.

Friday Cat Blogging; Minds, Brains, and Neurons; Mathematics

Posted at November 05, 2004 20:50 | permanent link

Three-Toed Sloth