I never expected to get mentioned by the Times because of a running joke, which only started because I don't have a digital camera... But now that I have, I guess I'd better keep it up. (I was traveling last weekend, without net access.) Herewith this week's venture into a scientific discussion related at least peripherally to cats.

- Charles F. Stevens, "Preserving properties of object shape by computations
in primary visual cortex", Proceedings of the National Academy of
Sciences (USA)
**101**(2004): 15524--15529 [Link to full text via Open Access] *Abstract*: Although our visual system is extremely good at extracting objects from the visual scene, this process involves complicated computations that are thought to require image processing by many successive cortical areas. Thus, intermediate stages in object extraction should not eliminate essential properties of the objects that are still required by later stages. A particularly important characteristic of an object is its shape, and shape has the property that it is unchanged by translations, rotations, and magnifications of the image. I show that the requirement for this property of shape to be preserved in the image, as represented by the firing of neurons in the primary visual cortex (V1), is equivalent to a particular type of computation, known as a wavelet transform, determining the firing rate of V1 neurons in response to an image on the retina. Experimental data support the conclusion that the neural representation of images in V1 is described by a wavelet transform and, therefore, that the properties of shape are preserved.

In the last installment, I talked about a neat experiment which examined the statistics of moving images as seen by cats under (fairly) natural conditions. One aspect of that experiment which I did not dwell on was that their analysis made heavy use of wavelet transforms. There, this was presented more or less as a convenient mathematical tool, but Stevens is arguing that it actually has a biological basis, at least if the right kind of wavelets are used. So let me say just a little about wavelets. (Whole books do not exhaust the subject.)

Consider a two-dimensional image, like the one projected on your retina (or
Fuzzy's). Let's ignore color for the moment and think just about
light-intensity. One way of representing this image is as a superposition or
sum of elementary, basic patterns, which consist of a single, point-size bright
dot on a dark background ("Dirac delta functions"). The full image is
equivalent to adding up what you'd get from taking a weighted combination of
all these elementary patterns, where the weights correspond to the
light-intensities at various points. This gains us nothing in itself, but it
introduces the idea of representing the image as a combination of some basic
patterns, which don't have to be dots. A classical one to use instead is sine
waves, in which case the decomposition of the image is called its Fourier
transform. The reason why Fourier transforms are classical parts of applied
math is that they are extremely useful in solving linear equations, and the
reason for that, in turn, is that a sine wave is *invariant under
translation*: shift it left or right (by its wavelength) and you have the
same wave. (Delta functions are not translation-invariant.)

The thing is, there's no reason to stick with delta functions and sine waves
as the only two function bases. Sine waves spread over all space, but have a
single definite frequency; delta functions have a definite location, but their
Fourier transform is spread over all frequencies. You can come up with
infinitely many other function bases which interpolate between these extremes,
ones which are more or less localized in both space and frequency. "Wavelet"
is a generic name for a kind of function basis which has certain nice
convergence properties, *and* nice symmetry properties. Specifically,
they're invariant under translations, rotations and dilations --- which, as
Stevens says, means they preserve shape.

(Stevens does not explain wavelets *at all*. You might try Wikipedia or Mathworld;
Gershenfeld's Nature
of Mathematical Modeling has a good section on them.)

Steven's basic argument is nice summed up in his abstract. The visual areas
of the mammalian brain are arranged in a more or less hierarchical fashion,
with successive areas taking the activity of their predecessors as inputs, and
responding to more and more abstract features of the retinal image. ("More or
less hierarchical" because there are also important connections running
"backwards", from the higher-level areas to the lower-level ones. They're
important for things like gestalt effects, but tangential to the present
argument.) If the centers higher in the hierarchy are going to be sensitive to
shape, the computations carried out lower in the hierarchy had better not
eliminate information about shapes; this is particularly true of the primary
visual area, V1. This is *biologically* important, because the animal
needs to be able to do things like pick a mouse or another cat out of the
background of the scene, track it as it moves, etc., and these are all
shape-related properties. (The information is in the scene, but it needs to be
made accessible.) Making the common assumption that neurons communicate with
each other by changing the rate at which they produce electrical spikes
("rate-coding"), Stevens argues that the spiking rate of neurons in V1 has to
be a kind of wavelet transform of the retinal image. But his arguments are
really generic, and don't commit him to any of the many
different *kinds* of wavelets.

Here, at last, are the cats. Experimentally, neurons in the V1 area of cats have spiking rates which match very nicely one particular kind of wavelet ("Gabor functions", which are products of Gaussians and sine waves). Stevens points out that, if his analysis is correct, the parameters of those functions should have a certain non-trivial linear relationship to each other. Further examining the data from the cat experiments, the fit looks at least reasonable. (Stevens doesn't report any quantitative measures of fit or significance.)

There are a whole host of issues here. The one which comes first to my mind
is the assumption of rate-coding. There's just too much information in the *timing* of
the spikes for me to believe that rate-coding is the whole story, but it's
not clear how much of Stevens's story needs to be changed if we allow for
time-coding on the part of neurons in V1, or other visual areas. Still, it's a
neat idea, and worth taking seriously, and at least marginally connected
to cats.

Friday Cat Blogging; Minds, Brains, and Neurons; Mathematics

Posted at November 05, 2004 20:50 | permanent link