[From Peter Cariani (960509.1030)]
[From Rick Marken (960508.1830)]
You [Peter Cariani]
seem to be arguing that the temporal pattern of neural impulses
can be a "code" for a perception. For example, you suggest that a
temporal pattern of neural impulses can be a code for the perception
of pitch. I know that there is evidence pointing to both "place" and
"time" coding of pitch. But I am curious if you have ever considered
the implications of these theories of neural coding for a control
model of control of pitch.One very common example of controlling pitch is tuning a musical
instrument (a guitar in my case). Have you ever tried to build a model
that brings one pitch perception into a match with a reference pitch?
The input to the control model would be a tone frequency that can be
varied by the output of the "tuner" (the control system). The
perceptual representation of this tone would, by your theory of pitch,
be a temporal "code" of the tone; this code would presumably change as
tone frequency changed as a result of the tuner's output (turning a
peg on the guitar). I presume that this temporal code would have to be
continuously compared to a reference signal of some kind (presumably a
temporal code of the desired pitch). The comparator would have to be
able to continuously measure the difference between these codes and
this difference would have to be turned into an output that increases
or decreases tone frequency, as necessary, to bring actual and
reference pitch (codes) into a match.We know that people can tune guitars (control pitch); I would like to
see a model of tuning that is based on control of pitch codes. If
pitch were represented by signal magnitude (measured as neural firing
rate) such a control system would be trivially easy to build. I would
like to know what it takes to build such a control system on the basis
of temporal patterns of neural firings.
Yes, I've thought about the control issues a little bit, mostly in
terms of the pitch-matching task, which is related to the tuning of
a musical instrument using a reference tone generator. Tuning by
musical interval can also be done simply in a temporally-coded system.
There are two major alternatives:
1) a time-to-place transformation/periodicity "detector" model:
This model is related to that of J.C.R. Licklider,
"A duplex theory of pitch perception" Experientia, 1951
also in C. Cherry, ed. Information Theory, 1956.
A temporal pattern is "recognized" by a neural assembly
(in effect a time-delay neural network) that has the proper
internal delays to detect a particular pitch, and that one has
the ability to "read" the output of that particular assembly, so
that as one gets closer in fundamental frequency F0 (->pitch) in the
tuning process the output of this neural assembly is optimized
in some way (the output could be in the form of firing rate,
of response latency, of synchrony of output, or of some other
kind of time-structured signal). As one reduces the difference
between the remembered activation pattern (using whatever signal
vehicle) and the present one, the output of the assembly (however
defined) is maximized, such that one has an error-reducing
perceptual control process. This explanation has the merit
of using neural network explanations that are currently in play,
but it has the drawback of assuming that either individual
neural assemblies are precisely tuned (and one then needs a
whole array of them) or somehow or another connectivities
within the population take care of the problem
("distributed encoding"). Since nobody has observed
sharply-tuned F0 (pitch) detectors (except maybe in bats), and
an ideosyncratically-wired, distributed system has lots of problems
in getting all of the perceptual pitch equivalence-classes right,
these explanations are not very satisfying to me.
2) a temporal memory trace/analog cross-correlation model
if the time pattern of the waveform coming in is reflected in
the time pattern of the resulting spike trains and these
are "stored" in a reverberating circuit of some sort
(in this context one might conceive of the hippocampus as an
organ that receives signals and continuously rebroadcasts
them for some period of time), then all that one needs to do
is to perform a temporal cross-correlation
between the "remembered", reverberating pulse trains and the
new, incoming signal. If the two sounds have the same fundamental,
there will be a high degree of cross-correlation (so that the
firing rate of cross-correlators will be maximized, and/or if
you broadcast your signal to a bank of cross-correlators, you will
get their temporally-structured signal containing common
periodicities (periodicities of the cross-correlations) returned
back to you -- if there is no cross-correlation, you get nothing
back, if the pitch is close, you get a return signal whose duration
is related to the difference between the two pitches, see next
sentence). The longer you listen to the signal and extend
the memory trace and the cross-correlation, the better will be
your temporal resolution (like a vernier). To tune an instrument
one tries to maximize the cross-correlation of the remembered
sound, a temporal, pulse-coded memory trace, and the incoming
signal. (The general scheme is also related to notions of "adaptive
resonance", top-down facilitation of perception/"perceptual attention").
For tuning by musical interval, unison (1:1) gives the highest
cross-correlation, octave (2:1) the next highest, the fifth
(3:2) the next, the fourth (4:3) the next, and so on. Each of these
musical intervals produces a characteristic interference pattern
(that can be seen in the form of the autocorrelation function of
the musical interval or chord.) [Ratios of time intervals are
simply compared in temporally-coded spike trains by using sets of
tapped delay lines having different conduction velocities.]
So, here one would have a system that recognizes "relative pitch"
better than "absolute pitch" (and this is not due to whether one
has attached linguistic labels to particular pitches). I like this
explanation because it does not require sharply-tuned elements or
assemblies, and it elegantly explains why particular frequency
ratios have their particular qualities. If this is what the
brain actually does, then all harmonic relations ("tonality")
are not [completely] the result of cultural conditioning
and/or auditory experience, but are a direct consequence
of the nature of the underlying neural codes and
the neural architectures that process them. The problems
with this scheme have more to do with lack of evidence than
anything else -- cortical neurophysiologists are mostly not
concerned with precise time-patterns so they don't look for them
and where precise time patterns have been found in cortical
responses (by Lestienne & Strehler and by Moshe Abeles), they are
embedded in other kinds of spike patterns (so one wouldn't ever
see them unless one were looking specifically for embedded
patterns, i.e. with an autocorrelation or complex-pattern
analysis). As far as I know, Boomsliter & Creel, "The long
pattern hypothesis in harmony and hearing", J. Music Theory,
5:2-31, 1962, is the only explanation of musical intervals
in these terms, and I know of no-one else (besides myself)
who has proposed the notion of temporally-structured
memory-traces and their cross-correlations with
temporally-structured inputs. So take this (please!)
as a very, tentative idea that can (eventually) be tested,
but has not yet been tested.
I'm sorry this is so long-winded (I get going and find it hard
to stop), but the upshot of it all is that one can do perceptual
control using any kind of signal (temporal or otherwise). I think
temporal coding has the advantage of being a very robust
scheme that is invariant to huge changes in level, auditory
location, onset/offset dynamics, s/n ratio, and competing
sounds. Multiplexing of time-coded signals permit
"broadcast" strategies for the coordination of many
asynchronous processes in a heterarchy, because one
no longer depends on "labelled lines" and precise point-to-point
connectivity. The representations are "sparse" in the time
domain and are therefore inherently "transparent" to each other,
and segmentation and binding can be achieved using commonalities
of time pattern. These kinds of properties have barely begun to be
exploited properly in audition, and I'm very sure they carry
implications for every other sense-modality, including vision.
(I believe that the basic concepts behind radio, radar, and
Fourier analyis will eventually come into to the heart of the
theory of neural networks, and when they do, we will think of
these things in very different terms.)
Peter Cariani