Perceptual control, tuning, and time codes

Peter_A_Cariani1 · May 9, 1996, 12:09pm

[From Peter Cariani (960509.1030)]

[From Rick Marken (960508.1830)]
You [Peter Cariani]
seem to be arguing that the temporal pattern of neural impulses
can be a "code" for a perception. For example, you suggest that a
temporal pattern of neural impulses can be a code for the perception
of pitch. I know that there is evidence pointing to both "place" and
"time" coding of pitch. But I am curious if you have ever considered
the implications of these theories of neural coding for a control
model of control of pitch.

One very common example of controlling pitch is tuning a musical
instrument (a guitar in my case). Have you ever tried to build a model
that brings one pitch perception into a match with a reference pitch?
The input to the control model would be a tone frequency that can be
varied by the output of the "tuner" (the control system). The
perceptual representation of this tone would, by your theory of pitch,
be a temporal "code" of the tone; this code would presumably change as
tone frequency changed as a result of the tuner's output (turning a
peg on the guitar). I presume that this temporal code would have to be
continuously compared to a reference signal of some kind (presumably a
temporal code of the desired pitch). The comparator would have to be
able to continuously measure the difference between these codes and
this difference would have to be turned into an output that increases
or decreases tone frequency, as necessary, to bring actual and
reference pitch (codes) into a match.

We know that people can tune guitars (control pitch); I would like to
see a model of tuning that is based on control of pitch codes. If
pitch were represented by signal magnitude (measured as neural firing
rate) such a control system would be trivially easy to build. I would
like to know what it takes to build such a control system on the basis
of temporal patterns of neural firings.

Yes, I've thought about the control issues a little bit, mostly in
terms of the pitch-matching task, which is related to the tuning of
a musical instrument using a reference tone generator. Tuning by
musical interval can also be done simply in a temporally-coded system.

There are two major alternatives:

1) a time-to-place transformation/periodicity "detector" model:
This model is related to that of J.C.R. Licklider,
"A duplex theory of pitch perception" Experientia, 1951
also in C. Cherry, ed. Information Theory, 1956.
A temporal pattern is "recognized" by a neural assembly
(in effect a time-delay neural network) that has the proper
internal delays to detect a particular pitch, and that one has
the ability to "read" the output of that particular assembly, so
that as one gets closer in fundamental frequency F0 (->pitch) in the
tuning process the output of this neural assembly is optimized
in some way (the output could be in the form of firing rate,
of response latency, of synchrony of output, or of some other
kind of time-structured signal). As one reduces the difference
between the remembered activation pattern (using whatever signal
vehicle) and the present one, the output of the assembly (however
defined) is maximized, such that one has an error-reducing
perceptual control process. This explanation has the merit
of using neural network explanations that are currently in play,
but it has the drawback of assuming that either individual
neural assemblies are precisely tuned (and one then needs a
whole array of them) or somehow or another connectivities
within the population take care of the problem
("distributed encoding"). Since nobody has observed
sharply-tuned F0 (pitch) detectors (except maybe in bats), and
an ideosyncratically-wired, distributed system has lots of problems
in getting all of the perceptual pitch equivalence-classes right,
these explanations are not very satisfying to me.

2) a temporal memory trace/analog cross-correlation model
if the time pattern of the waveform coming in is reflected in
the time pattern of the resulting spike trains and these
are "stored" in a reverberating circuit of some sort
(in this context one might conceive of the hippocampus as an
organ that receives signals and continuously rebroadcasts
them for some period of time), then all that one needs to do
is to perform a temporal cross-correlation
between the "remembered", reverberating pulse trains and the
new, incoming signal. If the two sounds have the same fundamental,
there will be a high degree of cross-correlation (so that the
firing rate of cross-correlators will be maximized, and/or if
you broadcast your signal to a bank of cross-correlators, you will
get their temporally-structured signal containing common
periodicities (periodicities of the cross-correlations) returned
back to you -- if there is no cross-correlation, you get nothing
back, if the pitch is close, you get a return signal whose duration
is related to the difference between the two pitches, see next
sentence). The longer you listen to the signal and extend
the memory trace and the cross-correlation, the better will be
your temporal resolution (like a vernier). To tune an instrument
one tries to maximize the cross-correlation of the remembered
sound, a temporal, pulse-coded memory trace, and the incoming
signal. (The general scheme is also related to notions of "adaptive
resonance", top-down facilitation of perception/"perceptual attention").
For tuning by musical interval, unison (1:1) gives the highest
cross-correlation, octave (2:1) the next highest, the fifth
(3:2) the next, the fourth (4:3) the next, and so on. Each of these
musical intervals produces a characteristic interference pattern
(that can be seen in the form of the autocorrelation function of
the musical interval or chord.) [Ratios of time intervals are
simply compared in temporally-coded spike trains by using sets of
tapped delay lines having different conduction velocities.]
So, here one would have a system that recognizes "relative pitch"
better than "absolute pitch" (and this is not due to whether one
has attached linguistic labels to particular pitches). I like this
explanation because it does not require sharply-tuned elements or
assemblies, and it elegantly explains why particular frequency
ratios have their particular qualities. If this is what the
brain actually does, then all harmonic relations ("tonality")
are not [completely] the result of cultural conditioning
and/or auditory experience, but are a direct consequence
of the nature of the underlying neural codes and
the neural architectures that process them. The problems
with this scheme have more to do with lack of evidence than
anything else -- cortical neurophysiologists are mostly not
concerned with precise time-patterns so they don't look for them
and where precise time patterns have been found in cortical
responses (by Lestienne & Strehler and by Moshe Abeles), they are
embedded in other kinds of spike patterns (so one wouldn't ever
see them unless one were looking specifically for embedded
patterns, i.e. with an autocorrelation or complex-pattern
analysis). As far as I know, Boomsliter & Creel, "The long
pattern hypothesis in harmony and hearing", J. Music Theory,
5:2-31, 1962, is the only explanation of musical intervals
in these terms, and I know of no-one else (besides myself)
who has proposed the notion of temporally-structured
memory-traces and their cross-correlations with
temporally-structured inputs. So take this (please!)
as a very, tentative idea that can (eventually) be tested,
but has not yet been tested.

I'm sorry this is so long-winded (I get going and find it hard
to stop), but the upshot of it all is that one can do perceptual
control using any kind of signal (temporal or otherwise). I think
temporal coding has the advantage of being a very robust
scheme that is invariant to huge changes in level, auditory
location, onset/offset dynamics, s/n ratio, and competing
sounds. Multiplexing of time-coded signals permit
"broadcast" strategies for the coordination of many
asynchronous processes in a heterarchy, because one
no longer depends on "labelled lines" and precise point-to-point
connectivity. The representations are "sparse" in the time
domain and are therefore inherently "transparent" to each other,
and segmentation and binding can be achieved using commonalities
of time pattern. These kinds of properties have barely begun to be
exploited properly in audition, and I'm very sure they carry
implications for every other sense-modality, including vision.
(I believe that the basic concepts behind radio, radar, and
Fourier analyis will eventually come into to the heart of the
theory of neural networks, and when they do, we will think of
these things in very different terms.)

Peter Cariani

_Martin_Taylor · May 9, 1996, 6:48pm

[Martin Taylor 960509 13:30]

Peter Cariani (960509.1030)

One has to be careful in modelling pitch perception, I think. At least,
I cannot see how models in which perceived pitch can be derived from
an analysis of firings phase-locked with the acoustic signal (whether
the analysis be time or frequency-based) can be consistent with a couple
of facts about perceived pitch.

The following observations were done by having the subject compare the
pitch of a signal in one ear with that of a signal in the other ear, so
they represent the frequencies of two signals that are perceived as having
the same pitch. Call them the reference and test signals.

1. If the test signal has been preceded by a period in which the subject
was listening in that ear to a signal of slightly higher or lower frequency,
the perceived pitch is shifted away from that of the predecessor signal.
(Analogue of the visual and tactile and auditory-location figural after-effect).

2. Signals of the same frequency but different harmonic content have
different pitches, the sine-wave having the lowest pitch of all the waveforms
we tested.

3. The pitch as a function of frequency of a given signal waveform
perceived in one ear does not match well the pitch perceived in the
other ear--the pitch of, say, a 600 Hz tone in one ear might match that of
a 601 Hz tone in the other (the actual numbers are pulled out of thin air,
not remembered from experiment)--and the mismatch function is quite jagged,
though replicable within a person. That also seems to argue against an
explanation based on phase-locked neural firings.

These are experiments of some 20 years ago, not published by us, but they
should be easily replicated by anyone with two signal generators and a
good ear.

The perception of pitch may be based on the temporal patterns of
neural firings locked to the signal waveform, but I doubt it can be
derived _exclusively_ from those patterns.

···

---------------------------

Rick Marken (960508.1830)

Maybe I'm wrong, but I'm reading your comment to Peter as indicating that
you are thinking of "temporal code" as some kind of discrete representation.
I read Peter differently, as dealing with a variable that changes continuously
as frequency changes. If that is so, then you wouldn't have a problem with
using it in a controller for pitch, would you? (And I know you don't have
a problem with controlling for discrete-valued perceptions, either, because
your spread-sheet does it; but controlling a physically continuous variable
(frequency) through a discrete perceptual variable (pitch) might be a bit
more tricky).

Martin

Avery_Andrews1 · May 10, 1996, 7:51pm

These kinds of properties have barely begun to be
exploited properly in audition, and I'm very sure they carry
implications for every other sense-modality, including vision.
(I believe that the basic concepts behind radio, radar, and
Fourier analyis will eventually come into to the heart of the
theory of neural networks, and when they do, we will think of
these things in very different terms.)

Waddaya reckon about the chances of using them to deal with `compositional
semantics' as in my little piece on

http://www-csli.stanford.edu/users/andrews/pctsem.txt

E.g. distinguishing between,

   the cat chased the dog
   the small cat chased the large dog
   the dog chased the cat
    etc.

Neural net people have various ideas about this, but it certainly still looks
like an essentially unsolved problem.

Avery.Andrews@anu.edu.au

Avery_Andrews1 · May 10, 1996, 7:56pm

[Avery Andrews 960510]
(Peter Cariani (960509.1030))

Oops I left off the header to what I just posted (as `andaling'), it should
be as above.

Avery.Andrews@anu.edu.au

Peter_A_Cariani1 · May 10, 1996, 12:45pm

[from Peter Cariani, May10,1996 1PM]

mmt@BEN.DCIEM.DND.CA wrote:

[Martin Taylor 960509 13:30]
One has to be careful in modelling pitch perception, I think. At least,
I cannot see how models in which perceived pitch can be derived from
an analysis of firings phase-locked with the acoustic signal (whether
the analysis be time or frequency-based) can be consistent with a couple
of facts about perceived pitch.

The following observations were done by having the subject compare the
pitch of a signal in one ear with that of a signal in the other ear, so
they represent the frequencies of two signals that are perceived as having
the same pitch. Call them the reference and test signals.

1. If the test signal has been preceded by a period in which the subject
was listening in that ear to a signal of slightly higher or lower frequency,
the perceived pitch is shifted away from that of the predecessor signal.
(Analogue of the visual and tactile and auditory-location figural after-effect).

Was this for pure tone pitch or for the low pitch of complex tones?
Independent adaptation has been seen both seen (Hall et al)
for pure tone pitches and for complex tone pitches, but I don't remember
the magnitude of this effect. It doesn't prevent human listeners from
making pitch judgments within 1% of the true fundamental. One can also do
pitch matching in many ways, with square waves, click trains, or other
stimuli as references.

2. Signals of the same frequency but different harmonic content have
different pitches, the sine-wave having the lowest pitch of all the waveforms
we tested.

Can you give the specifics of the structure of these signals?
I'm talking about the low pitches that are heard at the fundamental
for harmonic complex tones (i.e. "musical pitches" rich in partials).
I agree that the pitches of inharmonic
complex tones can be very complex, but temporal models and our interval data
account for these also. For two-tone "combination tones" the pitch at F0 is
not much stronger than those of the partials, so I'm not surprised. Here the
"dominance region" can also come into play.....

3. The pitch as a function of frequency of a given signal waveform
perceived in one ear does not match well the pitch perceived in the
other ear--the pitch of, say, a 600 Hz tone in one ear might match that of
a 601 Hz tone in the other (the actual numbers are pulled out of thin air,
not remembered from experiment)--and the mismatch function is quite jagged,
though replicable within a person. That also seems to argue against an
explanation based on phase-locked neural firings.

Binaural diplacusis is a pretty subtle effect, as are slight changes in
pitches with (40 dB) changes in level. For both of these phenomena, I think
the effects are much smaller for complex-tone pitches than they are for
pure tone ones (I need to recheck the diplaucusis case). I think that these
small effects could be mediated by a "place" map that is centrally fused with
a periodicity representatation. It should be noted that rate-based "best frequencies"
of central auditory neurons can change dramatically (e.g. 50%) with level,
so it's remarkable that pitch shifts of <only> 1-5% are seen for pure tones
when levels are increased over 40dB. For complex tones, they are generally
less than 1-2% and depend on the numbers of harmonics involved.

The perception of pitch may be based on the temporal patterns of
neural firings locked to the signal waveform, but I doubt it can be
derived _exclusively_ from those patterns.

We found that a huge range of "complex pitches" can be explained in these terms.
There appear to be no complex pitches for which a temporal model does not apply
(but this cannot be said for place models).

I've really been through the psychophysical literature, and know of no strong
arguments against a (mostly) temporal mechanism for the pitch of complex tones.
The best counterarguments are arguments from ignorance that I tend to
heavily discount (along with arguments from authority or conventional wisdom).

Our pitch papers just got formally accepted, so I could send them to you if
you're interested. They should appear in J. Neurophysiology in the next few months.

Peter Cariani

_Martin_Taylor · May 10, 1996, 6:52pm

[Martin Taylor 960510 13:30]

Peter Cariani, May10,1996 1PM]

1. If the test signal has been preceded by a period in which the subject
was listening in that ear to a signal of slightly higher or lower frequency,
the perceived pitch is shifted away from that of the predecessor signal.
(Analogue of the visual and tactile and auditory-location figural
after-effect).

Was this for pure tone pitch or for the low pitch of complex tones?

It's been 25 years or more since we did these studies, and I'm not sure
whether we used both sine-wave and harmonic complexes or just one or
just the other. Since I never (deliberately) throw away experimental
data, I could probably come up with the answer, given enough time and
enthusiasm. But I don't have the answer right now.

I don't remember
the magnitude of this effect. It doesn't prevent human listeners from
making pitch judgments within 1% of the true fundamental.

I don't remember the magnitude either. But it was a consistently measurable
deviation from the frequency setting by the same listener before the
adapting tone or on a later occasion. These listeners were good ones, able
to replicate settings somewhat better than 1% at 500 Hz. Trained and
practiced subjects in psychoacoustic experiments.

2. Signals of the same frequency but different harmonic content have
different pitches, the sine-wave having the lowest pitch of all the waveforms
we tested.

Can you give the specifics of the structure of these signals?
I'm talking about the low pitches that are heard at the fundamental
for harmonic complex tones (i.e. "musical pitches" rich in partials).

In all these experiments, the pitch in question corresponds to the
fundamental of a harmonic complex of simple character--sinusoid, sawtooth,
triangle, or square--generated by a good but inexpensive tone generator.

Binaural diplacusis is a pretty subtle effect, as are slight changes in
pitches with (40 dB) changes in level. For both of these phenomena, I think
the effects are much smaller for complex-tone pitches than they are for
pure tone ones (I need to recheck the diplaucusis case).

I expect so, since the pitch of a complex tone depends on the activity of
peripheral neurons with peak sensitivities in different frequency regions.
(By the way, Bill P., these sensitivities are very sharply tuned provided
that the intensity is not too far from the lower threshold of sensitivity
for that neuron). The simple law of large numbers would argue that the
diplacusis would drop by the square root of the number of audible harmonics.
"Threshold" sensitivity is very bumpy if you probe it finely, and we assumed
that the perceived pitch would relate to the central frequency of the more
sensitive fibres, moving in little jumps as the frequency glided smoothly
up and down.

It should be noted that rate-based "best frequencies"
of central auditory neurons can change dramatically (e.g. 50%) with level,
so it's remarkable that pitch shifts of <only> 1-5% are seen for pure tones
when levels are increased over 40dB.
...

The perception of pitch may be based on the temporal patterns of
neural firings locked to the signal waveform, but I doubt it can be
derived _exclusively_ from those patterns.

We found that a huge range of "complex pitches" can be explained in these terms.
There appear to be no complex pitches for which a temporal model does not apply
(but this cannot be said for place models).

I don't understand the apparent contradiction here. On the one hand, I'm
pointing out that perceived pitch changes under certain conditions even
though frequency is held constant, and you are saying that the perceived
pitch shifts with intensity though only a little. On the other hand, you
are saying that temporal coding can account for all these effects. If
there is phase locking between the signal and the neural firings, and
if perceived pitch is derived _exclusively_ from the temporal pattern
of firings, then none of these conditions should alter the perceived
pitch of a given frequency tone.

There's some gap here in our understanding of each other and of the
psycho-physiological situation.

Our pitch papers just got formally accepted, so I could send them to you if
you're interested.

Yes, please. Do you know if Roy Patterson was a reviewer?

I'm holding off sending your CARE package until I finish editing a chapter
I want to include.

···

----------------

Bill Powers (960510.0400 MDT)

Also, remember that the auditory signal has to carry two basic
dimensions of information: pitch, to be sure, but also intensity.

I think that these aspects are of different character, and that if you
are going to include them in the same context, then you must also include
such things as timbre, source location, room environment, and the like.
Timbre is closely related to the relative rates of onset of different
harmonics, source location involves the fine frequency (or time) structure
of a wide-band waveform in one ear just as much as it does on the relative
phase of low-pass waveforms in the two ears, and room environment also
depends on the fine structure of the spectrum (or the time waveform).
If you play a recorded waveform backwards, it has the same pitches and
intensities, but provides a vastly different auditory perception of the
room in which the signal was recorded.

It isn't necessary for pitch information to
be represented AGAIN in the coding of the signals. And I always go by
the (not infallible) principle that if it isn't necessary, it doesn't
happen.

But if one coding is affected by one common form of disturbance while
another coding is not, and vice-versa, the two codings may not be necessary
but they sure could be helpful.

The situation is by no means simple. There are all sorts of results about
what masks what, for example, that are not at all easily explained by any
linear description of the signals. Roy Patterson and his collaborators at
the MRC Applied Psychology Unit in Cambridge UK have developed what Roy
calls a "Stabilized Auditory Image" (SAI). It can be visualized on screen
as a time and frequency representation, though not a linear transform, of
the auditory signal. It is intended as a computationally simplified but
otherwise physiologically reasonable representation of what the peripheral
auditory system does, including phase-locking of the neural output. The
claim for the SAI is that if you can see in the SAI the effect of some
change in the auditory signal, then and only then will a person be able
to hear that change. The SAI permits pitch detection in both frequency
and time.

This is not to say that the SAI is correct (and it may well have been
amended since I was involved in such things). It is to say that audition
is complex, and a lot is known about what actually happens in the
peripheral structures including the auditory nerve.

As I understand it, the cochlea is not sharply tuned;

Yes and no. I think one-sided sharp tuning describes it, and that does
describe auditory nerve fibres for sounds not too far above their
individual threshold sensitivities.

The situation is quite similar to that in color
vision, where we have three channels carrying intensity information,
with the channels representing broadly-tuned overlapping information
about color in terms of three intensity signals.

Yes, this is so. And it is generally true that coarse coding of this kind
can give more precise representations of continuous variables than can
individual precisely tuned representations for which there is a quantal
limit to the precision of representation.

In the case of color vision, we can imagine a perceptual function which
generates a weighted sum of the three signals, converting a triad of
intensity signals into a vector with three coordinates; the direction of
the vector is set by the weightings, and the magnitude of the vector
then represents the intensity in a given direction, which is a color.

Yes. And notice that both the magnitude and intensity are as continuously
variable as (or perhaps a bit more so than) each of the individual components.

The question of mapping is very confusing, because we tend to give it a
visual interpretation in three spatial dimensions.

Why three? The auditory mapping has far more dimensions than that, though
I can't guess how many. Timbre alone must have quite a few. Pitch and tone
height are two (non-orthogonal but distinct) dimensions. Spatial location
has three...

In principle, it shouldn't matter whether adjacent points on the retina
or on the cochlea are represented by adjacent points in a neural map at
a higher level.

No, indeed. It's only a practical matter, making the maturing architecture
easier to develop.

if the map were the highest
level, it would not provide for any of the higher processes like shape
recognition.

Doesn't that statement depend on the level of map that you are talking about?

... all the information
implicit in a neural map remains implicit until specifically extracted
by higher-level perceptual functions.

I have one pressing reason for wanting perceptions to be represented as
simple one-dimensional neural signals. It's based on a principle first
(to my knowledge) proposed by Frank Rosenblatt, long ago. Rosenblatt
proposed that one perception corresponds to one signal, or to put it the
other way around, every aspect of the world that has significance to the
perceiving system has to be _explicitly_ represented as a _single_
neural signal.

I'm sensing a view here that I don't agree with. It is a homunculus view,
that demands an explicit observer before any perception is valuable. In
contrast, I espouse a view th a perception is valuable to the extent that
its control is both possible and is an element of the control of intrinsic
variables.

Consider the colour triad. I see no value at all in requiring that there
be an explicit signal for each discriminable colour, when control of the colour
vector can be achieved directly. How the reference signals are represented
at a higher level is irrelevant--the higher level may want "pretty and
nicely contrasted", for which output signals provide references that
vary the three (or 17 or 239) environmental variables whose changes affect
the relative and absolute magnitudes of the three kinds of colour receptor.
Why need there ever be an explicit perceiver of the magnitude of the
colour vector, or of its phase, for colour to be a useful perception?

Scalar control variables are easy to verbalize, and to discuss, but I see
no need at all for any perception discussed by the analyst to exist as a
separate scalar variable inside the control hierarchy being analyzed. The
analyst may treat a variable "yellowness" which is the difference between
red and green, zero being maximum yellowness, but that does not require
the hierarchy to have a yellowness perceptual function.

This is the
principle that led to the perceptron, because it said that the pattern
in the input set was not itself sufficient to constitute a perception;
there had to be an output signal explicitly standing for presence of the
pattern.

Ah, so that's where so many of our mutual misunderstandings about neural
networks come from!

Rosenblatt may have said so, but that was a long time ago, and I don't
think too many students of neural networks would support that view now.
If the _user_ of a neural network wants to perceive the network as having
"seen" a pattern, then that _user_ will probably prefer the network to
produce a single output of a high value for that pattern. But that's a
_user_ preference, about which the action of the network is indifferent.
Any linear rotation of the output space is as good as any other, for
the purposes of the network (which for the moment I define as providing
discriminably different outputs for members of certain classes of inputs).
A neural network forming part or all of the input function of a control
system or hierarchy will work equally well if the discriminable outputs
are concentrated into {00010000...} type vectors or are distributed across
all the output signal paths. The control hierarchy won't behave any
better or any worse, though one with a distributed representation may be
more robust against localized damage.

Even in doing
electronic designs, I had never said to myself that any function of
multiple signals had to be physically implemented, with an explicit
physical result, before it had any real existence. I mean, what
electronics engineer says something like that? It goes without saying.

Again, "real existence" to whom? I think to the designer, not to the circuit.
It's so much easier to design when you can think about one signal in one
place. But that's a designer convenience, based as much as anything on
the need to think symbolically, more than a design necessity. When we
are talking about systems that have evolved for their _own_ purposes, we
must be careful not to impose on them principles that would have been
convenient for us if we had been their designers.

A complex experience is composed of many one-dimensional neural
signals, which we can distinguish by seeing how many independent ways
there are for the complex experience to vary.

I have no problem with this, and find in it nothing at variance with
what I have said above. Remember that many people think "yellow" is a
primary colour.

Hope I didn't
give you the proverbial drink from a fire hose.

No. Interesting, and may have helped illuminate some of the background
assumptions that have from time to time caused us to misunderstand
one another.

Martin

Peter_A_Cariani1 · May 10, 1996, 5:47pm

[Martin Taylor 960510 13:30]

Thanks, Martin for all of the clarifications, re stimuli and such.

>Binaural diplacusis is a pretty subtle effect, as are slight changes in
>pitches with (40 dB) changes in level. For both of these phenomena, I think
>the effects are much smaller for complex-tone pitches than they are for
>pure tone ones (I need to recheck the diplaucusis case).

I expect so, since the pitch of a complex tone depends on the activity of
peripheral neurons with peak sensitivities in different frequency regions.
(By the way, Bill P., these sensitivities are very sharply tuned provided
that the intensity is not too far from the lower threshold of sensitivity
for that neuron). The simple law of large numbers would argue that the
diplacusis would drop by the square root of the number of audible harmonics.
"Threshold" sensitivity is very bumpy if you probe it finely, and we assumed
that the perceived pitch would relate to the central frequency of the more
sensitive fibres, moving in little jumps as the frequency glided smoothly
up and down.

> It should be noted that rate-based "best frequencies"
>of central auditory neurons can change dramatically (e.g. 50%) with level,
>so it's remarkable that pitch shifts of <only> 1-5% are seen for pure tones
>when levels are increased over 40dB.
>...
>> The perception of pitch may be based on the temporal patterns of
>> neural firings locked to the signal waveform, but I doubt it can be
>> derived _exclusively_ from those patterns.
>
>We found that a huge range of "complex pitches" can be explained in these terms.
>There appear to be no complex pitches for which a temporal model does not apply
>(but this cannot be said for place models).

I don't understand the apparent contradiction here. On the one hand, I'm
pointing out that perceived pitch changes under certain conditions even
though frequency is held constant, and you are saying that the perceived
pitch shifts with intensity though only a little. On the other hand, you
are saying that temporal coding can account for all these effects. If
there is phase locking between the signal and the neural firings, and
if perceived pitch is derived _exclusively_ from the temporal pattern
of firings, then none of these conditions should alter the perceived
pitch of a given frequency tone.
There's some gap here in our understanding of each other and of the
psycho-physiological situation.

I think I had said that there could be a fusion of rate-place and temporal
pattern information. I think that this must be the case, given that we
perceive pitches that must have a temporal origin (AM noise, unresolved
harmonics, electrical stimulation) and also those that must have a
"place" origin (high frequency pure tones, place of electrical stimulation).
My operating hypothesis is that all of the "complex" pitches that are
perceived are temporal, but that shifting "place" patterns can have a
subtle, modulating effect on them. (This interaction of "place" and
periodicity is also seen in the cochlear implant pitch studies.)
The big, big effects like the remarkable invariance of pitch with huge
changes in level, in the presence of background noise, in different locations,
etc. etc., I take as powerful evidence in favor of interval-coding
(since the rate-place representations are so drastically altered by
all of these factors -- one would need to postulate exceedingly complex
and effective compensatory mechanisms to do spectral pattern recognition
under these conditions).

Re: Roy Patterson, I certainly know of his work, which is related in many
ways to our work. I don't know if he was one of our reviewers, but he certainly
could have been, given our reviewer comments (which were knowledgeable and
very helpful). I would be completely happy with him as one of our reviewers.
The cross-correlation strategy that I was proposing has some similarities
to Patterson's "Stabilized Auditory Image" (SAI) approach, in that the time
structure of the waveform itself is preserved.

>The situation is quite similar to that in color
>vision, where we have three channels carrying intensity information,
>with the channels representing broadly-tuned overlapping information
>about color in terms of three intensity signals.

There are many temporal effects that can give rise to "subjective" or
achromatic colors, and there is some evidence for a "temporal" neural
code there (time courses of excitation and inhibition could underlie the
rate-opponency responses that are observed.) This is a complicated discussion
that I'll postpone. Just, don't believe what is in the textbooks (my experience
in the auditory system has taught me this lesson over and over).

Peter Cariani

_Martin_Taylor · May 13, 1996, 2:02pm

[Martin Taylor 960513 10:00]

Peter Cariani Fri, 10 May 1996 17:47:21

There are many temporal effects that can give rise to "subjective" or
achromatic colors, and there is some evidence for a "temporal" neural
code there (time courses of excitation and inhibition could underlie the
rate-opponency responses that are observed.) This is a complicated discussion
that I'll postpone. Just, don't believe what is in the textbooks (my experience
in the auditory system has taught me this lesson over and over).

All of which, I heartily endorse. And I think most of the contributors to
CSGnet will endorse the last sentence particularly.

Martin