TEMPORAL VBLS - Absolute pitch, vowels, and auditory representations

Absolute pitch, vowels, and auditory representations (4/13/94)

···

--------------------------------------------------------------------------------
----------

The sense that I am trying to deal with here is that for example, it
seems that there are people that actually do have an absolute sense for
audible pitch (at least over the partial range of hearing). [Bill Leach]

Our ability to distinguish vowels sounds is based on an absolute pitch
discrimination of the formant overtone regions, so it is possible that
people who can do absolute pitch with pure tones have merely learnt how
to extend an ability we all have. [Chris Malcolm, 4/11/94]

I have been working for the last four years on the problem of the coding
of acoustic information in the auditory system, specifically pitch and vowels.
The mechanisms underlying "absolute pitch" (in the few people who possess it)
are related to but probably not the same mechanisms by which most of us hear
musical or voice pitches.

I apologize that I have not been able to follow the discussion on temporal
variables more closely -- I hope what follows isn't redundant.

Our ability to distinguish vowels relies on a relatively low resolution spectral
analysis
(whether implemented via temporal autocorrelation or spectral neural
representations) --
formants can range over a good range (say up to 20%) and we hear the same vowel.
To
further cloud the matter, the spectral peaks which reflect the formant in the
power spectrum change their spacings (F0), while the formants (F1, F2, F3, ...)
are unchanged, so that the precise frequency location of a formant may not be
simple
to estimate even if one has a clean spectral representation. On the other hand,
the perception of pitch is much more accurate (on the order of 1%), and we
readily
separate pitches whose fundamentals would differ by a few percent.

Most pitch perception (speech, music, i.e. of complex tones) is notoriously
relativistic,
and there is the question of why, if pitch perception is based on "rate-place"
mechanisms (i.e. spatial excitation patterns in frequency maps) we so readily
fuse the
many components (which individually would be heard as pure tones) into one
Gestalt.
Temporal theories of hearing very easily explain this: the time structure of
neural
discharges in each channel reflects the common fundamental, and those channels
with
the most similar time structure (e.g. octave relationships) most readily fuse.

In the 1950's J.C.R. Licklider proposed a "duplex" neural architecture in which
every
frequency channel also had a set of delay lines and coincidence detectors, so
that
two kinds of complementary auditory maps were computed (recently implemented in
silicon analog VLSI, Lazzaro & Mead, PNAS, 86:9597-9601). There was a familiar
spectral
map if one summed the activity in each frequency band, and there was an
autocorrelation,
delay map is one summed all the lag positions across frequency channels.
Periodicity
pitch (the low pitch of complex tones, or musical pitch) is explained by the
temporal
autocorrelation mechanism, while place pitch (of pure tones in quiet) is
explained
by the frequency-place mechanism. I believe that when there is a means of
registering
the two kinds of representations (e.g. one is able to identify the "place"
channel which
would usually carry a given time structure), then one could have a sense of
"absolute
pitch" for complex tones. I imagine Licklider had this idea in his "duplex"
theory of
hearing, but I haven't ever seen it explicitly stated. This conception would be
in general
agreement with Chris Malcolm's suggestion that absolute pitch would be an
ability which
we all could potentially have with some amount of practice, since the basic
neural
architecture itself would be similar in each of us. I am currently working out
more
general, temporal cross-correlation architectures, more or less in the spirit of
Licklider (1951, 1959) and Pitts & McCulloch (1947), which I think are in better
accord with the observed anatomy and physiology.

In any case, don't ever take at face value what you read in the textbooks
(excepting
B.C.J. Moore's books and 1 or 2 others), concerning the coding of auditory
information --
especially those textbooks which deal with speech coding; most that I have seen
take a
very crude, dogmatic, and monolithic view of neural coding (all rate-place codes
and
spectral pattern analysis, generally without any mention of time codes and
time-based
analysis). Caveat emptor.

Peter Cariani

_________________________________________________

Peter Cariani
Eaton Peabody Laboratory of Auditory Physiology
Massachusetts Eye & Ear Infirmary
243 Charles St, Boston, MA 02114

tel (617) 573-4243
email eplunix!peter@eddie.mit.edu