[From Bill Powers (980322.0121 MST)]
Bruce Nevin (980321.1457 EST)--
Houde, John F. and Jordan, Michael I.; Sensorimotor adaptation in speech
production. Science, _279_, 20 Feb. 1998, 1213-1216.
Houde's email address is houde@phy.ucsf.edu.
This is very cool! It's just what I've been wishing to be able to do, and
the technical means did not seem to be available.
You might want to get in touch with the authors.
HotBot doesn't show up a site for Science ...
Science's web page is
www.sciencemag.org
This description is consistent with the model that I have sketched.
Speakers do not immediately adjust to correct the error; they learn. I
would like to know if they gradually perturbed vowels or if they introduced
changes in such a way that control could not have been continuous.
I wouldn't call this "learning," because the adaptation is always in the
right direction. I'd rather see it just as a slow control process.
The article actually does support your position, in that the new motor
patterns persist when auditory feedback is blocked by noise (the
"adaptation" part of the experiment). I didn't see any explicit data on
how quickly the "compensation" takes place, but they did apply the
disturbance slowly over a period of 17 minutes, and subjects said they
didn't notice the disturbance or their compensating change of articulation.
Oh, yeah, I forgot that you think tactile/kinesthetic control is primary
for controlling speech. I think auditory control is the higher level.
I'm probably being sloppy again. I'm overemphasizing the detachment of the
higher level of control from the lower level.
I agree that auditory control is at the higher level, setting reference
levels for tactile and kinesthetic control. Here's the difference that I'm
asserting: during speaking, reference levels for tactile/kinesthetic
control are fixed. They are not varied in order to keep the sounds of the
current syllable on track during the pronunciation of those sounds.
I don't think you mean that tactile/kinesthetic reference levels are fixed.
If they were, you could produce only one articulation, like saying [i] and
nothing else, forever. What is relatively fixed, I would guess, is the
perceptual input function that detects the configuration of the mouth. As
the organization of this input function slowly changes, the objective mouth
configuration corresponding to a given articulation reference signal
changes. This is the long-term adaptation. Short-term, however, one can
still rapidly alter the reference signal for which articulation is to be
produced, say along the scale from "eee" to "ooo", and the actual sensed
articulation will immediately follow the reference signal. So the auditory
feedback effect is very rapid, as rapid as one's ability to say consecutive
phonemes. The articulatory control must operate as rapidly as speech can
proceed.
If you do contact the authors, you might want to ask them how long the
compensation takes. Since they used a very slow disturbance, they might not
know, but they could certainly do a quick test to see how rapidly a
step-disturbance would be compensated. They used the analogy of the prism
goggles, and we know that in that case, compensation is very rapid,
although adaptation takes a long time. If the speech system works in a
similar way, we would expect a person to correct the heard sound quickly if
the initial articulation was in error. After all, if a person said "peg"
and heard 'pig", there would be a large auditory error, and one would
expect the reference signal to be shifted quickly toward the articulation
for "peeg" in order to get the sound of "peg." This would apply to all
words, of course (what the authors call "generalizing").
In the experiment, subjects were asked to read from a list of words for a
very long time, to get the adaptation effect. By the end of this time, I
infer that speech was proceeding at a normal speed with the subjects
hearing the correct pronunciation but producing a shifted set of
articulations. I don't know what the effect would be if the subjects were
reading from continuous text or speaking spontaneously. Small disturbances
might be so quickly compensated that subjects wouldn't realize they were
articulating any differently. This is something to ask Houde about.
The adaptation could actually be in the output function of the auditory
level of control. After all, the disturbance is not mechanical, but
acoustical. One would have to think carefully about just where the
adaptation must be taking place. The disturbance is not like a mechanical
one that changes the dependence of the acoustical cavity shapes on muscle
lengths (like Demosthenes' stone in the mouth). This will require some more
thought.
The basis for this twofold. First, experimentally blocking auditory
feedback with no degradation of pronunciation;
This certainly tells us something, but it's like Blom's "walking in the
dark" example. People may use auditory feedback when it's available, and
switch to kinesthetic control when it's not. Before I will accept "no
degradation of pronunciation" I would would want to see sonograms: human
listeners can tolerate large changes in auditory inputs.
and second, anecdotally
recalling how anaesthesia does result in degraded pronunciation even though
there is no impairment of hearing or of physical mobility of tongue and
lips. (Or would you say that reduction of kinesthetic/tactile feedback has
the effect of overwhelming control?)
The latter -- loop gain is determine by _all_ sensitivity factors around
the loop, multiplied together. However, all these perturbations are
legitimate ways to sort out where in the loop various effects are occurring.
Suppose you are correct. Suppose that, in real time, in the course of
speaking, we effect our control of auditory configurations and transitions
in the sounds of our speaking by adjusting reference levels for tactile and
kinesthetic perceptions at a lower level of the hierarchy.
Consider the sound-blocking evidence. When auditory feedback is blocked, I
suppose you would have to say that the reference levels for the successive
phonemes are supplied directly to the lower-level control systems from
memory, without being adjusted in real time by the higher-level auditory
control systems. These kinesthetic/tactile reference levels are fixed. Do
you have another account for the fact that pronunciation does not
deteriorate even during an hour or so of speaking without auditory feedback?
This is not an easy problem. The biggest problem is why, when auditory
feedback is swamped by noise, the outputs don't wildly exaggerate the
reference signal changes for the articulatory systems. This is what
inclines me to accept your idea of a purely kinesthetic channel (at several
levels) with the acoustic systems being used to trim the calibration of the
kinesthetic control systems. This is where Houde's method will be
invaluable. If it should prove that a step-disturbance is followed by a
very slow compensation that takes place only after many words have been
spoken, your basic concept would be clearly supported, and my idea of rapid
auditory control would be ruled out.
A clear answer to this question seems within reach, with Houde's apparatus.
The fundamental question remains, how to test the two hypotheses.
I think it can be done, with Houde's cooperation.
Best,
Bill P.