VOT experiment

[From: Bruce Nevin (Tue 920423 14:07:32)]

(Rick Marken (Tue, 23 Jun 1992 09:46:35 PDT) ) --

By the way, Martin (and other speech/language afficionados) -- any comments
about the sound adaptation study that I mentioned in an earlier post?

You mean this one:

(Rick Marken (920621) ) --

The reviewer pointed me to a book that is, indeed, relevant to
the topic of my paper -- it's called "The organization of
perception and action" (1987) by D. G. MacKay (a UCLA linguist. . . .
One interesting study that MacKay mentioned, which stongly
points to the importance of feedback (though he didn't see it
that way), showed that adapting a person with repeated presentation
of the sounds /pi/ or /ti/ lead to a decrease in voice onset
time (VOT) when these perceptually adapted people were asked to
say /pi/ or /ti/. The study was done by Cooper and Naper and reported
in JASA (Journal of the Acoustical Society of America) in 1975.
Any linguists know about this? It sounds like there is a change
in output (VOT) in order to produce an intended perceptual result
(/pi/ or /ti/) via an adapted sensory input system. Is this
interpretation reasonablee? It would have been nice if the
study had been done quantitatively -- maybe it was. For example,
the degree of change in VOT might be expected to depend on the
degree of adaptation.

I didn't understand this. If "decrease" in VOT means (as I think it
does) that voicing started sooner than in the corresponding unadapted
syllables /pi/ and /ti/, this sounds like adaptation results in a
relaxation of (reduction in gain for) control of contrast with the
corresponding voiced sounds (less contrast between adapted /pi/ or /ti/
and ordinary /bi/ or /di/, resp.). Alternatively, maybe the voiced
/bi,di/ would shift to relatively earlier VOT too--more "fully
voiced"--preserving contrast with shifted boundary. You ask whether
your interpretation is reasonable or not, but I am not clear what your
interpretation is. How is such a change purposeful "in order to produce
an intended perceptual result (/pi/ or /ti/) via an adapted sensory
input system"? What am I missing about the nature of adaptation?
Martin, can you help here?

I will look for the JASA article when I get the chance. Do you have
access to it?

  Bruce
  bn@bbn.com

[From Rick Marken (920623.1200)]

Bruce Nevin (Tue 920423 14:07:32)

You mean this one:

of the sounds /pi/ or /ti/ lead to a decrease in voice onset
time (VOT) when these perceptually adapted people were asked to
say /pi/ or /ti/.

Yep.

I didn't understand this.

I had the same problem.

You ask whether
your interpretation is reasonable or not, but I am not clear what your
interpretation is.

Just that there is an apparent change in output in order to keep a sensory
result constant in the face of disturbance (in this case the disturbance
being the adaptation). I just didn't see how a reduction in VOT could
preserve the /pi/ or /ti/ perception so I was wondering if there was some
esoteric acoustical phonetic reason. I guess I'll have to look at the
original article to get a better description of what really happened. But
whatever it was I can't imagine why some characteristic of speaking (VOT)
would change after sensory adaptation unless it were being done to control
a perception. So I was looking for some acoustic/phonetic evidence that
would make the connection clear. Maybe the VOT change is just a response
to the adapting stimulation? That would be a nice result for open loop fans.

I wonder how the authors of this work explained their results?

I'll try to look at the original JASA article today.

Regards

Rick

···

**************************************************************

Richard S. Marken USMail: 10459 Holman Ave
The Aerospace Corporation Los Angeles, CA 90024
E-mail: marken@aero.org
(310) 336-6214 (day)
(310) 474-0313 (evening)

[Martin Taylor 920623 21:00]
(Rick Marken 920621 and Bruce Nevin 920423 14:07:32)

I've now looked at the Cooper and Nager (1975) article on adaptation of the
perception of Voice Onset Time (VOT). I don't think it is very relevant to
CSG, though another experiment of a similar kind might be. They had people
listen passively (with tongue clenched between the teeth) to 70 repetitions
of /i/ or of /r@pi/ (@ is a neutral unstressed vowel called a schwa).
Immediately afterward, the subject spoke /r@ti/ in one experiment, /r@pi/
in another. They found that if the subject had listened to /r@pi/ rather
than /i/, the voice onset time was shortened, particularly for /t/.

One problem with the study is that the "adapting" VOT for the (synthetic)
/r@pi/ was 50 msec, and the natural (after /i/) VOT for the subjects' /p/
averaged 57 msec, 7 of the 20 subjects being shorter than 50 msec. So there
wasn't much perceptual adaptation to be expected. And I checked the
individual subject data, which showed no relation between their individual
VOT average and the direction or amount of shift (If their own VOT was being
used as a reference standard that was being affected by the adaptation, one
would expect those with long VOT to reduce, and those with short VOT to
lengthen after adaptation. That didn't happen).

The only condition that resulted in an appreciable shift of VOT was when they
were adapted to /p/ and spoke /t/, which had a "natural" VOT of 75 msec.
Adapting to /b/ (VOT 0) and speaking /t/ showed no effect.

Make of this what you will. I don't see any obvious follow-up in PCT terms.
I do have problems with the whole "adaptation=fatigue" hypothesis on which
the experiment was based. If the synthetic /p/ had had an unnaturally short
VOT, but was still perceptually a /p/ rather than a /b/, I would have found
the study more interesting, and perhaps there might have been an effect on
spoken /p/. But as it is, one has an effect of adapting one phoneme on the
production of another, but not on the production of the adapted one. And
there seems no a priori reason why the adaptation should have done anything,
since the adapting stimulus had a VOT very near the natural VOT for that
phoneme.

But one could do an experiment of this kind, usefully, I think. I do not
propose to do it.

Martin

[From Rick Marken (920623.1830)]

Well, I got the JASA article --

Nothing to get excited about; the data is the typical
psychology junk -- mostly noise (but what would you expect
from psycho-acousticians (hee hee)).

First, thanks to Bruce Nevin for the info about phonemes. Bruce asks:

Is adaptation a disturbance? (Serious question. I don't understand
what is going on in situations to which people apply this term.)

I call it a disturbance because if affects the perception of the
controlled variable in such a way that action is required to maintain
that variable at a reference level. But it could also be seen as
something that affects the form of the feedback function ,g(o),
that determines how outputs are related to perceptual inputs.

Now, to the Cooper/Nager study.

The basic finding is that adapting to a two syllable word
/repi/ leads to an AVERAGE 6 msec DECREASE in VOT. This is
not a dramatic effect. Ten of the 22 subjects show NO difference
in average VOT or an average INCREASE in VOT. The standard
deviation of VOT measures for each subject is bigger (often
by a factor of 2) than the average difference in each subject's
pre and post adaptation VOT scores.

What this means is that we are dealing with junk data. There
is certainly no evidence of a perceptual variable under control.
To go off and start modelling based on this data (as the
authors do) is simply absurd.

This experiment is a perfect example of the typical psychology
experiment; and it shows why, without understanding control,
much of the existing data in psychology is almost totally useless
to PCT, except by the most remote chance. In this experiment
the authors look at the effect of a stimulus variable (the
adapting stimulus) on an output variable (VOT). My
guess, based on the noisiness of the data (and the highly
variable way that the subjects' output relates to the stimulus)
that VOT is only indirectly related to whatever a subject controls
when speaking the words in the experiment. I have no idea what
might actually be controlled in this experiment. The fact that
there is an average effect of adaptation on VOT suggests that
there is some weak relationship between VOT and a controlled
variable. But this is definitely not the way to go about figuring
out what the controlled variable might be.

The problem is not just the use of adaptation as a "disturbance".
It's that there is no hypothesis about a controlled variable or
measures of the quantitative status of that hypothesized variable.
I can't imagine starting to build a model of speech until I was
able to predict precisely what a person's response would be
to every disturbance to the hypothesized controlled variable.

···

---------

I just got Martin's post on the VOT study.
(Martin Taylor 920623 21:00)

I've now looked at the Cooper and Nager (1975) article on adaptation of the
perception of Voice Onset Time (VOT). I don't think it is very relevant to
CSG,

I think it does have some relevance. It is a great example of why most
of the existing psychological data is useless for studying control.
This stuff was published in a very prestigious journal. This is the
best that psychology has to offer. Believe me, we pretty much have
to start over.

One problem with the study is that the "adapting" VOT for the (synthetic)
/r@pi/ was 50 msec, and the natural (after /i/) VOT for the subjects' /p/
averaged 57 msec, 7 of the 20 subjects being shorter than 50 msec. So there
wasn't much perceptual adaptation to be expected. And I checked the
individual subject data, which showed no relation between their individual
VOT average and the direction or amount of shift (If their own VOT was being
used as a reference standard that was being affected by the adaptation, one
would expect those with long VOT to reduce, and those with short VOT to
lengthen after adaptation. That didn't happen).

But they said the effect was even bigger in this experiment than
in an earlier study using just syllables for adaptors. I also checked
the individual data -- there was a slight positive relationship
(r = .34) between subkects' own VOT average and the change -- just the
opposite result one would expect if the subjects were controlling
perceived VOT by adjusting VOT.

But one could do an experiment of this kind, usefully, I think.

Do you mean an adaptation experiment or a stimulus/response
experiment like this one. I think what they should have done was measured
potentially controlled variables -- like spectrograms of the
words spoken after adaptation. These could have been compared
to spectrograms of words that were not spoken by the subject
but picked by each subject the best exemplar of the intended word (after
adaptation). At least, that's one possibilty; the idea is to try
to see what remains invariant in the input.

I agree that the study of the controlled variables in speech will
not be easy -- but the approach taken by Cooper/Nager (which
reflects no understanding of the concept of a controlled variable--
and is the approach taken in all psychological experiments in
all fields) is not likely to tell you much -- except that stimuli
have statistical effects on responses. Well, it does tell you that
VOT is probably NOT controlled (though there might be better ways
to tease this out). That IS something -- perhaps enough for JASA
but not enough for the Journal of Living Control Systems
(when it exists).

Regards

Rick

     **************************************************************

Richard S. Marken USMail: 10459 Holman Ave
The Aerospace Corporation Los Angeles, CA 90024
E-mail: marken@aero.org
(310) 336-6214 (day)
(310) 474-0313 (evening)

[Martin Taylor 920624 10:45]
(Rick Marken 920623.1830)

On the VOT study

I agree that the study of the controlled variables in speech will
not be easy -- but the approach taken by Cooper/Nager (which
reflects no understanding of the concept of a controlled variable--
and is the approach taken in all psychological experiments in
all fields) is not likely to tell you much -- except that stimuli
have statistical effects on responses. Well, it does tell you that
VOT is probably NOT controlled (though there might be better ways
to tease this out).

Leaving aside Rick's obligatory rant--how could one have expected Cooper and
Nagel to have understood the concept of a controlled variable?--I don't
understand how Rick could possibly come to the conclusion that VOT is probably
NOT controlled. That conclusion would lead one to believe that people are
unable reliably to distinguish /b/ from /p/ in production (at least the
allophone without the burst). VOT is clearly perceived, even, apparently,
by chinchillas and newborn humans. It is a major contrast in the perception
of language, and I seem to remember even that some languages contrast it in
three levels rather than two (don't quote me on that). Certainly some
languages place the VOT boundaries at different timings from others, so the
discrete VOT timing contrasts are not quantitatively innate. I find it highly
unlikely that VOT is not a controlled percept.

The Cooper/Nager study is very strange, whether you are of the PCT persuasion
or not. I do not understand why any adaptation effects relating to VOT
should have been observed, unless they had to do with variability, since
the adapting VOT was very close to the natural VOT for that phoneme.
Nevertheless, they did observe a change in the VOT for the production of
another phoneme, 18 of the 22 subjects showing a decrease. (I'm not going
to be sidetracked into another "stale and unprofitable" discussion of
statistics at this point--but another day, perhaps, we can heat it up in the
microwave).

As for things being printed in prestigious journals--there's lots of garbage
printed in all journals. Whether something good is rejected or something bad
is printed depends so much on one or two individual reviewers and whether
they had a good breakfast the day the looked at the paper. I always try to
have a first read to get an overall impression, and then set it aside for
a couple of weeks to see whether a detailed look confirms that impression.
But I think many reviewers just skim it and say "OK" or "junk" without checking
the data or the analyses. I would not have recommended publishing Cooper/Nager
because of the mystery about the 50 msec VOT of the synthetic /p/, as well
as for other reasons.

Rick's suggestion about studying the spectrum seems less profitable than
studying the VOT. I am under the impression that the spectrum associated
with a phoneme is not a controlled variable, since the perception of it is
highly context-dependent. One can measure the physical acoustic spectrum,
but that doesn't necessarily tell you about the perceived timbre.

It's all very well to rail on about measuring stimuli and responses, but
they are at least measurable, even if the measurements don't tell you much
that's useful. To measure what is controlled, you have to intuit (empathize,
introspect, anthropomorphize) some complex that can be computed from physical
measurements, disturb it or the perception of it, and see whether the
disturbance is resisted by further measurements on the "same" variable (and
"same" is an issue in itself). If you intuit wrong, you will get partial or
no control, and more or less statistical variability. That wrongness may
be in your intuition as to what is controlled, or in the effect of your
disturbance on what is controlled. My hunch is that in the Cooper/Nagel case
the wrongness is in the perceptual disturbance more than in the variable.

Martin