contrast; dependence

[From Bruce Nevin (2003.06.05 12:27 EDT)]

Bill Powers (2003.06.04.2000 MDT)–

I have a problem translating Harris’ term
“contrast” into quantitative

language. At times it seems the word means a transition; at others, a
ratio

or a difference. Can you illuminate?

An optimal differentiation of n terms within a perceptual ‘space’
that is (a) bounded by acoustic and physiological constraints (within the
larger bounds set by the limits of perceptual input functions) and (b)
punctuated by regions within which articulatory error does not produce
acoustic error.
Start with the vowels. Every language has at least the three [i], [a],
[u] at the apices of the ‘vowel triangle’. You can’t open the oral cavity
any farther than for [a], and you can’t move the narrowest constriction
between the tongue and the roof of the mouth any farther up and forward
than for [i] (or any farther up and back than for [u]) without causing
noise due to turbulence in the air stream. For what I have represented
here by [a] the tongue can move foreward and back, and in some languages
that difference is used in addition to these three to contrast one
utterance from another. The ‘vowel space’ then is a trapezoid, narrower
at the bottom than at the top, and narrower at the back than at the
front. This is so whether you are plotting tongue positions or F1 vs. F2.
(F1 and F2 are the first and second formant in the acoustic signal. I
won’t define these here, we’ve discussed them before.) Within this space
other differentiations are made, more or fewer in various languages. If a
language has one additional vowel in the front, it is midway between [i]
and [a]; if it has two, they divide the space between [i] and [a] more or
less evenly.
We are talking here of ‘target’ positions, which are achieved in careful
pronunciation without conflict from ‘coarticulation’. Typically,
coarticulation effects are due to controlling to reach or approximate a
very different sound immediately before or after, with the speed of
syllable production fast enough that there is little time to move from
controlling one reference to controlling the next. All the vowels are
produced closer to the center of the vowel space under these conditions,
and more so if the syllable is unstressed. (I’ll leave the term ‘stress’
undefined here.)
The case is similar for consonants. However, instead of three apex vowels
(or ‘quantal’ vowels), there are regions of the oral cavity where a
centimeter of difference in the point of narrowest occlusion has little
or no effect on the trace of the consonant in the contours of formants.
These ‘quantal’ regions (the term is due to Kenneth Stevens of MIT, in a
very important paper of 1972) turn out to be the points of articulation
of consonants in all languages universally. There is an order of
preference, beginning with lip closure ([p], [b], [m], etc.), then the
gingivo-postdental region at the front of the mouth ([t], etc.), then the
velar region at the back of the mouth ([k], etc.), then the palatal
region between the latter two ([j], etc.), and on from there for the less
common retroflex with the tongue tip turned back (put a dot under the t,
n, etc.), postvelar or uvular [q], occlusion by the epiglottis (with some
constriction in the pharynx as a byproduct), and occlusion at the glottis
itself.
The location of the narrowest occlusion for either vowels or consonants
determines the length ratio of two resonant tubes, the pharyngeal tube
between the larynx and the occlusion, and the oral cavity between the
occlusion and the lips. For the vowel [a], a quarter-wave pressure
waveform between the larynx and the opening of the pharynx into the much
larger oral cavity (10 times the cross-sectional area) matches the
quarter-wave pressure waveform between that opening and the opening of
the lips. Since the lengths of these two tubes are about equal, two
formants appear in the sound spectrum both near the fundamental. It
doesn’t matter which of these two tubes is slightly longer, the longer
one generates F1 and the shorter one generates F2. Thus, production of
[a], with maximal opening of the oral cavity, generates an extra large
spectral peak with the first two formants together at a medial frequency.
Production of [i] produces a spectral peak at a high frequency, where F2
and F3 come together, and production of [u] produces a spectral peak with
F1 and F2 at a low frequency. For other vowels the formants are separated
and less concentrated peaks of energy in the acoustic spectrum. We
optimize the acoustic (and articulatory) distance between the vowels of
peen, pin, pain, pen, pan, palm, pun in order to distinguish these (and
other) utterances from one another, as means of controlling the other
person hearing the word that we intended them to hear. This optimization
of difference is what we mean by phonemic contrast.
While the means for doing it are grounded in and limited by physiology
and acoustics, the actual inventory of phonemes is arbitrary and is
established by historically contingent convention in each language. It
should be clear from the above that there is considerable leeway in the
precise (acoustic and articulatory) location of each phoneme. Details of
pronunciation are subject to control of other perceptions, such as
belonging to this group of people as distinct from that group of people
over there, or claiming similarity to some esteemed person or group, and
so on.
So is contrast a transition, a ratio, a difference? I think the question
is misfocused. What perceptions can be controlled within the articulatory
and acoustic space so as to optimize the differences between phonemically
distinct utterances? There is nothing a priori that demands that these
controlled variables be of a given type, or even that they be all at the
same level. Is there? The first question is, what perceptions are
possibly being controlled. Then the references for controlling these are
distributed more or less ‘equidistantly’, while taking advantage of
‘quantal regions’ where error in production is without effect. Contrast
is at root controlling to say (and to hear) this one rather than
that one, where each is made from various combinations of the same
phonemic constituents. Although the acoustic difference between [b] and
[g] is in only about 30 ms. of sound, it makes the contrast between
“Put a bag on that one” and “Put a gag on that
one”.

I’m having the same problem with this new
term, “dependence.” In

mathematics, y depends on x if y = f(x) (any function). What kind of

dependence are you referring to: dependence on magnitude, logical

dependence, cause-effect dependence? I don’t know how many kinds there
are,

but there are lots.

For B to occur, A must already have occurred.

For B to be controlled, A must already have been controlled.

Is this really the same problem?

    /Bruce

Nevin

···

At 10:11 PM 6/4/2003, Bill Powers wrote: