applying the Test to language

[From: Bruce Nevin (Fri 930217 13:24:53 EST)]

Aim

To determine what perceptions are controlled when native speakers of a
language pronounce words of that language.

    We are concerned here only perceptions of pronunciation. We are not
    concerned with perceptions of meaning, context, etc. whereby hearers
    may more or less readily fill in gaps due to defective pronunciation,
    environmental noise, etc.

    This has the consequence that some pronunciations encountered in
    running speech may be excluded from our study at this stage, because
    they are not recognizable out of context.

Setup

Requirement: means to make substitutions among the segments of speech
sound heard in utterances.

    This may be by our own pronunciation, perhaps tested by sound
    spectrographic display of the original pronunciation as against our
    own, or it may be by some instrumental substitution such as tape
    splicing or some more sophisticated massaging of the digitized speech
    signal.

    "Segment" may refer to a stretch of the speech signal along the time
    axis, or it may refer to some feature or measured parameter of the
    speech signal measured or detected over a given stretch of the
    utterance.

    "Feature" may refer to analysis by other than acoustic means --
    for example, a feature of the articulatory processes by which we
    which we produce the speech sounds. Whatever features are used, they
    must be referrable to perceptions by the speaker that the speaker
    could (in principle at least) control in the process of speaking.
    Examples could include perceptions of touch, joint angle,
    configuration, relationship between areas of the tongue that touch
    and adjacent areas that do not touch e.g. the gingival ridge, and so
    on. However, acoustic features are clearly of great importance,
    given the largely social nature of talking.

Procedure

1. We record what informants consistently tell us are repetitions of the
same utterance. (This may take the form of what they consistently tell
us are repetitions of the same word in different utterances.)

"Consistently" means 100% of the time or nearly so.

2. We note features of these instances that are different. Because
native speakers consistently judge these utterances to be repetitions,
these differences are by definition incidental byproducts of control of a
perception of different words.

Alternatively, we may play back pairs of recordings of these
pronunciations and test with a native speaker if they are pronunciations
of the same word.

    In a careful form, known as the Pair Test, the test is as follows:
    have one native speaker produce the two pronunciations in random
    order by indicating which to say by some previously established means
    (or: play back the recordings in random order). Have a second native
    speaker identify which is said each time. If the hearer guesses
    right about 50% of the time, the difference is not distinctive; if
    the hearer guesses right about 100% of the time, the differences are
    distinctive.

3. We produce other pairs of different utterances by substituting one
segment for another. In place of some segment in one of the recorded
utterances we substitute some perceptibly different segment from some
other utterance. We test these with native speakers as above to see if
they are pronunciations of the same word. (One or both may be nonsense
syllables.)

    By this procedure, we identify each inter-utterance difference with a
    choice of one segment vs. other possible segments at a given location
    in the utterance.

4. Result

The segments can be used as a representation for the contrasts between
different utterances.

Different analyses into segments are possible, such that each
inter-utterance difference is in one-one correspondence with a segment,
and vice versa. The different analyses into segments may differ as to
their practical usefulness for different purposes. Some are more
efficient and elegant from a mathematical point of view, but unwieldy for
a practical orthography, for example, a purpose that some other analysis
into segments serves better.

However, the several possible representations of utterances by segments
are equivalent in just the sort of way that divergent repetitions of an
utterance are equivalent for the native speaker and hearer. The
differences between them are inconsequential as regards repetition and
contrast of utterances. Under each such analysis, the segments provide a
unique representation for each utterance that native speakers judge to be
distinct. Repetitions are represented by the same arrangement of
segments, and the inconsequential differences between them are not
represented by segments. From any given representation as segments one
may recover the original utterance, down to uncertainty as to the
inconsequential, non-contrastive differences between.

The pair test and substitution tests demonstrate that the speaker and
hearer control perceptions of identifiable points of difference between
words. This may be informally tested by misunderstanding a speaker, or
asking e.g. "Did you say cup or cut?" and hearing them emphasize the
point of difference in their reply.

Further Discussion: Contrast

The fact that there is no unique analysis into segments, and that
alternative analyses are equivalent as representations of these
identifiable points of difference between words (and as representations
of whole words and utterances, of course), suggests that speakers control
perceptions of the contrasts between words.

A claim that speakers control perceptions of words, without controlling
perceptions of contrasts (that is, of precisely those segments or
features that make the contrasts between words), must explain how native
speakers control nonsense syllables just as well as words.

The suggestion that speakers control perceptions of the contrasts between
words is supported by the diversity of means that may be used to effect a
given point of difference between words, and the diversity of perceived
differences that a given choice of means may effect. For example, in
ordinary pronunciation the difference between voiced and voiceless stops
(such as p vs. b) is effected by aspiration at the beginnings of words,
where e.g. b is voiceless unaspirated and p is voiceless aspirated (bin
vs. pin). Intervocalically, b is voiced and p is voiceless unaspirated
(slobber vs. slipper). Syllable-finally, b is voiced and p is voiceless
with a voiceless release, but not aspirated (sob vs. sop). As noted in
another discussion, only one member of the pair occurs after
syllable-initial s, and it is pronounced voiceless unaspirated (spin).

    Voiced Voiceless Voiceless
                    unaspirated aspirated
                    /released

···

=========================================
1. bin pin
2. slobber slipper
3. sob sop sop
4. spin

The difference between voicing, voicelessness, and aspiration is a
function of the voice onset time (VOT) in a following vowel, as well as
of the cessation of voicing at the onset of the segment. In each of the
first three pairs of words (rather, in the pairs of segments that make
the difference between those pairs of words and between many other pairs
of words), speakers appear to control a relationship of VOT being earlier
relative to the other member of the pair vs. later relative to the other
member of the pair. The contrast is controlled not in absolute terms but
in relative terms. The difference may be exaggerated when the
contrastive segment is emphasized. Where there is no "other member of
the pair" (no contrastive pair of segments, no contrast), the segment can
scarcely be emphasized. Try saying "I said spin, not spin!" Compare
this with "I said pin, not bin!"

The means for effecting contrast shift toward the voiceless aspirated end
of the range of VOT choices syllable-initially (pin/bin) and toward the
voiced end otherwise. This is for contemporary native speakers of
English. These shifts are by no means constrained by acoustic or
physiological necessity. The "same" contrast of voiced vs. voiceless
sounds is effected differently in other languages, and in the non-native
English pronounced by native speakers of those other languages. For
example, in Spanish syllable-initial b/p is much like native English
intervocalic b/p: fully voiced vs. voiceless unaspirated. And the choice
of voiceless unaspirated for the segment after syllable-initial s is not
constrained by acoustic or physiological factors, as may be seen in
combinations like "moss pot". It is, however, the midpoint between the
extremes of the VOT range for English, in a case where there is no
contrast to be maintained.

Further Discussion: sets of features

It has been a fundamental tenet of Generative linguistics that there is a
universal set of features that may be used as means of contrasting
utterances in any given language. Not all problems encountered in this
project have been resolved. Nonetheless, a more or less standard set of
distinctive features is assumed in the literature. It is understood that
there may be a diversity of phonetic means for effecting a given
phonological feature, differing between languages and differing between
phonological contexts (as above) within a given language.

Means are needed for disturbing features of speech while providing means
for native speakers to resist those disturbances.

    Bruce Nevin