Language; levels

William_T_Powers2 · October 16, 1993, 7:04pm

[From Bill Powers (931016.0815 MDT)]

Bruce Nevin, Martin Taylor, Avery Andrews, et. al.: I should
thank all of you for your patience with me. My pronouncements on
linguistics must sound to you like some of those on PCT that we
have seen from newcomers: full of misplaced confidence and well-
intended misinterpretations. I can only hope that in return my
naivete may accidentally reveal points of view that could be
explored profitably, if not by me. I'm really trying to stick to
PCT-relevant points, but sometimes stray beyond my competence.

Bruce Nevin (931015.1415) --

Terminal phoneme in an utterance with silence following: why
ask me? Use your senses.

Well, I sort of did. But you have me a bit cowed with your
finely-honed linguist's perceptions -- who am I to say what I am
really hearing myself saying? And I'm really not in a position to
observe the conversations of others: most of my conversations are
with a computer, and don't involve many phonemes (an occasional
"Damn!").

I realized something this morning in pondering your post. I think
we (I) have some confusion between spelling and pronunciation. In
the vocal world, there is no such thing as spelling: when we try
to write words to show the way they are said, we spell them using
symbols whose sounds we are supposed to know. What brought this
up was "the sound of 'r' in throw, and the sound of 'tt' in
butter, ladder, and better." If these sounds are really the same
(and it does seem to me that they are in casual speech), then
we're simply misspelling the words. If the point were to write
words to indicate exactly how they are pronounced, then all these
words would be spelled with the same visual configuration in the
position of the so-called 'r' and 'tt' and 'dd'.

We would then find that we have different words that mean the
same thing: "butter" and "budder" and "bu[xx]er" are different
words but ALL mean a yellow substance we can spread on bread.
These words also can mean different things: a ram with a
reputation for butting, and a rose bush that buds well. If we can
really successfully indicate all of these meanings by saying
"bu[xx]er," then it's not the subtle differences in pronunciation
that matter, but something at a higher level, such as relations
to other words or context. If I say "I like bu[xx]er on my
toast," you know which meaning of the word I'm using. You also
know what I mean if I say "That damned sheep is a real bu[xx]er."

This would indicate that in the input function of the event-level
system involved in recognizing or producing this word, we would
have this little sub-computation at the position in the event
following detection of the "bu" part and preceding detection of
the "r" part:

                                enabled by prev. stage
                        ------- |
                dd --->| |<----
                       > >
                tt --->| + | --->----> element of event
                       > > >
                xx --->| | V
                        ------- enable next stage

As these three sounds would never occur simultaneously, adding
together the signals representing them would be the equivalent of
a logical OR. Any one of these three sounds at the input would
result in a signal indicating that an acceptable perception had
occurred at that point in the event, and if the sounds before and
after were also acceptable, the event-perceiver would emit its
"my-word-occurred" signal.

This "word" would, however, still be devoid of meaning, so
perhaps Martin Taylor (who wants a morpheme or word to be a
"meaningful" unit of speech) might not yet give it the status of
a word. We can indicate that by calling it a word-event signal,
reserving the unmodified term "word" for the outcome of a higher-
level process that creates a link between the word-event and one
of the possible perceptual meanings, and emits a signal saying
that "my category occurred." That signal, produced equally by a
word-event and its co-categorical meanings, is a "meaningful
word."

When we distinguish butter from budder, we must somehow develop
another perceptual function (or alter the existing one on the
fly) so we now have two of them involving different sub-
computations:

···

-------
                       > >
                       > >
                tt ----| + | -------->we(a)
                       > >
                xx ----| |
                        -------

                        -------
                dd ----| |
                       > >
                       > + | -------->we(b)
                       > >
                xx ----| |
                        -------

Now there are two word-event components, a and b. They would be
parts of different word-event detectors. If the sound is xx, the
meaning must be assigned strictly from context because both word-
event signals will appear. If the sound is dd or tt, only one
word-event signal will appear and the meaning can become less
ambiguous. The meaning can seldom be completely unambiguous,
because there are few word-events that have precisely one
perceptual meaning, viz a clearly and crisply hyperarticulated
"butter" which could still mean either an animal or an animal
product.

In the above, I'm assuming a level of perception (to the left) at
which dd, tt, and xx are always distinctly and separately
perceived when the input sounds permit. At that level, the sounds
are all quite different. I speak, of course, of the adult
organization; the lower discriminations might not be so fine
initially (as I believe you have been trying to tell me, Bruce).

I think this partly solves one of our problems, even though
alternative schemes might work equally well. We now don't have to
worry about how different sounds can be perceived as the same
word-event. At one level, the sounds can be perceived as quite
different (by a trained linguist, or any person who has learned
to attend closely to the sensation level), while at the same time
the resulting word-event signals, at another level, are in fact
identical (not just "similar").

There remains the problem of how different sounds can result in
the same sensations, which I understand is also a fact. This is
the problem of embedded sounds, sounds occurring in the middle of
words. By the cut-and-paste approach (which I agree elegantly
gets around all problems of artificial perceptual functions but
which, unfortunately, I can't implement), it can be established
that the same sound is heard differently when embedded in
different surrrounding sounds. This suggest some sort of global
normalizing process that somehow stretches or compresses the
sensation space to bring all sensations under a standard metric,
thus putting signals misplaced relative to each other back into
their proper relationships. I swear that that bit of verbiage
does correspond to an idea in my head, which I will try to
describe.

A close analogy exists in the visual mode. Consider the word
"ill" printed without the dot over the i. If I simply type

l

can you tell whether that is an l or an i? Right now you can, but
suppose I had the ability to type everything at twice the x and y
size. I could now type

                            * *
                            * *
                       * * *
                       * * *

or

                                * *
                                * *
                                * *
                                * *
                       * * *
                       * * *
                       * * *
                       * * *

Now the "i" in the second case is identical to the "l" in the
first case, yet it's perceived as different.

This says that in the visual realm, scaling may be removed before
identification takes place. By "removed" I mean correcting to a
standard size, so tiny things are blown up and huge things are
minified, thus presenting to the next level a set of sensations
that vary within one common framework.

My general concept of levels of perception is that each level
contains computing processes typical of that level, common to ALL
systems of that level. In particular they are common across
modalities, including vision and audition. So if a scaling
process exists for vision, the same type of process would be
available to apply to signals at the same level that happen to
represent sounds. This brings me to your plot of formant 1
against formant 2 (F1 against F2) for different vowels.

Suppose we set the origin of a coordinate system at "schwa." Now
the position of each vowel in this space would be a vector of a
certain length and direction based at schwa (this is probably
what you've been talking about). You indicated on one plot that i
and I might move from their nominal positions toward schwa, so
that i came to be near the position formerly occupied by I, with
I now moving closer to schwa and thus remaining separated from i.
You also indicated that causal speech tends to move ALL vowels
closer to schwa.

With the coordinate system centered on schwa, this amounts to a
change of scale. The whole space contracts around the center at
schwa. If the sensitivity of the perceptual functions (that
distinguish the difference between the coordinates of vowels and
the coordinates of schwa) were ALL adjusted upward, the sensation
space could be expanded to compensate for the contraction, and
the resulting perceptual signals representing all the vowels
would be restored to their standard positions in this space. This
would remove the distortion produced by casual speech (if that's
the only distortion), and keep the signals entering the next
level in a standardized form that relieves the next level of the
task of compensating for scale.

This is not only like the geometric re-scaling that occurs in
perceiving "ill" above, but like the rescaling that occurs in
Land's color vision theory. "Schwa" in the realm of sound
corresponds to "gray" in the realm of color vision, and the
change in scaling of perceived sound-differences corresponds to
the changed weighting of color-differences. The result is to
maintain the perceptual field in a standard condition even under
systematic changes of lighting, or systematic changes in the
patterns of received sounds. Even at the sensation level, the
perceptual field as a whole looks just as it did before, in the
respect that matters. The re-scaling signal, of course, would be
available separately as an indicator of differences in scaling.

This says that if you hear sounds that are physically the same as
being different, you are actually perceiving different sounds:
the re-scaling that restores ALL sounds to a standard position in
F1-F2 space will actually change a sound-perception that would
have been the same without the rescaling. After the re-scaling
the perceptual signals that would have been the same are now
actually different. The higher systems are not required to
imagine a difference: it is actually there.

Implementing this sort of re-scaling is simple in principle. What
you need is a perceptual function that receives copies of all
perceptual signals from the sensation-generators in question and
sums them, and a single control system that controls the sum to
match a fixed reference value. The control is achieved by
simultaneously raising or lowering the sensitivity of all the
input functions at the same time (which alters all the
coordinates in proportion). Each perceptual signal would have to
have the coordinates of "schwa" subtracted from it before
summing. In fact, the center of rescaling could also easily be
under control.

This system has to operate relatively slowly, because the "sums"
are derived from signals that don't all occur at the same time.

There are some hints that this re-scaling might occur
independently along the different axes of the perceptual space.
When you sit off the centerline of a TV set, retinal images of
objects have the same height but less width than the centerline
view would give. But after a little while, you "forget" this
difference: the picture looks quite normal. The horizontal
dimension alone has been re-scaled.

In the realm of sound, there may be dialectic or individual
differences that distort the formant axes differently, so the
rescaling required is different on the two axes. That would
simply require one sum-of-coordinates control system for each
axis.

This sounds do-able, doesn't it? Could you send me a table of the
formant coordinates for the vowels so I don't have to try to
derive them from the ASCII plots? I will try to get that cited
article, but as you know that's a ponderous process. I just need
a little information, so I can go on living dangerously.

I am, by the way, accepting your judgment that my apparatus may
simply be too crude to pick up reliable differences in the
spectrograms. But the above proposal would probably apply just as
well to any other model of the way intensities are converted into
sensations, as long as the coordinate axes could be
distinguished. The formant approach is attractive because you say
that there is a reasonable correspondence with articulator
configurations -- that would make control easy. That
correspondence, however, might also be found with the outputs of
other perceptual functions. We might as well keep looking.
---------------------------------------------------------------
Martin Taylor (931015.1415) --

Bill's comments concern a long-standing open issue, highlighted
by the language discussion. It is whether it is possible, or
even normal, that new ECSs can be inserted between pre-existing
ECSs that have a higher-lower relationship.

Actually I do believe that new control systems can appear at a
given level, even after both higher and lower systems exist. This
isn't the nature of the controversy as I see it. The real problem
is whether a higher system can even interact with a system two
levels below. In my mind, a new intervening system would be
needed precisely because the higher system can't issue reference
signals to the lower level that would have any effects that would
make sense to the higher system. But the higher system couldn't
even exist unless there were some intervening systems; the gap
can never be empty, and if it is empty, all the higher levels
must be missing (or incapable of acting).

This is tied in with my idea that each new level introduces a new
logical type of controlled variable, not just a scaled-up or more
complex version of the types already existing at the lower
levels. The sense of motion is not just a scaled-up or more
complex configuration or sensation. An event is not just a
transition in a new direction or a bigger or more complex
configuration. And a relationship is not just a new kind of
event, transition, or configuration.

Each new level introduces a new dimension of control that doesn't
even exist at the lower levels (except implicitly, for the basic
degrees of freedom have to exist even if they're not explicitly
sensed or controlled). The step from one level to the next
involves translations such that an error at the higher level can
be converted directly into an appropriate reference signal at the
next lower level. I don't believe that this is possible when you
try to skip levels. That is, an error in an event may translate
directly into a change in a transition, but with no transitions
involved at all, I don't believe it could be converted into an
appropriate reference level for configuration. The transition
information is vital for knowing WHICH configuration must be
changed, and WHEN during the event. If you know that "spohled" is
"spoiled" mispronounced (in your opinion), how can you convert
the simple one-dimensional event-error signal into the the
placement of an "ee" to create the dipththong-transition that you
want, without the transition-level perception? The event-error
signal just isn't of the right type to translate directly into
the required rate of change of configuration _at the right time_.
The transition-level control is _required_.

So I would claim that it's just not possible to have an event-
control level and a configuration-control level, but no
intervening transition-control level. I'm claiming that the event
level simply can't control except through a transition level: any
larger gap can't be bridged in a control hierarchy.

In fact this was more or less a constraint on defining the levels
in the first place. I was always looking for what seemed the
_least_ step upward that could be taken. Just why one step seemed
small enough and another seemed too large I can't elucidate. In
some cases, intermediate levels (like relationships) came into
being because I saw suddenly that "you can't get here from
there." Way back in my subconscious there is an excellent and
simple way of stating what the requirement is for the least
permissible step, either upward or downward. Like Fermat, all I
have is the conviction that this proof exists; more, that
subterranean deponent sayeth not, curse him.

There is a vague notion that might give this idea at least an
aura of respectability (or maybe the deponent hath deigned to
drop a hint). Each new level of control operates in a world made
of the degrees of freedom created (made explicit) by the next
lower level, but takes advantage of ways in which that world can
still change that are not under control at the lower level. So
the transition level takes advantage of the fact that a
configuration-level system doesn't perceive or control WHEN or
HOW FAST its configuration changes. The configuration-level
system will make the configuration match the given reference
signal as if there had never been any other reference signal; the
fact that this reference signal is changing in time, and
accelerating or decelerating, in no way affects the configuration
control system. All that is forbidden is that two configuration-
level systems be told to produce mutually-exclusive states of the
sensation world. The dimension of transition is left undefined in
the configuration world.

Introducing the transition level defines that dimension (or
cluster of dimensions) and places it under control, by connecting
the outputs of the new transition control systems to the
reference inputs of the configuration control systems. This is
entirely appropriate because transitions are perceptually derived
from successive states of configurations: the types are exactly
right both upgoing and downgoing. This truly seems to be a
minimum step size.

Now, with transition perceptions in existence, there is a new
world in which the dimensions are all transitions. The event
level then takes advantage of the fact that individual
transitions are not controlled in the dimension of when or in
what patterns they will be brought about. Each transition-level
system treats its own task as though the current reference signal
is the only one that has ever existed; what came before it or
after it, or what other transitions were doing at the same time,
makes no difference. And those are the exact respects in which
the event level can vary the world of transitions without fear of
contradiction. Furthermore, events are perceptually derived from
transitions. Again we seem to have an exactly appropriate
relationship both at input and at output, and what seems a truly
minimum step size.

Perhaps the muse has indeed muttered a grudging word. This had
not occurred to me before. The reason that a system of level n+1
can't interact directly with a system of level n-1 is that it
would have to incorporate all the functions of level n in order
to do so. Level n is required in order to create the dimensions
from which perceptions of level n+1 are derived, and to translate
error signals of level n+1 correctly into changes of reference
signals for level n-1. Those error signals would not be
appropriately related to reference signals of level n-1, nor
could perceptual signals of level n+1 be derived directly (by
simple means) from those of level n-1.

Thus if there is a system of level n+1 that can act directly
through systems of level n-1, then the higher system must already
contain the perceptual and output processes that can bridge the
gap; in effect, level n already exists in the higher system.

I consider this unlikely, because it says that the single higher
system that would first exist is by far more complex than any
later subsystem that will ever exist, being capable of converting
raw intensity signals into variables many layers of dimensions
removed from the lowest level, and performing the reverse
conversion to create specific muscle tensions many layers below
-- and in a way that will relate properly to the highest level of
variables. This is the very reason, I think, that a hierarchy
exists rather than just one huge equivalent transfer function.
Others have conjectured in the same way -- I think it was Newell
who said that a hierarchical structure is the only practical way
of achieving complex behavior. Only by stacking up individually
simple types of systems can complexity be managed.
-----------------------------------

I'm not convinced by the argument in dense networks, in which
the perceptual representations are necessarily coarse-coded and
distributed. In such networks, conflict is endemic, and in
itself cannot force permanent reorganization. Likewise, since
neural signals cannot go negative (as Bill pointed out in an
important posting (920722.0800)), all functioning two-way ECSs
work because of internal conflict between the "push" one-way
ECS and its partnered "pull" ECS.

I don't think we can identify "conflict" with every case in which
signals are simply added together. The critical factor is whether
the adding takes place in a way that prevents one or more control
systems from correcting their errors. This is most likely to take
place when the vector of lower-level reference signals created by
one control system is coaxial with and opposed to the vector
created by another one. Even then, if the higher reference
signals are coordinated (in push-pull as you mention), then it
remains possible for both control systems to operate correctly
when called upon: they're simply never called upon to exert large
opposing actions in a way that drives them into saturation. When
properly coordinated, such control systems never experience large
chronic errors. It is the large chronic error that calls forth
reorganization, according to our current assumptions.

Also, I should note that conflicts that do create large chronic
errors are not a problem if reorganization commences as it should
and succeeds in eliminating the error. True problems arise when
reorganization fails because the process gets trapped into some
sort of endless loop, or creates other, greater conflicts. The
basic question that should be asked about every client in
psychotherapy is not "What's your conflict," but "Why hasn't your
conflict been resolved by this time?" Conflict is a perfectly
normal state, but so is conflict resolution.
---------------------------------------------------------------
Best to all,

Bill P.

_Martin_Taylor · October 18, 1993, 7:26pm

[Martin Taylor 931018 14:20]
(Bill Powers 931016.0815)

Bruce Nevin, Martin Taylor, Avery Andrews, et. al.: I should
thank all of you for your patience with me. My pronouncements on
linguistics must sound to you like some of those on PCT that we
have seen from newcomers: full of misplaced confidence and well-
intended misinterpretations.

One of the wonderful results of reinventing the wheel is that one
comes to understand how a wheel works.

I can only hope that in return my
naivete may accidentally reveal points of view that could be
explored profitably, if not by me.

Reinvented wheels may well have functions unsuspected by the original
inventors.

There's an unnecessary tone of apology in the foregoing. Your questions
usually make me rethink what I thought I knew, which can't be unhelpful.

This "word" would, however, still be devoid of meaning, so
perhaps Martin Taylor (who wants a morpheme or word to be a
"meaningful" unit of speech) might not yet give it the status of
a word.

"Morpheme" is a technical term that does include the concept of meaning.
"Word" is not, so far as I know, a technical term. It has no well-defined
meaning, though words tend to have meanings. Often, a "word" is taken
to be that which is written between two spaces in an alphabetic representation
of an IndoEuropean language (I may be being a bit too broad here). There
really isn't any valid definition of a spoken word, though most poeple
could say "I knows one when I sees one." In the discussion to date, I
have not worried about whether a "word" is linked to a meaning. I've
been more concerned with getting it (and phonemes) seen as category-level
perceptions, to contrast with the lower continuum perceptions that might
be labelled "phonetic." The point about the perceptions being at essentially
different levels is what I have been rabbitting on about.

My general concept of levels of perception is that each level
contains computing processes typical of that level, common to ALL
systems of that level. In particular they are common across
modalities, including vision and audition.

I've tended to hold that view, though not in terms of your levels, for
essentially all my working life. So do many other experimental psychologists,
though not all. I used to switch between vision and audition for that
reason, to see whether a process that seemed to work in one worked also
in the other. Usually they seem to, at least at low process levels.

But the point about scaling phonemes runs up against a different problem.
Here's a little drawing of what your approach might mean perceptually.
I'm plotting two perceptual levels. The coordinates of the space are
perceptions at one level, the regions in the space are perceptions at another.
(This is purely hypothetical, so don't assign any specific meanings to
the dimensions or categories).

        > \ A | \
        > \ | \
        > \ scales to | \ A
        > \ | \
        > \ | B \
coord 1 | B \ | \
        >______________________ |________________________
             coord 2

But what can happen in real life is that an occurrence of A could have
the same coordinates as a different occurence of B, or even, in extreme
cases, have the direction of the vector reversed.

It's hard to show a catastrophe surface in ASCII, but imagine that instead
of a line bounding the region that is A from the region that is B, the A
region is nearer you (in the third dimensions, out of the page) and its
edge overlaps the edge of the B region.

        > < > A | < >
        > < > | < >
        > < > scales to | < > A
        > < > | < >
        > < > | < B >
coord 1 | B < > | < >
        >________<___>_________ |________________________
             coord 2

Scaling still may occur (also translation of the spaces with respect to
one another), but now the scaled B is INSIDE the region that is sometimes
taken to be A. Which sheet of the catastrophe surface is taken depends on
the local (including temporal) context.

This catastrophe surface is essential to the category level, in my mind.
It is what makes the category level distinctive.

Your discussion of scaling is acute. It has been tried in speech recognition,
and works to some extent, but not to a great extent. One of the big problems
of speech is how to characterize voices so that they can be normalized
across persons and across times and situations. It is not known how to
do it well. Extensions of your method to warped scalings are modeately
effective, but they are not enough to lead to good person-independent
(or situation-independent) recognition.

This system has to operate relatively slowly, because the "sums"
are derived from signals that don't all occur at the same time.

You'd be surprised how fast it can work. In one contract done for me about
10 years ago, the contractor was able to make a substantial improvement in
recognition scores after testing only three syllables by doing local
warping of the frequency space.

There are some hints that this re-scaling might occur
independently along the different axes of the perceptual space.

Yes, and within an axis, too, at different points along it.

···

===================

Bill's comments concern a long-standing open issue, highlighted
by the language discussion. It is whether it is possible, or
even normal, that new ECSs can be inserted between pre-existing
ECSs that have a higher-lower relationship.

Actually I do believe that new control systems can appear at a
given level, even after both higher and lower systems exist. This
isn't the nature of the controversy as I see it.

A very interesting posting, demanding a better response than I have time
for right now. Perhaps before I go home this evening. But one word:

"reliable"

will, I think, be the key to my response.

Martin

bnhpct · October 21, 1993, 7:41pm

[From: Bruce Nevin (Thu 931021 15:41:11 EDT)]

Still enjoying the feeling of relief at disturbance long resisted no
longer causing error. A few responses to details.

( Bill Powers (931016.0815 MDT) )--

Yes, many of us I think have identified spelling with phonemes. It's
especially tempting in this all-ASCII medium! For example:

Hank Folsom (931016) --

Hank, I think you're confusing the manipulation of letters with the
manipulation of sounds. You say you perceive no difference between the p
of mouse pin and the b of mouse bin. I believe this is because you are
not actually listening to your pronunciations of these phrases, and not
actually comparing them to your pronunciations of pin, bin, and spin.

You say you don't understand the linguist's motivation in the pair test.
I agree. To understand the linguist's motivation, imagine yourself
applying this test to someone who says "they're the same word" when your
perceptions tell you you've substituted p for b, and who says "they're
different words" when your perceptions tell you you've pronounced p in
both of the utterances being compared. The informant isn't lying to you.
The informant's category perceptions are different from your category
perceptions. They are the phonemic categories of a language other than
English. Certain phonetic differences that shift category for you make
no difference to him, and indeed he has trouble even noticing them; you
would spell them with different letters, but for him such a distinction
in spelling is arbitrary, confusing, hard to learn and hard to remember.
Certain other phonetic differences that you don't notice make a big
difference to him. They fall into different phonemic categories for him.
He would spell them with different letters, but you would spell them
alike with the same letter.

( Bill Powers (931016.0815 MDT) resuming )--

If these sounds are really the same
(and it does seem to me that they are in casual speech), then
we're simply misspelling the words.

This is one factor in the Great Spelling Reform Controversy. There is a
Great Spelling Reform Controversy sooner or later in every language that
writes alphabetically. This is because languages change. Sooner or
later, what was a reasonably faithful representation of pronunciation at
some earlier time is subsequently found to contain differences of
spelling that map on to no differences of pronunciation (silent letters,
etc.), is found to lack differences of spelling where pronunciations are
now different ("dearest creature in creation...").

So, one factor is the desire to represent actual pronunciations. A
difficulty is that different dialects would have the same words spelled
differently. Worse than that, different styles of speaking (e.g. careful
pronunciation vs. casual speech) would be spelled differently even for a
single speaker. "Throw" would sometimes be spelled with r, sometimes
something else, perhaps with d (thdow). It probably would be an
excellent discipline for writers to listen to what they are writing, and
would reveal more about the writer's mood, attitudes, etc. than our
present spelling conventions, but I doubt it would be popular.

Conventional, standardized spelling smooths out those differences.
Hearers also smooth out such differences of pronunciation and ignore
them. So perhaps spelling should represent a higher level of perception
than the phonetic. Phonemic spelling would represent the category level.
At the level of phonemic categories, the sound of the second consonant in
throw is identical with the sound of the medial consonant in berry and
contrasts with the medial consonant in Betty. At the phonetic level,
there are two pronunciations of throw, and in one the second consonant is
identical with the medial consonant of Betty, but in another, more
careful pronunciation it is identical (phonetically) with the medial
consonant of berry. So you have an alternation of two pronunciations of
throw, three, bathroom, etc. on the phonetic level, but that phonetic
alternation gets smoothed out and is not perceived on the level of
phonemic categories.

There are other alternations in pronunciation of a given word, where the
alternants are phonemically different. Thus, the two pronunciations of
either and of neither (immortalized in Cole Porter's song, "Let's Call
the Whole Thing Off"), or the two pronunciations of economics, etc. One
does not wish to say that they are two different words, they are clearly
single words with two different phonemic shapes available. This is a
third factor in the Great Spelling Reform Controversy. In morphophonemic
spelling (as it is called), one representation is given as basic or as in
some sense underlying, and the alternant phonemic shapes are determinable
from it in some way, e.g. by a rule or relationship that applies to all
words that differ phonemically from one another in the same way. This is
in part what Harris' reductions are about. However, let's stick to
phonetic differences in pronunciation of a single phonemic shape (a
single morphophonemic alternant), such as the phonetic variation in
pronouncing economics with an ee vowel at the beginning (ignoring the
pronunciations with an eh vowel at the beginning).

we set the origin of a coordinate system at "schwa." Now
the position of each vowel in this space would be a vector of a
certain length and direction based at schwa

Yes. The Peterson & Barney data from 1952 plotted F1 and F2 for
different speakers on different occasions. For each vowel there was a
"cloud" of plotted positions, and the "clouds" overlapped. This was
problematic in the search for acoustic invariants of phonemic vowel
perception. But the "clouds" roughly describe vectors of just the sort
you have in mind, expanding out from their overlap with schwa.

With the coordinate system centered on schwa, this amounts to a
change of scale. The whole space contracts around the center at
schwa. If the sensitivity of the perceptual functions (that
distinguish the difference between the coordinates of vowels and
the coordinates of schwa) were ALL adjusted upward, the sensation
space could be expanded to compensate for the contraction, and
the resulting perceptual signals representing all the vowels
would be restored to their standard positions in this space. This
would remove the distortion produced by casual speech (if that's
the only distortion), and keep the signals entering the next
level in a standardized form that relieves the next level of the
task of compensating for scale.

Yes. Similar shifts and restorations could account for dialect
differences. There is another dimension not represented well in our
discussions so far of vowels, and that is diphthongization. There is
evidence that the temporal contour is important for diphthongs, adding a
third dimension orthogonal to F1 and F2 targets. The values marked eI
and oU on the charts I sent are diphthongs, but the formant positions for
only the first portion evidently is given, as though they were steady
state vowels. Many dialect differences involve differences in
diphthongization of the vowel portions of syllables. And factors of
transition and timing differentiate the consonants, as we have discussed.
This is apparently why e.g. w and y are perceived as consonants even
though phonetically they are at any given instant indistinguishable from
vowels.

There are other sound features, of course, e.g. the sound of air
turbulence, with concentrations at different frequencies and sometimes
filtered so as to show formants--compare f-s-sh sounds in English. But
let's stick with formants of and associated with voiced vowels.

This says that if you hear sounds that are physically the same as
being different, you are actually perceiving different sounds:
the re-scaling that restores ALL sounds to a standard position in
F1-F2 space will actually change a sound-perception that would
have been the same without the rescaling. After the re-scaling
the perceptual signals that would have been the same are now
actually different. The higher systems are not required to
imagine a difference: it is actually there.

The Land-type normalization must take place below the inputs to phonemic
category perceivers. As you say, it could account for what is measurably
"the same sound" (or "the same wavelength") being perceived as two
different sounds (colors). You can focus attention on sounds
subphonemically and notice their sameness. I suppose the same can be
done with light--guessing that's what painters (artists) do. A fleck of
paint from the same part of the palatte is green foliage in one part of
the scene, grey rock in another part, white sea foam or snow in another.

Here, we're talking about phonemic category perception. For the analogy
to light, we have to go from Land's problem (normalization under varying
lighting conditions) to the problem of color categorization--assigning
variegated color chips to one named category or another. Remember
discussion of color category perception and the Whorf-Sapir hypothesis,
especially the cross-post originally by one of the authors of the
experimental work in Mexico. English speakers found that it was they,
and not their native American informants, whose division between "blue"
and "green" was less in accord with physically measurable differences in
wavelength. The normalization here is such that the variously colored
chips in one set (a chip at a time) are categorized "green", and the
variously colored chips in another set are categorized "blue". Can this
be accounted for by the same mechanism? It seems to me that the Land
normalization is at a level just below category perception, and that
categorization involves a different sort of "normalization". It's not
clear to me how.

In either case, I don't see imagination involved. The category detector
is satisfied, and its yes/no signal is what is perceived, with the
variation "normalized" out, at higher levels that take category
perceptions as input.

I'm sending you a photocopy of the Johnson et al. article. The address
for Keith Johnson in the LSA list is no longer valid. My next step is to
post a query in the Linguist Digest.

Still behind in email reading, and getting behinder the hurrier I go.

Bruce
bn@bbn.com