[From Bill Powers (931016.0815 MDT)]
Bruce Nevin, Martin Taylor, Avery Andrews, et. al.: I should
thank all of you for your patience with me. My pronouncements on
linguistics must sound to you like some of those on PCT that we
have seen from newcomers: full of misplaced confidence and well-
intended misinterpretations. I can only hope that in return my
naivete may accidentally reveal points of view that could be
explored profitably, if not by me. I'm really trying to stick to
PCT-relevant points, but sometimes stray beyond my competence.
Bruce Nevin (931015.1415) --
Terminal phoneme in an utterance with silence following: why
ask me? Use your senses.
Well, I sort of did. But you have me a bit cowed with your
finely-honed linguist's perceptions -- who am I to say what I am
really hearing myself saying? And I'm really not in a position to
observe the conversations of others: most of my conversations are
with a computer, and don't involve many phonemes (an occasional
"Damn!").
I realized something this morning in pondering your post. I think
we (I) have some confusion between spelling and pronunciation. In
the vocal world, there is no such thing as spelling: when we try
to write words to show the way they are said, we spell them using
symbols whose sounds we are supposed to know. What brought this
up was "the sound of 'r' in throw, and the sound of 'tt' in
butter, ladder, and better." If these sounds are really the same
(and it does seem to me that they are in casual speech), then
we're simply misspelling the words. If the point were to write
words to indicate exactly how they are pronounced, then all these
words would be spelled with the same visual configuration in the
position of the so-called 'r' and 'tt' and 'dd'.
We would then find that we have different words that mean the
same thing: "butter" and "budder" and "bu[xx]er" are different
words but ALL mean a yellow substance we can spread on bread.
These words also can mean different things: a ram with a
reputation for butting, and a rose bush that buds well. If we can
really successfully indicate all of these meanings by saying
"bu[xx]er," then it's not the subtle differences in pronunciation
that matter, but something at a higher level, such as relations
to other words or context. If I say "I like bu[xx]er on my
toast," you know which meaning of the word I'm using. You also
know what I mean if I say "That damned sheep is a real bu[xx]er."
This would indicate that in the input function of the event-level
system involved in recognizing or producing this word, we would
have this little sub-computation at the position in the event
following detection of the "bu" part and preceding detection of
the "r" part:
enabled by prev. stage
------- |
dd --->| |<----
> >
tt --->| + | --->----> element of event
> > >
xx --->| | V
------- enable next stage
As these three sounds would never occur simultaneously, adding
together the signals representing them would be the equivalent of
a logical OR. Any one of these three sounds at the input would
result in a signal indicating that an acceptable perception had
occurred at that point in the event, and if the sounds before and
after were also acceptable, the event-perceiver would emit its
"my-word-occurred" signal.
This "word" would, however, still be devoid of meaning, so
perhaps Martin Taylor (who wants a morpheme or word to be a
"meaningful" unit of speech) might not yet give it the status of
a word. We can indicate that by calling it a word-event signal,
reserving the unmodified term "word" for the outcome of a higher-
level process that creates a link between the word-event and one
of the possible perceptual meanings, and emits a signal saying
that "my category occurred." That signal, produced equally by a
word-event and its co-categorical meanings, is a "meaningful
word."
When we distinguish butter from budder, we must somehow develop
another perceptual function (or alter the existing one on the
fly) so we now have two of them involving different sub-
computations:
···
-------
> >
> >
tt ----| + | -------->we(a)
> >
xx ----| |
-------
-------
dd ----| |
> >
> + | -------->we(b)
> >
xx ----| |
-------
Now there are two word-event components, a and b. They would be
parts of different word-event detectors. If the sound is xx, the
meaning must be assigned strictly from context because both word-
event signals will appear. If the sound is dd or tt, only one
word-event signal will appear and the meaning can become less
ambiguous. The meaning can seldom be completely unambiguous,
because there are few word-events that have precisely one
perceptual meaning, viz a clearly and crisply hyperarticulated
"butter" which could still mean either an animal or an animal
product.
In the above, I'm assuming a level of perception (to the left) at
which dd, tt, and xx are always distinctly and separately
perceived when the input sounds permit. At that level, the sounds
are all quite different. I speak, of course, of the adult
organization; the lower discriminations might not be so fine
initially (as I believe you have been trying to tell me, Bruce).
I think this partly solves one of our problems, even though
alternative schemes might work equally well. We now don't have to
worry about how different sounds can be perceived as the same
word-event. At one level, the sounds can be perceived as quite
different (by a trained linguist, or any person who has learned
to attend closely to the sensation level), while at the same time
the resulting word-event signals, at another level, are in fact
identical (not just "similar").
There remains the problem of how different sounds can result in
the same sensations, which I understand is also a fact. This is
the problem of embedded sounds, sounds occurring in the middle of
words. By the cut-and-paste approach (which I agree elegantly
gets around all problems of artificial perceptual functions but
which, unfortunately, I can't implement), it can be established
that the same sound is heard differently when embedded in
different surrrounding sounds. This suggest some sort of global
normalizing process that somehow stretches or compresses the
sensation space to bring all sensations under a standard metric,
thus putting signals misplaced relative to each other back into
their proper relationships. I swear that that bit of verbiage
does correspond to an idea in my head, which I will try to
describe.
A close analogy exists in the visual mode. Consider the word
"ill" printed without the dot over the i. If I simply type
l
can you tell whether that is an l or an i? Right now you can, but
suppose I had the ability to type everything at twice the x and y
size. I could now type
* *
* *
* * *
* * *
or
* *
* *
* *
* *
* * *
* * *
* * *
* * *
Now the "i" in the second case is identical to the "l" in the
first case, yet it's perceived as different.
This says that in the visual realm, scaling may be removed before
identification takes place. By "removed" I mean correcting to a
standard size, so tiny things are blown up and huge things are
minified, thus presenting to the next level a set of sensations
that vary within one common framework.
My general concept of levels of perception is that each level
contains computing processes typical of that level, common to ALL
systems of that level. In particular they are common across
modalities, including vision and audition. So if a scaling
process exists for vision, the same type of process would be
available to apply to signals at the same level that happen to
represent sounds. This brings me to your plot of formant 1
against formant 2 (F1 against F2) for different vowels.
Suppose we set the origin of a coordinate system at "schwa." Now
the position of each vowel in this space would be a vector of a
certain length and direction based at schwa (this is probably
what you've been talking about). You indicated on one plot that i
and I might move from their nominal positions toward schwa, so
that i came to be near the position formerly occupied by I, with
I now moving closer to schwa and thus remaining separated from i.
You also indicated that causal speech tends to move ALL vowels
closer to schwa.
With the coordinate system centered on schwa, this amounts to a
change of scale. The whole space contracts around the center at
schwa. If the sensitivity of the perceptual functions (that
distinguish the difference between the coordinates of vowels and
the coordinates of schwa) were ALL adjusted upward, the sensation
space could be expanded to compensate for the contraction, and
the resulting perceptual signals representing all the vowels
would be restored to their standard positions in this space. This
would remove the distortion produced by casual speech (if that's
the only distortion), and keep the signals entering the next
level in a standardized form that relieves the next level of the
task of compensating for scale.
This is not only like the geometric re-scaling that occurs in
perceiving "ill" above, but like the rescaling that occurs in
Land's color vision theory. "Schwa" in the realm of sound
corresponds to "gray" in the realm of color vision, and the
change in scaling of perceived sound-differences corresponds to
the changed weighting of color-differences. The result is to
maintain the perceptual field in a standard condition even under
systematic changes of lighting, or systematic changes in the
patterns of received sounds. Even at the sensation level, the
perceptual field as a whole looks just as it did before, in the
respect that matters. The re-scaling signal, of course, would be
available separately as an indicator of differences in scaling.
This says that if you hear sounds that are physically the same as
being different, you are actually perceiving different sounds:
the re-scaling that restores ALL sounds to a standard position in
F1-F2 space will actually change a sound-perception that would
have been the same without the rescaling. After the re-scaling
the perceptual signals that would have been the same are now
actually different. The higher systems are not required to
imagine a difference: it is actually there.
Implementing this sort of re-scaling is simple in principle. What
you need is a perceptual function that receives copies of all
perceptual signals from the sensation-generators in question and
sums them, and a single control system that controls the sum to
match a fixed reference value. The control is achieved by
simultaneously raising or lowering the sensitivity of all the
input functions at the same time (which alters all the
coordinates in proportion). Each perceptual signal would have to
have the coordinates of "schwa" subtracted from it before
summing. In fact, the center of rescaling could also easily be
under control.
This system has to operate relatively slowly, because the "sums"
are derived from signals that don't all occur at the same time.
There are some hints that this re-scaling might occur
independently along the different axes of the perceptual space.
When you sit off the centerline of a TV set, retinal images of
objects have the same height but less width than the centerline
view would give. But after a little while, you "forget" this
difference: the picture looks quite normal. The horizontal
dimension alone has been re-scaled.
In the realm of sound, there may be dialectic or individual
differences that distort the formant axes differently, so the
rescaling required is different on the two axes. That would
simply require one sum-of-coordinates control system for each
axis.
This sounds do-able, doesn't it? Could you send me a table of the
formant coordinates for the vowels so I don't have to try to
derive them from the ASCII plots? I will try to get that cited
article, but as you know that's a ponderous process. I just need
a little information, so I can go on living dangerously.
I am, by the way, accepting your judgment that my apparatus may
simply be too crude to pick up reliable differences in the
spectrograms. But the above proposal would probably apply just as
well to any other model of the way intensities are converted into
sensations, as long as the coordinate axes could be
distinguished. The formant approach is attractive because you say
that there is a reasonable correspondence with articulator
configurations -- that would make control easy. That
correspondence, however, might also be found with the outputs of
other perceptual functions. We might as well keep looking.
---------------------------------------------------------------
Martin Taylor (931015.1415) --
Bill's comments concern a long-standing open issue, highlighted
by the language discussion. It is whether it is possible, or
even normal, that new ECSs can be inserted between pre-existing
ECSs that have a higher-lower relationship.
Actually I do believe that new control systems can appear at a
given level, even after both higher and lower systems exist. This
isn't the nature of the controversy as I see it. The real problem
is whether a higher system can even interact with a system two
levels below. In my mind, a new intervening system would be
needed precisely because the higher system can't issue reference
signals to the lower level that would have any effects that would
make sense to the higher system. But the higher system couldn't
even exist unless there were some intervening systems; the gap
can never be empty, and if it is empty, all the higher levels
must be missing (or incapable of acting).
This is tied in with my idea that each new level introduces a new
logical type of controlled variable, not just a scaled-up or more
complex version of the types already existing at the lower
levels. The sense of motion is not just a scaled-up or more
complex configuration or sensation. An event is not just a
transition in a new direction or a bigger or more complex
configuration. And a relationship is not just a new kind of
event, transition, or configuration.
Each new level introduces a new dimension of control that doesn't
even exist at the lower levels (except implicitly, for the basic
degrees of freedom have to exist even if they're not explicitly
sensed or controlled). The step from one level to the next
involves translations such that an error at the higher level can
be converted directly into an appropriate reference signal at the
next lower level. I don't believe that this is possible when you
try to skip levels. That is, an error in an event may translate
directly into a change in a transition, but with no transitions
involved at all, I don't believe it could be converted into an
appropriate reference level for configuration. The transition
information is vital for knowing WHICH configuration must be
changed, and WHEN during the event. If you know that "spohled" is
"spoiled" mispronounced (in your opinion), how can you convert
the simple one-dimensional event-error signal into the the
placement of an "ee" to create the dipththong-transition that you
want, without the transition-level perception? The event-error
signal just isn't of the right type to translate directly into
the required rate of change of configuration _at the right time_.
The transition-level control is _required_.
So I would claim that it's just not possible to have an event-
control level and a configuration-control level, but no
intervening transition-control level. I'm claiming that the event
level simply can't control except through a transition level: any
larger gap can't be bridged in a control hierarchy.
In fact this was more or less a constraint on defining the levels
in the first place. I was always looking for what seemed the
_least_ step upward that could be taken. Just why one step seemed
small enough and another seemed too large I can't elucidate. In
some cases, intermediate levels (like relationships) came into
being because I saw suddenly that "you can't get here from
there." Way back in my subconscious there is an excellent and
simple way of stating what the requirement is for the least
permissible step, either upward or downward. Like Fermat, all I
have is the conviction that this proof exists; more, that
subterranean deponent sayeth not, curse him.
There is a vague notion that might give this idea at least an
aura of respectability (or maybe the deponent hath deigned to
drop a hint). Each new level of control operates in a world made
of the degrees of freedom created (made explicit) by the next
lower level, but takes advantage of ways in which that world can
still change that are not under control at the lower level. So
the transition level takes advantage of the fact that a
configuration-level system doesn't perceive or control WHEN or
HOW FAST its configuration changes. The configuration-level
system will make the configuration match the given reference
signal as if there had never been any other reference signal; the
fact that this reference signal is changing in time, and
accelerating or decelerating, in no way affects the configuration
control system. All that is forbidden is that two configuration-
level systems be told to produce mutually-exclusive states of the
sensation world. The dimension of transition is left undefined in
the configuration world.
Introducing the transition level defines that dimension (or
cluster of dimensions) and places it under control, by connecting
the outputs of the new transition control systems to the
reference inputs of the configuration control systems. This is
entirely appropriate because transitions are perceptually derived
from successive states of configurations: the types are exactly
right both upgoing and downgoing. This truly seems to be a
minimum step size.
Now, with transition perceptions in existence, there is a new
world in which the dimensions are all transitions. The event
level then takes advantage of the fact that individual
transitions are not controlled in the dimension of when or in
what patterns they will be brought about. Each transition-level
system treats its own task as though the current reference signal
is the only one that has ever existed; what came before it or
after it, or what other transitions were doing at the same time,
makes no difference. And those are the exact respects in which
the event level can vary the world of transitions without fear of
contradiction. Furthermore, events are perceptually derived from
transitions. Again we seem to have an exactly appropriate
relationship both at input and at output, and what seems a truly
minimum step size.
Perhaps the muse has indeed muttered a grudging word. This had
not occurred to me before. The reason that a system of level n+1
can't interact directly with a system of level n-1 is that it
would have to incorporate all the functions of level n in order
to do so. Level n is required in order to create the dimensions
from which perceptions of level n+1 are derived, and to translate
error signals of level n+1 correctly into changes of reference
signals for level n-1. Those error signals would not be
appropriately related to reference signals of level n-1, nor
could perceptual signals of level n+1 be derived directly (by
simple means) from those of level n-1.
Thus if there is a system of level n+1 that can act directly
through systems of level n-1, then the higher system must already
contain the perceptual and output processes that can bridge the
gap; in effect, level n already exists in the higher system.
I consider this unlikely, because it says that the single higher
system that would first exist is by far more complex than any
later subsystem that will ever exist, being capable of converting
raw intensity signals into variables many layers of dimensions
removed from the lowest level, and performing the reverse
conversion to create specific muscle tensions many layers below
-- and in a way that will relate properly to the highest level of
variables. This is the very reason, I think, that a hierarchy
exists rather than just one huge equivalent transfer function.
Others have conjectured in the same way -- I think it was Newell
who said that a hierarchical structure is the only practical way
of achieving complex behavior. Only by stacking up individually
simple types of systems can complexity be managed.
-----------------------------------
I'm not convinced by the argument in dense networks, in which
the perceptual representations are necessarily coarse-coded and
distributed. In such networks, conflict is endemic, and in
itself cannot force permanent reorganization. Likewise, since
neural signals cannot go negative (as Bill pointed out in an
important posting (920722.0800)), all functioning two-way ECSs
work because of internal conflict between the "push" one-way
ECS and its partnered "pull" ECS.
I don't think we can identify "conflict" with every case in which
signals are simply added together. The critical factor is whether
the adding takes place in a way that prevents one or more control
systems from correcting their errors. This is most likely to take
place when the vector of lower-level reference signals created by
one control system is coaxial with and opposed to the vector
created by another one. Even then, if the higher reference
signals are coordinated (in push-pull as you mention), then it
remains possible for both control systems to operate correctly
when called upon: they're simply never called upon to exert large
opposing actions in a way that drives them into saturation. When
properly coordinated, such control systems never experience large
chronic errors. It is the large chronic error that calls forth
reorganization, according to our current assumptions.
Also, I should note that conflicts that do create large chronic
errors are not a problem if reorganization commences as it should
and succeeds in eliminating the error. True problems arise when
reorganization fails because the process gets trapped into some
sort of endless loop, or creates other, greater conflicts. The
basic question that should be asked about every client in
psychotherapy is not "What's your conflict," but "Why hasn't your
conflict been resolved by this time?" Conflict is a perfectly
normal state, but so is conflict resolution.
---------------------------------------------------------------
Best to all,
Bill P.