Randall on information

[From Bill Powers (930618.1930 MDT)]

Allen Randall (930518.1400 EDT)]

I said:

When we say "disturbance" in equations, we mean the STATE OF
THE DISTURBING VARIABLE:...We do NOT mean the amount by which
the knot (X) is displaced from the target position.

In spite of all the other misunderstandings, I don't think this
is a problem. I do not mean the amount by which the knot is
moved. I mean the force(s) acting on the knot from the external
environment, whether or not the knot actually moves.

It's all right to use force as a measure of the disturbing
quantity in this example, but before it can be used in the system
equations it must be transformed into an equivalent effect on the
controlled quantity, the knot position. The effect of a position-
independent disturbing force depends on the nature of the
connection between the system's output and the controlled
variable or CEV. In the rubber-band experiment, the effect on
position of a force applied to the knot depends on the spring
constant of the other rubber band. If that spring is linear, and
assuming that the subject's hand position is under lower-level
control so it is unaffected by changes in the pull on the rubber
bands, the effect of the disturbing quantity qd as a force is
just qd/s, where s is the spring constant in units of stretch per
unit force. The disturbance function fd is 1/s.

When you use force as the disturbing quantity, you're implying
that it's unaffected by the position of the knot. If you use the
position of the disturbing end of the rubber band instead, as we
do for the output quantity at the other end, then the disturbing
force does depend on knot position, and the equations become
somewhat more complex.

You can't solve the system equations until you've converted the
output quantity and the input quantity into terms of effects on
the controlled quantity in the same units in which it is sensed.
The controlled quantity is a sensed position, not a force. To
deduce the applied disturbing force from the perceptual signal,
other system variables and functions must be known, particularly
(in this case) the nature of the feedback function fe. The
perceptual signal alone gives no clue about the conversion factor
1/s. In dynamic situations it is also necessary to know the mass
of the "knot" which is again not represented in the perceptual
signal. The variations in knot position that are represented in
the perceptual signal are not directly equivalent to variations
in the applied force.

···

--------------------------------------------------------------

Bill, I have a real problem with this whole log(D/r) thing...

I got it from Martin Taylor... I'm just following orders.

Well, all I can say is that this is not the standard
definition, which is -log(probability), as I've described many
times in the past. I suspect you are using log(D/r) differently
than Martin, but I will let him speak for himself. The way you
are using it, it is not equivalent to information.

And then you say ...

D/r is just the number of possibilities inherent in the
numbering system or the measuring apparatus. If you considered
this to be the inverse of the probability, then I would have
no problem.

Well, then, we have no problem about that, because I did
understand that the probability is simply -log(r/D) or
log(D/r), at least in simple cases where the divisions of the
range into units of r is uniform. The only difference between us
is a logarithm.

I'm surprised at your response to my argument about the effect of
using finite intervals on scaling. If you have a variable with a
range D of 8 units and a minimum increment r of 1 unit, all
values of the variable are expressed as whole numbers between 0
and 7. If you then scale this variable down by a factor of 3, you
get a new variable with a range between 0 and 2, still with a 1-
unit minimum increment. Obviously, if the original variable
contained 3 bits, the scaled-down version contains only 1 bit (0
to 1, or 1 to 2). Scaling back up by a factor of 3, you get not
the original variable, but a variable with a range from 0 to 5 --
2 bits and change.

Your conclusion below is simply wrong:

So if I scale a sequence of these numbers, I do not have to
also scale the resolution at which they were measured nor the
resolution at which they are represented in the computer in
order to retain information.

In the computer, all numbers are represented in binary form. The
mantissa has a finite number of bits, which determines the
resolution for all computations. If you compute x/10, you lose
whatever significant digits are unable to be expressed in the
conversion from decimal to binary notation and as the quotient of
the division. When you scale up again by a factor of 10, in
general you get a number different from the original one. You
happen to have chosen numbers, in your second example, that come
out even in decimal notation. They probably don't come out even
in the computer. But even if they did, that would be
happenstance, and nothing to base a general principle on. The
numbers will come out even only if the scaling division has no
remainder in the number base used -- that is, if the dividend is
an integer multiple of the divisor.

You have to assume that you will ALWAYS lose resolution when you
scale numbers down, given a universe of measurement with a finite
resolution. On the average, you will. And if the size of the
basic increment doesn't change, the scaled-down number will be
divided into fewer intervals of that size than the original
number will be divided. This is ridiculously easy to show:

If I = log(D/r), and we scale down to a new variable G with a
range of D/8 and the same resolution r, the new information
content is I' = log(D/8r) = log(D/r) - 3 = I - 3. The division
is, of course, remainderless, so we can't regain the lost 3 bits
by scaling up again. The scaled-down version loses information
according to the log of the scaling factor.

This is particularly relevant when you speak of a basic
resolution that is much larger than the resolution of your
computer. It is not the computer's resolution that counts, then;
it is the assumed resolution of measurement of the variable being
represented. If r is 1% of D, then the probability of chance
occurrance of any value of d between 0 and 99r is 1%. If you
represent any particular value with more significant digits than
2, the extra precision is illusory. Scaling such a measure down
by a factor of 3 and then back up by the same factor, in decimal
notation, will lose you nothing if the number is 33, but will
lose you 3 digits -- 10 percent -- if the number is 32. 33 / 3 x
3 = 33, but 32 / 3 * 3 = 30. If you scale down by 30 to one, you
will lose enormous amounts of precision; on scaling back up you
will find just one of four possible numbers: 0, 30r, 60r, and
90r.

I don't want to make a federal case out of this - its not a
really important point. I was only trying to illustrate that
your statement about log(D/r), while true about THAT measure,
is not IN GENERAL true about information. D/r is just the
number of possibilities inherent in the numbering system or the
measuring apparatus.

It IS an important point. Martin has been implying that the
resolution of the input function depends on its RMS noise level,
and I completely disagree with this concept of resolution of the
measuring apparatus. If the noise is 10% RMS of the range,
Martin's intepretation would be that there are only 10 possible
values of the perceptual signal with a range of 10 units, 0
through 9. This would make the probability of any one value of
perceptual signal 10 percent. In fact, however, the perceptual
signal can have a magnitude between 0 and 9 with a _precision_
that depends on how long you observe it: if you observe it 10
times as long, it has 3 times the precision if the noise is
Poisson-distributed. And however long you observe, whether for a
short time or a long time, the RMS noise does not predict the
probability of a specific measure, for a specific measure can
have any value in the real number range between 0 and 9. The
_resolution_ is infinite, although the _precision_ and
_repeatability_ are not.

So the number of possible measures between 0 and 9 is not 10, as
Martin's assumption implies, but infinite. The only way in which
the probabilities could be correctly computed from r/D would be
in the case where D is divided into exactly r intervals with
fixed boundaries, so that no measure can be anything but an
integral multiple of r. This is not true when r represents the
RMS spread of a noise function. In that case, the underlying
resolution is far smaller than r -- it must be, if we can even
talk about a noise "distribution." If the resolution were equal
to the 1-sigma spread in noise amplitude, we couldn't draw the
curve of the noise distribution: it would be maximum for all
deviations from the mean less than 1 sigma, and 0 for all larger
deviations. That is not how real noise behaves.

From what Martin and you have said, and from reading the

materials Martin has sent me and that I have looked up, it seems
clear that the basic logic of this whole approach was settled
many decades ago, and has not been critically examined since.
Much more interesting to forge ahead with the complex
mathematics. And, after all this time, much more dangerous to
fiddle around with the basic assumptions on which so much now
depends.

I certainly can't follow the advanced mathematics that have been
built up on the base, but I can judge simple basic assumptions as
well as the next person. I keep finding sloppiness and
ambiguities in them, and even illogic. I also find great pressure
to get out of that area, leave the assumptions alone, and
concentrate on learning what has been deduced from them. This
simply goes against my nature: I can't whip up any enthusiasm for
pursuing the complex implications of premises that I still think
are probably flawed.

As I said in a previous post, however, none of my objections will
mean anything if you can come up with an analysis of an actual
control system and show what information theory has to say about
it, in numbers. You say

*Given* a language or model to work with, information *is*
well-defined. But, you said yourself that what the numbers mean
depends on the physical situation we take them to represent.
And this *does* depend on what is assumed.

Fine. So specify a language, state your assumptions, and do it.
I'm not going to be impressed by mathematical generalizations.
It's too often the case that the mathematical mountain stirs, and
gives birth to a mouse. It's also quite frequent that when one
gets down to analyzing a specific actual situation, nature pops
up with relationships you hadn't anticipated, and you begin to
see that there is something wrong with your assumptions. There is
no substitute for doing an actual quantitative analysis of real
data. If you just want to do abstract mathematical analyses,
that's fine, but it just means that I'll have to wait for someone
else to demonstrate that the results apply to ANY real cases of
control.

As to the "information in perception" argument, I think you're
pulling back from the original position, which is OK but you
should acknowledge it. Originally, the concept was that the
output of the system almost exactly opposes the effects of the
disturbing variable on the basis of information contained in the
perceptual signal. Now it turns out that your side is admitting
that the perceptual signal doesn't contain ALL of the information
in the disturbing quantity, but only some part of it. You say

... if I can use the percept to do the reconstruction using
*fewer* additional bits than would be required without the
percept, then I have shown that the percept contains
information about the disturbance.

Showing that it contains SOME information is not the same as
showing that it contains SUFFICIENT information from which to
construct the required output. I know you feel that it MUST be
sufficient, but that is what remains to be proven, and can't be
taken as a premise.

I presume you would agree that the better the control, the
smaller the amount of information that gets through to the
perceptual signal. So you are now in the position of having to
show how an output waveform that almost exactly matches the
waveform of the disturbing quantity, and thus presumably contains
nearly the same amount of information, can be generated from a
perceptual signal that contains only a small part of the
information in the disturbing quantity (all in terms of rates).

I will be fascinated to see how you solve this problem and
demonstrate that the information in the disturbing quantity is
sufficient to produce the output actions. Simply insisting that
it MUST be enough isn't acceptable to me: you have to demonstrate
that it is, with numbers, before I'll believe it. Martin's
mystery function didn't prove this; it simply showed that if you
know all the functions and all but one of the variables, you can
solve for the remaining variable. There was no demonstration of
the relationship of this fact to information theory. What Martin
ended up proving was that algebra works.

You say

You posted correlation figures. As far as I can remember that
was about it.

Rick, how about sending Allen that string of figures again? I
also sent some, Allen. Perhaps you just haven't come across them.
I'll regenerate mine and send them, too, if you want them.

As far as I'm concerned, you have have all information about all
variables and functions. At this point, all I want is to see how
you actually do the calculations.
----------------------------------------------------------------
Best,

Bill P.

----------------------

kFrom @vm.utcc.utoronto.ca:owner-csg-l@VMD.CSO.UIUC.EDU Mon Jun 21 10:04:30 1993
Return-Path: <@vm.utcc.utoronto.ca:owner-csg-l@VMD.CSO.UIUC.EDU>
Received: from vm.utcc.utoronto.ca by ben.dciem.dnd.ca (4.1/SMI-3.2)
  id AA07577; Mon, 21 Jun 93 10:03:37 EDT
Message-Id: <9306211403.AA07577@ben.dciem.dnd.ca>
Received: from vm.utcc.utoronto.ca by vm.utcc.utoronto.ca (IBM VM SMTP V2R2)
   with BSMTP id 8935; Mon, 21 Jun 93 10:01:46 EDT
Received: from UTORONTO.BITNET by vm.utcc.utoronto.ca (Mailer R2.10 ptf000)
with BSMTP id 2056; Mon, 21 Jun 93 09:47:08 EDT
Date: Mon, 21 Jun 1993 09:46:56 -0400
Reply-To: "Control Systems Group Network (CSGnet)" <CSG-L@UIUCVMD.BITNET>
Sender: "Control Systems Group Network (CSGnet)" <CSG-L@UIUCVMD.BITNET>
From: bnevin@BBN.COM
Subject: repetition vs. imitation
X-To: csg-l@vmd.cso.uiuc.edu
X-Cc: bn@BBN.COM
To: Multiple recipients of list CSG-L <CSG-L@UIUCVMD.BITNET>
Status: RO

[From: Bruce Nevin (Mon 930621 09:44:18 EDT)]

( Bill Powers (930613.2200 MDT) ) --

Here's my problem. If I present you with a picture of a grape and
a picture of an elephant, you can distinguish between them; the
perceptual input that allows you to perceive the grape does not
respond to the elephant, and vice versa. So I have established
that you have two perceptual functions, one for each picture
(sort of). Am I then justified in saying that you are perceiving
something called "contrast" between the grape and the elephant?
Or is the notion of contrast an interpretation by the observer, me?

The analogy obscures the point. The image of an elephant or of a grape
is not a behavioral output of another human being that you might likewise
produce among your own behavioral outputs. The hypothesis was that you
have to produce a phoneme or a word (really or in imagination) in order
to perceive one. The phoneme /p/ cannot be said to exist as an object in
Boss Reality, independent of our perceptions, in the way that we say that
an elephant or a grape exists independent of our perceptions.

The notion of contrast is of course an interpretation by the observer;
the phoneme is an interpretation by the observer. The utterance itself,
as utterance rather than as mere sounds, is an interpretation by the
observer. The contrasts are not only "in" the utterance, they constitute
the utterance, they are what make it an utterance rather than mere
sounds. Phonemes, words, utterances do not exist in "Boss Reality" as we
suppose that elephants and grapes do; they only exist in people's
perceptions. Likewise phonemic cues, distinctive features, and so on.
They only exist because people, anticipating speech, are looking for
satisfiers of a previously known relationship of contrast. (I'm only
talking about the perception problem here, not the learning problem.
I'll try to avoide getting them muddled again.)

>The segments are relevant because they represent the contrasts
>between utterances and locate the points of contrast within
>utterances.

But isn't it really that the listener's perceptual functions make
a distinction, rather than that there are objective contrasts in
the sentences?

Yes, it is the case that the listener's perceptual functions make
a distinction, rather than that there are objective contrasts in
the sounds that he hears and the articulatory gestures that he perceives.
It is also the case that these contrasts, products of the listener's (and
speaker's) perceptual functions, are really in the sentences, which are
equally products of the same people's perceptual functions.

But it still is defining "segments" in terms of human perceptual
functions, not the other way around.

Yes. The thrust of virtually all linguists' work, ignoring Harris, has
been to show how segments and contrasts are objective, in Boss Reality,
either as physical entities or attributes or indirectly, by virtue of
"feature detectors" hard-wired in the human genome for features that are
universal across all humanly possible languages. However, these
detectors are not for specific features of sound or gesture, rather, for
dimensions along which sounds/gestures may be categorized into opposing
members of a contrast, with the absolute values used to be determined by
the child's experience. Learning, on this view, is parameter setting.

What grates on my tender
sensibilities is speaking of the contrast as if it existed in the
utterance, or pairs of utterances.

Assuming some sort of correspondence of perceptions to Boss Reality, an
elephant or a grape exists independently of whether someone perceives
them or not, and contrast is a relation between perceptions of an elephant
and perceptions of a grape, but not a relation between elephants and
grapes. However, an utterance does NOT exist independently of whether
someone perceives it or not. In Boss Reality you have only sounds and
articulatory movements. The utterance, as such, exists ONLY as
controlled perceptions of a linguistically competent control system.
An utterance is constituted of contrasts, that is, of differences that
make a difference.

(To clear away a possible misunderstanding: the term "utterance" is used
instead of "word" or "sentence" because much that is said between pauses
in a given language is not a single word, and is more than or less than a
sentence. In ordinary, non-technical usage an utterance might also be a
non-linguistic sound: "Heathcliff uttered a wierd cry." However,
non-language sounds are not utterances in the sense intended. If a tree
falls and no one perceives it, the tree (we presume) is still there, in
Boss Reality. If a tape recording of the Gettysburgh Address is played
in a forest and no one is there, it is just sounds, no utterances at
all. The sounds are really there; the utterances are not.)

An utterance is a production in a language which is recognizable by other
speakers of the language and repeatable by other speakers such that their
repetitions are likewise recognizable as repetitions. As such,
utterances literally do not exist as utterances unless they are perceived
as such. This is why the contrasts (by which utterances are perceived as
utterances) exist in the utterances. If I played a tape of Tagalog or
Rwanda conversations a listener who did not know the language on the
recording would not be able to repeat any of it (however well she could
imitate), would not recognize repetitions, and certainly would not
perceive the meanings of the utterances. The listener might suppose that
she heard a word repeated in the recording, when in fact those portions
of the recording differed by some feature that was distinctive in that
language but not in English; or might suppose that she heard different
words when in fact the discriminable differences are not contrastive in
the recorded language, even though they are so in English.

I have tried to keep distinct two aspects of the problem: the learning or
establishing of the phonemes vs. the recognition of words (setting aside
for now the limitations of the terms "phonemes" and "words", and ignoring
also the evolutionary aspect). We must be careful about this, to avoid
getting muddled.

Harris's use of the Test concerns the learning problem; your response
concerns the recognition problem. You say that neural nets can find the
invariants that are phonemes. In fact, as NNs are "trained" they are
given the contrasts, that is, the fact that some inputs are repetition
sets whose differences make no difference, and other inputs are not
repetitions. The result is that they come up with representations of
those contrasts, which they then use (with less than perfect success) to
recognize other repetitions. Coming up with representations of contrasts
is the learning problem; recognizing other repetitions, given a
representation of the contrasts, is the recognition problem. The
representations (the "phonemes" or "phoneme recognizers") that one NN or
one person comes up with in the learning process may be different from
those that another comes up with in a comparable learning process. That
does not matter, so long as they are representations of the same
contrasts between words (between utterances). The contrasts are the
invariants (given a partially shared vocabulary), not the representations
by sounds and gestures. It is this that accounts for our ready
accommodation to differences in pronunciation, so long as the differences
are systemic. If we had a speech synthesis device that achieved
naturalistic speech output in various dialects, we could understand it,
after a little exposure, no matter which of a range of dialects it was
procuding. If, however, it was shifting from one dialect to another
unpredictably during the course of sentences and words, we could not.

The learning problem is solvable only if the learner is told which
utterances are the same and which contrast. This often takes the form of
the hearer interpreting utterance XYZ as an instance of utterance XWZ.
In other words, the significant pronunciation-differences cannot be
learned, as opposed to the pronunciation-differences that are not
significant, if the learner has no idea of the differences of meaning
between utterances. The least datum about differences of meaning (to
which all such differences are reducible) is this: are the two
productions repetitions, or do they contrast? Conversely, if pairs of
utterances that differ by pronunciation-difference X are consistently
different from one another in meaning, they contrast, and X is a
significant difference; that is, X is phonemic; that is,
pronunciation-difference X is a representation of a phonemic contrast
between those pairs of utterances. Kids hear lots and lots of evidence
as to what differences make a difference and what differences don't.
People misunderstand them (or one another) and correct the
misunderstandings, people correct themselves in the child's hearing,
people repeat back the words they heard the child say, and words that
they heard one another say, and so on.

If I have two experiences and they actually involve
only one perception, I classify this situation as "same." If
there is more than one perception, I classify it as "different."
But before I can perceive which category of situation it is, I
must know already whether one or two perceptions were involved: I
must know that both experiences came from _this_ perceiver, or
that one came from _this_ and the other from _that_. If this sort
of discrimination hasn't already been made, perceptually, there
is no way to decide on the category "same" or "different."

It's on this basis that I maintain that "contrast" isn't an
explanatory term, but only a descriptive one. We don't perceive
that two things are the same or different because they ARE either
the same or different. We can only make that judgment after the
discrimination has been made at lower levels. If we perceive two
things via two input functions, we conclude that they are
different; if both perceptions come from the same input function
we call them the same.

At the lower level, we discriminate differences that may be either
distinctive or insignificant at the higher level.

The pronunciation of a given phoneme varies from one context to another.
The perception cues for distinguishing between similar phonemes at the
corresponding places in similar words may be produced with variable gain.
For example, the medial consonant in paining/painting/pating (or plating
if we can't accept "to pate" as some kind of nonce word). Or consider
the flapped r in the Spanish pronunciation of Paraguay. This occurs in
American English also. Say "pot of coffee" at an ordinary conversational
rate (pot'ocoffee). It occurs in the middle of "latter" as in "he picked
up the former and put it on the latter". Now, suppose we have a tool
called a former. The same sound/gesture occurs in "ladder", as in "he
picked up the former and put it on the ladder". Same input function. In
one case, we hear d, in the other we hear t. If the contrast is
important, we distinguish t from d more clearly, pronouncing "laTTer" or
"laDDer". (This is the norm in most British and many Canadian dialects.)

With low gain such words may become indistinguishable phonetically.
Semantic and syntactic aspects of the context tell us which word to
expect (as with outright homonyms like beet/beat, or see/sea/[holy]
see/C), and indeed such predictability is a condition for lowering gain
on pronunciation of a given word, with the reduction system as a further
extension of the same principle. In all of this, it is the contrast
between words that is controlled.

When two people converse, they speak more or less different versions of the
language. These may be called different dialects if extreme enough, on a
continuum out to what we call different but related languages if they are
not mutually intelligible (if most of those who speak the two versions
cannot converse, each speaking her own version to the other). Assuming
their talk is mutually intelligible, they interconvert between the phonemic
distinctions made by one and those made by the other. My "kayo" (K.O.)
sounds markedly similar to my brother's "cow". (He has lived for many years
in Georgia, and we grew up in central Florida, where he identified as a
local and I identified as a college-bound kid disaffected with the redneck
community around me. This is in reference to prior discussions of Labov's
work on social dialect.) Does this mean that people construct some sort of
table lookup? Is it not much more parsimonious to suppose that they are
controlling the contrasts between words by variable means at the level of
pronunciation?

Second pass:

Actually, this latter part was an earlier pass, and I just gave up on trying
to integrate into the "first pass" those points that I wanted to keep.

what is perceived is a syllable or a word, not a contrast.

What is a syllable? What is a word? You are begging the question here.
Syllables and words do not exist in the sounds of speech. This is shown
by the fact that speakers of different languages parse the same sounds
differently into syllables and words.

when you speak of contrasts, which contrast do you mean?
Every possible discriminable segment differs from all other
discriminable segments, simultaneously. If you have 3000
discriminable syllables or words in the working set, you have
4,498,500 dyadic contrasts in the set. Does each perceptual
function have to search through 2999 contrasts to decide [etc.]

Before children learn phonemic contrasts as a way of representing
word-contrasts, their vocabulary is severely limited for just this
reason. After they learn to represent word-contrasts by phonemic
contrasts, vocabulary takes off. If there are 50 phonemes the search
space is feasible. It is greatly reduced by other constraints, such as
those we express in terms of what may constitute a syllable. Some of
these constraints differ from one language to another, and are learned
from "the way people do it" here; others are universal, and reflect
physical (physiological, acoustic) properties of the environment.

( Tom Bourbon (930529.0310))--

I asked how would one model repeating as opposed to imitating. What I am
interested in, of course, is the difference between repeating a word and
imitating a person's pronunciation of sounds that we did not recognize as
a word, or where we choose to focus on the pronunciation independently of
its recognizability (and repeatability) as a word. We can't model that,
so we start with something simpler. For imitation, Tom says:

There must be something to imitate.

and introduces a triangular waveform for tracking, with a sketch of
how it is generated and how it is tracked.

       /\ /\ /\
     / \/ \/ \, and so on.

This is a standard, undisturbed pursuit tracking task, which PCT models
with great accuracy. The particpant can make the cursor accurately track
the target, which is to say, the positions of the cursor imitate those of the
target (incidentally, so, too, does the waveform of the participant's
hand movements -- the person's actions).

Then for repetition, Tom says:

For Repeating, there must be something to imitate.

Right away, we're in trouble. I've been trying to show how repetition is
different from imitation. Tom is saying repetition is simply imitation
of a memory:

Let that be the
person's remembered positions of the cursor as a function of time during
the Repeating task. (The target is not on the screen.) Now the model step
for the person changes, because the source of the reference signal is
different, but the cursor is still determined as it was before.

For either Repeating or Imitating, the same actions will occur. So, too.
will the same positions of the cursor as a function of time.

To get at what I am looking for, we have to introduce some variation.
Suppose person B's notion of the pattern is this:

_ _ _

\| \| \

Person A's notion is roughly as above.

/\ /\ /\
/ \/ \/ \

Suppose A and B recognize their
renditions as having the same intention--the same meaning. Let's say
that

A: B:
_ _ _ /\ /\ /\

\| \| \ / \/ \/ \

A recognizes B's production as a repetition (but not an imitation) of
A's, and B recognizes A's production as a repetition of B's.

Now, suppose A and B recognize the following pair of productions as
equivalent (as each one's repetition of the other's production):
A: B:
____ ____ ____ --- --- ---

   \| \| \ / \/ \/ \

And suppose they recognize the following as repetitions:

A: B:
_ _ _ ____ /\ /\ /\ ---

\| \| \| \ / \/ \/ \/ \

Finally, suppose there are two ways of repeating B's production of two
shorts and a long:

A: B:
_ _ ____ /\ /\ ---

\| \| \ / \/ \/ \

_________ ____

        \| \

In context of a following long, two shorts may be produced as a
"superlong" element.

This looks like it involves categorization. Phoneme recognition is
commonly taken to be categorial. We have discussed word perceptions as
event perceptions. That would mean that words are short, familiar
sequences of categories which may satisfy the input functions of category
recognizers. Is this sort of inter-level promiscuity OK?

Now we have two elements that are perhaps analogous to phonemes: the
single contrast of short element vs. long element. Abstracting from the
particular forms produced by A vs. those produced by B, we can focus
attention on "words" formed from these elements. Consider the following:

    .
    ..
    _
    ._
    .._
    _.
    _..
    _ _
    ...
    ..._

Relative duration is the means for contrasting . with _ in these "words".
Suppose there is a tendency to produce . at a higher pitch and _ at a
lower pitch. Among individuals in population X, this is just an
incidental byproduct of controlling a perception of relative
duration--something to do with the physiology of producing . elements.
But members of population Y over time begin to exaggerate this
characteristic. They do this to differentiate themselves as a group or
to indentify themselves as individuals as members of group Y as opposed
to group X. Conversely, individuals in group X over time begin to
perceive the tendency to produce . at a higher pitch than _ as a
disturbance, and they resist it. It is a disturbance because they don't
want to be identified as members of group Y. At some point in the
succession of generations, infants learning the dialect Y tend to take
relative pitch to be the feature distinguishing . from _. More exactly,
as they learn to say . and _ and ..._ (and to recognize these "words"),
they symbolize the contrasts between these "words" in terms of relative
pitch instead of relative duration. So in dialect Y we have | and _ and

_. Now "speakers" of the two dialects can still understand one

another--when an X individual hears |||_ from a Y individual, she
recognizes it as a repetition of what she would say as ..._, and vice
versa. And so on.

Rudimentary though this example may be, it does capture some
characteristics of language that we must learn to model.

Note that what is important is the words. The phonemes are important
only as means for differentiating one word from another that is not a
repetition of it. This is why they can vary so widely and the words that
they constitute can still be intelligible.

Does this help to clarify my question? It is a central fact about
language that when one person produces the same words as another person
she is not imitating the other person's words, she is repeating them.

Sorry this has been so long delayed. My next spare-time task is to try
to catch up to two weeks worth of CSG-l.

    Bruce
    bn@bbn.com