Theory; language; levels; blah blah blah ...

[From Bill Powers (931006.2100 MDT)]

Hal Pepinsky (931006) --

My theory says the components must be a surprise to affirm that
we have the power to learn what we do not already know and are
extending the diversity to which we accommodate.

OK, I'll accept this as a statement that follows from your
theory. Can you tell me how you arrived at this statement, so I
could apply the same reasoning independently and verify that I
arrive at the same conclusion?

···

--------------------------------------------------------------
Martin Taylor (931006.1730 EDT) --

It is easiest to contemplate ECSs in which the PIF is a
monotonic function of some space defined by "attribute signals"
(sensory signals or lower-level perceptual signals); but there
is no requirement that a PIF have this characteristic, and
indeed in some of his tutorial discussions recently (to Hal?)
Bill has been dealing with control in a world in which the
perceptual function is non-monotonic. The perceptual signal is
single-valued, but its value changes non- monotonically as a
function of any path within the space of attribute values.

Perceptual signals are always "single valued" in that they can
have only one value at a time. But they are not necessarily
single-valued in terms of the lower-level set of perceptions from
which they are derived. There is no need for the input function
to be monotonic. The perceptual signal itself is what the system
controls; it doesn't matter at all if that same value of
perceptual signal can be produced by many different arrangements
of the lower-level world. Just consider a perceptual function
that perceives the sine of the angle of rotation of a line
segment. The line might rotate 0.5 radians and produce a signal
of 70 units. Or the line might rotate 2*pi*n+0.5 radians and
produce exactly the same signal. In either case, the control
system could latch onto the line and control it so it produces 70
units of signal, ignoring the number of previous whole rotations.
The control system does not care what the objective situation is.

As long as there is a reachable region of negative feedback, a
control system does not care about local maxima or minima. It can
control quite well in the presence of either. It simply skips
past regions of positive feedback, automatically.

... there are no points in the attribute space that form a
local maximum or minimum. This is the form usually used in a
node of a multilayer perceptron, and is at the base of my
repeated claim that a four-layer HPCT structure could control
any temporal-spatial configuration whatever.

I have never seen a perceptron described that could control
anything. For control you need a comparator and an output
function, and an external feedback loop. The perceptron, at best,
can describe an input function.

Also, all the perceptrons I have seen diagramed produce only
categorical on-off outputs: this output rather than that one, and
only one value of it: "on.". A perceptron couldn't be part of a
pursuit tracking control system, as far as I know, because it
can't represent the continuously-variable state of the target-
cursor separation. It could, if appropriately trained, distingish
"separated" from "not separated", but it could not produce an
output signal that has a smooth relationship to the degree of
separation. At least I have never seen a perceptron in which a
given output is a continuous function of the state of the inputs,
however many layers intervene.

It [a radial-basis function] represents a single-peaked
mountain whose value is maximum at some point and grades down
to zero at infinity in every direction in the attribute space.
Such a function can be a "template" in a control system.

The meaning of "template" seems to have drifted since I first
heard it applied to perception. A template (the dictionary seems
to prefer "templet"), before the term was adopted into
psychology, was a cutout form or mold that was fitted against a
workpiece to identify points that deviated from the desired form.
The template was literally the negative of the desired shape,
point by point. To determine whether a template fits the
workpiece, it is necessary that every point of contact be somehow
examined. A template has to contain exactly as much detail as the
form being compared with it. So I don't see your analogy.

From your description, I would think that a radial basis function

is simply an input function of the kind I have always imagined,
at least for lower-level systems. It responds maximally to inputs
occuring in a certain relationship to each other, and less for
all other relationships. In a two-variable space, it would be a
mountain, as you say. In a space of more dimensions, it would be
a scalar mountain in a hyperspace. Is there some other aspect of
radial basis functions that makes them different from this
description?

A radial basis function cannot be a PIF in a single ECS that
controls its perceptual signal, because there is no direction
in the attribute space that is more likely than another to
increase the value of the perceptual signal.

A radial basis function defines a _kind_ of perception. Control
systems do not control the _kind_ of perception, but the amount.
With a given radial basis function as an input function, the
amount of perception can be varied by altering the magnitudes of
all the inputs while keeping them in a constant relationship as
defined by the input function. Of course if you think of the
inputs to a radial basis function as being on-off variables, the
peak of the mountain has a fixed height. But if the inputs are
continuous variables, then the height of the mountain is also a
continous variable and can be controlled in the usual way.

With continuous input variables, a radial basis function is
nothing but a projection of an input vector onto a line in the
space defined by the input weights, and that is how my model
works (at the lowest levels). If the input vector is aligned with
that line, variations in the magnitude of the vector are
maximally represented as variations in the projected line length,
the scalar perception, the height of the mountain. If the input
vector is at an angle to the line, then for a constant input
vector length, the projected length falls off in all directions
not lying on that line, just like a radial basis function. See
BCP, p. 106.

A control system is concerned only with controlling the height of
the mountain. If the input vector is not aligned so as to create
the maximum perceptual signal, the outputs simply produce a
larger effect until the magnitude of the perceptual signal does
match the reference signal. Barring physical limits, the control
system is not concerned with how much output is needed to produce
the desired amount of perceptual signal. It will produce the
amount that cancels the error as nearly as possible. Of course
the loop gain falls off if the input vector is not reasonably
close to being optimally aimed. But that can be taken care of
with multiple control systems: see Rick's spreadsheet program. If
the maximum loop gain is high enough, the input vector can be
misaligned by a large amount without appreciably affecting the
quality of control.

Don't make the mistake of Fowler et. al, objecting that if the
input vector is misaligned, the control system has no way of
knowing which way it is off and realigning it. The control system
is not concerned with the alignment -- with specific values of
individual lower-level perceptions -- but only with getting a
specified amount of signal out of the input function.

Odd: you agree with my conclusion, but for what I see as
unnecessarily complicated reasons.
------------------------------------------

Haven't you always said that sequences are serial lists of
categories? Or is my memory totally screwed up?

Yes (I mean no, it's not screwed up), but don't forget that at
one time I had sequences and events combined at one level. When I
separated them, sequence got bumped to the level above
categories, while events stayed just above transitions.

I thought events were temporal patterns in a continuum space,
whereas sequences were, as you say, lists of discrete items.
Wasn't that always your view?

Yes, but now I have been able to express the unspoken (by me)
understanding that events also have the character of being made
of multiple simultaneous lower-level perceptions, each appearing
and disappearing during the event at the proper time but
appearing simultaneously with other necessary inputs. This is not
a characteristic of a single sequence perception, which is made
of serially-ordered elements.

<If I am right, how can you not get to categories when you deal

with sequences?

I'm trying to suggest that word-perceptions can be created at the
event level, without requiring either categories or sequences.

I think you're forgetting that a word all by itself is not a
category, but simply a word, an event. It isn't the name of
anything until it has been used to _indicate_ a category of other
perceptions, perhaps by being included as a member of the same
category.

Also, without categories, how do you handle my "mug"->"bug"
example? Wouldn't you have to acknowledge the intermediate
insect that delivers coffee to your mouth?

No more than I would have to acknowledge a word halfway between
muv and buv. Even if the words muv and buv have not been assigned
to indicate any categories of perceptions, they can still exist
as words. By themselves they don't imply any meanings. Words
themselves do not "have" meanings, and they are not categories.
They point to categories of meanings, usually. Sometimes they
point to specific experiences: my nose. But it is difficult to
use words to indicate specific experiences.
------------------------------

Aside from that, there's another way of looking at Bruce's
parallel streams (not necessarily the right way, but another
way):

[the diagram] can be seen not as parallel streams of events,
but as a single stream of configurations:

Config. "b|b|i|i|n|n"
======== ==|=|=|=|=|===
Lips 1|1| | | | lips close, build pressure
Larynx 1|1|1|1|1|1 larynx sounds throughout
Tongue | |1|1|2|2 tongue forms ih and nn
Velum | | | |1|1 velum opens for nasal sound
           > > > > >

So can a proof of a trigonometric identity: after all, writing it
out is just one configuration of pencil and paper after another.
Does this mean that the configuration level can prove
trigonometric identities? What you see depends on the level from
which you view it. But if you view it from the wrong level,
you'll see either too much or too little: in this case, too
little.

First, if nothing but configurations are present, then there is
no "stream." Streamness is at least a transition-level
perception. Second, this is not a simple stream but one with a
specific internal structure, the elements of which can change
independently of each other to form recogizeably different
perceptions. One such structure is the sound we spell "bin." No
structure is perceived at the configuration level, or at the
transition level. Those levels perceive only configurations, or
only transitions. That is why we need the event level, however we
imagine the lower-level perceptions are combined to produce it.
The event-level is where the structure is perceived and
controlled.

All of this gives us a single word, ready to be assigned
meanings. The word is not experienced as a sequence, but as a
single thing: an event. An event can be part of a sequence, but
is not itself perceived as a sequence. If you slow it down
(reduce the magnitudes of the transition-signals), your sequence
level can perhaps recognize sequentiality in it, but that is not
necessary.

Perhaps a better linguistic term for event would be morpheme or
syllable. Certain words are indeed morphemes or syllables which
have no meaning until one is assigned. But the picture gets
confusing when we start constructing words out of parts of other
words, parts which have already been assigned meanings. I'll drop
that there.
-------------------------------------

On parallelism, I have no idea why you say (to me, out of the
blue) that there can be no more than one sequence perception
being controlled at any time.

I never said anything even close to that. Of course there can be
different sequence-controllers operating at once, in parallel.
What I said was that _a_ sequence consists of a series of
elements -- categories or below -- occurring one at a time. If
you analyze any particular sequence-perception into elements
which are not themselves sequences, you find categories (at the
highest), and only one of them at a time. A simple example of a
particular sequence is a sentence. Another is a list. My example
was intended to show that we cannot construct _a sentence_ or _a
list_ using more than one category-word at a time.

And it is nonsense (sorry, Bill, but that's the way it seems to
me) to say

The parallelism is a feature of analog systems;
digital systems work strictly in one-at-a-time sequence;

Only a von Neumann digital computer is so restricted, because
it has at its heart a single, scalar, program address register.

In the digital world, there is no counterpart of, for example,
the analog computation of the sum of two variables, in which the
two variables SIMULTANEOUSLY affect the magnitude of the sum,
neither of them affecting it before the other. In the digital
world, whether you use serial or parallel computation, you can
only load one variable into a register and then add the other
variable to it, finally producing the sum. On the way, you create
a situation that never appears in the analog computation: the
"sum" of one variable. In fact, if you do a running sum of two
changing digital variables in an accumulator, the sum is wrong at
least half of the time, isn't it? When you add variable A to the
register (you can't simultaneously add the value of B to the same
register), the sum is wrong by the missing value of B. Parallel
computing doesn't solve that problem, not to mention other
problems like the way addition actually takes place in a digital
computer, with propagation of carries and so forth.

An analog computer can handle parallel inputs literally
simultaneously, and do so in a system of the simplest possible
construction (that is, without having to duplicate the computer).
I think that analog and digital computing are fundamentally
different processes, as different as digitally computing the
trajectory of a ball and throwing a ball into the air and
observing its trajectory. The "degenerate" analog computation of
a physical process is the process itself. There is no
corresponding degenerate case for a digital computation -- except
of a process in another digital computer.
-----------------------------------

If something is impossible, then counterexamples should not
exist, right? Early speech recognizers, for isolated words, did
exactly what you say is impossible. The incoming sound wave,
possibly filtered by a crude filter bank, was compared with a
set of templates, one for each word that might be recognized.

But that wasn't comparing words with words; it was comparing
waveforms with waveforms: detailed magnitudes with detailed
magnitudes. A word is not perceived as a waveform, but as a
single scalar quantity that stands for presence of a particular
waveform. It does not itself have a waveform. So where are you
going to get this signal, if not by some computing process that
derives it from the waveform? By the time a word is perceived,
all that distinguishes it from other words is that the signal
came from THIS input function and not any other. Do you believe
that you can get the signal first, without the computation that
derives it from the more detailed perceptions?
--------------------------------------------------------------

... it is my opinion/impression that exactly the same thing
happens as a child learns to understand and to speak language.
Initially only the large sound patterns are discriminated from
one another and controlled for in production. This is
difficult and inefficient, and soon enough the child learns to
perceive common sub-structures within the various sound
patterns it is exposed to.

My impression is the opposite: first the child learns to perceive
and control just one or two small and simple sounds, which are
used for everything. Gradually more sounds are heard and brought
under control. The ability to hear and say "dot" is not lost;
"dot" does not break up into a lot of smaller patterns. In fact I
can't imagine what you mean by a "large sound pattern." I can
visualize a big blob of something, but I can't translate that
into anything auditory I've ever heard a baby produce -- nothing
that relates to language, that is.

It comes back to a discussion we had long ago, perhaps two
years, in which you asked something like "How can you insert an
ECS between existing levels; I just can't see how you make it
work."

Still can't. It still seems to me that you're building up an
intellectual scheme without asking whether it's physically or
phenomenologically feasible, or observationally justified.

I still think that such insertion is the major way new ECSs are
built into the hierarchy.

I want to know how you can connect sensory nerves (present from
the start) directly to, say, the cortical functions that compute
spatial relationships or logical functions, when at no time do
such direct connections ever exist, to my knowledge. And I want
to know how logical functions and spatial functions can work
before there is anything to be logical about and before there is
any perceived space. What would they do?

And I want to know why babies start out learning to control their
muscles, then their limbs, then their voices, then other objects,
then patterns of movement of objects, then relationships among
objects, and so on, and show no signs of being able to control
the highest-level functions, like logic, until years into their
lives. The evidence seems to be all against your scheme, and all
in favor of mine in which the levels of control develop from the
bottom up, with the highest functions appearing last, not first.

It seems to me that by its very nature, your scheme is
untestable. If the highest level exists first, then, without any
way of converting high-level processes into organized action, we
would never see any evidence of the highest-level functions until
finally the spinal control systems were organized. Then the whole
hierarchy would spring into view at once.

But then, I also still think that the intrinsic variables are
the top-level references, not just in a separate reorganization
hierarchy.

In my scheme, the top-level references are, at any time, those of
the highest level in existence, so we agree about that. However,
in my scheme that level starts out with spinal systems, and as
new levels are added the "highest" level keeps changing until
it's the highest level we ever develop. And as we require some
explanation as to why these references are set as they are,
whatever the stage of development, we need a system that can
relate learning to the physical state of the organism at all
stages of development. But you know my argument.
--------------------------------

Rick said:

For example,
by controlling for the perception "pin", the person is
implicitly contrasting "pin" with "bin" and every other word
for which "pin" might be mistaken.

and you said:

It's hard to know what might usefully help to show you how
wrong this paragraph is, other than to ask you to listen
critically to (preferably to tape) conversations.

I think this is a missed communication. Rick was pointing out the
difficulty in the concept of contrast, not defending it. There is
potentially a contrast between every word and every other word.
If words are distinguished by contrasts, then there is no a
priori way to say that one word should be tested for contrast
only against a small set of other words: those other words can't
even be identified as words except by contrast with the starting
word. So every word that is known has to be tested for contrast
with every other word, before the words can even be distinguished
from each other (according to the contrast hypothesis).

Oh, well. I'm getting tired.
-------------------------------------------------------------

It is true that if a word is recognized as belonging to a
particular category, it thereby is contrasted to all other
words.

Logically, yes. Categorically, no. Category perception simply
reports that a member of a category is present. It carries no
information about what categories are therefore not present. That
kind of deduction is a function of the logic level, not the
category level.
------------------------------------------------------------

How does a person know which word a word might be mistaken for?

I'd say, by mistaking it and discovering the mistake. And whether
a mistake occurs depends on who is listening as much as on what
is said.
----------------------------------------------------------
I give up. Good night.

Best to all,
Bill P.

[Martin Taylor 941007 12:00]
(Bill Powers 931006.2100)

Oh, well. I'm getting tired.

Not surprised, after writing a posting of nearly 450 lines, especially since
most of it seems to be based on mutual misunderstanding rather than on
differences of opinion.

Let's try for some clarification, though I suspect my methods of clarification
are due for reorganization, since they usually seem to have an opposite
effect!

Perceptual signals are always "single valued" in that they can
have only one value at a time. But they are not necessarily
single-valued in terms of the lower-level set of perceptions from
which they are derived. There is no need for the input function
to be monotonic. The perceptual signal itself is what the system
controls; it doesn't matter at all if that same value of
perceptual signal can be produced by many different arrangements
of the lower-level world. ... [and so forth]

Yes, that's essentially what I was trying to remind people, since much
of the tutorial discussion had carried (to me) the implication that
the function was monotonic in the input space. The point was to show
that templates, in the form of radial basis functions, are a perfectly
ordinary way to develop a perceptual signal.

As long as there is a reachable region of negative feedback, a
control system does not care about local maxima or minima. It can
control quite well in the presence of either. It simply skips
past regions of positive feedback, automatically.

There is a problem if the value of the function goes to a constant value
at infinity in all directions. In the case of a template match, that
value is zero. In such a case, if the input has a random (mutidimensional)
value, there is an equal probability for all outputs that the feedback
will be positive. If it is, and the function surface has a single peak
as it does with a template match, then the "skip past regions of positive
feedback" goes to infinity. The input space is divided into one region
of negative feedback and one region of positive feedback.

I have never seen a perceptron described that could control
anything. For control you need a comparator and an output
function, and an external feedback loop. The perceptron, at best,
can describe an input function.

Of course. Who would consider saying otherwise? My oft-repeated statement
is that the simplest PIFs, connected in the "classic" Powers manner, form
exactly a multilayer perceptron. The consequence is that a hierarchic
control system of the simplest form can control perceptions of ANY
configuration, no matter how disconnected instances of that configuration
may be within the attribute space. What I mean by "simplest" is that
the PIF simply forms a weighted sum of its inputs, and saturates if the
sum is too large.

To identify the MLP within a control hierarchy is to give a foundation
for the claim of generality for the ability to control. I know you don't
like arguments of plausibility, and quite properly favour demonstrations
that systems really work. But you often make equally valid/invalid claims
of implausibility for things that do work, one of which is the subject
of discussion later in your posting. Personally, I prefer to deal with
systems that have prior claims to plausibility than with those that seem
initially to be implausible. (Aside: perhaps that's one reason why PCT
doesn't catch on faster; one of my reviews said that I should at least
use concepts that are "biologically plausible", to which my comment to
the editor was that PCT was not biologically plausible, but biologically
necessary. Other people don't see it that way.)

To identify PIFs that have been found to be more powerful in what we might
call "passive" perception is to give the control hierarchy a priori
plausibility in claims of even greater power than is developed by the MLP
structure. At the very least, a perceptual signal that has been shown to
be produced by a passive structure is one that is amenable to control if
that passive structure is used as a PIF in an ECS.

Also, all the perceptrons I have seen diagramed produce only
categorical on-off outputs: this output rather than that one, and
only one value of it: "on.". A perceptron couldn't be part of a
pursuit tracking control system, as far as I know, because it
can't represent the continuously-variable state of the target-
cursor separation.

That's a very particular form of perceptron. I don't think they occur
in many theories or demonstration systems. All the ones I ever consider
are continuous throughout. They NEVER give a saturated output from any
node, if they are working well, because a saturated node provides no
information to higher levels about changes in its inputs.

At least I have never seen a perceptron in which a
given output is a continuous function of the state of the inputs,
however many layers intervene.

Check out the "bible." Rumelhart and McLelland (Eds) Parallel Distributed
Processing, A Bradford Book, MIT Press, 1986. In it you will find a few
kinds of network that do use binary nodes, but not many, I think.

The meaning of "template" seems to have drifted since I first
heard it applied to perception.

Yes, that's true. Distributed processing has perhaps had something to do
with that drift. A template has come to mean an abstract ideal of a
pattern rather than a pattern described warts and all. It is no longer
true that:

The template was literally the negative of the desired shape,
point by point. To determine whether a template fits the
workpiece, it is necessary that every point of contact be somehow
examined. A template has to contain exactly as much detail as the
form being compared with it.

If you read "template" like that, it's no wonder you reacted strongly
against the use of the term. To make a perceptual signal out of such
a template, the "mountain" in the input space would be an infinitely
thin needle, and control would be impossible.

From your description, I would think that a radial basis function
is simply an input function of the kind I have always imagined,
at least for lower-level systems.

As I started to make this response, I didn't follow the point of your
critique. As I was writing it, I discovered your point, but it is not
dealing with the issue I was raising. It turns out there is a subtlety
I hadn't bargained on.

What you commonly draw for an input function looks something like this:

···

__----------
                     _-
                    -
perception -
                 _-
             __--
     _____---

                  input attribute

For such a PIF, if the perception is higher than the reference, the output
should lower the input. A radial basis function looks like:

                       __---__
                     _- -_
                  > - - <
perception - -
                 _- -_
             __-- --__
     _____--- ---_______

                  input attribute

Suppose the reference level for the perception is at the height shown by
the >< marks. If the input attribute is low, to the left of the diagram,
the output should raise it, but if the input attribute is high, the output
should lower it. A fixed connection between output and CEV will not work,
as it will in the upper diagram. Control would be possible only so long
as the disturbance kept the input attribute always on the same side of the
peak. The situation is very like that of the study in which the relation
between the joystick and cursor movement was reversed during the tracking
run. You need either a higher-level control system that can affect the
sign of the output function, or rapid reorganization, if control is to be
maintained for all disturbances.

Here's where the subtlety arises. It is perfectly possible to control
the kind of perception--to "make this part match that one"--by monotonic
control of enough individual dimensions, such as the part's length, the
size of the hole, and so forth. That's the power of the MLP based on
discriminant functions of the upper kind. You CAN perceive "part" as
a configuration based on a sufficient number of, and number of layers of,
PIFs of that kind. What I noted in the earlier posting was that in
"passive" perception networks, radial basis functions often permit
equal performance with far fewer nodes. The output of a RBF can be
characterized as "how like this pattern is that one" rather than "is
this bigger or smaller than that." Typically, the total input to an RBF
is normalized, so that the value of the output perceptual signal does not
depend on the absolute magnitudes of the input attributes (as you do with
your speech analysis). It depends only on the likeness between patterns.

A radial basis function defines a _kind_ of perception. Control
systems do not control the _kind_ of perception, but the amount.
With a given radial basis function as an input function, the
amount of perception can be varied by altering the magnitudes of
all the inputs while keeping them in a constant relationship as
defined by the input function.

There seems to be a slight contradiction implied between this and

As long as there is a reachable region of negative feedback, a
control system does not care about local maxima or minima. It can
control quite well in the presence of either. It simply skips
past regions of positive feedback, automatically.

I tend to accept this paragraph rather than the previously quoted one,
which is more restrictive.

Of course if you think of the
inputs to a radial basis function as being on-off variables, the
peak of the mountain has a fixed height. But if the inputs are
continuous variables, then the height of the mountain is also a
continous variable and can be controlled in the usual way.

and

A control system is concerned only with controlling the height of
the mountain.

It's not the height of the mountain that's at issue, it's how to get
to a certain height ON the mountain. The height OF the mountain is a
characteristic of the PIF, not of the input. And we know we can get
to a given height ON the mountain with enough levels of discriminator
functions, so ability is not the issue; efficiency is.

Don't make the mistake of Fowler et. al, objecting that if the
input vector is misaligned, the control system has no way of
knowing which way it is off and realigning it. The control system
is not concerned with the alignment -- with specific values of
individual lower-level perceptions -- but only with getting a
specified amount of signal out of the input function.

If the perception at issue is that of alignment, then THAT control
system is concerned with alignment :slight_smile:

Odd: you agree with my conclusion, but for what I see as
unnecessarily complicated reasons.

Well, maybe they are complicated. Much is, in a control hierarchy.
Unnecessarily? Perhaps, but I am not convinced. Maybe my explication
is more complicated than the concepts that underly it.

What I am trying to do is to focus on PERCEPTION, as the critical item
in perceptual control. Here are some precepts, which I think should not
be controversial:

(1) Any single-valued function of any attributes of the (presumed to exist)
real world that can be described can also be computed. The result of any
such function is, by definition, a "Complex Environmental Variable (CEV)."

(2) A CEV for which a computational procedure executable by a biological
system can be described is potentially available to perception.

(3) Only those CEVs that correlate with perceptual signals as in (2) can
be affected in a manner leading to perceptual control.

(4) Any CEV describable as a configuration in phase space can be the
correlate of a perceptual signal in a multilayer perceptron.

(5) From 3 and 4, any CEV describable as a configuration in phase space
can be affected in a manner leading to control of some correlated perceptual
signal.

(6) It has been found, quite apart from PCT studies, that multilayer networks
in which some of the layers consist of radial basis functions rather than
simple discriminator functions can provide the same perceptual signals more
efficiently than do the discriminator function networks.

(7) Control hierarchies that incorporate radial basis functions may be
expected sometimes to be more efficient than those based simply on
discriminators.

======================

I'm trying to suggest that word-perceptions can be created at the
event level, without requiring either categories or sequences.

Without agreeing to the label "word-perceptions", I can agree with
that. It's more or less what I was trying to propose in my Durango
presentation. A lot of what seems to be based on rules and logic,
founded on categories, is not necessary when controlling for the
production of "wordy-things-in-sensible-seeming-patterns."

I think you're forgetting that a word all by itself is not a
category, but simply a word, an event. It isn't the name of
anything until it has been used to _indicate_ a category of other
perceptions, perhaps by being included as a member of the same
category.

It isn't the NAME of anything, no. The argument is that it isn't even
a word as such until it has been located within the category of that
word. There may possibly be a conceptual issue here, but I rather
think it is more a question of how we understand the word "word."
I do not think that the strings issued by my Syntax Recognizer will
BE words, though an outside observer having the appropriate categories
might perceive them to be such (just as your CROWD program doesn't
generate arcs, though an outside observer perceives it to do so).

I am more inclined to assign words to the category level than I am to
assign the perceptions that they label to the category level. I would
invert your statement, as follows: A perception isn't a member of a
category until it has been labelled by a WORD of a particular category,
which thereby incorporates that perception as a member of a class of
perceptions that are included as members of the same category.

Also, without categories, how do you handle my "mug"->"bug"
example? Wouldn't you have to acknowledge the intermediate
insect that delivers coffee to your mouth?

No more than I would have to acknowledge a word halfway between
muv and buv.

So we come full circle. If you don't have to acknowledge a word
halfway between muv and buv, what does come between them? Nothing,
in my view, because m and b are themselves categories with a common
boundary. There is no continuum at THAT level for there to BE an
"inbetween." Inbetween happens at the lower levels of perception.

So can a proof of a trigonometric identity: after all, writing it
out is just one configuration of pencil and paper after another.
Does this mean that the configuration level can prove
trigonometric identities? What you see depends on the level from
which you view it. But if you view it from the wrong level,
you'll see either too much or too little: in this case, too
little.

Yes, agreed. I explicitly didn't say it was the right way to view
multiple streams, only that it was another way. In fact, I don't think
that the stream of configurations is the right way, "right" being the
way people do it. Parallel streams is much more likely. But there was
nothing in the situation forcing parallelism. That's all I was pointing
out.

On parallelism, I have no idea why you say (to me, out of the
blue) that there can be no more than one sequence perception
being controlled at any time.

I never said anything even close to that.

OK. I read what you didn't write. You do it to me
often enough, too. That happens to any reader and writer
more often than we would like, but it happens. Sorry.

What I said was that _a_ sequence consists of a series of
elements -- categories or below -- occurring one at a time.

If that was your point, who can disagree?

In the digital world, there is no counterpart of, for example,
the analog computation of the sum of two variables, in which the
two variables SIMULTANEOUSLY affect the magnitude of the sum,
neither of them affecting it before the other. In the digital
world, whether you use serial or parallel computation, you can
only load one variable into a register and then add the other
variable to it, finally producing the sum.

That's a description of a von Neumann machine, but it is overly limiting
to apply it to all digital machines. You can make digital adders that
behave the same way as you correctly say analogue adders tend to work.
The distinction is not between digital and analogue, but between systems
with a single degree-of-freedom bottleneck and systems without.

I'm not going to return to the discussion of inserting ECSs yet. I hope
that before next July we will have a proff-of-concept to show. If it
works, there will be more basis for discussion (also, if it doesn't).
But I do want to make it clear that this issue is not closed, any more
than is the information-theoretic issue.

But:

It seems to me that by its very nature, your scheme is
untestable. If the highest level exists first, then, without any
way of converting high-level processes into organized action, we
would never see any evidence of the highest-level functions until
finally the spinal control systems were organized. Then the whole
hierarchy would spring into view at once.

It's the "without any way" that is the sticker here. My wording would
have "with only ineffective ways," and then the rest would be softened.
The newborn controls its intrinsic chemical variables with very diffuse,
almost random actions. Yes, there are evolved, built-in, low-level
control systems, or at least muscular output functions. When the
intrinsic variables get out of whack, these operate to some extent
randomly, to some extent effectively (crying). You see the intrinsic
variables as not belonging in the main hierarchy. I see them at its top.
Either way, reorganization develops new ECSs and reconnects old ones
near the "waving arms and legs" systems. For you, these are at the (new)
top level. For me they are inserted. Let's leave the "possibility"
argument until there is something on which to base it. Our intuitions
differ, and there really isn't anything but new information or restructuring
of old information that can change that.

I think this is a missed communication. Rick was pointing out the
difficulty in the concept of contrast, not defending it. There is
potentially a contrast between every word and every other word.
If words are distinguished by contrasts, then there is no a
priori way to say that one word should be tested for contrast
only against a small set of other words: those other words can't
even be identified as words except by contrast with the starting
word. So every word that is known has to be tested for contrast
with every other word, before the words can even be distinguished
from each other (according to the contrast hypothesis).

I think I understood Rick, since you seem to be saying the same thing
as he did. But you didn't understand that I was trying to show Rick
the way out of that bind.

It is true that if a word is recognized as belonging to a
particular category, it thereby is contrasted to all other
words.

Logically, yes. Categorically, no. Category perception simply
reports that a member of a category is present. It carries no
information about what categories are therefore not present. That
kind of deduction is a function of the logic level, not the
category level.

Yes, I was just giving Rick a point, here. But if you want to take it
back, I don't mind.

How does a person know which word a word might be mistaken for?

I'd say, by mistaking it and discovering the mistake. And whether
a mistake occurs depends on who is listening as much as on what
is said.

I'd say by mistaking it in imagination and discovering the mistake.
You can't change your production of a word after it has been spoken,
and that's when you discover that the listener mistook it. When that
happens (and it does), you try again at some level of abstraction (maybe
using different words, or a different argument structure). In your
retry, you exaggerate the contrast with what it was mistaken for.
(We do that all the time in these discussions, exaggerating the
distinction between our "true" position and the one we are contrasting
it with). But before you speak, the contrast is tested in imagination.

I give up. Good night.

Hope you slept well. Stress (conflict) is tiring.

Martin