[From Bill Powers (931006.2100 MDT)]
Hal Pepinsky (931006) --
My theory says the components must be a surprise to affirm that
we have the power to learn what we do not already know and are
extending the diversity to which we accommodate.
OK, I'll accept this as a statement that follows from your
theory. Can you tell me how you arrived at this statement, so I
could apply the same reasoning independently and verify that I
arrive at the same conclusion?
···
--------------------------------------------------------------
Martin Taylor (931006.1730 EDT) --
It is easiest to contemplate ECSs in which the PIF is a
monotonic function of some space defined by "attribute signals"
(sensory signals or lower-level perceptual signals); but there
is no requirement that a PIF have this characteristic, and
indeed in some of his tutorial discussions recently (to Hal?)
Bill has been dealing with control in a world in which the
perceptual function is non-monotonic. The perceptual signal is
single-valued, but its value changes non- monotonically as a
function of any path within the space of attribute values.
Perceptual signals are always "single valued" in that they can
have only one value at a time. But they are not necessarily
single-valued in terms of the lower-level set of perceptions from
which they are derived. There is no need for the input function
to be monotonic. The perceptual signal itself is what the system
controls; it doesn't matter at all if that same value of
perceptual signal can be produced by many different arrangements
of the lower-level world. Just consider a perceptual function
that perceives the sine of the angle of rotation of a line
segment. The line might rotate 0.5 radians and produce a signal
of 70 units. Or the line might rotate 2*pi*n+0.5 radians and
produce exactly the same signal. In either case, the control
system could latch onto the line and control it so it produces 70
units of signal, ignoring the number of previous whole rotations.
The control system does not care what the objective situation is.
As long as there is a reachable region of negative feedback, a
control system does not care about local maxima or minima. It can
control quite well in the presence of either. It simply skips
past regions of positive feedback, automatically.
... there are no points in the attribute space that form a
local maximum or minimum. This is the form usually used in a
node of a multilayer perceptron, and is at the base of my
repeated claim that a four-layer HPCT structure could control
any temporal-spatial configuration whatever.
I have never seen a perceptron described that could control
anything. For control you need a comparator and an output
function, and an external feedback loop. The perceptron, at best,
can describe an input function.
Also, all the perceptrons I have seen diagramed produce only
categorical on-off outputs: this output rather than that one, and
only one value of it: "on.". A perceptron couldn't be part of a
pursuit tracking control system, as far as I know, because it
can't represent the continuously-variable state of the target-
cursor separation. It could, if appropriately trained, distingish
"separated" from "not separated", but it could not produce an
output signal that has a smooth relationship to the degree of
separation. At least I have never seen a perceptron in which a
given output is a continuous function of the state of the inputs,
however many layers intervene.
It [a radial-basis function] represents a single-peaked
mountain whose value is maximum at some point and grades down
to zero at infinity in every direction in the attribute space.
Such a function can be a "template" in a control system.
The meaning of "template" seems to have drifted since I first
heard it applied to perception. A template (the dictionary seems
to prefer "templet"), before the term was adopted into
psychology, was a cutout form or mold that was fitted against a
workpiece to identify points that deviated from the desired form.
The template was literally the negative of the desired shape,
point by point. To determine whether a template fits the
workpiece, it is necessary that every point of contact be somehow
examined. A template has to contain exactly as much detail as the
form being compared with it. So I don't see your analogy.
From your description, I would think that a radial basis function
is simply an input function of the kind I have always imagined,
at least for lower-level systems. It responds maximally to inputs
occuring in a certain relationship to each other, and less for
all other relationships. In a two-variable space, it would be a
mountain, as you say. In a space of more dimensions, it would be
a scalar mountain in a hyperspace. Is there some other aspect of
radial basis functions that makes them different from this
description?
A radial basis function cannot be a PIF in a single ECS that
controls its perceptual signal, because there is no direction
in the attribute space that is more likely than another to
increase the value of the perceptual signal.
A radial basis function defines a _kind_ of perception. Control
systems do not control the _kind_ of perception, but the amount.
With a given radial basis function as an input function, the
amount of perception can be varied by altering the magnitudes of
all the inputs while keeping them in a constant relationship as
defined by the input function. Of course if you think of the
inputs to a radial basis function as being on-off variables, the
peak of the mountain has a fixed height. But if the inputs are
continuous variables, then the height of the mountain is also a
continous variable and can be controlled in the usual way.
With continuous input variables, a radial basis function is
nothing but a projection of an input vector onto a line in the
space defined by the input weights, and that is how my model
works (at the lowest levels). If the input vector is aligned with
that line, variations in the magnitude of the vector are
maximally represented as variations in the projected line length,
the scalar perception, the height of the mountain. If the input
vector is at an angle to the line, then for a constant input
vector length, the projected length falls off in all directions
not lying on that line, just like a radial basis function. See
BCP, p. 106.
A control system is concerned only with controlling the height of
the mountain. If the input vector is not aligned so as to create
the maximum perceptual signal, the outputs simply produce a
larger effect until the magnitude of the perceptual signal does
match the reference signal. Barring physical limits, the control
system is not concerned with how much output is needed to produce
the desired amount of perceptual signal. It will produce the
amount that cancels the error as nearly as possible. Of course
the loop gain falls off if the input vector is not reasonably
close to being optimally aimed. But that can be taken care of
with multiple control systems: see Rick's spreadsheet program. If
the maximum loop gain is high enough, the input vector can be
misaligned by a large amount without appreciably affecting the
quality of control.
Don't make the mistake of Fowler et. al, objecting that if the
input vector is misaligned, the control system has no way of
knowing which way it is off and realigning it. The control system
is not concerned with the alignment -- with specific values of
individual lower-level perceptions -- but only with getting a
specified amount of signal out of the input function.
Odd: you agree with my conclusion, but for what I see as
unnecessarily complicated reasons.
------------------------------------------
Haven't you always said that sequences are serial lists of
categories? Or is my memory totally screwed up?
Yes (I mean no, it's not screwed up), but don't forget that at
one time I had sequences and events combined at one level. When I
separated them, sequence got bumped to the level above
categories, while events stayed just above transitions.
I thought events were temporal patterns in a continuum space,
whereas sequences were, as you say, lists of discrete items.
Wasn't that always your view?
Yes, but now I have been able to express the unspoken (by me)
understanding that events also have the character of being made
of multiple simultaneous lower-level perceptions, each appearing
and disappearing during the event at the proper time but
appearing simultaneously with other necessary inputs. This is not
a characteristic of a single sequence perception, which is made
of serially-ordered elements.
<If I am right, how can you not get to categories when you deal
with sequences?
I'm trying to suggest that word-perceptions can be created at the
event level, without requiring either categories or sequences.
I think you're forgetting that a word all by itself is not a
category, but simply a word, an event. It isn't the name of
anything until it has been used to _indicate_ a category of other
perceptions, perhaps by being included as a member of the same
category.
Also, without categories, how do you handle my "mug"->"bug"
example? Wouldn't you have to acknowledge the intermediate
insect that delivers coffee to your mouth?
No more than I would have to acknowledge a word halfway between
muv and buv. Even if the words muv and buv have not been assigned
to indicate any categories of perceptions, they can still exist
as words. By themselves they don't imply any meanings. Words
themselves do not "have" meanings, and they are not categories.
They point to categories of meanings, usually. Sometimes they
point to specific experiences: my nose. But it is difficult to
use words to indicate specific experiences.
------------------------------
Aside from that, there's another way of looking at Bruce's
parallel streams (not necessarily the right way, but another
way):
[the diagram] can be seen not as parallel streams of events,
but as a single stream of configurations:
Config. "b|b|i|i|n|n"
======== ==|=|=|=|=|===
Lips 1|1| | | | lips close, build pressure
Larynx 1|1|1|1|1|1 larynx sounds throughout
Tongue | |1|1|2|2 tongue forms ih and nn
Velum | | | |1|1 velum opens for nasal sound
> > > > >
So can a proof of a trigonometric identity: after all, writing it
out is just one configuration of pencil and paper after another.
Does this mean that the configuration level can prove
trigonometric identities? What you see depends on the level from
which you view it. But if you view it from the wrong level,
you'll see either too much or too little: in this case, too
little.
First, if nothing but configurations are present, then there is
no "stream." Streamness is at least a transition-level
perception. Second, this is not a simple stream but one with a
specific internal structure, the elements of which can change
independently of each other to form recogizeably different
perceptions. One such structure is the sound we spell "bin." No
structure is perceived at the configuration level, or at the
transition level. Those levels perceive only configurations, or
only transitions. That is why we need the event level, however we
imagine the lower-level perceptions are combined to produce it.
The event-level is where the structure is perceived and
controlled.
All of this gives us a single word, ready to be assigned
meanings. The word is not experienced as a sequence, but as a
single thing: an event. An event can be part of a sequence, but
is not itself perceived as a sequence. If you slow it down
(reduce the magnitudes of the transition-signals), your sequence
level can perhaps recognize sequentiality in it, but that is not
necessary.
Perhaps a better linguistic term for event would be morpheme or
syllable. Certain words are indeed morphemes or syllables which
have no meaning until one is assigned. But the picture gets
confusing when we start constructing words out of parts of other
words, parts which have already been assigned meanings. I'll drop
that there.
-------------------------------------
On parallelism, I have no idea why you say (to me, out of the
blue) that there can be no more than one sequence perception
being controlled at any time.
I never said anything even close to that. Of course there can be
different sequence-controllers operating at once, in parallel.
What I said was that _a_ sequence consists of a series of
elements -- categories or below -- occurring one at a time. If
you analyze any particular sequence-perception into elements
which are not themselves sequences, you find categories (at the
highest), and only one of them at a time. A simple example of a
particular sequence is a sentence. Another is a list. My example
was intended to show that we cannot construct _a sentence_ or _a
list_ using more than one category-word at a time.
And it is nonsense (sorry, Bill, but that's the way it seems to
me) to say
The parallelism is a feature of analog systems;
digital systems work strictly in one-at-a-time sequence;
Only a von Neumann digital computer is so restricted, because
it has at its heart a single, scalar, program address register.
In the digital world, there is no counterpart of, for example,
the analog computation of the sum of two variables, in which the
two variables SIMULTANEOUSLY affect the magnitude of the sum,
neither of them affecting it before the other. In the digital
world, whether you use serial or parallel computation, you can
only load one variable into a register and then add the other
variable to it, finally producing the sum. On the way, you create
a situation that never appears in the analog computation: the
"sum" of one variable. In fact, if you do a running sum of two
changing digital variables in an accumulator, the sum is wrong at
least half of the time, isn't it? When you add variable A to the
register (you can't simultaneously add the value of B to the same
register), the sum is wrong by the missing value of B. Parallel
computing doesn't solve that problem, not to mention other
problems like the way addition actually takes place in a digital
computer, with propagation of carries and so forth.
An analog computer can handle parallel inputs literally
simultaneously, and do so in a system of the simplest possible
construction (that is, without having to duplicate the computer).
I think that analog and digital computing are fundamentally
different processes, as different as digitally computing the
trajectory of a ball and throwing a ball into the air and
observing its trajectory. The "degenerate" analog computation of
a physical process is the process itself. There is no
corresponding degenerate case for a digital computation -- except
of a process in another digital computer.
-----------------------------------
If something is impossible, then counterexamples should not
exist, right? Early speech recognizers, for isolated words, did
exactly what you say is impossible. The incoming sound wave,
possibly filtered by a crude filter bank, was compared with a
set of templates, one for each word that might be recognized.
But that wasn't comparing words with words; it was comparing
waveforms with waveforms: detailed magnitudes with detailed
magnitudes. A word is not perceived as a waveform, but as a
single scalar quantity that stands for presence of a particular
waveform. It does not itself have a waveform. So where are you
going to get this signal, if not by some computing process that
derives it from the waveform? By the time a word is perceived,
all that distinguishes it from other words is that the signal
came from THIS input function and not any other. Do you believe
that you can get the signal first, without the computation that
derives it from the more detailed perceptions?
--------------------------------------------------------------
... it is my opinion/impression that exactly the same thing
happens as a child learns to understand and to speak language.
Initially only the large sound patterns are discriminated from
one another and controlled for in production. This is
difficult and inefficient, and soon enough the child learns to
perceive common sub-structures within the various sound
patterns it is exposed to.
My impression is the opposite: first the child learns to perceive
and control just one or two small and simple sounds, which are
used for everything. Gradually more sounds are heard and brought
under control. The ability to hear and say "dot" is not lost;
"dot" does not break up into a lot of smaller patterns. In fact I
can't imagine what you mean by a "large sound pattern." I can
visualize a big blob of something, but I can't translate that
into anything auditory I've ever heard a baby produce -- nothing
that relates to language, that is.
It comes back to a discussion we had long ago, perhaps two
years, in which you asked something like "How can you insert an
ECS between existing levels; I just can't see how you make it
work."
Still can't. It still seems to me that you're building up an
intellectual scheme without asking whether it's physically or
phenomenologically feasible, or observationally justified.
I still think that such insertion is the major way new ECSs are
built into the hierarchy.
I want to know how you can connect sensory nerves (present from
the start) directly to, say, the cortical functions that compute
spatial relationships or logical functions, when at no time do
such direct connections ever exist, to my knowledge. And I want
to know how logical functions and spatial functions can work
before there is anything to be logical about and before there is
any perceived space. What would they do?
And I want to know why babies start out learning to control their
muscles, then their limbs, then their voices, then other objects,
then patterns of movement of objects, then relationships among
objects, and so on, and show no signs of being able to control
the highest-level functions, like logic, until years into their
lives. The evidence seems to be all against your scheme, and all
in favor of mine in which the levels of control develop from the
bottom up, with the highest functions appearing last, not first.
It seems to me that by its very nature, your scheme is
untestable. If the highest level exists first, then, without any
way of converting high-level processes into organized action, we
would never see any evidence of the highest-level functions until
finally the spinal control systems were organized. Then the whole
hierarchy would spring into view at once.
But then, I also still think that the intrinsic variables are
the top-level references, not just in a separate reorganization
hierarchy.
In my scheme, the top-level references are, at any time, those of
the highest level in existence, so we agree about that. However,
in my scheme that level starts out with spinal systems, and as
new levels are added the "highest" level keeps changing until
it's the highest level we ever develop. And as we require some
explanation as to why these references are set as they are,
whatever the stage of development, we need a system that can
relate learning to the physical state of the organism at all
stages of development. But you know my argument.
--------------------------------
Rick said:
For example,
by controlling for the perception "pin", the person is
implicitly contrasting "pin" with "bin" and every other word
for which "pin" might be mistaken.
and you said:
It's hard to know what might usefully help to show you how
wrong this paragraph is, other than to ask you to listen
critically to (preferably to tape) conversations.
I think this is a missed communication. Rick was pointing out the
difficulty in the concept of contrast, not defending it. There is
potentially a contrast between every word and every other word.
If words are distinguished by contrasts, then there is no a
priori way to say that one word should be tested for contrast
only against a small set of other words: those other words can't
even be identified as words except by contrast with the starting
word. So every word that is known has to be tested for contrast
with every other word, before the words can even be distinguished
from each other (according to the contrast hypothesis).
Oh, well. I'm getting tired.
-------------------------------------------------------------
It is true that if a word is recognized as belonging to a
particular category, it thereby is contrasted to all other
words.
Logically, yes. Categorically, no. Category perception simply
reports that a member of a category is present. It carries no
information about what categories are therefore not present. That
kind of deduction is a function of the logic level, not the
category level.
------------------------------------------------------------
How does a person know which word a word might be mistaken for?
I'd say, by mistaking it and discovering the mistake. And whether
a mistake occurs depends on who is listening as much as on what
is said.
----------------------------------------------------------
I give up. Good night.
Best to all,
Bill P.