Research, PCT-style

Abbott_Bruce · June 26, 1999, 11:58am

[From Bruce Abbott (990626.0800 EST)]

Bill Powers (990625.1103 MDT) --

Bruce Abbott (990625.0755 EST)

In an EAB experiment, the motion of the lever is artificially given an
effect on food production that is always the same (given the schedule). But
in the natural world, that's not how it is.

I'm not convinced, even for the continuous case. To keep your car on the
road, you must turn the steering wheel to compensate for disturbances
produced by changes in the direction of the road, wind, irregularities in
the surface. As you perceive the vehicle to be drifting toward the left you
must turn the wheel to the right to compensate, perhaps even hold it there
if the disturbance persists (e.g., steady wind from the right). What you
learn is a relationship, not between wheel angle and direction, but between
turning the wheel and altering the rate of change of your perception of the
car's position with respect to the road. Increasing the wheel-angle in the
counterclockwise direction reduces the rate of rightward motion of the car
with respect to the road (through zero and into negative righward motion,
i.e., lefward motion); increasing the wheel-angle in the clockwise direction
has the opposite effect. This is true for all conditions except when the
front wheels loose traction, in which case control is lost. The amount of
wheel-angle required to prevent further drift will vary continuously with
the disturbance, but what the driver extracts from his or her experience at
the wheel is how turning the wheel affects the drift, independent of
disturbances. There is a criterion for success (the car goes where one
wants it to go, or at least the error is altered in the right direction) and
a particular relationship between what to do and what happens as a result
that can be learned on this basis.

As for the sort of relationships studied in the operant chamber, natural
examples abound. Earlier I mentioned the association the lion may discover
between the watering hole and prey. If the lion is looking for prey, then
finding it at the watering hole, the lion is encouraged to make more visits
there in the future. With further experience, the lion is likely to learn
which visitation times are more likely to pay off, and begin to visit more
frequently at those times than at others. Having set a reference for
visiting the watering hole, the lion will of course deal with any
disturbances (within its power to compensate for) such as irregularities in
the terrain. The chimpanzee learns that if it pushes a stick into an
opening in a termite mound and then withdraws it, the stick is likely to
have a few delicious termites clinging to it. When I'm hungry and at home,
I'm likely to check out the pantry or the refigerator for something to eat.
The reason I look in those places is that I've often found something to eat
there in the past. When I go outside, I reach for the doorknob and turn it
and while holding the knob turned, pull on the knob; the door swings open
and I am able to step through. If the knob won't turn I have to go to plan
B, but I grab the knob and turn it and pull on it because in the past such
behavioral acts have resulted in an open door at a time when an open door is
what I wanted to perceive. I can't think of many occasions in which
disturbances have altered this relationship.

"Sketchy as it is at this point, the proposal is a proposal about how
control systems come to be organized so as to produce perceptions that tend
to be accompanied or closely followed by the delivery of food."

What I am saying is that in a normal environment, there is no action-type
perception that is always accompanied or closely followed by a standard
amount of food. While such regularities can occur by accident or artifice,
the natural case is for the delivery of food to vary in part independently
of variations in acts (actions at the higher level, controlled consequences
of actions at the lower level). In short, disturbances in the real world
see to it that repeating a given act will not result in the same
consequence of that act (using the term act carefully in the sense you mean).

The action-type perception does not have to be accompanied or closely
followed by food on every occasion in order to be learned, as the
acquisition of behavioral acts on partial reinforcement schedules
demonstrates. Nor to these behavioral acts have to be followed by exactly
the same consequence on each occasion when the consequent event does occur,
for the behavioral act to be learned as a means of obtaining the
consequence. It shouldn't matter much to that lion whether an antelope is
found at the watering hole on one visit and a wildebeast or a rabbit on the
next, so long as the lion views any of the above as acceptible prey. When
the lion is hungry, one of this things it will now do with greater liklihood
is check out the watering hole.

And this is my main point. If, in the real world, different acts are
required in order to produce the same consequence, then there is no way the
consequence can be considered as reinforcing. If the consequence _did_ tend
to make the same causal act occur again, it would _decrease_ the
probability of occurrance of the same consequence. Only when disturbances
are prevented from happening could it seem that reinforcements strengthen
the behaviors that produce them.

I'm not advocating a traditional reinforcement view, which unfortunately
does not distinguish between two distinct mechanisms -- reorganization and
control (or perhaps I should say that it employs one mechanism to accomplish
both jobs, when two mechanisms are required). What I'm advocating is a
selection mechanism for building new control systems, one that is much more
efficient than random reorganization, when conditions are such that it can
work. I'm certainly not talking about any mechanism that "makes the same
causal act occur again" -- the animal learns to perform the act as a means
of obtaining the consequent event, it is not driven to produce those acts
willy-nilly.

The reason it seems this way is that reorganization continues until
behavior brings the "reinforcement" rate, or some internal effect of it, as
close as possible to an intrinsic reference level. Of course the behavior,
for example the pattern of lever-pressing, is linked to production of the
reinforcement through the schedule, so when we say "as close as possible"
we mean as close as possible given the nature of the schedule that makes
reinforcements depend on instrumental acts.

I am very skeptical of this proposal. I think that if the system had to
wait until some variable were brought near to an intrinsic reference level,
the animal would rarely if ever learn what to do. The delays involved are
generally of sufficient length that the animal would already be reorganizing
into something else before the error were reduced sufficiently to halt the
reselection process. Consider the hungry rat that bumps into the lever and
immediately receives a food pellet. (The rat has already been magazine
trained so that approaches the food cup and feeds as soon as it hears the
feeder operate). One pellet clearly is insufficient to bring the animal's
state of nutrition back near some intrinsic value, yet as I have observed,
sometimes this _single experience_ is sufficient to produce a quick return
to the lever and a second press, and with only a couple more such
experiences the animal's performance becomes organized into a sequence of
approach the lever, press/release, approach the cup, eat, approach the lever
. . . To my mind, random reorganization simply cannot account for this
dramatic change in behavior after the delivery of only a few pellets.

There is also the problem of how the reorganization system "knows" which
systems to reorganize and which to leave be. And why old organizations
often remain intact after they supposedly have been reorganized. But those
are other issues; for now I want to focus on this issue of efficiency.

Bruce

Bruce_Gregory8 · June 26, 1999, 3:11pm

[From Bruce Gregory (990626.1110 EDT)]

Bruce Abbott (990626.0800 EST)

I am very skeptical of this proposal. I think that if the system had to
wait until some variable were brought near to an intrinsic
reference level,
the animal would rarely if ever learn what to do. The delays involved are
generally of sufficient length that the animal would already be
reorganizing
into something else before the error were reduced sufficiently to halt the
reselection process. Consider the hungry rat that bumps into the
lever and
immediately receives a food pellet. (The rat has already been magazine
trained so that approaches the food cup and feeds as soon as it hears the
feeder operate). One pellet clearly is insufficient to bring the animal's
state of nutrition back near some intrinsic value, yet as I have observed,
sometimes this _single experience_ is sufficient to produce a quick return
to the lever and a second press, and with only a couple more such
experiences the animal's performance becomes organized into a sequence of
approach the lever, press/release, approach the cup, eat,
approach the lever
. . . To my mind, random reorganization simply cannot account for this
dramatic change in behavior after the delivery of only a few pellets.

I too think we call on reorganization a little too readily. The problem of
persisting error seems best addressed by a model in which implementing a
plan (the lion decides to go to the waterhole to look for prey) replaces
reorganization. That is, the system does _not_ reorganize as long as
following a plan leads to error reduction. The superiority of "higher"
organism consists in their ability to envision and carry out complex plans.
Only when our plans fail must we fall back on the unpredictable
reorganization mechanism--often with very undesirable results.

Bruce Gregory

Bill_Powers1 · June 26, 1999, 4:47pm

[From Bill Powers (990626.0819 MDT)]
I see from the address I'm replying to that we're back on CSGnet. Probably
a good idea.

Bruce Abbott (990626.0800 EST)--

I'm not convinced, even for the continuous case.

Perhaps not, but maybe you're beginning to suspect there's something to be
convinced about.

To keep your car on the
road, you must turn the steering wheel to compensate for disturbances
produced by changes in the direction of the road, wind, irregularities in
the surface. As you perceive the vehicle to be drifting toward the left you
must turn the wheel to the right to compensate, perhaps even hold it there
if the disturbance persists (e.g., steady wind from the right). What you
learn is a relationship, not between wheel angle and direction, but between
turning the wheel and altering the rate of change of your perception of the
car's position with respect to the road.

While it's true that these relationships among rates of change can be seen,
controlling rate of change will not result in long-term control of
position. If you try to analyse this in terms of rates of change, there
will always be some minimum detectable rate of change (because of system
changes and noise), and below that rate, the variables will simply drift
without limit. Furthermore, even without noise and changes in parameters,
control is never perfect; when there's a sustained disturbance, you can
counteract perhaps 99.9% of the effect, but that still leaves 0.1% to cause
a continual position drift. When the disturbance finally is removed, the
drift ceases, but there is an accumulated positional error which a rate
control system can't correct. In a steady crosswind, you'd go off the road
in five minutes, ten minutes, an hour ... eventually. Yet people can drive
indefinitely without going out of the proper lane, even in a very stiff
crosswind.

The right analysis requires a reference _position_ and comparison with the
perceived _position_ -- not rates of change. Now the error is anchored to a
specific perceived position on the road, and corrections will always tend
to restore the car to (toward) the reference position, no matter how long
the driver drives, and whether disturbances are transient or sustained.
There is error, but no _cumulative_ error. The system's characteristics
might wander a bit, but that will only change the position on the road by a
small amount. If your estimate of the car's position acquired a bias of 5%
of the range, the car's position will change by 5% of whatever the measure
of position is. If the perception of a rate of change acquired a bias of 5%
of the range (in the rate kind of control system), however, you'd very soon
be off the road.

The amount of
wheel-angle required to prevent further drift will vary continuously with
the disturbance, but what the driver extracts from his or her experience at
the wheel is how turning the wheel affects the drift, independent of
disturbances.

That's the rate model, which will not work. What keeps the car on the road
are the _proportional_ relationships (or integrals). It's a problem of
cumulative error. For the rate model to work, all amounts and directions of
changes in rate would have to cancel out exactly for an indefinite period
of time. I'm sure you will see that this is impossible if you think about
it a bit.

There is a criterion for success (the car goes where one
wants it to go, or at least the error is altered in the right direction) and
a particular relationship between what to do and what happens as a result
that can be learned on this basis.

Sorry, not true. You can be driving the car straight down the road, keeping
exactly in the center of your lane, with the steering wheel steadily cocked
at easily noticeable angles to the right or left, depending on the
crosswind or the tilt of the road or the flatness of a tire -- or any
combination. There is no particular angle of the steering wheel that will
produce successful steering. Yet for every wind and road condition, there
is ONLY ONE angle of the wheel that will keep the car on the road, at a
given speed. I've already explained why thinking in terms of _changes_ in
wheel angle and position will not work. I suggest simulating some models in
which only rates of change are involved. It will help greatly in
understanding what I'm saying here to investigate what really happens if
you try to handle everything in terms of changes -- especially in the
presence of sustained disturbances.

As for the sort of relationships studied in the operant chamber, natural
examples abound. Earlier I mentioned the association the lion may discover
between the watering hole and prey. If the lion is looking for prey, then
finding it at the watering hole, the lion is encouraged to make more visits
there in the future. With further experience, the lion is likely to learn
which visitation times are more likely to pay off, and begin to visit more
frequently at those times than at others. Having set a reference for
visiting the watering hole, the lion will of course deal with any
disturbances (within its power to compensate for) such as irregularities in
the terrain.

But there are other irregularities it must cope with. If you ask how the
lion gets to the watering hole, you find that each time it does so it must
move in a different direction and for a different distance -- slightly or
greatly different. Clearly "moving toward the hole" can't be reinforced,
because there is no particular movement that will get the lion to the hole
on all occasions. You might be able to reinforce "wanting to go to the
water hole," but that won't get the lion there. When it gets near the hole
it can home in on it by a control process, and ditto for approaching the
prey, which can be anywhere in the vicinity of the hole. But again, there's
no particular behavioral act that can be reinforced because a different act
will be required each time, depending on what the prey does.

I'm not arguing that nothing is learned here. At some level, I'm willing to
assume, the lion learns something _qualitative_: go to the watering hole
when the sky begins to get light. That is learned because doing so gets the
lion fed -- no disagreement about that. But "go to the watering hole" is
not sufficient to get the lion to the watering hole. The direction of
movement implied by that concept depends on where the lion is, relative to
the hole, and which way it is facing when the message comes through, and
those factors can be different every time.

Sooner or later we have to deal with the mystery inherent in the concept of
"the operant." Skinner postponed dealing with it, saying that SOMEHOW the
animal emits just the behavior needed to do what is required. But in a
physical universe, our explanations eventually have to explain that
"somehow." Skinner thought he only had to deal with the fact that different
movements of the animal could have the same effect, that of depressing the
bar. But the REAL problem is that in the wild, as with your lion at the
watering hole, the animal MUST DO SPECIFIC DIFFERENT BEHAVIORS IN ORDER TO
HAVE THE SAME EFFECT on different occasions.

You're now starting to try to explain how this can happen, without using
PCT. It can't be done. You immediately have to introduce rate control,
perception of and response to changes, in order to avoid bringing in a goal
or reference condition, thus making it seem that the environment is still
initiating the action.

But the concept of changes leaves the system unanchored to any particular
state -- any disturbance that moves the system at all will leave it in a
new state, with no way of getting back to the former state. The only way to
explain behavior that leads to a particular end-state time and time again
is to suppose that the end-state itself is intended -- represented as a
reference signal.

The chimpanzee learns that if it pushes a stick into an
opening in a termite mound and then withdraws it, the stick is likely to
have a few delicious termites clinging to it. When I'm hungry and at home,
I'm likely to check out the pantry or the refigerator for something to eat.
The reason I look in those places is that I've often found something to eat
there in the past. When I go outside, I reach for the doorknob and turn it
and while holding the knob turned, pull on the knob; the door swings open
and I am able to step through. If the knob won't turn I have to go to plan
B, but I grab the knob and turn it and pull on it because in the past such
behavioral acts have resulted in an open door at a time when an open door is
what I wanted to perceive. I can't think of many occasions in which
disturbances have altered this relationship.

You carry your explanations only as far as the qualitative result, but it's
the quantitative result that has to be achieved. It does no good to pull on
the doorknob if you don't pull hard enough to swing the door open.
Qualitatively speaking, a 1-gram pull and a 2000-gram pull both amount to
"pulling on the doorknob" -- but one works and the other doesn't. When you
only speak of the _kind_ of behavior involved, you haven't said enough. We
have to explain not only the kind, but the amount and the direction.

But I don't have any real quarrel with these examples; something is
learned, and it has to do with meeting basic physiological requirements of
the organism. Where I differ with you is in the explanation of how it is
that a basic physiological requirement can lead to learning when it's not
satisfied.

In your scheme, something about the reward (whatever is needed to reduce
the error relative to the basic physiological requirement) makes it more
likely for the behavior that produced the reward to occur again. I'm trying
to tell you to pay attention to what you already know from your years in
PCT: that is not the way to produce the same reward again! Of course
sometimes it will work, if the environment is free of disturbances. But,
outside the laboratory, how often does that happen? The normal case is that
in order to produce another reward, the animal must do something
_different_ from what it did the first time -- and different by the right
amount and in the right direction.

Clearly, reinforcement will not work if what is required is a _different_
action rather than the _same_ action. The only way to handle that, in
reinforcement theory, is to rely on discriminative stimuli, so that for
every disturbance, there is a discriminative stimulus to signal that this
time a specific different action must occur -- an action that is different
by just the amount and direction that will quantitatively cancel the effect
of the disturbance. "Stimulus generalization" can help by filling in the
gaps between specific discriminative stimuli, and "satiation" can keep the
process from running away with itself, but these are just patches to fix
flaws in the basic explanation. The patches are needed because the
explanation is not right for the kind of system that actually exists.

In PCT, we explain what happens by saying that it is not a specific act or
action that is learned, but a whole closed-loop control system that is
acquired. The organization of the control system is such that a deviation
of a perception from a reference signal will activate behavior in
proportion to the amount of error and in a direction corresponding to the
sign of the error. As this system becomes organized through random
variations of its parameters (or in any other way you'd like to suggest),
control of the perceptual signal gradually develops, so that for _any_
disturbance or _no_ disturbance, the system will act to bring the
perception (and what it represents) toward a specific preselected
goal-state. What is learned is a relationship between action and error, not
between specific actions and specific stimuli (although that kind of
relationship can itself be perceived and become a controlled variable).

I'm not advocating a traditional reinforcement view, which unfortunately
does not distinguish between two distinct mechanisms -- reorganization and
control (or perhaps I should say that it employs one mechanism to accomplish
both jobs, when two mechanisms are required). What I'm advocating is a
selection mechanism for building new control systems, one that is much more
efficient than random reorganization, when conditions are such that it can
work.

That's fine with me -- presumably, at some level what becomes organized are
systems for doing systematically what the reorganizing system can only
accomplish by random changes.

I'm certainly not talking about any mechanism that "makes the same
causal act occur again" -- the animal learns to perform the act as a means
of obtaining the consequent event, it is not driven to produce those acts
willy-nilly.

But here is the same problem again. You're assuming an environment that is
regular enough that performing the same act will produce the same
consequent event. I'm talking about an environment -- the real one outside
the lab -- in which the only way to produce the same consequent event is to
perform a _different_ act. If you perform the same act in this environment,
a different consequent event will happen, because of disturbances.

The reason you don't believe in this variable environment of mine is that
you always speak qualitatively about behavior and consequences. I reach for
a cup; I pick up the cup; I drink the water. Those are all qualitative
statements. But if reaching for the cup were strictly open-loop, it would
take only a few ounces of force, perhaps from stretching my clothing, to
make my hand miss the cup; only a bit of skin dryness for the cup to slip
out of my hand; only a slight misorientation of my head for the water to
dribble out of the corner of my mouth. The real environment is teeming with
small disturbances which would be enough to throw behavior completely off
target if there were no feedback control working all the time. It's ONLY
this feedback control that enables us to accomplish the same end twice in a
row. And because the control is so good, we don't even notice the
disturbances: they have no noticeable effects!

Of course they do have noticeable effects, but you have to be sharp to
notice them.

I am very skeptical of this proposal. I think that if the system had to
wait until some variable were brought near to an intrinsic reference level,
the animal would rarely if ever learn what to do.

Think quantitatively, please. It's not as if changes occur at random until
by luck they bring some intrinsic variable to its reference level, stopping
the changes. I'm specifically proposing the E. coli method, in which the
intrinsic error controls the rate of reorganization, and selects strongly
for consequences that bring the intrinsic variable closer to its reference
level. I'm willing to entertain other ideas, but so far this is the only
one I know of that would actually work.

The delays involved are
generally of sufficient length that the animal would already be reorganizing
into something else before the error were reduced sufficiently to halt the
reselection process.

You've seen the E. coli demo, I hope. I'm sure a skeptic could say the same
about it. But whether it works or not depends on what parameters you assume
for the reorganizing system. If you make the gain too high, exactly what
you describe will occur; you can model it. But why assume that the gain is
too high? Why not assume that it's just right?

On the other hand, if you make the rate of change of the parameters of the
control system depend on the size of the remaining intrinsic error, and let
the direction of parameter changes follow a random walk in hyperspace, you
can get this system to converge very efficiently to a final state. So why
not assume that design, instead of one that doesn't work?

Consider the hungry rat that bumps into the lever and
immediately receives a food pellet. (The rat has already been magazine
trained so that approaches the food cup and feeds as soon as it hears the
feeder operate). One pellet clearly is insufficient to bring the animal's
state of nutrition back near some intrinsic value, yet as I have observed,
sometimes this _single experience_ is sufficient to produce a quick return
to the lever and a second press, and with only a couple more such
experiences the animal's performance becomes organized into a sequence of
approach the lever, press/release, approach the cup, eat, approach the lever
. . . To my mind, random reorganization simply cannot account for this
dramatic change in behavior after the delivery of only a few pellets.

Again, that depends on the parameters of the model, and the detailed
design. You can make the intrinsic perceptual signal rate-sensitive, so
that each jolt of food causes a momentary large decrease in the error. If
there's not much left by way of needed changes, the next error might be
still smaller, and learning could be very rapid.

On the other hand, perhaps a systematic logical reasoning process, if
plausible in a rat, could be learned or inherited, and would reason as you
suggest. Higher systems can, after all, alter parameters in lower systems,
although we haven't talked much about that in PCT. I have nothing against a
more efficient process, although I'm suspicious about systematic
algorithms, considering how much of the environment is unpredictable
(evolution really couldn't be expected to equip a rat to deal with a
Skinner box).

But whatever the process, what it teaches the organism must at least work
properly. If the process teaches the animal to repeat the action that
produced the food the last time, this will not work in a real environment,
because there are always sufficient natural disturbances to assure that
repeating the same act will not have the same result. Yes, the organism
must learn to make the same result occur again. But the way to do that in a
real environment is to acquire a control system that monitors the
result-variable, compares it with a reference level, and lets the error
drive the action that affects the result-variable. Since disturbances are
always acting, we will, in general, see different acts occurring to keep
the result the same. But that's just what control systems are organized to
do automatically.

Of course in a special environment where the results are protected against
disturbances, we will see the same acts occurring each time, but that's not
indicative that the animal has learned to produce the same acts each time.
This appearance is an artefact of the special disturbance-free environment.

There is also the problem of how the reorganization system "knows" which
systems to reorganize and which to leave be. And why old organizations
often remain intact after they supposedly have been reorganized. But those
are other issues; for now I want to focus on this issue of efficiency.

I agree that these are problems, the more so in more complex organisms. One
possible answer is local reorganization. Another is some mechanism for
directing reorganization to areas of the brain where error signals are
abnormally large. But we must still provide a way for physiological errors
to cause changes in behavioral systems that have nothing directly to do
with them. Without that ability we couldn't explain why a hungry animal
learns to perform arbitrary acts which have no relationship to hunger
except that they happen to produce little pellets that turn out to be
nutritious.

But explaining how those little pellets make the brain systems in control
of the behavior that produces them more likely to act is even harder than
explaining how their lack might induce random changes of organization in
the right places. Reinforcement is harder to explain than reorganization.

Best,

Bill P.

Bill_Powers1 · June 26, 1999, 5:13pm

[From Bill Powers (990626.1107 MDT)]

Bruce Gregory (990626.1110 EDT)--

I too think we call on reorganization a little too readily. The problem of
persisting error seems best addressed by a model in which implementing a
plan (the lion decides to go to the waterhole to look for prey) replaces
reorganization. That is, the system does _not_ reorganize as long as
following a plan leads to error reduction.

Exactly what I think. However, we have to account for the ability of a lion
to perceive and carry out a plan. I can see inheriting the machinery that
is capable of the sort of programmed activity we call planning, but I am
reluctant to say that specific plans are inherited -- after all, plans have
to be formed around the properties of the present-time world. I think that
plans arise through reorganization, and that once they have been acquired
their operation prevents the errors that would produce further
reorganization (as long as they go on working properly).

To go to the other extreme and say that plans are all inherited would be to
call on evolution a little too readily, wouldn't it? And calling on
reinforcement a little too readily brings on even more problems.

Best,

Bill P.

Abbott_Bruce · June 26, 1999, 7:56pm

[From Bruce Abbott (990626.1600 EST)]

Bill Powers (990626.0819 MDT) --

I see from the address I'm replying to that we're back on CSGnet. Probably
a good idea.

It's a surprise to me, too. Force of habit. But now that we're here . . .

Bruce Abbott (990626.0800 EST)--

While it's true that these relationships among rates of change can be seen,
controlling rate of change will not result in long-term control of
position. If you try to analyse this in terms of rates of change, there
will always be some minimum detectable rate of change (because of system
changes and noise), and below that rate, the variables will simply drift
without limit. Furthermore, even without noise and changes in parameters,
control is never perfect; when there's a sustained disturbance, you can
counteract perhaps 99.9% of the effect, but that still leaves 0.1% to cause
a continual position drift. When the disturbance finally is removed, the
drift ceases, but there is an accumulated positional error which a rate
control system can't correct.

Yep. You probably need position, rate and integral control together to do
the job well. But the point of the illustration was to show that certain
regular relationships exist which probably remain detectable even under
continuous disturbance to the CV. The novice driver first learns (under
conditions in which disturbance effects tend to be minor) that holding the
wheel left or right turns the car left or right, respectively, with a
turn-radius proportional to the deviation of the wheel from center position.
The person learns to turn the wheel more if the radius of turn is too wide
and less if the radius is too sharp. Disturbances change the relationship
between wheel angle and turn radius, but they do not change the basic
relationship between turning the wheel more or less and producing more or
less turn radius. The person learns to turn the wheel until the desired
state is achieved, correcting for drift and bringing the car back to the
center of the lane. When the car is in motion, turning radius contributes
to rate of change in direction, jointly with the car's forward rate of motion.

Trial and error learning comes into play as the driver experiences the
effects of too much or too little correction and varies these until good
control is achieved. It's an active calibration process, like zeroing a gun
in on a target, with feedback driving the calibration changes.

As for the sort of relationships studied in the operant chamber, natural
examples abound. Earlier I mentioned the association the lion may discover
between the watering hole and prey. If the lion is looking for prey, then
finding it at the watering hole, the lion is encouraged to make more visits
there in the future. With further experience, the lion is likely to learn
which visitation times are more likely to pay off, and begin to visit more
frequently at those times than at others. Having set a reference for
visiting the watering hole, the lion will of course deal with any
disturbances (within its power to compensate for) such as irregularities in
the terrain.

But there are other irregularities it must cope with. If you ask how the
lion gets to the watering hole, you find that each time it does so it must
move in a different direction and for a different distance -- slightly or
greatly different. Clearly "moving toward the hole" can't be reinforced,
because there is no particular movement that will get the lion to the hole
on all occasions.

Clearly "moving toward the hole" _can_ be selected, because the lion already
knows how to move toward the hole (compensating for any disturbances as
needed); what it is learning is where to go (and probably more, such as when
to go, and how to approach the watering hole without alerting the prey).

You might be able to reinforce "wanting to go to the
water hole," but that won't get the lion there. When it gets near the hole
it can home in on it by a control process, and ditto for approaching the
prey, which can be anywhere in the vicinity of the hole. But again, there's
no particular behavioral act that can be reinforced because a different act
will be required each time, depending on what the prey does.

I am assuming that the lion already possesses a repertorie of behavioral
acts that it can call upon to produce perceptions like going to the watering
hole, being at the watering hole, closing in on the prey, and so on.
Certain of those acts succeed in bringing about a desired state of affairs;
when the lion sets a reference for those affairs in the future, then by
virtue of that experience the lion will select those acts as the means of
achieving that desired state, conditions permitting. It can even learn to
select different means to suit different circumstances.

I'm not arguing that nothing is learned here. At some level, I'm willing to
assume, the lion learns something _qualitative_: go to the watering hole
when the sky begins to get light. That is learned because doing so gets the
lion fed -- no disagreement about that.

I would only add that it could learn something quantitative as well -- like
how fast to approach when attempting to sneak up to within pouncing distance
of the prey. There's nothing here to restrict learning to qualitative
relationships.

But "go to the watering hole" is
not sufficient to get the lion to the watering hole.

It is if the relevant control systems already exist. Once the reference is
set, the system will begin to carry out the actions required to make it so.

The direction of
movement implied by that concept depends on where the lion is, relative to
the hole, and which way it is facing when the message comes through, and
those factors can be different every time.

Right. That's why I'm assuming that certain control capabilities already
exist. What that food delivery teaches the rat is not how to contract
certain muscles, or how to move toward the lever, or how to rear, but that
food follows doing certain things.

Sooner or later we have to deal with the mystery inherent in the concept of
"the operant." Skinner postponed dealing with it, saying that SOMEHOW the
animal emits just the behavior needed to do what is required. But in a
physical universe, our explanations eventually have to explain that
"somehow." Skinner thought he only had to deal with the fact that different
movements of the animal could have the same effect, that of depressing the
bar. But the REAL problem is that in the wild, as with your lion at the
watering hole, the animal MUST DO SPECIFIC DIFFERENT BEHAVIORS IN ORDER TO
HAVE THE SAME EFFECT on different occasions.

I would say sooner, because that's what my proposal is all about. How does
the lion come to use approach to the watering hole as a means of finding
prey? Approach to the watering hole is a controlled result of action, and
it is employed here because in the past its use has been associated with
finding prey. Because approach to the watering hole is already a controlled
behavioral act, it is rather insensitive to the starting position or the
animal and disturbances encountered along the way (these are countered by
the system's actions so that in most cases approach to the watering hole
continues). The same behavioral act is both a controlled _consequence_ of
actions _and_ a (relatively disturbance-protected) _means_ of controlling
another perception (finding prey).

You're now starting to try to explain how this can happen, without using
PCT. It can't be done. You immediately have to introduce rate control,
perception of and response to changes, in order to avoid bringing in a goal
or reference condition, thus making it seem that the environment is still
initiating the action.

Now I'm starting to feel some frustration, because (a) I _am_ "trying to use
PCT" -- what do you think all my talk about behavioral acts is all about?
and (b) nowhere have I made the claim that the environment is initiating any
action. I am not defending traditional reinforcement theory. This is a
different proposal.

The only way to
explain behavior that leads to a particular end-state time and time again
is to suppose that the end-state itself is intended -- represented as a
reference signal.

Yep. And how does setting this particular reference in this particular
control system to this particular value get selected out of all the
possibilities as the means toward attaining the food? An associative
process accomplishes that. It gets selected because, in the experience of
the animal, setting that reference as a means has led to the control (or
improved control) of another variable the animal is attempting to bring
under control. That's my proposal in a nutshell.

You carry your explanations only as far as the qualitative result, but it's
the quantitative result that has to be achieved. It does no good to pull on
the doorknob if you don't pull hard enough to swing the door open.
Qualitatively speaking, a 1-gram pull and a 2000-gram pull both amount to
"pulling on the doorknob" -- but one works and the other doesn't. When you
only speak of the _kind_ of behavior involved, you haven't said enough. We
have to explain not only the kind, but the amount and the direction.

There is no reason why the same argument cannot apply quantitatively. If
turning with more force results in successful opening of the door when
turning with less force failed, then turning harder gets selected as a means
when turning with the usually effective force fails. If fact, if you didn't
think of doing it yourself, I could shape such behavior by an appropriately
designed schedule of approximations.

In your scheme, something about the reward (whatever is needed to reduce
the error relative to the basic physiological requirement) makes it more
likely for the behavior that produced the reward to occur again. I'm trying
to tell you to pay attention to what you already know from your years in
PCT: that is not the way to produce the same reward again! Of course
sometimes it will work, if the environment is free of disturbances. But,
outside the laboratory, how often does that happen? The normal case is that
in order to produce another reward, the animal must do something
_different_ from what it did the first time -- and different by the right
amount and in the right direction.

If, as I proposed, what is selected is a behavioral act (which itself is the
result of a control process and is thus buffered from most disturbances),
then your objection is eliminated.

Bruce

Bruce_Gregory8 · June 26, 1999, 8:39pm

[From Bruce Gregory (990626.1640 EDT)]

Bill Powers (990626.1107 MDT)

Exactly what I think. However, we have to account for the ability
of a lion
to perceive and carry out a plan. I can see inheriting the machinery that
is capable of the sort of programmed activity we call planning, but I am
reluctant to say that specific plans are inherited -- after all,
plans have
to be formed around the properties of the present-time world.

Agreed. There is probably not enough capacity in DNA to encode the kind of
information required.

I think that
plans arise through reorganization, and that once they have been acquired
their operation prevents the errors that would produce further
reorganization (as long as they go on working properly).

We have many examples of plans that are not acquired through reorganization,
however. Imagination and memory, at least for human beings, seem to provide
reference signals for many of our plans.

To go to the other extreme and say that plans are all inherited
would be to
call on evolution a little too readily, wouldn't it?

Definitely.

And calling on
reinforcement a little too readily brings on even more problems.

Like reorganization, reinforcement seems to me greatly overused. I agree
with you that "reinforcers" are _not_ accompanied by an increase in the
frequency of behavior. The opposite is more often the case. The term seems
worse than useless. It _seems_ to explain something when it actually it does
nothing of the sort. Reinforcers are only reinforces because they allow
internal reference levels to be matched by perceptions.

Bruce Gregory

Bill_Powers1 · June 26, 1999, 11:24pm

[From Bill Powers (990626.1659 MDT)]

Bruce Abbott (990626.1600 EST)]

To skip to the nub of the argument:

If, as I proposed, what is selected is a behavioral act (which itself is the
result of a control process and is thus buffered from most disturbances),
then your objection is eliminated.

No it's not, because the connection between the behavioral act and the
consequences of producing it (however well controlled the act is) is _also_
subject to disturbance and is therefore variable. The behavioral act, if it
is to be usable for controlling some other variable, must be carried out
relative to a variable reference level, and the reference level must be
varied so as to produce _different degrees and directions_ of the
behavioral act (which can turn it into a different class of act) if the
consequence is to be maintained the same.

What you're getting into here, Bruce, is simply hierarchical control. The
higher system, as a means of keeping its own perception under control,
varies the reference signals of some set of lower systems, each of which is
a control system controlling some behavioral act in parallel with all the
others. The consequences of these behavioral acts add up to a state of the
higher-level controlled perception (or many perceptions) which is or are
the overriding object of control. No higher control system can work by
simply specifying a single fixed reference level for the lower-level
behavioral acts, because the environment, in general, is changeable and
requires different behavioral acts to be generated if the same higher-level
result is to be preserved. And that higher-level result in turn must in
general also be controllable relative to a variable reference level,
because it has even higher-level consequences which are also under control,
and are also subject to disturbance. As we go to higher levels, the
time-scale stretches out, and disturbances capable of getting past the
lower levels of control become less likely. But there is no level where you
can just "set it and forget it."

The beauty of a control hierarchy is that as it becomes more complete, it
can adjust behavioral acts at many levels quite automatically, keeping the
highest levels of variables under control by varying the lower behavioral
acts precisely as needed to counteract disturbances from the concrete to
the abstract. Couple this hierarchy to a theory of reorganization that
links it to the basic life support systems, and you have a glimmer of the
workings of an enormously capable adaptive organism in very effective
control of what happens to it.

Best,

Bill P.

Bill_Powers1 · June 27, 1999, 9:42am

[From Bill Powers (990627.0136 MDT)]

Bruce Abbott (990626.1600 EST)--

Yep. You probably need position, rate and integral control together to do
the job well.

That's not quite what I meant. I meant that with only rate control, you
can't do it at all. But more than that, the task I was referring to was
that of the modeler, not that of the learner. To explain behavior we need a
model with proportional and/or integral control because rate control alone
can't explain what we see. Proportional and integral control anchor the
controlled quantity to a specific state.

You say:

But the point of the illustration was to show that certain
regular relationships exist which probably remain detectable even under
continuous disturbance to the CV. The novice driver first learns (under
conditions in which disturbance effects tend to be minor) that holding the
wheel left or right turns the car left or right, respectively, with a
turn-radius proportional to the deviation of the wheel from center position.
The person learns to turn the wheel more if the radius of turn is too wide
and less if the radius is too sharp. Disturbances change the relationship
between wheel angle and turn radius, but they do not change the basic
relationship between turning the wheel more or less and producing more or
less turn radius. The person learns to turn the wheel until the desired
state is achieved, correcting for drift and bringing the car back to the
center of the lane. When the car is in motion, turning radius contributes
to rate of change in direction, jointly with the car's forward rate of

motion.

I doubt rather strenuously that this intellectual account describes how
novice drivers learn to drive. It sounds more like an after-the-fact,
this-is-what-must-have-happened account. For one thing, I doubt that your
typical 15-year-old student driver is thinking in terms of radii of
curvature and the law of centrifugal force (proportional to v^2/r). You
state a true fact when you say that turning radius and car motion
contribute to rate of change in direction, but there is no reason to think
that this fact is taken into account during learning. And anyway, turning
the wheel (meaning creating a rate of turn of the wheel) only creates a
rate of change of the sideward force exerted on the car because of the
angle of attack of the front tires relative to the direction of travel.
That force adds to other forces acting on the car such as its own inertia,
the effects of wind and road tilt interacting with gravity, and so on. It
is the _net_ force that accelerates the car sideways, and that is what
determines the radius of curvature. At a constant angle of the steering
wheel, the radius of curvature can change quite substantially during the
turn. As I pointed out last time, it can change so much that it requires
reversing the direction of turn of the steering wheel just to keep the car
on the road.

In order to learn to drive, the novice driver must add and alter neural
connections in his or her brain, but speaking for myself I have to admit
that I don't know how to do that. Do you? What I did do was passionately
want to drive a car, a process which became less mysterious as I acquired
knowledge about the controls of a car and the behavior of the car when
someone else was driving. But I only became aware of that knowledge after I
had learned it. My knowing it didn't help me know it for the first time.

The process of learning itself was then and remains now outside my
conscious experience. I can expose myself (or others) to conditions under
which learning might take place, but if it doesn't I can't make it happen
(so much for "teaching"). If the process of learning could be directed by
my consciousness, I wouldn't need to speculate about models of learning;
I'd already know how it works. But I don't. A model is necessary.

Trial and error learning comes into play as the driver experiences the
effects of too much or too little correction and varies these until good
control is achieved. It's an active calibration process, like zeroing a gun
in on a target, with feedback driving the calibration changes.

Well, trial and error learning is called "reorganization" in PCT, isn't it?
It's driven by some goal relating to a perception (not necessarily
conscious) that is affected by behavior, and random adjustments are made in
the control process until the perception is reliably maintained at or
brought to the reference condition. Random changes are required because at
first there is no understanding of what is needed to achieve control: the
goal exists, but the selection of means is missing. And as experience is
accumulated, even the goal may change, because when we see what happens
when the original goal is met, higher-level systems may well experience
errors -- woops, that's not what I wanted to do.

If a novice driver starts out with the idea that steering is just a matter
of "turning" the steering wheel, experience will soon show that this
doesn't work: the car will wander from side to side because the "turns"
always come too late.

But the novice driver will learn to drive even with an incorrect
understanding and verbal description of what is going on. Reorganization
will not stop until the car behaves in a way that doesn't scare the driver.
If rate control is required, it will come into existence; likewise for
proportional and integral control. Afterward, the driver may pontificate
about what a driver has to learn in order to be really good at it, but
that's only theory (and bragging). It has nothing to do with the changes
that actually took place in the driver, and the description may well not
fit what was actually learned. Theorizing well requires more than verbal
description.

I am assuming that the lion already possesses a repertorie of behavioral
acts that it can call upon to produce perceptions like going to the watering
hole, being at the watering hole, closing in on the prey, and so on.
Certain of those acts succeed in bringing about a desired state of affairs;
when the lion sets a reference for those affairs in the future, then by
virtue of that experience the lion will select those acts as the means of
achieving that desired state, conditions permitting. It can even learn to
select different means to suit different circumstances.

You're still not willing to give up the idea that some things can affect
others without any disturbances being present. How fast should a lion sneak
up on a prey that is looking toward it? Looking to one side? Facing away?
Downwind? Upwind? Adult? Infantile? Moving? Stationary? The list of
variable factors is endless, and by your story all of these factors would
have to be taken into account by the lion to allow it to "emit" the right
speed of creeping up on its prey. And each factor is a continuous variable:
the looking direction, age, wind direction, state of motion, and everything
else can change imperceptibly from one state to another, so the list of
discriminative stimuli and the degrees of each action can't be determined
in advance. The ONLY plausible solution is continuous feedback control. The
environment is just too variable to permit learning ANY single way of
acting as the "best" way. Your account of what the lion learns creates an
endless proliferation of "circumstances" and helpful discriminative
stimuli, but why bother with that? A control-system account does away with
the need for this almost completely; all the lion has to do is creep slowly
enough that the prey shows no signs of being alerted. That's the controlled
variable that takes the place of ALL the discriminations and learned rates
of creeping, and it adjusts behavior perfectly automatically without any
reliance on previous experiences.

In reorganization theory, changes do not happen because certain results
occur. It's just the opposite: changes continue to happen until some
specific result is achieved. It is the lack of the result that creates
change, and the achievement of the result (meaning, acquisition of the
control system that reliably produces that result) that terminates change.

Under reinforcement theory, specific behaviors are selected from a stream
of randomly emitted behaviors by a positive effect of the occurrance of a
critical result, like the appearance of a bit of food. This positive effect
is called reinforcement or reward, and it's supposed to work by increasing
the probability that the same act will be emitted under the same
circumstances in the future.

But reorganization theory requires no such positive effect. It is an error
signal that starts changes occurring, and correction of the error that
makes the changes stop, so the organization in force at the time it stops
is retained.

The relationship of reorganization theory to reinforcement theory is much
like the relationship of Newton's laws of motion to older theories of
motion. The older theories assumed that motion had to be caused by
something called "impetus": flying objects travelled until their "impetus"
ran out, then fell to the ground. Newton realized that it was not what
sustained the motion that had to be explained, but what _changed_ the
motion, so motion would simply continue unchanged until a force was applied
to change it.

Reorganization theory is based on the idea that we must explain what
_changes_ the organization of a behaving system, not what makes it stay the
same. The natural state of the system's organization is to continue as it
is, just as the natural state of matter is to continue in its present state
of motion. If nothing special happens, the organism will continue to act
the same way in the same circumstances. Only when something acts to change
that organization will we see anything new in behavior.

Nothing special is required to make behavior continue in its present form.
What requires explanation is what causes changes in behavioral
organization. In reorganization theory, the mechanism that changes
organization is driven by departures of basic life support variables from
their genetically-specified reference states. This gives us a physical
explanation for how a lack of food, for example, can result in a bird's
learning to walk in a figure-eight pattern -- things which have nothing to
do with each other in physical terms.

If you admit to what I say here, you simply have to give up on
reinforcement as a viable concept. I know that and you know that. It's not
a question of whether you will give up on it, but only when. You are
already beginning to rely on the existence of lower-order control systems
to take care of environmental variations without needing discriminative
stimuli and reinforcement to produce the right acts. Why not admit that
such things are not needed at the next level either, or the next, or
anywhere? It's control all the way up.

If, as I proposed, what is selected is a behavioral act (which itself is the
result of a control process and is thus buffered from most disturbances),
then your objection is eliminated.

And if the process of selecting behavioral acts is a control process, with
the selection varying as disturbances come and go, we will never need
anything but control processes to explain the whole thing.

Best,

Bill P.

Abbott_Bruce · June 28, 1999, 12:10pm

[From Bruce Abbott (990628.0815 EST)]

Bill Powers (990627.0136 MDT)

Bruce Abbott (990626.1600 EST)

I'm limited on time, but will say a few things now.

But the point of the illustration was to show that certain
regular relationships exist which probably remain detectable even under
continuous disturbance to the CV. The novice driver first learns (under
conditions in which disturbance effects tend to be minor) that holding the
wheel left or right turns the car left or right, respectively, with a
turn-radius proportional to the deviation of the wheel from center position.
The person learns to turn the wheel more if the radius of turn is too wide
and less if the radius is too sharp. Disturbances change the relationship
between wheel angle and turn radius, but they do not change the basic
relationship between turning the wheel more or less and producing more or
less turn radius. The person learns to turn the wheel until the desired
state is achieved, correcting for drift and bringing the car back to the
center of the lane. When the car is in motion, turning radius contributes
to rate of change in direction, jointly with the car's forward rate of

motion.

I doubt rather strenuously that this intellectual account describes how
novice drivers learn to drive. It sounds more like an after-the-fact,
this-is-what-must-have-happened account.

Most analyses have that character.

For one thing, I doubt that your
typical 15-year-old student driver is thinking in terms of radii of
curvature and the law of centrifugal force (proportional to v^2/r).

Me, too. What I have in mind is more along the lines of the 15-year-old
recognizing that the turn becomes sharper as the wheel is cranked from the
straight-ahead position. I referred to "turning radius" to communicate what
the variable is, not how the 15-year-old would think of it. (In fact kid
may not think of it verbally at all.) What is learned is a relationship
between direction of turn and change of direction.

You
state a true fact when you say that turning radius and car motion
contribute to rate of change in direction, but there is no reason to think
that this fact is taken into account during learning.

On the contrary, there is every reason to believe that these relations are
learned, although most of them have already been learned by the average
15-year-old while learning how to walk, run, and drive a tricycle.

And anyway, turning
the wheel (meaning creating a rate of turn of the wheel) only creates a
rate of change of the sideward force exerted on the car because of the
angle of attack of the front tires relative to the direction of travel.
That force adds to other forces acting on the car such as its own inertia,
the effects of wind and road tilt interacting with gravity, and so on. It
is the _net_ force that accelerates the car sideways, and that is what
determines the radius of curvature. At a constant angle of the steering
wheel, the radius of curvature can change quite substantially during the
turn. As I pointed out last time, it can change so much that it requires
reversing the direction of turn of the steering wheel just to keep the car
on the road.

I doubt that these problems come into play as your 15-year-old navigates a
deserted parking lot, learning how changing the wheel-position alters the
direction and rate-of-turn of the car. Once that relationship is learned,
it remains a servicable one even under the conditions you describe. I once
had the sort of experience you describe while driving on a 65 mph two-lane
highway in a driving rain. I could barely see the road and had to slow down
so much that my speed was not great enough for the banking of the curves, so
the car would start to drift down onto the inside of the curve. This could
only be corrected by turning the wheel right while going around left-hand
turns and left while going around right-hand turns. But -- and this is my
point -- I still had to turn the wheel to the right to correct a leftward
drift and left to correct a rightward drift. The position of the wheel
required to compensate for the drifting varies with the disturbance, but the
direction the wheel needs to be turned does not. The same association
learned in the parking lot still works on the road, disturbance or no
disturbance.

An exception is when the car encounters a slick surface. Then the basic
rule learned on the parking lot no longer applies. The usual result -- at
first, at least -- is that people lose control.

In order to learn to drive, the novice driver must add and alter neural
connections in his or her brain, but speaking for myself I have to admit
that I don't know how to do that. Do you?

No, it isn't required that we do know that. It isn't my position that we must.

What I did do was passionately
want to drive a car, a process which became less mysterious as I acquired
knowledge about the controls of a car and the behavior of the car when
someone else was driving. But I only became aware of that knowledge after I
had learned it. My knowing it didn't help me know it for the first time.

As noted above, you probably had some basis knowledge of what to do before
you ever tried to drive a car. But imagine a person that had no idea, other
than that the steering wheel was used to steer the car. The car starts
rolling forward and the person tries turning the wheel to the left. The car
starts turning left. The person changes the wheel angle. The rate of
turning changes. The wheel is turned right to the right by degrees. The
amount of left-turning decreases and then the car starts to turn right.

What is being learned? The person is learning to associate turning the
wheel in given ways with certain predictable changes in the car's direction
of travel. This knowledge _could_ be expressed verbally, but doesn't have
to be.

So if you want the car to move to the left, you learn that turning the wheel
left works and turning it right doesn't. You begin to develop an
association between a certain act and a certain result, and in the future,
you are more likely to turn the wheel left when you want the car to move to
the left, and right when you want the car to move to the right. You repeat
the acts that bring about the desired result and eliminate those that do
not, or have contrary effects. In so doing, you establish control over the
relevant variables.

Trial and error learning comes into play as the driver experiences the
effects of too much or too little correction and varies these until good
control is achieved. It's an active calibration process, like zeroing a gun
in on a target, with feedback driving the calibration changes.

Well, trial and error learning is called "reorganization" in PCT, isn't it?

Both proposals are forms of trial and error learning.

It's driven by some goal relating to a perception (not necessarily
conscious)

I had not claimed that consciousness was necessary, either.

that is affected by behavior, and random adjustments are made in
the control process until the perception is reliably maintained at or
brought to the reference condition.

Random changes cannot be made to a control process that does not yet exist.
How does the driver's reorgnization system know what to vary? This could be
very dangerous -- the driver wants to drive, and suddenly the gain in the
system controlling his left biceps muscle goes way up. Not good.

Random changes are required because at
first there is no understanding of what is needed to achieve control: the
goal exists, but the selection of means is missing. And as experience is
accumulated, even the goal may change, because when we see what happens
when the original goal is met, higher-level systems may well experience
errors -- woops, that's not what I wanted to do.

Random changes must be restricted to those that are most likely to result in
assertion of control, or the system will simply flail about, destroying
perfectly good control systems as it blindly varies this or that parameter,
without much chance that the desired system will come into existence. You
have provided no mechanism allowing the reorganizing system to "decide" what
to change, and what to leave alone.

The proposal I am offering is likewise based on trial and error, but the
"trial" consists not of random changes in the system, but of the production
of organized, controlled sequences of behavioral acts already in the
animal's repertoire, when an error exists in some variable that currently
either is not under the animal's control or is relatively poorly controlled.
These behavioral acts produce a stream of relatively reproducable
perceptions. When some of these are followed by a perception (consequence)
that the animal is trying to produce in order to bring a variable under
control, the animal associates the behaviorally-produced perceptions with
the consequence, and attempts to reproduce the perceptions by repeating the
behavioral act it associates with the consequence.

I've got to head off to work at this point, so I'll close here.

Bruce

Bruce_Gregory · June 28, 1999, 3:24pm

[From Bruce Gregory (990628.1123 EDT)]

Bruce Abbott (990628.0815 EST)

>Bill Powers (990627.0136 MDT)
>Random changes are required because at
>first there is no understanding of what is needed to achieve
control: the
>goal exists, but the selection of means is missing. And as
experience is
>accumulated, even the goal may change, because when we see
what happens
>when the original goal is met, higher-level systems may well
experience
>errors -- woops, that's not what I wanted to do.

Random changes must be restricted to those that are most
likely to result in
assertion of control, or the system will simply flail about,
destroying
perfectly good control systems as it blindly varies this or
that parameter,
without much chance that the desired system will come into
existence. You
have provided no mechanism allowing the reorganizing system
to "decide" what
to change, and what to leave alone.

Blind variation and selective retention is certainly _a_ learning
mechanism, but I agree with you that it is unlikely to play a major role
in learning to control complex perceptions. When I discover that my
usual route home is blocked, I do not normally set off in a random
direction. Rather, I pick an alternative route that seems likely, on the
basis of what I already know, to get me where I want to go. Ditto when I
learn to bank a Cessna 150 using rudder pedals rather than ailerons. I
am able to use my legs to alter pressure on one rudder pedal or the
other--a learned to control my legs a long time ago. What I need to do
is to practice this procedure until it no longer requires my attention
to execute this particular form of control under circumstances when I
want to produce a shallow bank. Learning apparently involves discovering
which perception to control under which circumstance. A coach can speed
this process. Random reorganization seems likely to lead to disaster (at
least when learning to fly).

Bruce Gregory

Bill_Powers1 · June 28, 1999, 5:47pm

[From Bill Powers (990628.1047 MDT)(]

Bruce Abbott (990628.0815 EST)--

I doubt rather strenuously that this intellectual account describes how
novice drivers learn to drive. It sounds more like an after-the-fact,
this-is-what-must-have-happened account.

Most analyses have that character.

Yes, but if you're proposing a model, you have to make at least some
attempt to say HOW these results that you're describing are brought about.
I completely agree that what you learn is to turn the wheel in the right
relationship to errors between the perceived and desired states of affairs,
making use of existing control systems at lower levels (although you didn't
quite put it that way). That's the OUTCOME of the learning process; that's
a description of the way the control system that results from learning will
work. But how do you propose that we get from not having this control
system to having it?

While the mechanism I propose isn't completely worked out for all
conditions, and indeed might not work under all conditions, it is at least
a start toward a mechanism with defined functions. I've tested this
mechanism under some circumstances and found what is necessary to make it
work properly.

For example, I've found that the parameter changes have to be small enough
during any one iteration of the process so the errors in the reorganizing
system can be sensed and another round of reorganization can be carried out
if necessary before the errors become large. Also, the changes have to take
place in a continuum, not by jumping randomly from one value to a
completely unrelated value. In effect, we can treat the momentary values of
all the parameters being altered as a point in n-dimensional hyperspace.
These parameters are all changing at a slow rate proportional to the total
error, so the point is moving in some direction in hyperspace. A
reorganizing event consists of randomly choosing not new _values_ of the
parameters, but new _rates of change_ of the values of the parameters, the
choice being selected randomly between a positive and a negative limit.
The effect is to randomly alter the direction of movement of the
n-dimensional point. If the result is to make the total error smaller, the
next reorganization is postponed by a time that can be adjusted as a
parameter of the reorganizing process. The same parameter determines how
much sooner the next reorganization occurs if the error gets worse. "The
error", of course, is some measure that is reduced as control improves,
though it doesn't have to be directly related to the error in the forming
control system. This faithfully reproduces the principle by which E. coli
navigates, which is why I call it the "E. coli" method of reorganization.
There are no doubt others (the Extended Kalman Filter method could probably
be classed as a principle of reorganization, although it uses a highly
complex systematic procedure which itself would have to be learned).

With this method, I've been able to solve 50 simultaneous equations in 50
unknowns strictly by "trial and error," in a reasonably short time. I've
also been able to make 50 control systems adapt to an environment in which
50 variables are both affected and sensed simultaneously by the 50 systems,
with the perceptual signals each being a different randomly-selected
function of the 50 external variables. So that's at least progress toward
checking out this mechanism of learning.

My "artificial cerebellum" is also a learning algorithm, a systematic one
but one that I believe has evolved and so doesn't need to be re-learned in
each individual lifetime. Embodied as a specific kind of wiring in the
cerebellar cortex (the climbing fibers, parallel fibers, and Purkinje
cells), it is capable of modifying output signals detoured on their way
from the brainstem to the spinal motor neurons so as to create dynamic
stability over a range of physical properties of the limb. This method has
the great advantage that it can adapt to complex loads without knowing
anything but the error signal in the control system that is trying to use
the limb to control the load. So it needs no model of the world. "Armac", a
downloadable PC program on my web page, shows the Little Man equipped with
this artificial cerebellum gradually learning to move an arm to point to a
randomly-moving target (I call it the "fly-catcher").

Considering what really has to be acquired and adjusted in order to create
successful and stable control systems, explaining learning to drive by
appealing to reasoning and "association" seems pretty inadequate. As
occupants of this machinery, we get a pretty sketchy picture of its
detailed workings, and I think we tend to take credit for having done
things that really happen at much too low a level of detail for us to have
much conscious influence on them. Yes, it is perfectly reasonable that if
turning the wheel left is "associated" with a turn to the left, we should
turn the wheel left if we want the car to turn left. But that is such a
laughably incomplete story of what is really required in terms of neural
reorganization that we must consider it only a just-so story, told to keep
the occupant happy and not really to explain anything.

Best,

Bill P.

Bruce_Gregory · June 28, 1999, 6:15pm

[From Bruce Gregory (990628.1413 EDT)]

Bill Powers (990628.1047 MDT)

Considering what really has to be acquired and adjusted in order to

create

successful and stable control systems, explaining learning to drive by
appealing to reasoning and "association" seems pretty inadequate. As
occupants of this machinery, we get a pretty sketchy picture of its
detailed workings, and I think we tend to take credit for having done
things that really happen at much too low a level of detail
for us to have
much conscious influence on them.

I take it that in your view introspection has almost nothing to tell us
about how learning takes place--everything that is important is "hidden"
at level inaccessible to awareness. This is certainly possible. There is
evidence that our "thinking" is largely a commentary on what we
observe--the result of a struggle to make sense of and to "own" our
behavior.

Bruce Gregory

Bill_Powers1 · June 28, 1999, 7:50pm

[From Bill Powers (990628.1338 MDT )]

Bruce Gregory (990628.1413 EDT)--

I take it that in your view introspection has almost nothing to tell us
about how learning takes place--everything that is important is "hidden"
at level inaccessible to awareness. This is certainly possible. There is
evidence that our "thinking" is largely a commentary on what we
observe--the result of a struggle to make sense of and to "own" our
behavior.

I think introspection might reveal to us some of what is going on in us
when we learn. I think we can voluntarily have some effect on whether and
how and to some extent what we learn. But a lot of this seems to work by
way of putting ourselves into situations and conditions where learning may
happen. I can go to school and participate in a course, but whether I come
away knowing something new is not as predictable as my attendance.

A learning model has to work before we have learned anything. So any
explanation that relies on some skill that we have learned must be at least
viewed with suspicion. There are undoubtedly processes that look like
learning and do rely on logic and experience, but I would prefer to think
of them as simply executing algorithms that we must, at some previous time,
have learned in the deeper sense of reorganization.

Best,

Bill P.

Bruce_Gregory · June 29, 1999, 2:06pm

[From Bruce Gregory (990629.1005 EDT)]

Bill Powers (990628.1338 MDT )

A learning model has to work before we have learned anything. So any
explanation that relies on some skill that we have learned
must be at least
viewed with suspicion. There are undoubtedly processes that look like
learning and do rely on logic and experience, but I would
prefer to think
of them as simply executing algorithms that we must, at some
previous time,
have learned in the deeper sense of reorganization.

Sounds perfectly reasonable to me. Thanks, this exchange has been very
helpful.

Bruce Gregory