[From Bill Powers (990626.0819 MDT)]
I see from the address I'm replying to that we're back on CSGnet. Probably
a good idea.
Bruce Abbott (990626.0800 EST)--
I'm not convinced, even for the continuous case.
Perhaps not, but maybe you're beginning to suspect there's something to be
convinced about.
To keep your car on the
road, you must turn the steering wheel to compensate for disturbances
produced by changes in the direction of the road, wind, irregularities in
the surface. As you perceive the vehicle to be drifting toward the left you
must turn the wheel to the right to compensate, perhaps even hold it there
if the disturbance persists (e.g., steady wind from the right). What you
learn is a relationship, not between wheel angle and direction, but between
turning the wheel and altering the rate of change of your perception of the
car's position with respect to the road.
While it's true that these relationships among rates of change can be seen,
controlling rate of change will not result in long-term control of
position. If you try to analyse this in terms of rates of change, there
will always be some minimum detectable rate of change (because of system
changes and noise), and below that rate, the variables will simply drift
without limit. Furthermore, even without noise and changes in parameters,
control is never perfect; when there's a sustained disturbance, you can
counteract perhaps 99.9% of the effect, but that still leaves 0.1% to cause
a continual position drift. When the disturbance finally is removed, the
drift ceases, but there is an accumulated positional error which a rate
control system can't correct. In a steady crosswind, you'd go off the road
in five minutes, ten minutes, an hour ... eventually. Yet people can drive
indefinitely without going out of the proper lane, even in a very stiff
crosswind.
The right analysis requires a reference _position_ and comparison with the
perceived _position_ -- not rates of change. Now the error is anchored to a
specific perceived position on the road, and corrections will always tend
to restore the car to (toward) the reference position, no matter how long
the driver drives, and whether disturbances are transient or sustained.
There is error, but no _cumulative_ error. The system's characteristics
might wander a bit, but that will only change the position on the road by a
small amount. If your estimate of the car's position acquired a bias of 5%
of the range, the car's position will change by 5% of whatever the measure
of position is. If the perception of a rate of change acquired a bias of 5%
of the range (in the rate kind of control system), however, you'd very soon
be off the road.
The amount of
wheel-angle required to prevent further drift will vary continuously with
the disturbance, but what the driver extracts from his or her experience at
the wheel is how turning the wheel affects the drift, independent of
disturbances.
That's the rate model, which will not work. What keeps the car on the road
are the _proportional_ relationships (or integrals). It's a problem of
cumulative error. For the rate model to work, all amounts and directions of
changes in rate would have to cancel out exactly for an indefinite period
of time. I'm sure you will see that this is impossible if you think about
it a bit.
There is a criterion for success (the car goes where one
wants it to go, or at least the error is altered in the right direction) and
a particular relationship between what to do and what happens as a result
that can be learned on this basis.
Sorry, not true. You can be driving the car straight down the road, keeping
exactly in the center of your lane, with the steering wheel steadily cocked
at easily noticeable angles to the right or left, depending on the
crosswind or the tilt of the road or the flatness of a tire -- or any
combination. There is no particular angle of the steering wheel that will
produce successful steering. Yet for every wind and road condition, there
is ONLY ONE angle of the wheel that will keep the car on the road, at a
given speed. I've already explained why thinking in terms of _changes_ in
wheel angle and position will not work. I suggest simulating some models in
which only rates of change are involved. It will help greatly in
understanding what I'm saying here to investigate what really happens if
you try to handle everything in terms of changes -- especially in the
presence of sustained disturbances.
As for the sort of relationships studied in the operant chamber, natural
examples abound. Earlier I mentioned the association the lion may discover
between the watering hole and prey. If the lion is looking for prey, then
finding it at the watering hole, the lion is encouraged to make more visits
there in the future. With further experience, the lion is likely to learn
which visitation times are more likely to pay off, and begin to visit more
frequently at those times than at others. Having set a reference for
visiting the watering hole, the lion will of course deal with any
disturbances (within its power to compensate for) such as irregularities in
the terrain.
But there are other irregularities it must cope with. If you ask how the
lion gets to the watering hole, you find that each time it does so it must
move in a different direction and for a different distance -- slightly or
greatly different. Clearly "moving toward the hole" can't be reinforced,
because there is no particular movement that will get the lion to the hole
on all occasions. You might be able to reinforce "wanting to go to the
water hole," but that won't get the lion there. When it gets near the hole
it can home in on it by a control process, and ditto for approaching the
prey, which can be anywhere in the vicinity of the hole. But again, there's
no particular behavioral act that can be reinforced because a different act
will be required each time, depending on what the prey does.
I'm not arguing that nothing is learned here. At some level, I'm willing to
assume, the lion learns something _qualitative_: go to the watering hole
when the sky begins to get light. That is learned because doing so gets the
lion fed -- no disagreement about that. But "go to the watering hole" is
not sufficient to get the lion to the watering hole. The direction of
movement implied by that concept depends on where the lion is, relative to
the hole, and which way it is facing when the message comes through, and
those factors can be different every time.
Sooner or later we have to deal with the mystery inherent in the concept of
"the operant." Skinner postponed dealing with it, saying that SOMEHOW the
animal emits just the behavior needed to do what is required. But in a
physical universe, our explanations eventually have to explain that
"somehow." Skinner thought he only had to deal with the fact that different
movements of the animal could have the same effect, that of depressing the
bar. But the REAL problem is that in the wild, as with your lion at the
watering hole, the animal MUST DO SPECIFIC DIFFERENT BEHAVIORS IN ORDER TO
HAVE THE SAME EFFECT on different occasions.
You're now starting to try to explain how this can happen, without using
PCT. It can't be done. You immediately have to introduce rate control,
perception of and response to changes, in order to avoid bringing in a goal
or reference condition, thus making it seem that the environment is still
initiating the action.
But the concept of changes leaves the system unanchored to any particular
state -- any disturbance that moves the system at all will leave it in a
new state, with no way of getting back to the former state. The only way to
explain behavior that leads to a particular end-state time and time again
is to suppose that the end-state itself is intended -- represented as a
reference signal.
The chimpanzee learns that if it pushes a stick into an
opening in a termite mound and then withdraws it, the stick is likely to
have a few delicious termites clinging to it. When I'm hungry and at home,
I'm likely to check out the pantry or the refigerator for something to eat.
The reason I look in those places is that I've often found something to eat
there in the past. When I go outside, I reach for the doorknob and turn it
and while holding the knob turned, pull on the knob; the door swings open
and I am able to step through. If the knob won't turn I have to go to plan
B, but I grab the knob and turn it and pull on it because in the past such
behavioral acts have resulted in an open door at a time when an open door is
what I wanted to perceive. I can't think of many occasions in which
disturbances have altered this relationship.
You carry your explanations only as far as the qualitative result, but it's
the quantitative result that has to be achieved. It does no good to pull on
the doorknob if you don't pull hard enough to swing the door open.
Qualitatively speaking, a 1-gram pull and a 2000-gram pull both amount to
"pulling on the doorknob" -- but one works and the other doesn't. When you
only speak of the _kind_ of behavior involved, you haven't said enough. We
have to explain not only the kind, but the amount and the direction.
But I don't have any real quarrel with these examples; something is
learned, and it has to do with meeting basic physiological requirements of
the organism. Where I differ with you is in the explanation of how it is
that a basic physiological requirement can lead to learning when it's not
satisfied.
In your scheme, something about the reward (whatever is needed to reduce
the error relative to the basic physiological requirement) makes it more
likely for the behavior that produced the reward to occur again. I'm trying
to tell you to pay attention to what you already know from your years in
PCT: that is not the way to produce the same reward again! Of course
sometimes it will work, if the environment is free of disturbances. But,
outside the laboratory, how often does that happen? The normal case is that
in order to produce another reward, the animal must do something
_different_ from what it did the first time -- and different by the right
amount and in the right direction.
Clearly, reinforcement will not work if what is required is a _different_
action rather than the _same_ action. The only way to handle that, in
reinforcement theory, is to rely on discriminative stimuli, so that for
every disturbance, there is a discriminative stimulus to signal that this
time a specific different action must occur -- an action that is different
by just the amount and direction that will quantitatively cancel the effect
of the disturbance. "Stimulus generalization" can help by filling in the
gaps between specific discriminative stimuli, and "satiation" can keep the
process from running away with itself, but these are just patches to fix
flaws in the basic explanation. The patches are needed because the
explanation is not right for the kind of system that actually exists.
In PCT, we explain what happens by saying that it is not a specific act or
action that is learned, but a whole closed-loop control system that is
acquired. The organization of the control system is such that a deviation
of a perception from a reference signal will activate behavior in
proportion to the amount of error and in a direction corresponding to the
sign of the error. As this system becomes organized through random
variations of its parameters (or in any other way you'd like to suggest),
control of the perceptual signal gradually develops, so that for _any_
disturbance or _no_ disturbance, the system will act to bring the
perception (and what it represents) toward a specific preselected
goal-state. What is learned is a relationship between action and error, not
between specific actions and specific stimuli (although that kind of
relationship can itself be perceived and become a controlled variable).
I'm not advocating a traditional reinforcement view, which unfortunately
does not distinguish between two distinct mechanisms -- reorganization and
control (or perhaps I should say that it employs one mechanism to accomplish
both jobs, when two mechanisms are required). What I'm advocating is a
selection mechanism for building new control systems, one that is much more
efficient than random reorganization, when conditions are such that it can
work.
That's fine with me -- presumably, at some level what becomes organized are
systems for doing systematically what the reorganizing system can only
accomplish by random changes.
I'm certainly not talking about any mechanism that "makes the same
causal act occur again" -- the animal learns to perform the act as a means
of obtaining the consequent event, it is not driven to produce those acts
willy-nilly.
But here is the same problem again. You're assuming an environment that is
regular enough that performing the same act will produce the same
consequent event. I'm talking about an environment -- the real one outside
the lab -- in which the only way to produce the same consequent event is to
perform a _different_ act. If you perform the same act in this environment,
a different consequent event will happen, because of disturbances.
The reason you don't believe in this variable environment of mine is that
you always speak qualitatively about behavior and consequences. I reach for
a cup; I pick up the cup; I drink the water. Those are all qualitative
statements. But if reaching for the cup were strictly open-loop, it would
take only a few ounces of force, perhaps from stretching my clothing, to
make my hand miss the cup; only a bit of skin dryness for the cup to slip
out of my hand; only a slight misorientation of my head for the water to
dribble out of the corner of my mouth. The real environment is teeming with
small disturbances which would be enough to throw behavior completely off
target if there were no feedback control working all the time. It's ONLY
this feedback control that enables us to accomplish the same end twice in a
row. And because the control is so good, we don't even notice the
disturbances: they have no noticeable effects!
Of course they do have noticeable effects, but you have to be sharp to
notice them.
I am very skeptical of this proposal. I think that if the system had to
wait until some variable were brought near to an intrinsic reference level,
the animal would rarely if ever learn what to do.
Think quantitatively, please. It's not as if changes occur at random until
by luck they bring some intrinsic variable to its reference level, stopping
the changes. I'm specifically proposing the E. coli method, in which the
intrinsic error controls the rate of reorganization, and selects strongly
for consequences that bring the intrinsic variable closer to its reference
level. I'm willing to entertain other ideas, but so far this is the only
one I know of that would actually work.
The delays involved are
generally of sufficient length that the animal would already be reorganizing
into something else before the error were reduced sufficiently to halt the
reselection process.
You've seen the E. coli demo, I hope. I'm sure a skeptic could say the same
about it. But whether it works or not depends on what parameters you assume
for the reorganizing system. If you make the gain too high, exactly what
you describe will occur; you can model it. But why assume that the gain is
too high? Why not assume that it's just right?
On the other hand, if you make the rate of change of the parameters of the
control system depend on the size of the remaining intrinsic error, and let
the direction of parameter changes follow a random walk in hyperspace, you
can get this system to converge very efficiently to a final state. So why
not assume that design, instead of one that doesn't work?
Consider the hungry rat that bumps into the lever and
immediately receives a food pellet. (The rat has already been magazine
trained so that approaches the food cup and feeds as soon as it hears the
feeder operate). One pellet clearly is insufficient to bring the animal's
state of nutrition back near some intrinsic value, yet as I have observed,
sometimes this _single experience_ is sufficient to produce a quick return
to the lever and a second press, and with only a couple more such
experiences the animal's performance becomes organized into a sequence of
approach the lever, press/release, approach the cup, eat, approach the lever
. . . To my mind, random reorganization simply cannot account for this
dramatic change in behavior after the delivery of only a few pellets.
Again, that depends on the parameters of the model, and the detailed
design. You can make the intrinsic perceptual signal rate-sensitive, so
that each jolt of food causes a momentary large decrease in the error. If
there's not much left by way of needed changes, the next error might be
still smaller, and learning could be very rapid.
On the other hand, perhaps a systematic logical reasoning process, if
plausible in a rat, could be learned or inherited, and would reason as you
suggest. Higher systems can, after all, alter parameters in lower systems,
although we haven't talked much about that in PCT. I have nothing against a
more efficient process, although I'm suspicious about systematic
algorithms, considering how much of the environment is unpredictable
(evolution really couldn't be expected to equip a rat to deal with a
Skinner box).
But whatever the process, what it teaches the organism must at least work
properly. If the process teaches the animal to repeat the action that
produced the food the last time, this will not work in a real environment,
because there are always sufficient natural disturbances to assure that
repeating the same act will not have the same result. Yes, the organism
must learn to make the same result occur again. But the way to do that in a
real environment is to acquire a control system that monitors the
result-variable, compares it with a reference level, and lets the error
drive the action that affects the result-variable. Since disturbances are
always acting, we will, in general, see different acts occurring to keep
the result the same. But that's just what control systems are organized to
do automatically.
Of course in a special environment where the results are protected against
disturbances, we will see the same acts occurring each time, but that's not
indicative that the animal has learned to produce the same acts each time.
This appearance is an artefact of the special disturbance-free environment.
There is also the problem of how the reorganization system "knows" which
systems to reorganize and which to leave be. And why old organizations
often remain intact after they supposedly have been reorganized. But those
are other issues; for now I want to focus on this issue of efficiency.
I agree that these are problems, the more so in more complex organisms. One
possible answer is local reorganization. Another is some mechanism for
directing reorganization to areas of the brain where error signals are
abnormally large. But we must still provide a way for physiological errors
to cause changes in behavioral systems that have nothing directly to do
with them. Without that ability we couldn't explain why a hungry animal
learns to perform arbitrary acts which have no relationship to hunger
except that they happen to produce little pellets that turn out to be
nutritious.
But explaining how those little pellets make the brain systems in control
of the behavior that produces them more likely to act is even harder than
explaining how their lack might induce random changes of organization in
the right places. Reinforcement is harder to explain than reorganization.
Best,
Bill P.