Reorganization, Killeen Machine, etc.

William_T_Powers4 · January 24, 1996, 6:18pm

[From Bill Powers (960124.1000 MST)]

Bruce Abbott (960123.1305 EST) --

I thought that the preferred descriptor for behavioral output was
_action_, not act.

My goal is always to pick words that indicate exactly what part of the
control process I am talking about. As Rick Marken pointed out, "Act" is
defined in the dictionary so it refers both to the output action and to
its result. "Behavior" has a similar problem. These ambiguities lead to
difficulties when we notice that actions have more than one consequence
at a time, and that the same event can be a consequence of different
prior events. If I back my car through the closed garage door, breaking
the door would qualify as an "act" or a "behavior" under many
definitions of these terms. Yet if the same consequence occurred for
some reason other than my operation of the car, it would no longer be
called an "act" or a "behavior" of mine -- for example, if the car had
been left in reverse by someone else or the accelerator pedal stuck. PCT
gives us a way to speak of _intended_ consequences and separating them
from unintended ones; it also distinguishes actions from the
consequences they affect. Even more important, it shows us how
disturbances, or independent causes, get into the picture and it unlinks
specific acts from specific consequences. We can see that learning to
produce a specific consequence is not the same as learning to produce a
specific action.

Language developed before control theory; it's not surprising that
normal usage of common terms fails to make distinctions that we now know
are important.

···

-------------------------
RE: incentives

The common view of an incentive is that it has some special power over
behavior. When a manager offers an incentive of higher pay for more
work, the naive expectation might be that more work should result. This
misconception is maintained by the fact that many people already want
more money than they are getting, so when the manager supplies a means
for getting it (doing more work) the people will take advantage of this
opportunity and work more, thus increasing their pay. Farmers, on the
other hand, are sometimes offered incentives for doing less work; in
this case, many of them take advantage of the opportunity to get the
payments by producing less work. But not all wage-earners or farmers
will behave this way.

Samuelson, in one edition of his big text on economics, footnoted the
"anomalous" fact that certain primitive people, when given a raise in
pay in the effort to get them to build roads faster, worked even fewer
hours than before. They were already getting as much pay as they wanted.
What makes something an incentive is that (a) you want it, (b) you are
getting less of it than you want, (c) a means of acting is available
(and within your abilities) which will increase the amount you are
getting, and (4) the means itself does not make your life worse in some
other regard, or overall.

It is always the organism that determines whether any given input will
give the appearance of being an incentive -- of causing a behavior that
the giver wants to see. But this appearance can be maintained only as
long as it is the organism's own actions that produce the incentive; the
incentive-giver does not actually provide any incentives directly, but
arranges the environment so that certain acts will automatically produce
the incentives. If, indeed, the incentive-giver does directly provide
the incentive, the result will be a decrease in any behavior that was
formerly producing it, not the desired increase. Welfare reformers,
please note.

Killeen and others are quite mistaken in thinking that for each
incentive given, there are so many seconds of responding that result. To
disprove this notion, all you have to do is start giving extra
incentives by arbitrarily adding them to the ones being produced by
behavior. If the incentives themselves were the cause of behavior, more
behavior should result. In fact, less behavior will result. This can't
be predicted from any organism function in which output rises when input
rises -- but it can be predicted from a PCT model in which an increase
in input reduces error (or creates an opposite error) and thus reduces
action.

Try this with any control model you have running. Just add more of the
controlled variable periodically and independently of the behavior. The
behavior will decrease.
--------------------------------------
(Bruce to Rick):
     Or in words you would rather not hear used, by reducing intrinsic
     error it leads to the cessation of reorganization, which leaves the
     current candidate reference in place or "established." Hmmm, so
     the occurrence of the incentive _does_ lead to the establishment of
     a reference for striking the key, in your view. For a moment there
     I thought you were disagreeing with me. (;->

It also leads to not pressing the key, or to pressing a key with a red
light on over it, or to turning in a circle, or to going through a
particular door -- in other words, it leads to any behavior whatsoever,
depending on the effect that behavior has on producing the incentive. So
there is no general way to link an incentive to any particular effect on
any particular behavior.

When you are dealing with a specific case, it appears that an incentive
can cause a particular act like pecking on a key. So you are misled into
thinking that this particular incentive must have something innate to do
with making animals peck on keys. You begin to think that food given
under certain circumstances must be hooked up to the pecking muscles of
this species of animal. You begin to wonder what is in the food that
gives it this special effect. You start putting adulterants into the
food, to see if you can alter its effect on key-pecking. You start
changing the external circumstances to see if that will alter the
connection between food intake and key pecking.

And all the while, you are forgetting that in a different situation,
exactly the same food intake would seem to be hooked up to some
completely different behavior. Having fastened on the food itself as
causing the behavior, you are simply looking in the wrong place for an
explanation. The food has no way of knowing what action will produce
more of itself. What makes the food seem to have a special effect is the
fact that the organism has less food intake than it wants, and what
hooks the food intake to any particular action is a system in the animal
that is actively trying different actions, looking for the one that will
affect the food intake. When it finds an action that will bring the food
intake closer to the desired amount, it continues to produce that
action. If possible, it produces just enough of that action to bring the
food intake (or its own food content) to the desired level -- and no
more.

This is a very simple and clear picture of operant conditioning. But
when you start looking at the same phenomena with the assumption that an
incentive is something more than a variable that the organism desires to
be in a particular state, all sorts of difficulties arise that require
elaborate and fanciful adjustments of the model, as well as arm-waving
arguments that avoid precision. Just look at all the background
assumptions that Killeen has to make in English, relying on word-
associations rather than mathematics to make his point. He has a
compelling intuition about how the system is working, but this intuition
is based on an unsupportable assumption.
--------------------------------------
     This it where I have trouble with this explanation. Reorganization
     is said to come into play when "intrinsic" variables are in
     "persistent" error, and to stop when this error is "corrected."
     That's a fairly slow process. I don't see how one brief grain-
     delivery (or even several) is going to have much impact on the
     level of deprivation (intrinsic error), yet the pigeon goes back
     and repeats what it just did (in terms of result of action).

You exaggerate. It takes more than a few grain deliveries to get the
pigeon pecking consistently in the right place. I seem to recall a
discussion about shaping, a couple of months ago, in which it was
remarked how hard it is to get pigeons to learn to peck on a key to get
food -- how one has to start by rewarding any move in the right
direction, and then gradually and patiently narrow the requirements, so
at any given time only a small change in the pigeon's behavior is
required. The time scale seems quite appropriate to the operation of a
reorganizing system.

The best reorganization model I have been able to come up with uses both
the magnitude and rate of change of intrinsic error to determine the
speed of reorganization. A random reorganization of the deltas added to
system parameters occurs when the rate of change of squared (or
absolute) intrinsic error is positive. The _amount_ of change of the
parameters is proportional to the squared or absolute error, so change
slows as error decreases. When intrinsic error is decreasing, no matter
how large it is, there is no reorganization: the parameters continue to
change at a rate determined by the current set of deltas, and by an
amount proportional to the remaining squared or absolute error.
Eventually, of course, the parameters will change past their closest
approach to the optimum combination, the error will begin to increase
again, and a reorganization will occur -- the deltas will all be
reselected at random between positive and negative limits. If the error
still increases, another random selection will occur, and so on until
once again the parameters are approaching the minimum-error settings. As
long as not too many parameters are being adjusted at once (see Martin
Taylor's post of yesterday), the intrinsic error will continually
decrease until the least achievable error is found.

[Footnote: "deltas added to system parameters" might be likened to the
rate at which synaptic connections are formed and strengthen (positive
delta) or are weakened or removed (negative delta). A reorganization
would alter whatever it is that determines whether a connection is
becoming more effective with time or is decaying.]

So you can see, reorganization is not just a matter of "error or no
error." It is a quantitative process, and operates in a continuum of
change. Increases in error lead to immediate reorganizations, but
decreases in error may continue for a long time without any
reorganizations occurring. The error may simply be diminished by this
process, if circumstances prevent complete erasure. But the tendency
will always be toward a state of minimum error.

That's just one reorganization model, relying only on random changes.
There are probably other models that would work at least as well. But
this is a worst-case model, requiring an absolute minimum of built-in
intelligence and skill. The adaptive Extended Kalman Filter approach of
Hans Blom can find an optimum parameter set much faster, but to do so it
requires advanced computational abilities and a great deal of
_understood_ information about its environment. These computational
abilities (if they actually exist) must have come from somewhere -- and
heridity is a dubious possibility. Random reorganization, to my taste,
is a better explanation for how higher brain functions come into being.

     The basic idea -- that you stop looking for a solution to a problem
     the moment you find it -- seems reasonable enough, but the
     mechanism involved in identifying the problem and the solution
     seems too sluggish and too dependent on error at the highest level
     in the hierarchy.

There are some who seem to disagree with me, but I don't consider the
reorganizing system to be part of the hierarchy, much less its highest
level. Its inputs are not perceptual signals in the hierarchy, and its
output is not a reference signal entering any system in the hierarchy.
The variables monitored by the reorganizing system are intrinsic to the
organism and few of them are available in the form of sensory
perceptions. The action of the reorganizing system is strictly on the
parameters of the hierarchical systems -- the connections among them and
the internal forms of the functions, neither of which features of the
hierarchy is represented perceptually. It does not produce signals in
the hierarchy directly.

I think of the reorganizing system as operating throughout, and
separately from, the hierarchy of behavioral systems. Furthermore, the
reorganizing system must be assumed present and functioning from at
least birth and possibly from conception, in order to explain how the
hierarchy develops. The same reorganizing system, operating the same
way, functions from birth to death. It functions even before the nervous
system has become organized at the level of spinal reflexes. There is no
time during the growth of the organism when we can do without it. So it
has to operate simply, in a way that could be inherited, in a way that
requires zero intelligence or skill -- particularly higher levels of
skill, like mathematical computation.

I sympathize with your hunch that random reorganization will be too
sluggish to accomplish much very fast. I felt the same way too, and did
not make much of the reorganizing model until I read about E. coli in
the early 1980s and did some experiments in simulation. I was astonished
at how efficient this process is -- it worked orders of magnitude faster
than I thought it could. Only then did I start pushing the idea of a
random reorganizing system more seriously.

In reorganization, E. coli style, we do not just wait for a succession
of random changes to finally hit on a workable solution. There is a
measure of how far from a solution we are, and there is a bias on how
rapidly the random changes are made. When the result of a random change
is an increase in this basic measure of error, there is another
reorganization right away, as soon as the increase can be detected. But
if the system is changing in the right direction, reorganization is
postponed; a very high weight is given to movement in the right
direction, and a very low weight -- duration -- to changes in the wrong
direction. This biases the random walk VERY STRONGLY. As I've remarked
before, E. coli progresses up the gradient at least half as fast as it
would if it could simply turn in the right direction and swim directly
upstream. To me, this fact put an entirely different light on a random
reorganizing system.
-----------------------------------------------------------------------
Bruce Abbott (960124.1055 EST) --

     Schedules of reinforcement were attractive because they produced
     interesting, stable patterns of behavior that, it was thought,
     would soon yield to detailed analysis of the factors that appeared
     to be at work, such as temporal discrimination. This has not
     proved to be the case, and some researchers have been arguing that
     simpler situations may provide a more productive field for
     research.

A similar situation existed when behaviorism was born. Watson thought
that simply by observing the antecedent conditions under which behaviors
took place, we could eventually build up a cause-effect catalogue and
thus achieve complete prediction and control of behavior. But it didn't
work out that way: this nasty old "variability" showed up, and
psychology has been struggling with the consequences ever since.

There has to come a time when a scientist finally admits that his
difficulties are arising not from lack of cooperation by nature, but
from having picked the wrong model at the start.

Let's try running after we've learned to walk.

Amen.
----------------------------------------------------------------------
Best to all,

Bill P.