Artifactbusters II

[From Rick Marken (941124.1430)]

My E. coli demo was designed to show that consequences cannot select
the actions that produce intended behavioral results (such as keeping a
spot near a target). Subjects were able to keep a spot near a target by
pressing a bar despite the fact that the consequences of this action (the
direction of spot movement after the press) was random.

Bruce Abbott developed a model that seems to show that consequences
can select the actions that produce an intended behavioral result. Bruce's
model is based on the "law of effect"; the probability of an action
("tumbling") is changed depending on the consequence of the previous
action. If the previous action (tumble) produced an improved result (a
larger gradient of attractant) then the probability of doing this action
again under the same circumstances (the circumstances being the value
of the gradient before the tumble) is decreased; if the previous action
produces a worse result (the same or a smaller gradient of attractant)
then the probability of doing this action again under the same
circumstances is increased.

Bruce's model learns how to generate the actions that produce the
intended result (keep the spot near the target). The learning is
represented by the change in two probabilities: the probability of a
tumble when going up the gradient, p(t|up), and the probability of a
tumble when going down the gradient, p(t|down). The learning
process results in p(t|up) being around .67 and p(t|down) about .33.
So the model is most likely to respond when then spot is moving down
the gradient and least likely when the spot is going up the gradient.

The result of the "law of effect" learning process in Bruce's model is a
control system model. This model keeps the perceived gradient
"increasing". It is not a reinforcement model because it's behavior does
not depend on the results of previous trials. This can be seen by simply
setting p(t|up) and p(t|down) to appropriate constant values. This
"cuts off" the model from the consequences of past actions. The
behavior of this model depends only on the current input gradient.
The actions that produce the intended result are not selected by their
consequences because the consequences of actions have no effect on the
actions that produced them (at least, this is true when p(t|up) and
p(t|down) are set by the modeller rather than learned by the law of
effect system).

Although control behavior is clearly not selected by its consequences, it
still may be the case that the parameters than produce control behavior
are selected by their consequences. I take this to be the point of Bruce's
"law of effect" model; the parameters that produce control, p(t|up) and
p(t|down) , do seem to be selected by their consequences. The
environmental consequences of tumbling (the improvement or worsening
of the gradient relative to what it was before a tumble) seem to change
p(t|up) and p(t|down) in just the right way so that the result
is an organism that controls. If the consequences of action (tumbling)
can be counted on to shape up an organism hat controls, then it is
certainly true (contrary to the basic theorms of PCT) that behavior (the
production of intended results, such as keeping a spot near a target) is
selected BY its consequences.

Both Bill Powers and I have suggested that Bruce's "law of effect" model
might work (build a control system) due to an artifact in the E. coli
situation; it is simply a fact of geometry that the gradient after a tumble
is highly likely to be worse than it was before the tumble if you were
moving up the gradient before the tumble; and the gradient after the
tumble is highly likely to be better than it was before the tumble if
you were moving down the gradient before the tumble. Bruce proved that
this is the case and addmitted that his model takes advantage of this fact;
the result is that p(t|up) and p(t|down) are increased and decreased
appropriately based on the result of a tumble (when the result of a tumble
is take to be the change from pre- to post- tumble gradient).

I tried to eliminate this artifact in various ways that were not
successful. I have finally discovered a relatively simple way to do it, in
one dimension anyway. In order to eliminate the artifact -- the fact that
the gradient is more likely to get worse after a tumble when moving up
the gradient and better after a tumble when moving down it -- two
things happen when there is a tumble: 1) a change the direction of spot
movement, but with a strong bias that the movement be the same as it
was before the tumble and 2) a random change the position of spot. For
example, suppose that the spot is moving in the y dimension only
(with the target position at zero). When an action occurs (a tumble) the
following events occur :

  put (random (19) - 10) + y into y
    if random(100) > 80 then
      put -inc into inc
    else
      put inc into inc
    end if

This is my "de-artifacting" code. It's written in HyperTalk but I'm sure
that you hyper smart PC types can figure it out. I will tell you that
random(n) returns an integer between 1 and n, inclusive; y is the
position of the spot and inc is the increment (1 or -1) made to y on each
iteration of the program. This "de-artifacting" process seems to work,
but I'm not sure why.

I compute the gradient on each iteration using Bruce's original
approach:

  put sqrt((x-xt)*(x-xt) + (y-yt)*(y-yt)) into dist
  put (100/(1 +.001*(dist+dist))) into NewNut
  put NewNut-Nut into dNut
  put NewNut into Nut

I have a HyperCard stack that let's a human subject, Bruce's "law of
effect" model or a control model try to keep a spot near a target. The
spot moves only in the Y dimension, up and down. You can chose to
run the task with or without the "de-artifacting" code above;. Without
the "de-artifacting" code, the "law of effect" model works just fine;
p(t|up) converges to about .67 and p(t|down) converges to about .33.
Once the convergence is complete the "law of effect" model behaves
just like the control model (it controls), keeping the spot near the target
despite disturbances (which can present during the "training" period
too).

With the "de-artifacting" code the "law of effect" model fails; it never
becomes a control model; p(t|up) and p(t|down) both converge to
values near .5. The "de-artifacting" code has no effect on the
peformance of the human subject or control model; both continue to
be able to keep the spot near the target; they maintain control.

The "de-artifacting" code shows that the "law of effect" results in a
system that controls only when the "effects" of actions happen to be
systematic (the results of tumbles, in terms of change from pre-to post
tumble gradient, are relatively systematic). A modeller can then take
advantage of this systematicity, writing code that uses it to produce a
control system. By doing this, the modeller makes it appear that
purposeful behavior is selected by its consequences (the effects of
action). In fact, purposeful behavior is only "selected" by its
consequences when these consequences are systematic;; when they are
not systematic (or, at least, not systemactic in the way required by the
clever model that is "shaped" by them) then purposeful behavior does
not occur; purposeful behavior was nover really selected by
consequences in the first place.

Operant (purposeful) behavior (according to PCT) is the control of
perceptual input. The inputs that are controlled in operant
conditioning experiments are perceived consequences of both actions
and independent environmental disturbances; these inputs
(consequences) are kept in reference states determined by the organism
itself; consequences select neither the actions that keep them in a
reference state nor the reference state in which they remain. Behavior
is not selected by it consequences; behavior is the selection of the
consequences that actions will have. The "law of effect" has been
repealed by control theory; it has been replaced by the "law of control of
effect".

The correctness of this new law will be rigorously tested in the operant
conditioning experiments to be conducted by the CSG's most vigorous
advocate of the old law, Bruce Abbott. The best way to see the
difference between the old and new laws is to compare the predictions
of each. Bill Powers has provided some of the predictions of the new
law for behavior in different operant conditiong situations; I am still
eagerly awaiting Bruce Abbott's catalog of the predictions of the old law
for behavior in the same situations.

Happy Turkeyday

Rick

PS. The HyperCard stack mentioned in this post is avaiable to anyone who
wants it.