IJHMS, No selection by consequences

[From Rick Marken (941217.1415)]

Just back from traveling again. Glad to see everyone was busy on
the net.

Martin Taylor (941212 14:15) --

Do you think that such a combined paper is a possibility?

Never mind my earlier suggestions. I will try to write a paper on PCT
methodology, with a few comments about how this methodology can
be applied to HCI engineering. I'll try to have a draft to you by Feb 1.

CHUCK TUCKER (941214) --

I have read the posts on this topic with my eye towards some
resolution... I do not understand what was being done with the
various simulations of ecoli.

Bill Powers (941216.1410 MST) gave his take on it. Here's mine.

In my E. coli demo, people are able to produce a consistent result
(move a dot to a target) despite the fact that the consequences of actions
are random; actions are bar presses and the consequences of these
actions (reinforcements) are random changes in the direction of the dot
on the screen.

I developed this demo to show that purposeful behavior (keeping a
spot on a target) cannot be viewed as selection by consequences;
random consequences (the random directions of spot movement after
each bar press) cannot select a non-random result (spot on target).

Bruce Abbott did not believe that this demo showed anything of the
kind and he tried to build a reinforcement model to explain the
controlling that is actually observed in the E. coli demo. His first try
failed because it was not a reinforcement model; it was a control model.
His second try was, indeed, a reinforcement model (reinforcement was
proportional to the direction of spot movement after a bar press; he
called this the "gradient") and the model failed as expected; the behavior
of this model was a random walk rather than purposeful behavior.

Finally, Bruce developed a reinforcement model that seemed to
produce purposeful behavior via selection by consequences. In this
model, reinforcement was proportional to a _change_ in the direction
of the spot before and after a press; if the direction after the press was
"better" (more towards the target) than before, then this was a
"reward"; if the direction after the press was "worse", this was a
"punishment".

These rewards and punishments "selected" the values of two
probabilities; the probability that the model would press the bar given
that the spot was moving toward target and the probability that the
model would press the bar given that the spot was moving away from
the target.

This selection by consequences model worked because the consequence
of a bar press is most likely to be a reward if there is a press when the
spot is moving away from the target and to be a punishment if there is a
press when the spot is moving toward the target. The result is that the
probability of pressing the bar when the spot is moving toward the target
gets small and the probability of pressing when the spot is moving
away from the target get large. The model works, then, because it ends
up pressing the bar when the spot is moving away and not pressing when the
spot is moving toward the target.

I then changed the conditions of the experiment so that it would no longer
be true that the consequence of a bar press is most likely a reward if
there is a press when the spot is moving away from the target and a
punishment if there is a press when the spot is moving toward the
target. When I made this change (which I called "getting rid of an
artifact" from the experiment) Bruce's selection by consequences
model no longer worked because consequences (rewards and punishments) no
longer selected probabilities of pressing that acheived the goal. The
selection by consequences model did what Bill Powers said it did. It:

failed under a fairly simple change of conditions

The control model (a selection OF consequences model) and human subjects
did NOT fail (to control) under this change of circumstances.

The failure of Bruce's selection by consequences model under this
"fairly simple change of conditions" is very important, not only
because the control model did not fail under these conditions but also
becuase people did not fail under these conditions either.

Bruce's selection by consequences model fails to capture a very important
characteristic of REAL control behavior; the ability to maintain control of a
variable (the position of the spot) despite changes in the feedback function
relating actions (bar presses) to consequences (direction of movement after
the press). This change in the feedback function is the "fairly simple change
of conditions" that Bill alludes to in his post.

The result of this set of demos should have been the realization that
selection by consequences is incompatible with the control exhibited
by people in the E. coli demo. The subject cannot vary actions (the interval
between bar presses or the probability of a bar press) as necessary in order
to produce a consistent result (the spot remaining near the target) if those
actions are selected by their consequences. In order to control, actions
must be free to vary, as necessary (that is, as determined by the error
signal in the control loop).

Depite these demonstrations, I suspect that Bruce Abbott does not yet see
how fundementally incompatible the notion of selection by consequences is
with the fact of purposeful behavior. His intuitions about behavior come
from years of thinking about behavior from a behaviorist perspective. As Bill
Powers (941216.1330 MST) says:

If you have trained it [ your intuition] to look for environmental
causes of behavior, you will interpret the loosening of the bottle-cap
[the consequence] as acting on the organism somehow to make it
produce the action that creates the loosening.

If one is dedicated to (controlling for) this point of view (system
concept) then demonstrations that are not consistent with it are
disturbances to be delt with in any way possible. That's why Bruce has
not yet said: "Geez. You're right. Control cannot be viewed as selection
by consequences. What psychologists have been doing for the last 100 years
is based on a misconception about the nature of behavior; environmental
consequences of actions don't select actions; organisms control the
consequences of their actions. I'd better start my studies of behavior all
over again and try to discover the consequences (variables) of their actions
that organisms control, how they control them and why."

Soon, perhaps.

Best

Rick