600 pound E. coli

Richard_Marken · June 26, 1995, 4:02pm

[From Rick Marken (950626.0900)]

Bill Powers (950624.0620 MDT) --

It occurs to me that we've been missing our best bet for a task to
compare PCT and reinforcement theory: a simple compensatory tracking
task.

This task sounds fine but I think it is unlikely to have any more impact on
reinforcement theorists than my E. coli experiment.

I was cut off from the net this weekend so I was unable to post the following
analysis in a timely manner. But, as you will see, I have concluded that the
results of the original E. coli experiment are as clear a rejection of
reinforcement theory as you are likely to find. The fact that reinforcement
theorists don't see this as a rejection of their model shows that there is
nothing -- nothing at all -- that will convince reinforcement theorists that
their model of behavior is wrong.

Here's what I wrote Saturday but was unable to post until today.

···

----------------
[From Rick Marken (950624...)]

Bill Powers (950624.1005 MDT) --

In a real science, when a prediction fails you don't excuse
it by saying that something unobserved must have happened in
just the right way to keep the theory true. You investigate
why it failed and produce the data that explains the failure.

The basic question is whether behavior really works as each
side says it works. Resolving that question requires thinking
up tests that could give preference to one theory over the other.
It requires taking failures seriously and using only observed
data to explain them.

All sentiments which I applaud.

If reinforcement theorists were, indeed, playing science rather than
defending the faith they would have seen that the results of the E. coli
experiment gave a clear preference to PCT because they demonstrate the
failure of a prediction of reinforcement theory.

I have re-read the original E. coli paper (p. 79 - 85 in _Mind Readings_) and
see that the results show quite clearly that a prediction of reinforcement
theory (selection by consequences) fails. The mistake I made on the net was
to invite reinforcement theorists to explain the results of that experiment.
This was a mistake becuase I understand reinforcement theory better than the
reinforcement theorists . What the reinforcement theorists did is just what
Bill describes above; they excused their failure to predict the E. coli
resutls by saying that something unobserved must have happened in just the
right way to keep their theory true.

What I did in the E. coli paper (which I didn't do on the net) was actually
test reinforcement theory. I did this by measuring the reinforcement value of
each of the consequences of responding. The reinforcement value of a
consequences was measured as every textbook on reinforcemeent theory says it
should be measured; as the tendency of a consequence to increase or decrease
the probability of the response that produced it.

I measured the probability of a response as the inverse of the time to repeat
the response that produced the consequence; the smaller the time until
reponse repetition the higher the probability of the response. The
reinforcement value of a consequence is then measured as the probability of
the response that produces that consequence. The results looked like this for
most subjects:

       > x
       > x
       > x
  P(R) | x
       > x
       > x
       > x
        -------------------------------------------
          0 180
          Consequence of R (Angle relative to target)
                         Figure 1

If the consequence of a response (bar press) is movement toward the target (0
degree angle relative to the direction of the target) then there is a very
low probability of repeating the response; if the consequence of a response
is movement away from the target (180 degree angle relative to the direction
of the target) then there is a very high probability of repeating the
response. So movement toward the target is the least reinforcing consequence
of a response, movement away from the target is the most reinforcing
consequence of a response, with reinforcement value varying linearly between
these extremes.

Reinforcement theory says that the different directions of movement that
follow a press will differentially strengthen the responses that produce
them; since all directions of movement are equally probably, responding will
be randomly reinforced.

So reinforcement theory predicts that the responding in this experiment will
be random (this prediction was confirmed by running a reinforcement model ---
including discriminative stimuli -- of the task; the reinforcement model
responds randomly; the result is that the dot does a random walk around --
and off-- the screen).

The actual results of the E. coli experiment are not those predicted by
reinforcement theory: responding is not random. Responding is actually quite
systematic: the result of responding is NOT a random walk; it is control. The
dot moves toward and remains near one of the three targets on the screen.

Reinforcement theorists (the reviewers of the E. coli paper) simply would not
accept the notion that the results of this experiment are not what are
predicted by reinforcement theory; they claimed that the results are
predicted by reinforcement theory, but they never explained why.

Finally, Bruce Abbott came up with an ad hoc explanation of the E. coli
results, ostensibly based on reinforcement theory. The explanation was based
on the assumption that the reinforcing consequence is not the direction of
dot movement after a response; it is _change_ (from before to after the
response) in direction of dot movement.

What I didn't notice when Bruce proposed this explanation was that the choice
of change in direction as the reinforcer was completely "extra-theoretical";
change in direction was selected as the reinforcer because it was just what
was needed to keep reinforcement theory true.

In fact, change in direction of dot movement following a press would be
REJECTED as a reinforcer by reinforcement theory because there is no
differential reinforcement value of this consequence. The best way to see
this is by noting that the probability of a response following a particular
direction relative to the target (Figure 1) is the same regardless of the
direction of movement prior to a response. The reinforcement value of
moving, say, toward the target after a press is the same regardless of the
direction of movement before the press; there is no differential
reinforcement (measured as specified by reinforcement theory) based on change
in direction of movement; only the direction of movement after the response
affects the proability of response.

So reinforcement theory would have to predict that using _change_ in
direction of movement as the reinforcer would produce the same result as
using direction of movement: the prediction is a random walk IF _change_ in
direction, like direction, is a random consequence of responding.

But using change in direction as the reinforcer does not produce a random
walk! Why? The answer, of course, is "because change in direction after a
response is NOT random". In fact, a change away from the target is more
likely when moving toward it and a change toward the target is more likely
when moving away from it.

So, by changing the definition of the reinforcer to a change in direction,
reinforcement theorists played a trick and made it seem like their theory
predicted a result that it does not predict. The trick was to call a
consequence of responses a reinforcer when it was NOT a reinforcer; since
this consequence is also non-random, the result of using this consequence as
a reinforcer is non-random responding.

The E. coli demo was aimed at showing that people can respond systematically
even though the reinforcing consequences of their responses are random.
Reinforcement, which says that responding is selected by consequences,
predicts that people will respond randomly if the reinforcing consequences of
their responses are random. The results of this experiement clearly reject
the predictions of reinforcement theory.

But the reinforcement theorists would not accept this failure of prediction;
their theory MUST be true. So they used a trick to make it seem like
reinforcement theory could produce systematic results when consequences are
random. They did this by identifying a consequence of responding that is NOT
randomly related to reponses (change in direction after a response) and
declaring that this consequence is the "real reinforcer" when, in fact, their
own theory would REJECT it as a reinforcer.

Unfortunately, I fell for this trick and accepted the reinforcement
theorists' pseudo- reinforcer (change in direction after a response) as a
reinforcer. So I then showed that if change in direction after a response
is made to be randomly related to responses (by biasing the probability of
the direction of movement after a response) the reinforcement model again
predicts random dot movement. Again, the actual result, with a human
subject, is systematic responding, not the random responding predicted by
reinforcement theory.

So even using the pseudo - reinforcer as the reinforcing consequence of
responses, the prediction of reinforcement theory (random responding with
random reinforcement) fails. To my knowledge, this failure was simply ignored
by the reinforcement theorists.

The fact of the matter, however, is that the original E. coli experiment
rejected reinforcement theory clearly and decisively.

The results of the E. coli experiment cannot be explained by reinforcement
theory (without going outside the theory). However these results can be
explained by a control model that acts in order to make a sensed
representation of the results of its responses match its own specification
for what those sensed results should be. This model can also explain all the
results that reinforcement theory is supposed to be able to handle (such as
the basic operant control task as well as control in the face of feedback
function -- schedule -- changes).

The bottom line is that reinforcement theory says that responses are selected
by their consequences; some consequences strengthen responses; other
consequences weaken them. If this is true, then random presentation of
stengthening and weakening consequences should produce random responding.
Control theory says that organisms select consequences, not vice versa;
responses are made in order to keep consequences in the reference state
selected by the organism. Random changes in consequences are just a
disturbance that will be resisted by responses.

When the reinforcement and control models of behavior were pitted against
each other in the E. coli experiment, the reinforcement model failed; real
organisms control consequences, they are not controlled by them.

The appropriate response to the E. coli experiment would be the abandonment
of reinforcement theory in favor of testing the appropriate model of control
-- perceptual control theory. Instead, behaviorists continue to defend a
theory that has been decisively rejected. This is what PCT is up against. I
don't know if there is any way to deal with people who simply will NOT accept
demonstrations that their theory has failed.

Best

Rick