Wither E. coli

[From Rick Marken (941101.1410)]

Bill Powers (941101.0740 MST) --

I think we need to take a brief break from the E. coli modeling and
spend some time with the basic model of a control system.

Great idea. But let me first make some comments about why I developed the
E. coli demo in the first place.

Bill Powers learned about the E. coli method of navigation years ago when he
read Koshland's book on the subject. He realized that this "random walk"
based control process was exactly what he had proposed as the means by which
control systems might reorganize; he wrote up a simulation and found that
the E. coli process was remarkably efficient. He showed this simulation to me
in 1980 or so.

Sometime later I had the idea of seeing if a person could navigate the way
E. coli does. So I set up a "game" where a dot moves at a constant rate
in some direction on a computer screen until the person presses the space
bar, generating a "tumble", which is a new, randomly selected direction for
the dot. The person is to move the dot to some target position on the screen
by pressing the space bar. People have no trouble doing this right off the
bat -- getting the dot to the target position of their choice by pressing
the space bar appropriately. This is the E. coli effect.

Then I realized that this E. coli effect seems to present a challenge to
what I understood to be the reinforcement view of behavior. B. F. Skinner had
recently written a paper in Science called "Selection by consequences" in
which he argued that behavior is selected BY its consequences. Well, in the
E. coli effect, the consequences of behavior (pressing the space bar) are
RANDOM; each of the 360 possible dot movement directions that might occur
after pressing the bar (including the direction in which the dot had been
moving before the press) has a probabilty of 1/ 360. Taking Skinner at his
word, it seemed that, if behavior is selected BY its consequences, random
consequences should select random behavior. So one might expect that the
probability of a bar press in the E. coli situation would be random -- the
person should be as likely to press the bar at one point in time as
at any another. In fact, however, the bar pressing is not random; it is
systematic; the subject consistently presses the bar in a way that moves the
dot to some target position on the screen and keeps it there.

I tried to publish the results of my studies of the E. coli effect in
Science, as an answer to Skinner's article. The paper went through at least
two harrowing reviews and revisions but, ultimately, it left the reviewers
nonplussed. I, of course, have saved all the reviews; when PCT finally makes
it they should make very amusing reading, if they can still be read on the
decaying paper.

It is in these reviews that I learned of all the different ways that
reinforcement theory can handle the E. coli effect. The first approach was to
say that the results of a bar press were not really random. A press would
tend to occur when the dot was moving away from the target so the chances
were that it would be moving in a "better" direction after the press. So the
result of a press after a "bad" direction was likely to be a "reinforcement"
(movement in a good direction) and, in the future it would be more likely
that the subject would press when the dot was moving in a bad direction. In
fact, the results of a press are random -- that's it. And, when I implemented
this verbal explanation as a model it produced a random walk, as expected.
The reviewers were not interested.

The second approach was to claim that reinforcement wasn't operating in this
task (behavior is selected by its consequences only when Skinner says it is,
apparently); the E. coli effect is a simple case of the operatation of
"discriminative stimuli"; movements away from the target are more strongly
associated with bar pressing than movements toward the target. The problem
with this approach, of course, is that the subject can voluntarily change the
target to which the dot is moving; when this happens, "discriminative
stimuli" that were once strongly associated with pressing suddenly become
"unassociated" with it. Clearly, it is the subject, not the stimuli, that
determine the "discriminative" value of stimuli.

A third approach invoked the "three term contingency" -- which is just a
combination of the disciminative stimulus and differential strengthening
approaches mentioned above. This explanation has the combined shortcomings of
both approaches.

A fourth approach was the control theory approach; some reviewers described a
process that worked like a control system, with a fixed reference (for
perception of movement directly toward the target). Of course, the reviewers
still called it a reinforcement model. I tried to show that the behavior
of such a model involves selection OF consequences (the particular
consequences selected being secularly determined by the setting of a
reference signal inside the control system) not selection BY consequences.
But to no avail.

I thought the E. coli demo would pose a real challenge to reinforcement
theory and the notion that behavior is selected BY its consequences. I
though it would lead reinforcement theorists to reorganize and understand
behavior as the control of perception. But nooooo. In fact, reinforcement
theorists were completely and utterly unfazed; they never broke stride. I did
this work about 10 years ago, and I was shocked at the (lack of) response.
But I was so much older then; I'm younger than that now.

Best

Rick