More on E. Coli; ECOLI2

[From Bruce Abbott (941031.1500 EST)]

Bill Powers (941030.1845, 941031.0700 MST)

Bill, I'm delighted that you're having fun with my e. coli simulation, and
that you now have Turbo Pascal 7.0 to run it on. That will eliminate any
compatibility problems we might have had. Tom and Rick: Have you tried it?
You've been strangely silent. Bill, I have a few comments to make on your
analysis; these are followed by a second version of e. coli called ECOLI2.PAS.

OK, what is the "consequence" and what is the "behavior" here? I think
we can probably agree that dNut is the consequence; it's a perceptual
representation of the time rate of change of nutrient concentration just
outside the thingie. What is the behavior? I have a little problem with
this because it seems to be the execution of a tumble the instant that
dNut falls below its reference level of zero. It's hard to see how the
behavior could be affected by the consequence -- how could the behavior
change if it is always the execution of a tumble? About the only
dimension in which this behavior could change would be in rate, or in
number of tumbles needed to get within some distance of the goal.

In Traditional Reinforcement Theory (I'll call it TRT), reinforcers not only
alter the probability of behavior, they also serve to MAINTAIN it.
Thorndike's Law of Effect described the effect of "satisfiers" and
"dissatisfiers" on the future probability of behavior within the situation in
which the behavior occurred, but what was satisfying and dissatisfying had to
be determined by observing whether the organism would approach or withdraw
from the putative "states of affairs" that were to serve these functions. In
this model, satisfiers and dissatisfiers apply "forces" of attraction and
repulsion. The "strength" of these forces depends on factors such as
deprivation and intensity (e.g., magnitude or reinforcement, shock current
level).

My e. coli model is intended to illustrate control over behavior by
consequences as envisioned by TRT but is NOT a learning model. Thus your
search for evidence of learning in the model is misplaced. (Show me where the
learning takes place in YOUR e. coli simulation and I'll show you where it
takes place in mine.)

This system is a one-way control system with very high gain at one point
in its output response curve. The control region includes only values of
dNut less than the reference signal. The system will act to prevent dNut
from becoming less than the given reference value. For the most part it
succeeds, although because the output effect is in the wrong direction
after half of the tumbles, there are brief moments when dNut goes
considerably negative relative to the reference signal. But it is
quickly brought to the region equal to or greater than the reference
signal, where it is no longer controlled.

Precisely! So, my TRT model is REALLY a cleverly disguised control system!
That's hard to see, because there is no mention of a reference level, feedback
loop, gain, output function, or perceptual signal. These components lurk
hidden in the alien vocabulary of "reinforcers," "punishers," "contingencies
of reinforcement," "strengthening," and so on. This is NOT to say that TRT
theorists recognize it as such (they usually don't), but I think we have made
important progress here in understanding why these folks do not immediately
see why TRT is wrong...it seems to work!

The flaw in the concept that consequences govern, determine, affect,
influence, control, or select the behavior that causes them is in the
failure to distinguish between the reference-consequence and the actual
consequence. Only the actual consequences of behavior can influence
future behavior. There is nothing to say that any _particular_
consequence will be brought about. Implicit in the concept behind
control by consequences is that a _particular_ consequence is
controlling behavior so as to produce it, a consequence which has not
yet occurred, but will occur if behavior goes on long enough.

Under TRT, reinforcement can act to maintain behavior as well as strengthen it
(up to some limit). Thus, with e. coli, the fact that moving forward produces
an increase in nutrient level will serve to maintain that behavior. In my
simulation I assumed that e. coli was going to move forward anyway, so I did
not explicitly include this maintenance function in the model. For simplicity
I chose only to explicitly model the effect of "punishment" as tumbling.

But why _that_ final rate and not another? Why not zero? Why not any
observed rate? The problem is that there is nothing to determine what
the final consequence will be, and therefore nothing to determine what
the final behavior rate will be. There is a missing relationship in this
conception.

What is missing, of course, is the reference setting inside the
organism: how much "consequence" the organism wants. What actually
happens is that the organism varies its behavior rate until the
consequence matches the amount of that consequence specified by a
reference signal inside the organism. That is why both the consequence-
measure and the behavior-measure come to a final equilibrium condition.

But of course! And that is why reinforcers and punishers are viewed as having
these mystical attractive and repulsive properties. It is why my poor e. coli
will continue moving up the concentration gradient even if the concentration
becomes high enough to kill it. But THIS is where TRT fails, not in
selection-by-consequences. In its crude, inept, and misguided way, TRT IS a
model of a control system, as I think my e. coli simulation demonstrates. But
by failing to recognize the reference, by ignoring in purely verbal
descriptions the role of feedback, and by mistakenly placing the emphasis on
the disturbance-behavior relationship, TRT commits a set of fundamental
errors. So close, and yet so far!

The final state of the consequence does not reach back through time to
guide behavior toward creating itself.

Nor, as you pointed out (quoting Skinner), does TRT assume so.

Instead, a present-time setting
of a reference signal (with the associated control system) continuously
determines how both behaviors and consequences will change from their
current states toward a final state. Seeing consequences as determinants
of behavior is a misinterpretation.

Yes, but I have a question. In PCT, how is a control system learned? Trial
and error? If so is not trial and error selection-by-consequences?

In E. coli, the transition from maximum tumbling rate to no tumbling is
smooth and occurs over a range of dNut. Individuals can be observed
which show all possible settings of the reference signal. A few always
tumble, although a sufficiently large positive transient in
concentration (artificially induced) can slow their tumbling or stop it.
A few others never tumble, although a large negative transient can
induce a nonzero tumbling rate. Koshland didn't investigate the
conditions under which the apparent reference level in an individual
might change, but we can guess that a sufficiently close approach to a
food source (near contact) might cause tumbling to cease. There would be
a sensor for the actual concentration, Nut, and a higher level of
control that would vary the reference level for dNut.

Bear in mind that I in my e. coli simulation I was not trying to reproduce
real e. coli behavior in detail but to illustrate the selection-by-
consequences principle. I went for clarity (simple model) over detailed
accuracy. In the same spirit, below is the promised ECOLI2.PAS, which
includes honest-to-gosh reinforcement and punishment effects, but does not
behave entirely like real e. coli either.

E. coli's speed is now varied (between 0.3 and 2.0) depending on whether its
"moving forward" behavior being reinforced or punished. When punishment
suppresses the speed to minimum, there is "time" for "other" behavior, so
tumbling occurs (the only "other" thing e. coli does). Reinforcement and
punishment thus act on rate of movement.

Also illustrated in ECOLI2 is a correction that should be copied into
ECOLI.PAS: I goofed in initializing the direction (in InitSim): I should have
used 3*PI/2 rather than TwoPi/3; the replacement accomplishes the same thing
in a simpler way. Another "goof" is that the naming of DecayRate is something
of a misnomer. High values (near 1.0) yield a SLOW loss of dNut value whereas
low values yield a RAPID loss. Because dNut changes slowly when DecayRate is
high, values near 1.0 cause it to overshoot. Also, if anyone would like to
chat with me via phone, please try the number given in the header to
ECOLI2.PAS first (it's my office phone, the one on ECOLI.PAS is my home
phone). These days I seem to be living in my office.

ECOLI2.PAS uses the same GrUtil TPU unit as ECOLI2, so I have not reproduced
it here.

Oh, and one more thing: I'd be interested to hear from Sam Saunders. Sam, do
my descriptions of TRT agree with your understanding of it?

Happy Halloween!

Bruce

{***********************************************************************
* E. COLI "REINFORCEMENT" SIMULATION 2 *
* *
* Programmer: Dr. Bruce B. Abbott *
* Psychological Sciences *
* Indiana U. - Purdue U. *
* Fort Wayne, IN 46805-1499 *
* (219) 841-6399 *
* *
* Created: 10/31/94 *
* *
* This program implements a "selection-by-consequences" reinforcement *
* simulation of the "tumble-and-swim" behavior of e. coli. It *
* includes no explicit reference levels for nutrients but merely *
* ASSUMES that e. coli will continue behaviors that lead to increased *
* nutrient concentration ("reinforcement") and abandon behaviors that *
* lead to decreased nutrient concentration ("punishment"). In this *
* version, e. coli's swimming rate is increased ("reinforced") by *
* increasing nutrient concentrations and suppressed ("punished") by *
* decreasing nutrient concentrations. When swimming rate reaches a *
* minimum value, this allows "other" behavior to emerge: e. coli *
* tumbles to select a new swimming direction. NOTE: this is NOT *
* intended to model real e. coli. *
* *
***********************************************************************}

program Ecoli2;

uses
  CRT, Graph, GrUtils;

const
  TWOPI = PI * 2;
  ENDSESSION = 10000;
var
  MaxX, MaxY: integer;
  NutrX, NutrY, X, Y: integer;
  NutMag, NutCon, dNut: real;
  EcoliX, EcoliY,
  Speed, Angle, DecayRate, LearnRate: real;
  Ch: char;
  Clock: longint;

procedure InitScreen;
begin
  ClrScr;
  InitGraphics;
  MaxX := GetMaxX; MaxY := GetMaxY;
  Rectangle(0, 0, MaxX, MaxY);
  OutTextXY(MaxX div 2 - 170, Y+5,
    'E. COLI SIMULATION: REINFORCEMENT MODEL');
  OutTextXY(20, MaxY-50, 'Press ESC to Quit...');
end;

procedure Tumble(var Angle: real);
begin
  Angle := TwoPi * Random;
end;

procedure InitSim;
begin
  Randomize;
  NutrX := MaxX div 2;
  NutrY := MaxY div 2;
  EcoliX := 50.0;
  EcoliY := 50.0;
  X := Round(EcoliX);
  Y := Round(EcoliY);
  Rectangle(NutrX-2, NutrY-2, NutrX+2, NutrY+2);
  Speed := 0.2;
  DecayRate := 0.25;
  LearnRate := 0.02;
  NutMag := 100.0; { max concentration }
  repeat Tumble(Angle) until (Angle < PI/2);
  Clock := 0;
end;

function NutConcen(X, Y: real): real;
{ Nutient concentration at point X, Y: environment function }
var
  Dist: real;
begin
  Dist := Sqrt(Sqr(X - NutrX) + Sqr(Y - NutrY));
  NutConcen := NutMag / (1 + 0.001*(Sqr(Dist)));
end;

procedure StepEColi;
var
  NewNut: real;
begin
  EcoliX := EcoliX + Speed * cos(Angle);
  EcoliY := EcoliY + Speed * sin(Angle);
  X := Round(EcoliX);
  Y := Round(EcoliY);
  PutPixel(X, Y, white);
  NewNut := NutConcen(EcoliX, EcoliY);
  dNut := DecayRate * dNut + (NewNut - NutCon);
  if dNut <= 0 then
    begin
      if Speed > 0.3 then Speed := Speed - LearnRate
       else Tumble(Angle);
    end
  else
    if Speed < 2.0 then Speed := Speed + LearnRate;
  NutCon := NewNut;
end;

var i: integer;
    z: real;
begin
  InitScreen;
  InitSim;
  repeat
    inc(Clock);
    StepEcoli;
    Delay(10);
    outtextxy(1,1, '');
    if Keypressed then Ch := ReadKey;
  until (Ch = #27) or (Clock >= ENDSESSION);
  if Ch <> #27 then Ch := ReadKey;
  RestoreCRTMode;
end.

Tom Bourbon [941102.1630]

[From Bruce Abbott (941031.1500 EST)]

Bill Powers (941030.1845, 941031.0700 MST)

Bill, I'm delighted that you're having fun with my e. coli simulation, and
that you now have Turbo Pascal 7.0 to run it on. That will eliminate any
compatibility problems we might have had. Tom and Rick: Have you tried it?
You've been strangely silent.

Bruce, don't interpret delayed replies as evidence that we lack interest.
The listserver for csg-l seems to be running on super glue again, and I have
no access to the net on weekends. I first saw your programs on Monday, and
this post of yours from Monday reached me on Tuesday afternoon.

In reply to Bill, you said:

In Traditional Reinforcement Theory (I'll call it TRT), reinforcers not only
alter the probability of behavior, they also serve to MAINTAIN it.

Ouch! I know you are presenting a summary of TRT, but it hurts to see
"scientists" savage the language that way. TRT writers portray inanimate
objects as agents, embued with the functional properties of living control
systems. I know that way of writing is part of the TRT tradition, but by
that practice the TRT community looks downright animistic.
. . .

My e. coli model is intended to illustrate control over behavior by
consequences as envisioned by TRT but is NOT a learning model.
. . . So, my TRT model is REALLY a cleverly disguised control system!

It's a nice model, in its way, but the reference signal is not disguised.
It sits there, shining brightly in your code. Like all good little
reference signals, it specifies the level of a perception. Maybe I'm just a
hopelessly corrupted perceptual control theorist, but I can't see how your
model shows consequences controlling behavior. That idea is present only in
the verbal descriptions that accompany your model. As you say:

That's hard to see, because there is no mention of a reference level, feedback
loop, gain, output function, or perceptual signal. These components lurk
hidden in the alien vocabulary of "reinforcers," "punishers," "contingencies
of reinforcement," "strengthening," and so on.

I don't see that the PCT components lurk in those words. I see no evidence
in standard TRT language that TRT people (a few on this net excepted) have
realized there is a phenomenon of control by living things. Knowing PCT, we
see how someone who _does not_ know it might believe TRT jargon explains
control by consequences, but those believers are in the position of one of
the blind men in Rick's paper about blind men and control systems. As you
said:

This is NOT to say that TRT
theorists recognize it as such (they usually don't), but I think we have made
important progress here in understanding why these folks do not immediately
see why TRT is wrong...it seems to work!

TRT words don't "work." Words are not working models. As for your model, in
my experience with TRT, it is an exception. Wouldn't verbal TRT people have
as much trouble with your working model as they do with ours? If they
understand your model, they will see that it is a model of a perceptual
control system producing control _of_ consequences, not control _by_
consequences. Your _language_ was the language of TRT, but your model
looks like pure TRT heresy. :slight_smile:
. . .

Bill:

But why _that_ final rate and not another? Why not zero? Why not any
observed rate? The problem is that there is nothing to determine what
the final consequence will be, and therefore nothing to determine what
the final behavior rate will be. There is a missing relationship in this
conception.

What is missing, of course, is the reference setting inside the
organism: how much "consequence" the organism wants. What actually
happens is that the organism varies its behavior rate until the
consequence matches the amount of that consequence specified by a
reference signal inside the organism. That is why both the consequence-
measure and the behavior-measure come to a final equilibrium condition.

Bruce:

But of course! And that is why reinforcers and punishers are viewed as having
these mystical attractive and repulsive properties. It is why my poor e. coli
will continue moving up the concentration gradient even if the concentration
becomes high enough to kill it. But THIS is where TRT fails, not in
selection-by-consequences.

I follow you, down to your final sentence. Can you elaborate on it a bit?
Are saying one reason TRT fails is that it attributes magical, animistic
properties to the environment, but TRT is right when it asserts the
importance of selection-by-consequences?

Maybe I can play with both of you E. coli models tonight.

Later,

Tom