ECOLI4A: Does This One Work?

[From Bruce Abbott (941114.1630 EST)]

Bill Powers (941113.0530 MST)

Bruce Abbott (941111.2100 EST)--

Enough of these verbal arguments! ... Run ECOLI4 as often, for as long
as you like. E. coli will occasionally find the goal, and sometimes it
will stay there a long time, but it will not consistently go there and
it will not consistently stay there when it does. If it does, I'll buy
you a steak dinner. If it doesn't, you buy me one. Deal?

No deal, you're right. I was fooled by a chance period in which the spot
stayed near the target.

Darn, and just when I was starting to look forward to a nice, juicy steak!

Regarding the learning model of e. coli, I've had some thoughts of my
own concerning how to handle this; when I get some "play" time I'll try
to model both approaches and see which works best.

I think that such a learning model is possible, but that to make it work
we would have to make the learning EXTREMELY slow. . . .
Of course you might prove me wrong, but if I'm right a lot of time could
be wasted trying to make this learning work.

Yes, I did a little playing around with your suggested learning model myself
and came to the same conclusion. On the other hand, at the risk of annoying
everyone concerned, I MAY have a learning model that works, which operates on
a different principle. I was on my way to Bloomington, Indiana (Indiana
University) Sunday to visit my son (he's in Music Education there). It's a
three-plus hour drive from here, most of it on the Interstate. I guess in the
absence of stimulation some dark corner of my brain went back to work on
ECOLI4, because an apparent solution suddenly popped into consciousness and
begged me to try it. I programmed it this morning and it seems to work; not
only that, but it appears to have eliminated the logical inconsistency of
ECOLI4 in that now "reinforcement" INCREASES the probability of a tumble and
"punishment" decreases it.

In PCT finding a successful solution depends on finding the "right" perceptual
variable; in TRT the equivalent is finding the "right" definition of
reinforcement. It has been recognized since at least the 1970s that
reinforcement is "relative," (c.f. Baum, Reinforcement as Situation
Transition). For example, given that a rat is receiving 20 food pellets per
hour, a response that would increase this rate to 30 pellets per hour would be
reinforced, whereas one that cut this rate to 10 would be punished. It
occurred to me that for e. coli, a response that made the rate of nutrient
change more positive would be "reinforced," whereas one that made the rate of
nutrient change less positive would be "punished." Determining the effect of
a tumble in these terms is easily done by subtracting dNut after the tumble
from dNut before the tumble. As in ECOLI4, reinforcement would increase the
probability of a tumble in the presence of the discriminative stimulus present
before the tumble and punishment would decrease it.

I got this idea from thinking about what your participants were doing when
learning how to control "their" e. coli with the space bar. They would
quickly learn that pressing the space bar when the bug was approaching the
nutrient usually made things worse. (In fact, when the bug was on a direct
line to the nutrient ANY tumble would either make things worse or, rarely, no
better.) Similarly, they would learn that, when the bug was receding from the
nutrient, most tumbles would improve the situation. (In the worst case when
the bug was moving directly away from the nutrient, ANY tumble would make
things no worse and would very likely make them better.)

There is thus information about the consequences of a tumble in the comparison
between pre- and post-tumble rates of nutrient change. This information can
be used to alter the rate of tumbling that occurs as a function of the change
in nutrients, which in ECOLI4a is done by altering p(Tumble|S+) and
p(Tumble|S-). (I do this because reinforcement/punishment is supposed to act
on response probability.)

Although I have "explained" the model in traditional reinforcement theory
terms, I'm sure that an equivalent PCT model could be formulated that would do
essentially the same thing, in effect "finding" the right function to produce
the correct relationship between dNut and the interval between tumbles. I'll
have to think about it. Meanwhile, here's ECOLI4a. In contrast to your
statistical model, this one appears to learn FAST, as people do.

The rate of learning parameter, LearnRate, has been set to a fast, but not too
fast rate. Set it too high and e. coli's behavioral output function becomes
too unstable to be useful. The maximum and minimum tumble probabilities are
not critical; these just set the maximum and minimum tumble rates. I start
the simulation with p(Tumble|S+) = p(Tumble|S-) = 0.50, but this is not
critical either; it just makes e. coli tumble too much initially. The 0.50
value was chosen simply because it is half-way between 0 and 1.0 and thus does
not appear to bias the simulation in some way.

The code for ReinforceOrPunish has been changed to use the new criteria for
reinforcement and punishment, give the proper change in probability for
reinforcement and punishment, and make sure that the probabilities stay at or
between pMin and pMax (the old code was intended to do this, but didn't).

Again, I must insist that this exercise is NOT intended to show that TRT is
PCT; I am merely trying to determine whether a model that can be accurately
described in TRT terms can produce an e. coli that "learns" in this situation.

I wouldn't want to be going around telling the EAB guys that their model CAN'T
produce this kind of behavior if it can. And if it can, I'd like to be able
to show how PCT improves on this account.

Regards,

Bruce

{***********************************************************************
* E. COLI "REINFORCEMENT" SIMULATION 4a *
* *
* Programmer: Dr. Bruce B. Abbott *
* Psychological Sciences *
* Indiana U. - Purdue U. *
* Fort Wayne, IN 46805-1499 *
* (219) 481-6399 *
* Language: Borland/Turbo Pascal 7.0 *
* Created: 11/14/94 *
* *
* This program implements a discriminated operant simulation of the *
* "tumble-and-swim" behavior of an "e. coli" capable of learning from *
* experience. If a tumble in the presence of S+ (rising nutrient *
* level) results in a more positive rate of nutrient change, *
* p(Tumble|S+) increases, otherwise it decreases. If a tumble in the *
* presence of S- (decreasing or steady nutrient level) results in a *
* more positive rate of nutrient increase, then p(Tumble|S- increases, *
* otherwise it decreases. Thus the effect of the change in the rate *
* of nutrient change following a tumble on p(Tumble|S+) and *
* p(Tumble|S-) is symmetrical. Experience with the consequences *
* of tumbling gradually "shapes" e. coli's behavior so as to maximize *
* nutrient levels. *
* *
***********************************************************************}

program Ecoli4a;

uses
  CRT, Graph, GrUtils;

const
  TWOPI = PI * 2;
  ENDSESSION = 50000;
var
  MaxX, MaxY: integer;
  NutrX, NutrY, X, Y: integer;
  NutMag, NutCon, dNut, NutSave: real;
  EcoliX, EcoliY,
  Speed, Angle, LearnRate,
  pTumbleGivenSplus, pTumbleGivenSminus,
  pMax, pMin: real;
  JustTumbled: boolean;
  Ch: char;
  Clock: longint;

procedure InitScreen;
begin
  ClrScr;
  InitGraphics;
  MaxX := GetMaxX; MaxY := GetMaxY;
  Rectangle(0, 0, MaxX, MaxY);
  OutTextXY(MaxX div 2 - 220, Y+5,
    'E. COLI SIMULATION # 4A: DISCRIMINATED OPERANT MODEL 2');
  OutTextXY(MaxX - 200, 50, ' dNutrient');
  OutTextXY(MaxX - 200, 60, 'p(Tumble|S+)');
  OutTextXy(MaxX - 200, 70, 'p(Tumble|S-)');
  OutTextXY(20, MaxY-50, 'Press ESC to Quit...');
end;

procedure ShowReal(x,y: integer; v: real);
var s: string;
begin
str(v:8:4, s);
setfillstyle(0,0);
bar (x,y,x+textwidth(s),y+textheight(s));
outtextxy(x,y,s);
end;

procedure Tumble(var Angle: real);
begin
  Angle := TwoPi * Random;
end;

procedure InitSim;
begin
  Randomize;
  NutrX := MaxX div 2;
  NutrY := MaxY div 2;
  EcoliX := 50.0;
  EcoliY := 50.0;
  X := Round(EcoliX);
  Y := Round(EcoliY);
  Rectangle(NutrX-2, NutrY-2, NutrX+2, NutrY+2);
  Speed := 1.0;
  LearnRate := 0.01;
  JustTumbled := false;
  NutSave := 0;
  pMax := 1.00; { Maximum tumble rate }
  pMin := 0.005; { Minimum tumble rate }
  pTumbleGivenSplus := 0.50; { initial tumble rates }
  pTumbleGivenSminus := 0.50;
  NutMag := 100.0; { max concentration }
  repeat Tumble(Angle) until (Angle < PI/2);
  Clock := 0;
end;

function NutConcen(X, Y: real): real;
{ Nutient concentration at point X, Y: environment function }
var
  Dist: real;
begin
  Dist := Sqrt(Sqr(X - NutrX) + Sqr(Y - NutrY));
  NutConcen := NutMag / (1 + 0.001*(Sqr(Dist)));
end;

procedure StepEColi;
var
  NewNut: real;

  procedure ReinforceOrPunish;
  var
    DeltaNutRate: real;
  begin
    DeltaNutRate := dNut - NutSave; { Change in the rate of change in }
                                    { nutrient following a tumble. }
                                    { + = improvement = reinforcement }
                                    { - = deterioration = punishment }
    If DeltaNutRate > 0 then { Nutrient rate increased by tumble: reinforce }
      begin { tumbling }
        If NutSave > 0 then { If S+ present during tumble then }
          begin { increase probability of tumble given S+ }
            pTumbleGivenSplus := pTumbleGivenSplus + LearnRate;
            if pTumbleGivenSplus > pMax then pTumbleGivenSplus := pMax;
          end
        else { S- present when last tumbled then }
          begin { increase probability of tumble given S- }
            pTumbleGivenSminus := pTumbleGivenSminus + LearnRate;
            if pTumbleGivenSminus > pMax then pTumbleGivenSminus := pMax;
          end
      end
    else
      if DeltaNutrate < 0 then { Nutrient rate decreased by tumble: punish }
        If NutSave > 0 then { If S+ present when last tumbled then }
          begin { decrease probability of tumble given S+ }
            pTumbleGivenSplus := pTumbleGivenSplus - LearnRate;
            if pTumbleGivenSplus < pMin then pTumbleGivenSplus := pMin;
          end
        else { If S- present when last tumbled then }
          begin { decrease probability of tumble given S- }
            pTumbleGivenSminus := pTumbleGivenSminus - LearnRate;
            if pTumbleGivenSminus < pMin then pTumbleGivenSminus := pMin;
          end;
  end;
  procedure DoTumble;
  begin
    Tumble(Angle);
    JustTumbled := true;
    NutSave := dNut; { NutSave is nutrient rate of change immediately }
  end; { after a tumble }
begin
  EcoliX := EcoliX + Speed * cos(Angle);
  EcoliY := EcoliY + Speed * sin(Angle);
  X := Round(EcoliX);
  Y := Round(EcoliY);
  PutPixel(X, Y, white);
  NewNut := NutConcen(EcoliX, EcoliY);
  dNut := (NewNut - NutCon);
  if JustTumbled then ReinforceOrPunish;

  if dNut > 0 then { S+ present; tumble probability determined by S+ }
    begin
      if (Random < pTumbleGivenSplus) then DoTumble
      else JustTumbled := false;
    end
  else { S- present; tumble probability determined by S- }
    begin
      if (Random < pTumbleGivenSminus) then DoTumble
      else JustTumbled := false;
    end;
  NutCon := NewNut;
  showreal(maxx - 100, 50, dNut);
  showreal(maxx - 100, 60, pTumbleGivenSplus);
  showreal(maxx - 100, 70, pTumbleGivenSminus);
end;

begin
  InitScreen;
  InitSim;
  repeat
    inc(Clock);
    StepEcoli;
    if Keypressed then Ch := ReadKey;
  until (Ch = #27) or (Clock >= ENDSESSION);
  if Ch <> #27 then Ch := ReadKey;
  RestoreCRTMode;
  CloseGraph;
end.