"Reinforcement" Based E. Coli Simulation

[From Bruce Abbott (941030.1700 EST)]

Bill Powers (941029.1730 MDT) --

Me (941029.1600 EST)

The "future" in this case means a time _after_ the reinforcement has
occurred. Obviously there can be no effect on the current behavior by
its consequence, the reinforcement that has not occurred yet and may or
may not occur. Only after the behavior has taken place (swim for some
length of time and then tumble) and the consequence occurs can the
consequence be known (end up swimming up or down the gradient, and thus
experiencing a positive or negative time rate of change of
concentration).

I disagree with you here. Not swim AND THEN tumble, but merely swim. The
behavior on which reinforcement would act in this context is forward motion
along a given trajectory. Reinforcement would occur as long as forward motion
produced increasing concentrations of the nutrient. When forward motion began
to produce decreasing concentrations of nutrients, it would be
"punished," leading to tumbling and the selection (usually) of a new
trajectory.

I think you are trying to force the Skinnerian model to operate only on
discrete responses (tumbles), thus forcing it into an S-R framework. Yet
Skinner explicitly recognized elements of behavior other than discrete
responses: e.g., moving TOWARD the lever, RATE of responding, and so on, which
could be subject to the effects of reinforcement. Furthermore, he recognized
that reinforcement and punishment effects often involve CHANGES (e.g.,
increased concentration of food) and not just presentation of discrete
stimuli.

Now before I get myself completely in trouble, I want to indicate that I agree
entirely with you that this model fails to adequately handle all those
elements you pointed out, e.g., attributing reinforcement to the stimulus
rather than to its relationship to internal conditions. Some accommodation is
attempted by introducing the concept of "establishing conditions," such as
deprivation and appealing to Darwinian natural selection to explain why some
stimuli act as reinforcers or punishers. These ad hoc concepts are precise
what PCT handles so well as a natural consequence of the model. It is here
that modern reinforcement theory fails, not in selection-by-consequences. I'm
not out to defend this view, but I think you've got to attack where it fails.
The reason the e. coli simulation didn't upset your reinforcement theorists is
that they had no difficulty explaining it in reinforcement-theory terms. The
problem is that, within the limited context of e. coli behavior, the two
models yield similar results.

You've neatly changed the challenge:

repeat
inc(Clock);
If Going_up_gradient then continue;
If Going_down_gradient then tumble;
until Clock > EndSimulation

Turning this sketch into a runnable model requires filling in some
details.

O.K., you asked for it. Again, I am NOT proposing that this is an adequate
account of e. coli behavior, but the model below is, given MY understanding of
modern reinforcement theory, a perfectly acceptable implementation of it. A
few things to note:

I have separated the graphics setup routine and constant from the main program
so that I don't have to keep copying it into new programs. I will be adding
other graphics utilities (procedures and functions) to this GrUtils unit as
needed (e.g., a graphics text procedure based on OutTextXY). You should
separate GRUTILS.PAS from ECOLI.PAS and compile the former to create the Turbo
Pascal Unit. BE SURE TO CHANGE THE DRIVE\DIRECTORY IN BGIDIR TO THE LOCATION
OF YOUR GRAPHICS FILES. Then run ECOLI.PAS.

The program assumes that e. coli moves at a constant rate, during which it
keeps track of the direction of change in nutrient concentration via a leaky
integrator whose "leakiness" is set by the parameter DecayRate, currently set
at 0.25. When dNut, the integrated change in nutrient level, becomes negative
the organism tumbles. The nutrient gradient is centered on the square and is
modeled as decreasing with the square of the distance from the center. There
is no EXPLICIT reference level in the model, only "good" (keep going) and
"bad" (tumble).

The organism starts out at location 50,50 on the screen with an initial
direction of travel between 270-360 degrees. If its movement carries it off-
screen, just wait--it will come back eventually. The program quits after
10,000 clock ticks (set in ENDSESSION) or by pressing the ESCape key.

Enjoy!

Bruce

{***********************************************************************
* E. COLI "REINFORCEMENT" SIMULATION *
* *
* Programmer: Dr. Bruce B. Abbott *
* Psychological Sciences *
* Indiana U. - Purdue U. *
* Fort Wayne, IN 46805-1499 *
* (219) 493-4335 *
* *
* Created: 10/30/94 *
* *
* This program implements a "selection-by-consequences" reinforcement *
* simulation of the "tumble-and-swim" behavior of e. coli. It *
* includes no explicit reference levels for nutrients but merely *
* ASSUMES that e. coli will continue behaviors that lead to increased *
* nutrient concentration ("reinforcement") and abandon behaviors that *
* lead to decreased nutrient concentration ("punishment"). *
* *
***********************************************************************}

program Ecoli;

uses
  CRT, Graph, GrUtils;

const
  TWOPI = PI * 2;
  ENDSESSION = 10000;
var
  MaxX, MaxY: integer;
  NutrX, NutrY, X, Y: integer;
  NutMag, NutCon, dNut: real;
  EcoliX, EcoliY,
  Speed, Angle, DecayRate: real;
  Ch: char;
  Clock: longint;

procedure InitScreen;
begin
  ClrScr;
  InitGraphics;
  MaxX := GetMaxX; MaxY := GetMaxY;
  Rectangle(0, 0, MaxX, MaxY);
  OutTextXY(MaxX div 2 - 170, Y+5,
    'E. COLI SIMULATION: REINFORCEMENT MODEL');
  OutTextXY(20, MaxY-50, 'Press ESC to Quit...');
end;

procedure Tumble(var Angle: real);
begin
  Angle := TwoPi * Random;
end;

procedure InitSim;
begin
  Randomize;
  NutrX := MaxX div 2;
  NutrY := MaxY div 2;
  EcoliX := 50.0;
  EcoliY := 50.0;
  X := Round(EcoliX);
  Y := Round(EcoliY);
  Rectangle(NutrX-2, NutrY-2, NutrX+2, NutrY+2);
  Speed := 1.0;
  DecayRate := 0.25;
  NutMag := 100.0; { max concentration }
  repeat Tumble(Angle) until (Angle > TwoPi/3);
  Clock := 0;
end;

function NutConcen(X, Y: real): real;
{ Nutient concentration at point X, Y: environment function }
var
  Dist: real;
begin
  Dist := Sqrt(Sqr(X - NutrX) + Sqr(Y - NutrY));
  NutConcen := NutMag / (1 + 0.001*(Sqr(Dist)));
end;

procedure StepEColi;
var
  NewNut: real;
begin
  EcoliX := EcoliX + Speed * cos(Angle);
  EcoliY := EcoliY + Speed * sin(Angle);
  X := Round(EcoliX);
  Y := Round(EcoliY);
  PutPixel(X, Y, white);
  NewNut := NutConcen(EcoliX, EcoliY);
  dNut := DecayRate * dNut + (NewNut - NutCon);
  if dNut <= 0 then Tumble(Angle);
  NutCon := NewNut;
end;

var i: integer;
    z: real;
begin
  InitScreen;
  InitSim;
  repeat
    inc(Clock);
    StepEcoli;
    Delay(50);
    if Keypressed then Ch := ReadKey;
  until (Ch = #27) or (Clock >= ENDSESSION);
  RestoreCRTMode;
end.

unit GrUtils;
{ Graphics Utilities Unit }

interface

uses
  Graph;

const
  BGIDIR = 'c:\bp\bgi'; { set this to your bgi drive\directory }

var
  GraphDriver, GraphMode: integer;

procedure InitGraphics;

implementation

procedure InitGraphics; {ADAPTS TO HARDWARE}
begin
graphdriver := 0; graphmode := 0;
detectgraph(graphdriver,graphmode);
initgraph(graphdriver,graphmode, BGIDIR);
graphmode := getmaxmode;
setgraphmode(graphmode);
end;

begin
end.