ECOLI4a: Back to School

Abbott_Bruce_Indiana · November 22, 1994, 9:38pm

[From Bruce Abbott (941122.1615 EST)]

Bill Powers (941122.0730 MST) --

Let's cut to the chase:

Your program says

If NutSave > 0 then { If S+ present during tumble then ...

Here you are defining Nutsave as S+.

I have already pointed out that in DoTumble, NutSave is actually set to
the value of dNut _prior_ to the tumble, even though your comment says
it is dNut _after_ the tumble.

Yes, I must be getting dyslexic. It is the value of dNut PRIOR to a tumble.
When going back later to add comments, I was confused by the statement's
position AFTER the tumble, but of course dNut is still its pre-tumble value.
I apologize for any confusion this may have caused. However, the logic
implemented in StepEColi is correct, even if my description of the identity of
NutSave was not.

Below is the relevant code from ECOLI4a. Let's step through StepEColi to see
what happens:

procedure StepEColi;
begin

[ This section steps e. coli forward along its current trajectory ]
  EcoliX := EcoliX + Speed * cos(Angle);
  EcoliY := EcoliY + Speed * sin(Angle);
  X := Round(EcoliX);
  Y := Round(EcoliY);
  PutPixel(X, Y, white);

[ Now we determine the nutrient level at e. coli's new position. ]
NewNut := NutConcen(EcoliX, EcoliY);

[ Next we compare the new nutrient concentration against its previous value
and save the change in dNut.]

dNut := (NewNut - NutCon);

[ If e. coli just tumbled (on the previous iteration) then JustTumbled will
  be true, otherwise it will be false. Call ReinforceOrPunish if JustTumbled
  is true. JustTumbled is initialized to false, so this will not happen on
  the first iteration.]

if JustTumbled then ReinforceOrPunish;

[ If the change in dNut on THIS iteration is positive, then use pTS+ to
determine the probability of a tumble, otherwise use pTS-. Determine
whether or not to tumble on this iteration.]

  if dNut > 0 then { S+ present; tumble probability determined by current S+}
    begin
      if (Random < pTumbleGivenSplus) then DoTumble
      else JustTumbled := false;
    end
  else { S- present; tumble probability determined by current S-}
    begin
      if (Random < pTumbleGivenSminus) then DoTumble
      else JustTumbled := false;
    end;

[ Save the current value of NewNut in NutCon for use next iteration.]
NutCon := NewNut;
end;

Now let's assume that a tumble DID occur in the above iteration. The
following happens via the call to DoTumble:

  procedure DoTumble;
  begin
    NutSave := dNut; { NutSave is nutrient rate of change immediately }
    Tumble(Angle); { PRIOR to a tumble }
    JustTumbled := true;
  end;

(I moved the NutSave statement to make its relationship to a tumble clearer.)
The current value of dNut is saved, a new angle is selected, and JustTumbled
is set to true.

On the NEXT iteration following the tumble, the following happens:

[ We move e. coli forward along its new, post-tumble trajectory ]
  EcoliX := EcoliX + Speed * cos(Angle);
  EcoliY := EcoliY + Speed * sin(Angle);
  X := Round(EcoliX);
  Y := Round(EcoliY);
  PutPixel(X, Y, white);

[ Now we determine the nutrient level at e. coli's new position. ]
NewNut := NutConcen(EcoliX, EcoliY);

[ Next we compare the new nutrient concentration against its previous value
and save the change in dNut.]

dNut := (NewNut - NutCon);

[ dNut is now the change in nutrient concentration AFTER the tumble AND the
current change in nutrient concentration.]

[ Next we check to see if e. coli tumbled on the previous iteration.
It did, so we call ReinforceOrPunish.]

if JustTumbled then ReinforceOrPunish;

  procedure ReinforceOrPunish;
  var
    DeltaNutRate: real;
  begin

[ We compare the value of dNut immediately after the tumble with its value
  immediately before the tumble. DeltaNutRate will be positive if the new
  value is greater than the old (an improvement) and negative if the new value
  is less than the old (a deterioration).]

    DeltaNutRate := dNut - NutSave; { Change in the rate of change in }
                                    { nutrient following a tumble. }
                                    { + = improvement = reinforcement }
                                    { - = deterioration = punishment }

[ If things have gotten better, then this is reinforcement. Increase the
probability of a tumble given the condition present prior to the tumble (i.e.,
positive or negative NutSave).]
    If DeltaNutRate > 0 then { Nutrient rate increased by tumble: reinforce }
      begin { tumbling }
        If NutSave > 0 then { If S+ present during tumble then }
          begin { increase probability of tumble given S+ }
            pTumbleGivenSplus := pTumbleGivenSplus + LearnRate;
            if pTumbleGivenSplus > pMax then pTumbleGivenSplus := pMax;
          end
        else { S- present when last tumbled then }
          begin { increase probability of tumble given S- }
            pTumbleGivenSminus := pTumbleGivenSminus + LearnRate;
            if pTumbleGivenSminus > pMax then pTumbleGivenSminus := pMax;
          end
      end
    else

[If things have gotten worse, then this is punishment. Decrease the
probability of a tumble given the condition present prior to the tumble
(positive or negative NutSave).]
      if DeltaNutrate < 0 then { Nutrient rate decreased by tumble: punish }
        If NutSave > 0 then { If S+ present when last tumbled then }
          begin { decrease probability of tumble given S+ }
            pTumbleGivenSplus := pTumbleGivenSplus - LearnRate;
            if pTumbleGivenSplus < pMin then pTumbleGivenSplus := pMin;
          end
        else { If S- present when last tumbled then }
          begin { decrease probability of tumble given S- }
            pTumbleGivenSminus := pTumbleGivenSminus - LearnRate;
            if pTumbleGivenSminus < pMin then pTumbleGivenSminus := pMin;
          end;
  end;

Thus, other than my gaff in describing what NutSave is, everything works as
advertised.

In the section that actually implements the tumbles, you say

if dNut > 0 then { S+ present; tumble probability determined by S+ }

Note that now the _present_ value of dNut is said to be S+. If you had
not changed the definition of S+ within the same iteration, you would
have had to say

if NutSave > 0 then {S+ present ...

Not so. A positive nutrient change is always S+; a negative nutrient change
is always S-. dNut, the current value of nutrient change, establishes which
discriminative stimulus is currently present, and thus which tumbling
probability should be in effect. NutSave, the value immediately before the
tumble, saves the previous value of dNut in order to allow the consequences of
a tumble to be referred to the discriminative stimulus present at the time the
tumble took place. So I have not changed the definition of S+ and S-, I am
using different variables to represent the discriminative stimuli that are
present before and after a tumble. The table below summarizes what happens in
ReinforceOrPunish:

S Before Tumble* Consequence of Tumble Result

···

-----------------------------------------------------------------
positive dNut (S+) increased dNut (reinf) increase p(Tumble|S+)
positive dNut (S+) decreased dNut (punis) decrease p(Tumble|S+)

  negative dNut (S-) increased dNut (reinf) increase p(Tumble|S-)
  negative dNut (S-) decreased dNut (punis) decrease p(Tumble|S-)
  ------------------------------------------------------------------
  *Stored in NutSave

In a currently positive nutrient gradient (S+ present, stored in dNut),
p(Tumble|S+) controls, er, sets the probability of a tumble. In a currently
negative nutrient gradient (S- present), p(Tumble|S- sets the probability of a
tumble. Thus the program implements the proper three-term contingency as
specified by the reinforcement model.

If you define S+ and S- as two "situations" in which responses (tumbles)
occur, you will see that, in a given situation, those responses which are
followed by a satisfying state of affairs (more positive nutrient rate) have
their "connections" to the situation "strengthened" (probability of a tumble
increased) so that, when the situation recurs, the response (tumbling) will be
more likely to occur (p(Tumble|S) increased). In a given situation, those
responses which are followed by an annoying state of affairs (less positive
nutrient rate) will have their connection to the situation weakened (reduced
p(Tumble|S), so that, when the situation recurs, the response (Tumbling) will
be less likely to occur. This is just a restatement of the law of effect.

You have offered a program which does in fact produce the right
behavior. But you have not considered the question of whether your
explanation is unique, or if not unique then best. There are several
other models that will create the same results, only faster. The
simplest one is

if dNut > 0 then decrease(PTS+); (toward 0)
if dNut <= 0 then increase(PTS-); (toward 1)

This modification will produce "learning" just as your program does, but
much faster and without using any knowledge about previous values of
dNut. Aside from the goal of making the model fit the language of
operant conditioning, what motive could we have for preferring your
model over the alternate one, which works even better? Yours is
computationally more complex, and results in slower learning with at
least some probability of failure. The modification is very simple and
also infallible.

Your simplified model does not learn anything: it is just a pair of one-way
control systems that force the two probabilities to opposite extremes.
ECOLI4a actually does learn, and fairly rapidly, I might add. To learn,
ECOLI4a must first try out tumbling under different conditions and discover
what happens. If the geometry of the situation were changed so that tumbles
during S+ tended to make things better more often than worse, and tumbles
during S- tended to make things worse more often than better, ECOLI4a would
learn to tumble in a positive gradient and not to tumble in a negative
gradient. The simulation is completely unbiased in the sense that ECOLI4a
will learn the correct behavior regardless of what the "correct" behavior is,
and will thus find its way up the nutrient gradient. Your "improved" model
will not. This is the basis for my preferring the more complex model.

I have shown a modification of the model which WILL "learn" appropriate
behavior in the same sense that yours does, and which does not use the
change in probability across a tumble, DeltaNutRate.

[Minor correction: DeltaNutRate is not the change in probability across a
tumble, it is the change in the nutrient rate across a tumble.]

I'm sorry, but I just can't see any learning going on in your model.
Learning, as envisioned in PCT, involves a reorganization of the system in
such a was as to establish or improve negative-feedback control over some
perceptual variable. All I see is a fairly ordinary two-level control system,
one for regulating rate of change in nutrients and one, operating through the
first, for regulating nutrient concentration. My ECOLI6 works in a similar
way to control stored nutrient levels. It's entertaining, it may represent
the actual way some organism regulates its nutrient levels, but it does not
model learning. In contrast, ECOLI4a alters its response probabilities in the
presence of increasing or decreasing nutrient gradients as a result of
experience with the consequences of its behavior. Its program code does not
tell it how to behave in order to climb the nutrient gradient; instead it must
learn what works, under what conditions.

It proposes ONE possible mechanism, but it is a very elaborate mechanism
and others would work equally well or better.

O.K., then here's a challenge: start with an e. coli model that wants to
perceive increasing nutrient levels but (and this is crucial) does NOT know
how to get them. That is, it does not know when to tumble or not tumble; it
does not know how its rate of tumble should vary with the change in nutrient
levels. In the absence of learning it should just do a random walk. It
should then attempt to "find" the right relationship by (1) varying its
behavior and (2) using the results of its "experiments" to develop an
appropriate (negative feedback) relationship between its behavior and error.

When you have such a model, we will compare its complexity to the complexity
of ECOLI4a; otherwise we are comparing apples and oranges.

And I am not convinced
that your model actually does embody the law of effect as you state it.
You will have to be much more precise about your definitions before I
will believe that.

I hope I have been able to do that. (:->

Sincerely,

Bruce

Tom_BOURBON3 · November 22, 1994, 11:54pm

Tom Bourbon [941122.1743]

[From Bruce Abbott (941122.1615 EST)]

Bill Powers (941122.0730 MST) --

Let's cut to the chase:

And in reply to Bill, Bruce delivered a few cuts of his own.

I only have time for a quick question, Bruce. What is a "probability" that
it can be altered deliberately, as in the following quote from you?

[ If things have gotten better, then this is reinforcement. Increase the
probability of a tumble given the condition present prior to the tumble (i.e.,
positive or negative NutSave).]
   If DeltaNutRate > 0 then { Nutrient rate increased by tumble: reinforce }
     begin { tumbling }
       If NutSave > 0 then { If S+ present during tumble then }
         begin { increase probability of tumble given S+ }
           pTumbleGivenSplus := pTumbleGivenSplus + LearnRate;
           if pTumbleGivenSplus > pMax then pTumbleGivenSplus := pMax;
         end
       else { S- present when last tumbled then }
         begin { increase probability of tumble given S- }
           pTumbleGivenSminus := pTumbleGivenSminus + LearnRate;
           if pTumbleGivenSminus > pMax then pTumbleGivenSminus := pMax;
         end
     end
   else

[If things have gotten worse, then this is punishment. Decrease the
probability of a tumble given the condition present prior to the tumble
(positive or negative NutSave).]
     if DeltaNutrate < 0 then { Nutrient rate decreased by tumble: punish }
       If NutSave > 0 then { If S+ present when last tumbled then }
         begin { decrease probability of tumble given S+ }
           pTumbleGivenSplus := pTumbleGivenSplus - LearnRate;
           if pTumbleGivenSplus < pMin then pTumbleGivenSplus := pMin;
         end
       else { If S- present when last tumbled then }
         begin { decrease probability of tumble given S- }
           pTumbleGivenSminus := pTumbleGivenSminus - LearnRate;
           if pTumbleGivenSminus < pMin then pTumbleGivenSminus := pMin;
         end;
end;

In your model, "probability" seems to be controllable entity. It looks to
me as though you include in the model a feature that guarantees the outward
appearances of behavior will be right, but the added feature implies some
(to my thinking) unlikely properties of the system (E. coli).

In your model, what is this thing called "probability?"

Later,

Tom