[From Bruce Abbott (941122.1615 EST)]

Bill Powers (941122.0730 MST) --

Let's cut to the chase:

Your program says

If NutSave > 0 then { If S+ present during tumble then ...

Here you are defining Nutsave as S+.

I have already pointed out that in DoTumble, NutSave is actually set to

the value of dNut _prior_ to the tumble, even though your comment says

it is dNut _after_ the tumble.

Yes, I must be getting dyslexic. It is the value of dNut PRIOR to a tumble.

When going back later to add comments, I was confused by the statement's

position AFTER the tumble, but of course dNut is still its pre-tumble value.

I apologize for any confusion this may have caused. However, the logic

implemented in StepEColi is correct, even if my description of the identity of

NutSave was not.

Below is the relevant code from ECOLI4a. Let's step through StepEColi to see

what happens:

procedure StepEColi;

begin

[ This section steps e. coli forward along its current trajectory ]

EcoliX := EcoliX + Speed * cos(Angle);

EcoliY := EcoliY + Speed * sin(Angle);

X := Round(EcoliX);

Y := Round(EcoliY);

PutPixel(X, Y, white);

[ Now we determine the nutrient level at e. coli's new position. ]

NewNut := NutConcen(EcoliX, EcoliY);

[ Next we compare the new nutrient concentration against its previous value

and save the change in dNut.]

dNut := (NewNut - NutCon);

[ If e. coli just tumbled (on the previous iteration) then JustTumbled will

be true, otherwise it will be false. Call ReinforceOrPunish if JustTumbled

is true. JustTumbled is initialized to false, so this will not happen on

the first iteration.]

if JustTumbled then ReinforceOrPunish;

[ If the change in dNut on THIS iteration is positive, then use pTS+ to

determine the probability of a tumble, otherwise use pTS-. Determine

whether or not to tumble on this iteration.]

if dNut > 0 then { S+ present; tumble probability determined by current S+}

begin

if (Random < pTumbleGivenSplus) then DoTumble

else JustTumbled := false;

end

else { S- present; tumble probability determined by current S-}

begin

if (Random < pTumbleGivenSminus) then DoTumble

else JustTumbled := false;

end;

[ Save the current value of NewNut in NutCon for use next iteration.]

NutCon := NewNut;

end;

Now let's assume that a tumble DID occur in the above iteration. The

following happens via the call to DoTumble:

procedure DoTumble;

begin

NutSave := dNut; { NutSave is nutrient rate of change immediately }

Tumble(Angle); { PRIOR to a tumble }

JustTumbled := true;

end;

(I moved the NutSave statement to make its relationship to a tumble clearer.)

The current value of dNut is saved, a new angle is selected, and JustTumbled

is set to true.

On the NEXT iteration following the tumble, the following happens:

[ We move e. coli forward along its new, post-tumble trajectory ]

EcoliX := EcoliX + Speed * cos(Angle);

EcoliY := EcoliY + Speed * sin(Angle);

X := Round(EcoliX);

Y := Round(EcoliY);

PutPixel(X, Y, white);

[ Now we determine the nutrient level at e. coli's new position. ]

NewNut := NutConcen(EcoliX, EcoliY);

[ Next we compare the new nutrient concentration against its previous value

and save the change in dNut.]

dNut := (NewNut - NutCon);

[ dNut is now the change in nutrient concentration AFTER the tumble AND the

current change in nutrient concentration.]

[ Next we check to see if e. coli tumbled on the previous iteration.

It did, so we call ReinforceOrPunish.]

if JustTumbled then ReinforceOrPunish;

procedure ReinforceOrPunish;

var

DeltaNutRate: real;

begin

[ We compare the value of dNut immediately after the tumble with its value

immediately before the tumble. DeltaNutRate will be positive if the new

value is greater than the old (an improvement) and negative if the new value

is less than the old (a deterioration).]

DeltaNutRate := dNut - NutSave; { Change in the rate of change in }

{ nutrient following a tumble. }

{ + = improvement = reinforcement }

{ - = deterioration = punishment }

[ If things have gotten better, then this is reinforcement. Increase the

probability of a tumble given the condition present prior to the tumble (i.e.,

positive or negative NutSave).]

If DeltaNutRate > 0 then { Nutrient rate increased by tumble: reinforce }

begin { tumbling }

If NutSave > 0 then { If S+ present during tumble then }

begin { increase probability of tumble given S+ }

pTumbleGivenSplus := pTumbleGivenSplus + LearnRate;

if pTumbleGivenSplus > pMax then pTumbleGivenSplus := pMax;

end

else { S- present when last tumbled then }

begin { increase probability of tumble given S- }

pTumbleGivenSminus := pTumbleGivenSminus + LearnRate;

if pTumbleGivenSminus > pMax then pTumbleGivenSminus := pMax;

end

end

else

[If things have gotten worse, then this is punishment. Decrease the

probability of a tumble given the condition present prior to the tumble

(positive or negative NutSave).]

if DeltaNutrate < 0 then { Nutrient rate decreased by tumble: punish }

If NutSave > 0 then { If S+ present when last tumbled then }

begin { decrease probability of tumble given S+ }

pTumbleGivenSplus := pTumbleGivenSplus - LearnRate;

if pTumbleGivenSplus < pMin then pTumbleGivenSplus := pMin;

end

else { If S- present when last tumbled then }

begin { decrease probability of tumble given S- }

pTumbleGivenSminus := pTumbleGivenSminus - LearnRate;

if pTumbleGivenSminus < pMin then pTumbleGivenSminus := pMin;

end;

end;

Thus, other than my gaff in describing what NutSave is, everything works as

advertised.

In the section that actually implements the tumbles, you say

if dNut > 0 then { S+ present; tumble probability determined by S+ }

Note that now the _present_ value of dNut is said to be S+. If you had

not changed the definition of S+ within the same iteration, you would

have had to sayif NutSave > 0 then {S+ present ...

Not so. A positive nutrient change is always S+; a negative nutrient change

is always S-. dNut, the current value of nutrient change, establishes which

discriminative stimulus is currently present, and thus which tumbling

probability should be in effect. NutSave, the value immediately before the

tumble, saves the previous value of dNut in order to allow the consequences of

a tumble to be referred to the discriminative stimulus present at the time the

tumble took place. So I have not changed the definition of S+ and S-, I am

using different variables to represent the discriminative stimuli that are

present before and after a tumble. The table below summarizes what happens in

ReinforceOrPunish:

S Before Tumble* Consequence of Tumble Result

## ···

-----------------------------------------------------------------

positive dNut (S+) increased dNut (reinf) increase p(Tumble|S+)

positive dNut (S+) decreased dNut (punis) decrease p(Tumble|S+)

negative dNut (S-) increased dNut (reinf) increase p(Tumble|S-)

negative dNut (S-) decreased dNut (punis) decrease p(Tumble|S-)

------------------------------------------------------------------

*Stored in NutSave

In a currently positive nutrient gradient (S+ present, stored in dNut),

p(Tumble|S+) controls, er, sets the probability of a tumble. In a currently

negative nutrient gradient (S- present), p(Tumble|S- sets the probability of a

tumble. Thus the program implements the proper three-term contingency as

specified by the reinforcement model.

If you define S+ and S- as two "situations" in which responses (tumbles)

occur, you will see that, in a given situation, those responses which are

followed by a satisfying state of affairs (more positive nutrient rate) have

their "connections" to the situation "strengthened" (probability of a tumble

increased) so that, when the situation recurs, the response (tumbling) will be

more likely to occur (p(Tumble|S) increased). In a given situation, those

responses which are followed by an annoying state of affairs (less positive

nutrient rate) will have their connection to the situation weakened (reduced

p(Tumble|S), so that, when the situation recurs, the response (Tumbling) will

be less likely to occur. This is just a restatement of the law of effect.

You have offered a program which does in fact produce the right

behavior. But you have not considered the question of whether your

explanation is unique, or if not unique then best. There are several

other models that will create the same results, only faster. The

simplest one isif dNut > 0 then decrease(PTS+); (toward 0)

if dNut <= 0 then increase(PTS-); (toward 1)This modification will produce "learning" just as your program does, but

much faster and without using any knowledge about previous values of

dNut. Aside from the goal of making the model fit the language of

operant conditioning, what motive could we have for preferring your

model over the alternate one, which works even better? Yours is

computationally more complex, and results in slower learning with at

least some probability of failure. The modification is very simple and

also infallible.

Your simplified model does not learn anything: it is just a pair of one-way

control systems that force the two probabilities to opposite extremes.

ECOLI4a actually does learn, and fairly rapidly, I might add. To learn,

ECOLI4a must first try out tumbling under different conditions and discover

what happens. If the geometry of the situation were changed so that tumbles

during S+ tended to make things better more often than worse, and tumbles

during S- tended to make things worse more often than better, ECOLI4a would

learn to tumble in a positive gradient and not to tumble in a negative

gradient. The simulation is completely unbiased in the sense that ECOLI4a

will learn the correct behavior regardless of what the "correct" behavior is,

and will thus find its way up the nutrient gradient. Your "improved" model

will not. This is the basis for my preferring the more complex model.

I have shown a modification of the model which WILL "learn" appropriate

behavior in the same sense that yours does, and which does not use the

change in probability across a tumble, DeltaNutRate.

[Minor correction: DeltaNutRate is not the change in probability across a

tumble, it is the change in the nutrient rate across a tumble.]

I'm sorry, but I just can't see any learning going on in your model.

Learning, as envisioned in PCT, involves a reorganization of the system in

such a was as to establish or improve negative-feedback control over some

perceptual variable. All I see is a fairly ordinary two-level control system,

one for regulating rate of change in nutrients and one, operating through the

first, for regulating nutrient concentration. My ECOLI6 works in a similar

way to control stored nutrient levels. It's entertaining, it may represent

the actual way some organism regulates its nutrient levels, but it does not

model learning. In contrast, ECOLI4a alters its response probabilities in the

presence of increasing or decreasing nutrient gradients as a result of

experience with the consequences of its behavior. Its program code does not

tell it how to behave in order to climb the nutrient gradient; instead it must

learn what works, under what conditions.

It proposes ONE possible mechanism, but it is a very elaborate mechanism

and others would work equally well or better.

O.K., then here's a challenge: start with an e. coli model that wants to

perceive increasing nutrient levels but (and this is crucial) does NOT know

how to get them. That is, it does not know when to tumble or not tumble; it

does not know how its rate of tumble should vary with the change in nutrient

levels. In the absence of learning it should just do a random walk. It

should then attempt to "find" the right relationship by (1) varying its

behavior and (2) using the results of its "experiments" to develop an

appropriate (negative feedback) relationship between its behavior and error.

When you have such a model, we will compare its complexity to the complexity

of ECOLI4a; otherwise we are comparing apples and oranges.

And I am not convinced

that your model actually does embody the law of effect as you state it.

You will have to be much more precise about your definitions before I

will believe that.

I hope I have been able to do that. (:->

Sincerely,

Bruce