···
--------------------------------------------------------------------------
Got your new code for ecoli4a, compiled it, and ran it. An interesting
feature that you may have noticed: ps+ (pardon my shorthand) approaches
zero (actually 0.005 but let's call it zero) and ps- approaches 1. This
happens relatively quickly.
Suppose we start with these probabilities at their limits. Then we can
understand the following code segment very easily:
if dNut > 0 then { S+ present; tumble probability determined by S+ }
begin
if (Random < pTumbleGivenSplus) then DoTumble
else JustTumbled := false;
end
else { S- present; tumble probability determined by S- }
begin
if (Random < pTumbleGivenSminus) then DoTumble
else JustTumbled := false;
end;
NutCon := NewNut;
Note that in the limit, pTumbleGivenSplus goes essentially to zero, and
pTumbleGivenSminus goes to 1. This means that in the first clause of the
overall "if" statement,
if dNut > 0 then { S+ present; tumble probability determined by S+ }
begin
if (Random < pTumbleGivenSplus) then DoTumble
else JustTumbled := false;
end
... there will never be a tumble (that is, "random" will never be less than
0). Thus if dNut > 0 there will never be a tumble -- until the path passes
a right angle to the target and dNut becomes negative.
In the second clause
else { S- present; tumble probability determined by S- }
begin
if (Random < pTumbleGivenSminus) then DoTumble
else JustTumbled := false;
end;
.... pTumbleGivenSminus is 1 in the limit, so when dNut <= 0, there will
always be a tumble immediately.
Therefore, when the probabilities reach their limits, the above code
segment is closely equivalent to
if dNut <= 0 then DoTumble.
Now we can understand why the final approach to the target is so rapid and
the final position stays so close to the target: the model has approached
the condition in which there is a tumble for any movement down the gradient
and none for any movement up the gradient, as in the simplest PCT model.
------------------------------------
The question is now why the two probabilities tend so rapidly and
systematically toward 0 and 1. The implication is that NutSave, the
previous value of dNut, predicts the next value of dNut. But we know that
this is not true. Random means random: whatever the current value of dNut,
the next value can be anything from the maximum positive to the maximum
negative, and the most probable value is zero.
The logic here is extremely complex, but there is a simple way to see
whether the previous value of dNut is really acting as a reinforcer. Change
the sign of dNut that is saved as NutSave. That is, in
procedure DoTumble;
begin
Tumble(Angle);
JustTumbled := true;
NutSave := dNut; { NutSave is nutrient rate of change
immediately }
end; { after a tumble }
change the last statement to NutSave := -dNut.
We see the probabilities change in the same directions as before, although
now they do not come as close the the limits as before. The model still
progresses toward the target.
Going even farther, we can write
NutSave := dNut * 2.0* (random - 0.5);
... with the same result, only now the probabilities do eventually reach
the limits of 0.005 and 1.0.
The quickest results of all come from simply randomizing NutSave:
NutSave := random - 0.5;
So what is making the probabilities change is not any systematic effect of
NutSave, but something else about the nonlinear and circular geometry of
this situation, combined with the complex logic. I really don't have the
faintest idea why the net effect is swimming up the gradient. The situation
is too complex for me to see how to apply control theory. But it is clear
that the reason is NOT an effect of the previous value of dNut, or of the
change in value across a tumble.
---------------------------------------------------------------------------
[Me writing now]
As you can see, I accepted that your model ran and produced the right
results (qualitatively). At the time I still didn't understand the logic
fully, but even then, by experimenting with the program, I was able to
demonstrate that the "reinforcement" was just getting in the way. The model
converged more quickly when the reinforcing effect was randomized out. At
that time I didn't see that eliminating the reinforcing effect altogether
would work best of all.
Why was I doing this? You had a model that ran and produced the E. coli
effect. Wasn't that enough to validate the reinforcement model? Obviously,
for me it wasn't enough. I had to understand how the model worked. In an
empirical way, I played around with the parts of the model to see which
parts were essential and which weren't. I came close to the right answer,
but still didn't see exactly what the reason was. Now I can say that the
reinforcement aspect of the model is superfluous; the model works even
better without it (including learning).
This leads me to venture a generalization: in _all_ applications of
reinforcement theory, the concept of reinforcement is excess baggage. Given
a behavior that produces a consequence of a certain type, we may observe
that the frequency of that behavior increases, and therefore that the
frequency of the consequence increases. What we do not observe is WHY the
frequency of the behavior and the consequence increases. All we really
observe is that the probability density of the behavior increases, and as a
result the probability density of all its consequences increases. That, and
not reinforcement, is the empirical observation. Reinforcement is a
theoretical notion offered to explain WHY these probabilities increase, and
it is not supported by any evidence.
In the Ecoli model I cited yesterday (which seems to be still a later
version of the one mentioned above) it is clear that if you leave out the
so-called reinforcing condition (Dnut - NutSave positive etc.), convergence
will be faster. This suggests that a different model of learning, in which
the "wrong" cases are minimized or eliminated, would work even better. Once
we open the door to other theories of learning, reinforcement theory can be
seen for what it is: an attempt to keep control of behavior in the
environment where, according to certain philosophical positions, it belongs.
Best,
Bill P.