Ecoli3 model

[From Bill Powers (941104.0740 MST)]

Rick Marken (941103.2100),
Bruce Abbott (941103.2000)--

I see that it took Rick about three hours to pin down the correct
analysis of Bruce's program. I've just looked at it, and I agree with
Rick. It's a nice program, it works, and it is a control-system program
-- but it doesn't have any reinforcement in it. Furthermore, the attempt
to introduce reinforcement by letting past experience carry over into
the present causes a deterioration in performance. I will, of course,
elaborate.

In a repetitive situation, "probability" translates into frequency or
inverse delay. On every iteration, there is a chance that a tumble will
occur. The mean delay to the next tumble is the probability of no tumble
on a given iteration raised to the power that gives a probability of
0.5, so

(1) t = ln(0.5)/(ln(Pr)) where t is measured in iterations.

The two steps

      if pTumbleGivenSplus > pMin then
          pTumbleGivenSplus := pTumbleGivenSplus - LearnRate;
and
      if pTumbleGivenSminus < pMax then
          pTumbleGivenSminus := pTumbleGivenSminus + LearnRate;

have the effect of maintaining the total probability of a tumble between
the limits pMin and pMax. pMin is 0.001, and pMax is 0.5. If the
gradient is positive, pTumbleGivenSPlus is gradually decreased toward
pMin, iteration by iteration. From Eq. (1) we can see that this will
decrease Pr, and thus increase t -- the next tumble will be delayed. By
the same reasoning, if the gradient is negative, the next tumble will
occur sooner. The effect is nonlinear, but as with most control systems,
the system works anyway.

The "learning" that takes place occurs in the interval between one
tumble and the next one. Its only effect is to increase and decrease the
delay to the next tumble based on the _current_ sensed gradient. In my
implementation the same thing occurs, but much more simply:

DeltaT := gain*(ref - dNut);
T := T + T0 + DeltaT;
if T > Tmax then tumble;

T0 is a baseline tumbling rate. Of course T has to be kept from going
negative. If you make gain very large, so a single DeltaT is enough to
switch from a very long delay to zero delay, this reduces to your first
program.

Your model will work just as mine does, although not as efficiently
because it is set to slow down the adjustment of delay so that a number
of iterations is required to make any substantial change in the delay.

You use a very small value for LearnRate, presumably so the effect will
carry over from tumble to tumble. This, however, greatly reduces the
efficiency of goal-seeking. If you change LearnRate from 0.00005 to 0.05
(larger by a factor of 1000), you will find the approach to the goal
very much faster, with a far smaller range of random movement around the
goal once it is reached. This is because you _remove_ the carryover from
one tumble to the next. Since any given probability of a tumble is
followed, at random, by either better or worse results, allowing a
carryover of values of the probabilities means that half the time the
next delay will be wrong. Speeding up the "learning" so it is complete
within a fraction of the tumble delay means that you remove this
carryover from one tumble to the next, and that makes the delay
appropriate to the _current_ conditions, not the past conditions. The
dramatic improvement in speed of reaching the target shows the removal
of incorrect delays.

The approach could be made even more efficient by using my systematic
method (or yours) rather than a probabilistic way of adjusting the
delay. With a probabilistic method, if dNut is positive, which should
always call for a long delay, sometimes a short delay will occur, with
about a 50% chance that the next direction of movement will be
unfavorable. Going to a systematic method of adjusting the delay, as
above, or as in your first simulation, will remove those wrong delay-
lengths, and further improve the performance.

If you make the learning rate even smaller than in your initialization,
so that the adjustment of delay is more and more influenced by past
experience in relation to current experience, the performance will
simply get worse and worse, until you approach a random walk.

I also agree with Rick's approval of your modeling approach. When you
lay out your hypothesis in the form of a working model, you make
evaluation of the huypothesis orders of magnitude easier than it would
be if presented only in words.

···

----------------------------------------------------------------------
Best,

Bill P.