[From Bill Powers (941104.0740 MST)]

Rick Marken (941103.2100),

Bruce Abbott (941103.2000)--

I see that it took Rick about three hours to pin down the correct

analysis of Bruce's program. I've just looked at it, and I agree with

Rick. It's a nice program, it works, and it is a control-system program

-- but it doesn't have any reinforcement in it. Furthermore, the attempt

to introduce reinforcement by letting past experience carry over into

the present causes a deterioration in performance. I will, of course,

elaborate.

In a repetitive situation, "probability" translates into frequency or

inverse delay. On every iteration, there is a chance that a tumble will

occur. The mean delay to the next tumble is the probability of no tumble

on a given iteration raised to the power that gives a probability of

0.5, so

(1) t = ln(0.5)/(ln(Pr)) where t is measured in iterations.

The two steps

if pTumbleGivenSplus > pMin then

pTumbleGivenSplus := pTumbleGivenSplus - LearnRate;

and

if pTumbleGivenSminus < pMax then

pTumbleGivenSminus := pTumbleGivenSminus + LearnRate;

have the effect of maintaining the total probability of a tumble between

the limits pMin and pMax. pMin is 0.001, and pMax is 0.5. If the

gradient is positive, pTumbleGivenSPlus is gradually decreased toward

pMin, iteration by iteration. From Eq. (1) we can see that this will

decrease Pr, and thus increase t -- the next tumble will be delayed. By

the same reasoning, if the gradient is negative, the next tumble will

occur sooner. The effect is nonlinear, but as with most control systems,

the system works anyway.

The "learning" that takes place occurs in the interval between one

tumble and the next one. Its only effect is to increase and decrease the

delay to the next tumble based on the _current_ sensed gradient. In my

implementation the same thing occurs, but much more simply:

DeltaT := gain*(ref - dNut);

T := T + T0 + DeltaT;

if T > Tmax then tumble;

T0 is a baseline tumbling rate. Of course T has to be kept from going

negative. If you make gain very large, so a single DeltaT is enough to

switch from a very long delay to zero delay, this reduces to your first

program.

Your model will work just as mine does, although not as efficiently

because it is set to slow down the adjustment of delay so that a number

of iterations is required to make any substantial change in the delay.

You use a very small value for LearnRate, presumably so the effect will

carry over from tumble to tumble. This, however, greatly reduces the

efficiency of goal-seeking. If you change LearnRate from 0.00005 to 0.05

(larger by a factor of 1000), you will find the approach to the goal

very much faster, with a far smaller range of random movement around the

goal once it is reached. This is because you _remove_ the carryover from

one tumble to the next. Since any given probability of a tumble is

followed, at random, by either better or worse results, allowing a

carryover of values of the probabilities means that half the time the

next delay will be wrong. Speeding up the "learning" so it is complete

within a fraction of the tumble delay means that you remove this

carryover from one tumble to the next, and that makes the delay

appropriate to the _current_ conditions, not the past conditions. The

dramatic improvement in speed of reaching the target shows the removal

of incorrect delays.

The approach could be made even more efficient by using my systematic

method (or yours) rather than a probabilistic way of adjusting the

delay. With a probabilistic method, if dNut is positive, which should

always call for a long delay, sometimes a short delay will occur, with

about a 50% chance that the next direction of movement will be

unfavorable. Going to a systematic method of adjusting the delay, as

above, or as in your first simulation, will remove those wrong delay-

lengths, and further improve the performance.

If you make the learning rate even smaller than in your initialization,

so that the adjustment of delay is more and more influenced by past

experience in relation to current experience, the performance will

simply get worse and worse, until you approach a random walk.

I also agree with Rick's approval of your modeling approach. When you

lay out your hypothesis in the form of a working model, you make

evaluation of the huypothesis orders of magnitude easier than it would

be if presented only in words.

## ···

----------------------------------------------------------------------

Best,

Bill P.