Ecoli4 reasoning; World models

William_T_Powers3 · November 11, 1994, 4:40pm

[From Bill Powers (941111.0530 MST)]

Bruce Abbott (941110.1650 EST) --

Just to set the record straight, Bill, your analysis is incorrect;
Rick, you were right the first time. The flaw in ECOLI4 is in the
overlooked dNut carryover, as I pointed out in an earlier post, NOT in
the above code. Translated CORRECTLY, the code says the following:

IF nutrients are increasing AFTER a tumble, i.e., reinforcement, THEN
BEGIN
  If S+ was present then increase the probability of a tumble given S+
  else { S- was present } increase the probability of a tumble given
S-
END
ELSE { nutrients decreasing/stable AFTER a tumble, i.e.,
punishment/ext. }
BEGIN
   If S+ was present then decrease the probability of a tumble given
S+
  else { S- was present } decrease the probability of a tumble given
S-
END

Here is an exact copy of the code you sent:

  procedure ReinforceOrPunish;
  begin
    If dNut > 0 then { nutrients rising: reinforcement }
      begin
        If NutSave > 0 then { S+ present when last tumbled }
          begin
            if pTumbleGivenSplus > pMin then
                pTumbleGivenSplus := pTumbleGivenSplus - LearnRate;
          end ^
        else { S- present when last tumbled } |
          begin |
            if pTumbleGivenSminus > pMin then v
              pTumbleGivenSminus := pTumbleGivenSminus - LearnRate;
          end
      end
    else { nutrients not rising: punishment }
      begin
        If NutSave > 0 then { S+ present when last tumbled }
          begin
            if pTumbleGivenSplus < pMax then
              pTumbleGivenSplus := pTumbleGivenSplus + LearnRate;
          end ^
        else { S- present when last tumbled } |
          begin |
            if pTumbleGivenSminus < pMax then v
            pTumbleGivenSminus := pTumbleGivenSminus + LearnRate;
          end;
      end;
  end;

The cases we do NOT see are if dNut > 0 then INCREASE the probability of
a tumble, and if it is < 0 then DECREASE the probability of a tumble.
That is what makes it work. Regardless of the state of NutSave, the
correct relationship between dNut and the change of probability of a
tumble remains correct, although because of using NutSave, half of the
time the wrong probability is changed. That gets corrected the next time
dNut changes sign. The wrong probability, however, is never changed in
the wrong direction for the sign of dNut. Whichever probability you are
using, it should decrease while dNut is positive, and that does happen.
So Ecoli4 eventually gets to the goal, if you wait long enough.

···

----------------------------------------------
RE: Ecoli2:

You correct my analysis this way:

if Speed > 0.3 then Speed := Speed - LearnRate;
implements a little control system which maintains
speed at or less than 0.3.

^greater

This system will act only if speed is greater than 0.3. When it acts, it
reduces speed on each iteration by LearnRate. It will cease to act only
if speed is <= 0.3. So I say this loop maintains the speed at or below
0.3, treating any speed above 0.3 as an error and correcting it. This
loop operates on every iteration during which dNut is positive and speed
is greater than 0.3.

To see what condition a control loop maintains (or tries to maintain),
you have to look at the zero-error condition, the condition under which
the system will cease to act. The error is zero in this loop only for
speed <= 0.3. So I say that is the condition that this "control system"
tries to maintain, and will maintain if there are no external
disturbances tending to increase the speed.

check?

I have indicated above where corrections should be made. [If it's any
consolation, I sometimes confuse the meanings of ">" and "<" myself.]
These corrections, to borrow a phrase, make all the difference in the
world. The only effect of the above code is to increase speed with
each iteration so long as dNut is positive and to decrease speed with
each iteration so long as dNut is negative.

As you can see, confusion of > with < wasn't my problem.

The whole code segment was

if dNut <= 0 then
   begin
     if Speed > 0.3 then Speed := Speed - LearnRate;
      {else} Tumble(Angle);
   end
else
   if Speed < 2.0 then Speed := Speed + LearnRate;

While dNut is <= zero, a one-way control system:

if Speed > 0.3 then Speed := Speed - LearnRate;

acts to decrease the speed toward 0.3. If the speed drops below 0.3,
that control system ceases to act. Also, a tumble occurs, but that has
to be considered a different subsystem: if speed < 0.3 then tumble.

While dNut is > 0, a second control system comes into play, which acts
to keep speed greater than or equal to 2.0. That is, every time the
speed is found to be less than 2.0, it is increased by LearnRate. If
speed is >= 2.0, the system does nothing, showing that this is its
reference or zero-error condition.

The values 0.3 and 2.0 are the respective lower and upper limits of
this adjustment. The lower limit keeps speed positive and above zero,
so that there is always some "strength" of the "operant" (moving
forward). The upper limit just keeps speed from increasing to an
unreasonable amount. (I could have done the same thing without the
limit by using a nonlinear function.) I don't see any additional "one-
way" control systems here.

Now who's confused? If speed is greater than the lower limit of 0.3, it
is not kept positive but is decreased. On successive iterations, it will
continuously decrease until it reaches 0.3 or less. The expression using
the upper limit does not keep speed from increasing; rather, as long as
speed is less than 2.0, it will increase the speed. The effect of the
upper limit is to set the point where the speed will no longer be
increased by the expression "if Speed < 2.0 then Speed := Speed +
LearnRate;". If an external disturbance increased the speed to 4.0, this
expression could do nothing to decrease it. The effect of this
expression is to assure that over successive iterations the speed will
become at least 2.0.

Part of the problem here is in your attributing an effect to the limits
rather than to the expressions that use the limits. A limit of 0.3 does
not "keep the speed positive and above zero." A limit by itself does
nothing to the variable that is compared to it. The active agent here is
the expression that uses the limit: if speed > limit then DECREASE
SPEED. The effect of repetitions of this expression, if it were the only
influence on speed, would be to reduce speed to the limit or somewhat
below it. If the action had been "increase speed" instead, the result
would have been to make the speed become indefinitely larger than the
limit as soon as speed became at all larger than the limit. What makes
the difference in the final result is not the limit, but the operation
that uses the limit as an argument.

Another difficulty is in your concept of "forward progress". As long as
E. coli is swimming, it is making forward progress, but it is not
necessarily progressing up the gradient. Your model could have worked on
the basis of varying swimming speed alone: speed := speed + gain *
dNut. If dNut is positive speed will increase; if negative, it will
decrease. Presumably, tumbles would occur at regular intervals. So on
the average the bacterium would move farther up the gradient than down
it. But of course E. coli does not vary its swimming speed, only the
interval between tumbles.

Whether it succeeds or not, the simulation was intended to model each
step forward as an "operant" response, similar to a keypeck. The
"consequence" of the operant is either increased nutrient level or
decreased/stable nutrient level.

You have to be more careful in defining your variables here. A
consequence of an operant cannot affect the operant that leads to that
consequence, but only the subsequent operant. There are two kinds of
consequence of swimming faster: the _immediate_ effect on the current
value of dNut, and the effect on the value dNut will have after the next
tumble. The first effect is simply proportional; the second averages out
to zero.

Beside the effect of speed on dNut, which is a matter of geometry and
the gradient of Nut, we must also consider the effect of dNut on speed,
which you bring in by saying

Increased nutrient level
(reinforcement) "strengthens" moving forward (faster speed);
decreased/stable nutrient level "weakens" moving forward
(punishment/extinction). When moving forward becomes weak enough,
"other behavior" has time to occur (tumbling).

You have defined a positive feedback loop here:

Increased speed --> increased dNut ( environment relationship)
Increased dNut --> increased speed (organism relationship)

The same rule implies, of course, that we can substituted "decreased"
for "increased" in this pair of expressions.

There are only two ways to keep this system from running away toward
either zero or infinite behavior. One is to weaken the connections so
that the net positive feedback is less than 1. The other is to bring in
some arbitrary means of limiting response -- a nonlinearity or a
physical limit, some means of creating "satiation." The basic model will
simply run away without some such extraneous fudge factor, if the loop
gain is greater than 1.

If the loop gain is 1 or less, then we have a system that either
responds very weakly, or is on the verge of instability. I don't believe
you would be able to fit this model to any real data.

In saying that a tumble "has time to occur" when moving forward becomes
"weak enough," you are implying a model that you don't describe. Is some
subsystem always watching the "strength" of the forward movement (in the
direction of swimming or up the gradient, two different things?) and
instigating a tumble when the "forward movement" is "weak" enough? If by
forward movement you mean dNut, then yes, there should be a system that
institutes a tumble when dNut is below some reference value close to
zero. In fact that system alone will do the trick. But if the system is
only watching swimming speed, then there will be no progress up the
gradient.

The trouble with verbal descriptions of how a system works is that they
are too vague to allow deducing what will happen.
-----------------------------------------

E. coli as modeled here accelerates in a positive nutrient gradient and
decelerated in a negative nutrient gradient. Because these effects
require several iterations, e. coli2 tends to overshoot the mark, slow,
reverse, accelerate, overshoot, ad infinitum, thus spending most of its
time within a circular band around the nutrient source where the
reversals occur, for the same reason that a pendulum spends most of its
time near the top of its swing.

The oscillations, as you say, were caused by the gradual changes in
speed. If you increased LearnRate in the attempt to make the changes
faster, you would just get a larger circular band of overshoots.
However, if you set LearnRate to zero, you would never get a tumble if
speed happened to be greater than 0.3. So the only way to get efficient
behavior is to eliminate the speed control entirely, and just let the
tumbles occur when dNut is less than zero.

In retrospect, I should have modeled reinforcement as acting on the
"probability" of moving forward (pMove) rather than on speed. Because
there is only one other possible behavior (Tumbling), increases in
pMove would imply decreases in pTumble.

Again, loose definitions. "Moving forward" is a qualitative description;
it is also ambiguous, depending on whether "forward" means "in the
direction of swimming" or "up the gradient." Tumbling, likewise, is a
qualitative event: it occurs or it doesn't occur. There is nothing about
the tumbling act itself that can vary. Only some attribute of it can
vary, such as the time at which it occurs, or the interval between
tumbles.

All you can profitably change is the probability of a tumble per unit
time. This is (inversely) equivalent to changing the mean interval
between tumbles. However, doing this probablistically will mean that
sometimes the interval will be too long for an unfavorable dNut, and
sometimes too short for a favorable one. The optimal approach, I
believe, is simply to let the error signal set the interval directly,
without a probabilistic intervening calculation.

These two cases in which my code was misread bring to mind something I
read years ago about the behavior of reviewers of scientific papers.

I don't believe I misread your code. I just saw something in it that you
hadn't seen. You did, after all, get progress toward the goal, albeit
very slow. I was looking for the explanation of how this could happen
despite the fact that previous values of dNut, before a tumble, have no
relationship to the best current interval between tumbles. Perhaps you
and I are simply looking at different sides of the same explanation.
-----------------------------------------------------------------------
Eric (archie) Harnden (941110) --

I am, of course, in enthusiastic support of your proposal to apply HPCT
to a world model. I always worry, however, that such proposals may be
far more complex in the implementation than in their original
conception. Of course that's my own fear of failure speaking -- I don't
mean to discourage you from boldly going where no Powers has gone
before.

One thing I hope you will be able to work into your model: when control
systems are involved, prediction becomes less important than
understanding intentions. Most large-scale models try to extrapolate
from current conditions into the future. The assumption is always that
nothing important will change during the extrapolation period. But with
control systems involved, there are many kinds of changes that simply
don't matter: the intended end result will occur anyway, although the
_behavior_ that produces it will change. For example, you can predict
that if current trends continue, billions of people will starve to
death. But billions of people do not intend to starve to death: they
intend to survive. While many will succumb, others will change their
behavior until they are able to feed themselves as they intend to do.
They will, for example, form marauding bands that steal food, thus
keeping themselves alive at the expense of those who haven't been able
to get guns and ammunition. Scarcity implies conflict, and conflict
between control systems favors those who can find greater resources.

You can't tell how behavior will change when reorganization is involved,
but you can look over the situation and see what changes would be
effective in restoring the goal-condition. You can say that those who do
the changing, and survive, will end up with one of the effective
changes, not any of the ineffective ones. You can even do some guessing
as to how many people will select each of the possible routes to
restoring control.

What control does is to make behaviors in a model more variable, but
outcomes less variable. This might prove interesting to the world-
modelers.
-----------------------------------------------------------------------
Best to all,

Bill P.