E. coli model revisited

[From Bill Powers (970813.2015 MDT)]

Bruce Abbott(970813) --

More browsing through the archives. I found this from just before you
arrived at the Ecoli4 program:


[From Bruce Abbott (941115.1400 EST)]

Bill Powers (941114.2200 MST)

The logic here is extremely complex, but there is a simple way to see
whether the previous value of dNut is really acting as a reinforcer.
. . . . .

So who ever said that the previous value of dNut is acting as the reinforcer
here? I recall explicitly stating otherwise! (See below for a recap.)

The quickest results of all come from simply randomizing NutSave:

NutSave := random - 0.5;

So what is making the probabilities change is not any systematic effect
of NutSave, but something else about the nonlinear and circular geometry
of this situation, combined with the complex logic. I really don't have
the faintest idea why the net effect is swimming up the gradient.

[Enterprise makes a run for the Motaran Nebula on impulse power, closely
pursued by Reliant, which has been pirated by Kahn and his crew . . .]

Reliant helmsman, speaking to Kahn: If they go in there we'll loose them.

Kahn: Explain it to them.

(a) NutSave is the dNut value resulting from movement in the new direction
     selected by the last tumble. Its value is a function of the angle
     selected at random by Tumble(Angle), and thus is itself random. Your
     substitutions simply replace one random number with another, which is
     why they have little effect on the model.

(b) NutSave is NOT the reinforcer here, as I took pains to explain. To
     repeat: Certain consequences of tumbling are NOT random. Tumbling
     while moving up the nutrient gradient usually makes things worse;
     tumbling while moving down the nutrient gradient usually makes things
     better. Better = reinforcement, worse = punishment.

(c) A naive e. coli does not know what the consequences of tumbling are.
     All it knows is that it LIKES to go up-gradient, and DISLIKES going

(d) To gain control over nutrient change (dNut), e. coli has only one
     response it can try: tumbling. But it can try this response under
     different conditions and see what results. So it tries tumbling when
     the nutrients are increasing. Because (unknown to e. coli) tumbling
     produces a random change in dNut, the usual result is that nutrient rate
     gets worse. So e. coli learns not to tumble when nutrients are
     increasing. It tries tumbling when nutrients are decreasing. Because
     this usually improves the nutrient rate, e. coli learns to tumble
     immediately when nutrients start decreasing.

The result is that e. coli learns a very efficient control system:

if dNut < 0 then tumble else don't tumble

So what is making the probabilities change is not any systematic effect
of NutSave, but something else about the nonlinear and circular geometry
of this situation, combined with the complex logic.

It is not any systematic effect of NutSave, nor is it a consequence of the
nonlinear and circular geometry of the situation. It is the systematic effect
of the change between pre- and post-tumble nutrient rates, as observed
separately under positive and negative nutrient gradients. The logic is
really not all that complex, once you get to know it--certainly no more
complex than yer average perceptual control system. I hope I've been able to
make that logic clear.

The model is informative, for it tell us what conditions are necessary for
learning in this situation:

(a) e. coli must be able to sense the rate of change in nutrients that
     result from its forward motion.

(b) e. coli must be able to compare the rate of change before and after a
     tumble in order to determine the effect of tumbling.

(c) e. coli must be able to discriminate the results of tumbling separately
     for tumbles that take place while dNut is increasing and while dNut is

(d) the selection mechanism must work so as to favor responses that tend to
     increase dNut and to suppress those that decrease dNut.

I'm still thinking about how to implement this explicitly as a learning
control system. Somehow, experience with the change in nutrient rate
following a tumble in positive and in negative gradients should determine the
form of the function controlling the tumble interval in the lower-level
system. The model would start without a systematic relationship between dNut
and tumble interval; experience would then change the parameters until the
correct function emerged. How about you or Tom or Rick giving it a shot?


This was a very clear statement of the logic of your model; I don't know
why I had any problem with grasping it. Perhaps it might have been because
I didn't believe you were actually attributing this complex reasoning
process to E. coli.

The reasoning process you proposed is certainly one way to figure out that
E. coli should tumble when going down the gradient and not tumble when
going up it. We seem to have a system that can perceive dnut before a
tumble, remember it, and compare it with the new dnut after a tumble. This
gives it the ability to perceive the tumble as an event associated with the
two values of dnut, the previous and current values. Then it can observe,
over many trials, that if dnut is negative before the tumble, a tumble
produces, on the average, an increase in dnut, while if dnut is positive
before the tumble, a tumble produces a decrease in dnut, again on the
average. The system has not only a dnut sensor, but a tumble sensor, as
well as the ability to sense "before" and "after."

Given those empirical facts about the relationship between dnut and
tumbles, the system now has to decide whether it wants dnut to increase or
decrease. If dnut happened to be "drep", a rate of change of concentration
of a repellent, it would want it to decrease; since we're talking about a
nutrient, the system obviously wants it to increase. So it knows what dnut
_should_ be doing, and it can perceive what it _is_ doing; all that is left
is to figure out the action that will produce the right result.

The only action that is available is to change the delay before the next
tumble. The remaining question is which way to change it. The necessary
facts are available: if dnut is negative, a tumble is likely to improve
matters, so the delay should be made short, or even zero. By similar
reasoning, if dnut is positive, a tumble is likely to reduce it, so the
tumble should be postponed, preferably forever. The relationship between
dnut and the delay, since it is evident only over many tumbles, will change
slowly as observations accumulate, so that eventually we arrive at the
following rules:

if dnut <= 0, tumble right away;
if dnut > 0, delay the next tumble as much as possible.

The second rule is logically redundant; the only necessary rule is

if dnut <= 0, tumble.

You said, "The logic is really not all that complex, once you get to know
it--certainly no more complex than yer average perceptual control system."
That may seem true to you, but when you analyse the operations that have to
be carried out, it seems to me that the system you propose is a great deal
more complex than a control system that does the same thing. You are
proposing a system with memory of previous events, that can form
associations between changes in dnut and the value of dnut, and that can
reason out logically the action it must take in order to make dnut keep
increasing. A lot of perceptual and computing machinery is required to
carry out the process you describe. In your program, you supplied all that
machinery. You had to supply it; otherwise the program would not have
worked by itself.
The problem we have here is the essence of the difference between a
descriptive model and a generative model. In a descriptive model, one
doesn't worry about the underlying complexities that it implies. You can
say that the "probability of a response" increases, without having to spell
out the physical processes that produce the change in behavior rate that is
abstractly represented as a change in probability density. You can describe
the apparent rules that apply, without having to imagine the physical
devices that would be necessary to implement those rules. If you can
describe two systems in about the same number of words, it seems that the
two systems must be about equally complex.

The true comparison, however, can be made only when the descriptive model
is unpacked into the processes necessary to make it operate in a physical
system. Then what seems simple at the descriptive level can, and usually
does, prove to be very complex to implement. Just consider the popular
notion that an open-loop system is simpler than a control system. In fact,
when you try to design an open-loop system that will actually accomplish
the same results that the control system accomplishes, you run into all
kind of complications -- for starters, how do you get the open-loop system
to produce repeatable results in a variable environment? Before you know
it, you have festooned your "simple" open-loop system with precision
sensors and equipped it with an entire physical simulation system that any
real organism would have to tow around behind it in a cart. The open-loop
system turns out to be by far, by orders of magnitude, the more complex one.
The other aspect of this problem is that the descriptive model can seem
simple and adequate when in fact it is physically improbable. This is the
problem with the concept of reinforcement. It's easy to say, operationally,
that reinforcement increases the probability of a behavior, but when you
ask HOW it could do so, the answer turns out to be so complex that it casts
suspicion on this very way of describing the situation. If we need a whole
human brain to comprehend the logic of reinforcement, how can we imagine
that a bacterium, or even a rat, is carrying out that same logic? How, in
fact, can we imagine that human beings carry it out so impeccably, when
most of the time they get their logic wrong, especially if they haven't
been trained to use it? The descriptive model, unpacked, is not consistent
with what we can expect organisms actually to be able to do.
The model you proposed works. It gets E. coli up the gradient. It even
"learns" to go up the gradient. But the mechanism that's required to do
this, I would say, is completely impossible as a generative model of E.
coli. There is no way that E. coli, or even creatures much more complex,
could carry out those operations in the computer simulation. The machinery
just isn't there. So if such creatures do accomplish this result, they
can't do it in the way that fits the descriptions of reinforcement theory.
Something else entirely, something much simpler, must be going on.

What we have to remember is that when learning takes place, we do not
actually observe any reinforcing effect of a consequence on an organism or
its behavior. We see that under certain circumstances behavior changes, and
as a result its consequences change. If we do the right experiments, we
find that these consequences are resistant to disturbance. And that is all
we observe. The idea of reinforcement is a proposed _explanation_ of these
observations, one that invokes an unobservable effect of the consequence
back on the organism, and more specifically on its behavior. I do not
believe there is any such reinforcing effect of a consequence on behavior.
That is simply a wrong conception of what is happening.
So what _is_ going on in E. coli? At best, if E. coli could "learn," all
that needs to be going on is a change in the magnitude (and sign) of the
constant k in some relationship describable, for example, as

delay = k0 - k*dnut

This change in k could be brought about by random reorganization, in a way
I have described and simulated. Neither logic nor complex sensing is
required. And there is certainly nothing in this process that could be
described as reinforcement.

I'm not claiming that this last expression is the right model or even a
feasible one. But it's simple enough to qualify as a _possible_ model.


Bill P.