[From Bill Powers (970813.2015 MDT)]

Bruce Abbott(970813) --

More browsing through the archives. I found this from just before you

arrived at the Ecoli4 program:

## ···

----------------------------------------------------------------

[From Bruce Abbott (941115.1400 EST)]

Bill Powers (941114.2200 MST)

The logic here is extremely complex, but there is a simple way to see

whether the previous value of dNut is really acting as a reinforcer.

. . . . .

So who ever said that the previous value of dNut is acting as the reinforcer

here? I recall explicitly stating otherwise! (See below for a recap.)

The quickest results of all come from simply randomizing NutSave:

NutSave := random - 0.5;

So what is making the probabilities change is not any systematic effect

of NutSave, but something else about the nonlinear and circular geometry

of this situation, combined with the complex logic. I really don't have

the faintest idea why the net effect is swimming up the gradient.

[Enterprise makes a run for the Motaran Nebula on impulse power, closely

pursued by Reliant, which has been pirated by Kahn and his crew . . .]

Reliant helmsman, speaking to Kahn: If they go in there we'll loose them.

Kahn: Explain it to them.

(a) NutSave is the dNut value resulting from movement in the new direction

selected by the last tumble. Its value is a function of the angle

selected at random by Tumble(Angle), and thus is itself random. Your

substitutions simply replace one random number with another, which is

why they have little effect on the model.

(b) NutSave is NOT the reinforcer here, as I took pains to explain. To

repeat: Certain consequences of tumbling are NOT random. Tumbling

while moving up the nutrient gradient usually makes things worse;

tumbling while moving down the nutrient gradient usually makes things

better. Better = reinforcement, worse = punishment.

(c) A naive e. coli does not know what the consequences of tumbling are.

All it knows is that it LIKES to go up-gradient, and DISLIKES going

down-gradient.

(d) To gain control over nutrient change (dNut), e. coli has only one

response it can try: tumbling. But it can try this response under

different conditions and see what results. So it tries tumbling when

the nutrients are increasing. Because (unknown to e. coli) tumbling

produces a random change in dNut, the usual result is that nutrient rate

gets worse. So e. coli learns not to tumble when nutrients are

increasing. It tries tumbling when nutrients are decreasing. Because

this usually improves the nutrient rate, e. coli learns to tumble

immediately when nutrients start decreasing.

The result is that e. coli learns a very efficient control system:

if dNut < 0 then tumble else don't tumble

So what is making the probabilities change is not any systematic effect

of NutSave, but something else about the nonlinear and circular geometry

of this situation, combined with the complex logic.

It is not any systematic effect of NutSave, nor is it a consequence of the

nonlinear and circular geometry of the situation. It is the systematic effect

of the change between pre- and post-tumble nutrient rates, as observed

separately under positive and negative nutrient gradients. The logic is

really not all that complex, once you get to know it--certainly no more

complex than yer average perceptual control system. I hope I've been able to

make that logic clear.

The model is informative, for it tell us what conditions are necessary for

learning in this situation:

(a) e. coli must be able to sense the rate of change in nutrients that

result from its forward motion.

(b) e. coli must be able to compare the rate of change before and after a

tumble in order to determine the effect of tumbling.

(c) e. coli must be able to discriminate the results of tumbling separately

for tumbles that take place while dNut is increasing and while dNut is

decreasing.

(d) the selection mechanism must work so as to favor responses that tend to

increase dNut and to suppress those that decrease dNut.

I'm still thinking about how to implement this explicitly as a learning

control system. Somehow, experience with the change in nutrient rate

following a tumble in positive and in negative gradients should determine the

form of the function controlling the tumble interval in the lower-level

system. The model would start without a systematic relationship between dNut

and tumble interval; experience would then change the parameters until the

correct function emerged. How about you or Tom or Rick giving it a shot?

Regards,

Bruce

--------------------------------------------------------------------------

This was a very clear statement of the logic of your model; I don't know

why I had any problem with grasping it. Perhaps it might have been because

I didn't believe you were actually attributing this complex reasoning

process to E. coli.

The reasoning process you proposed is certainly one way to figure out that

E. coli should tumble when going down the gradient and not tumble when

going up it. We seem to have a system that can perceive dnut before a

tumble, remember it, and compare it with the new dnut after a tumble. This

gives it the ability to perceive the tumble as an event associated with the

two values of dnut, the previous and current values. Then it can observe,

over many trials, that if dnut is negative before the tumble, a tumble

produces, on the average, an increase in dnut, while if dnut is positive

before the tumble, a tumble produces a decrease in dnut, again on the

average. The system has not only a dnut sensor, but a tumble sensor, as

well as the ability to sense "before" and "after."

Given those empirical facts about the relationship between dnut and

tumbles, the system now has to decide whether it wants dnut to increase or

decrease. If dnut happened to be "drep", a rate of change of concentration

of a repellent, it would want it to decrease; since we're talking about a

nutrient, the system obviously wants it to increase. So it knows what dnut

_should_ be doing, and it can perceive what it _is_ doing; all that is left

is to figure out the action that will produce the right result.

The only action that is available is to change the delay before the next

tumble. The remaining question is which way to change it. The necessary

facts are available: if dnut is negative, a tumble is likely to improve

matters, so the delay should be made short, or even zero. By similar

reasoning, if dnut is positive, a tumble is likely to reduce it, so the

tumble should be postponed, preferably forever. The relationship between

dnut and the delay, since it is evident only over many tumbles, will change

slowly as observations accumulate, so that eventually we arrive at the

following rules:

if dnut <= 0, tumble right away;

if dnut > 0, delay the next tumble as much as possible.

The second rule is logically redundant; the only necessary rule is

if dnut <= 0, tumble.

You said, "The logic is really not all that complex, once you get to know

it--certainly no more complex than yer average perceptual control system."

That may seem true to you, but when you analyse the operations that have to

be carried out, it seems to me that the system you propose is a great deal

more complex than a control system that does the same thing. You are

proposing a system with memory of previous events, that can form

associations between changes in dnut and the value of dnut, and that can

reason out logically the action it must take in order to make dnut keep

increasing. A lot of perceptual and computing machinery is required to

carry out the process you describe. In your program, you supplied all that

machinery. You had to supply it; otherwise the program would not have

worked by itself.

------------------------------------

The problem we have here is the essence of the difference between a

descriptive model and a generative model. In a descriptive model, one

doesn't worry about the underlying complexities that it implies. You can

say that the "probability of a response" increases, without having to spell

out the physical processes that produce the change in behavior rate that is

abstractly represented as a change in probability density. You can describe

the apparent rules that apply, without having to imagine the physical

devices that would be necessary to implement those rules. If you can

describe two systems in about the same number of words, it seems that the

two systems must be about equally complex.

The true comparison, however, can be made only when the descriptive model

is unpacked into the processes necessary to make it operate in a physical

system. Then what seems simple at the descriptive level can, and usually

does, prove to be very complex to implement. Just consider the popular

notion that an open-loop system is simpler than a control system. In fact,

when you try to design an open-loop system that will actually accomplish

the same results that the control system accomplishes, you run into all

kind of complications -- for starters, how do you get the open-loop system

to produce repeatable results in a variable environment? Before you know

it, you have festooned your "simple" open-loop system with precision

sensors and equipped it with an entire physical simulation system that any

real organism would have to tow around behind it in a cart. The open-loop

system turns out to be by far, by orders of magnitude, the more complex one.

------------------------------------

The other aspect of this problem is that the descriptive model can seem

simple and adequate when in fact it is physically improbable. This is the

problem with the concept of reinforcement. It's easy to say, operationally,

that reinforcement increases the probability of a behavior, but when you

ask HOW it could do so, the answer turns out to be so complex that it casts

suspicion on this very way of describing the situation. If we need a whole

human brain to comprehend the logic of reinforcement, how can we imagine

that a bacterium, or even a rat, is carrying out that same logic? How, in

fact, can we imagine that human beings carry it out so impeccably, when

most of the time they get their logic wrong, especially if they haven't

been trained to use it? The descriptive model, unpacked, is not consistent

with what we can expect organisms actually to be able to do.

-----------------------------------

The model you proposed works. It gets E. coli up the gradient. It even

"learns" to go up the gradient. But the mechanism that's required to do

this, I would say, is completely impossible as a generative model of E.

coli. There is no way that E. coli, or even creatures much more complex,

could carry out those operations in the computer simulation. The machinery

just isn't there. So if such creatures do accomplish this result, they

can't do it in the way that fits the descriptions of reinforcement theory.

Something else entirely, something much simpler, must be going on.

What we have to remember is that when learning takes place, we do not

actually observe any reinforcing effect of a consequence on an organism or

its behavior. We see that under certain circumstances behavior changes, and

as a result its consequences change. If we do the right experiments, we

find that these consequences are resistant to disturbance. And that is all

we observe. The idea of reinforcement is a proposed _explanation_ of these

observations, one that invokes an unobservable effect of the consequence

back on the organism, and more specifically on its behavior. I do not

believe there is any such reinforcing effect of a consequence on behavior.

That is simply a wrong conception of what is happening.

----------------------------------------

So what _is_ going on in E. coli? At best, if E. coli could "learn," all

that needs to be going on is a change in the magnitude (and sign) of the

constant k in some relationship describable, for example, as

delay = k0 - k*dnut

This change in k could be brought about by random reorganization, in a way

I have described and simulated. Neither logic nor complex sensing is

required. And there is certainly nothing in this process that could be

described as reinforcement.

I'm not claiming that this last expression is the right model or even a

feasible one. But it's simple enough to qualify as a _possible_ model.

Best,

Bill P.