[From Bill Powers (941116.0610 MST)]
Rick Marken (941114.1630) --
Let's try this again:
1) reinforcement theory is NOT the same as control theory.
2) reinforcement theory is NOT about what control theory is about --
control.
3) reinforcement theory is a completely and utterly incorrect
explanation of the behavior of organisms.
This is what we're trying to prove, isn't it? Bruce is offering
explanations in terms of reinforcement theory, trying to prove that
reinforcement theory works to explain the E. coli model of Ecoli4a. His
thesis is that yes, the PCT explanation works, but the reinforcement
approach works, too. So it does us no good to keep showing that the PCT
model works -- that has already been accepted.
What Bruce is doing in Ecoli4a is trying to model the _learning_ of the
control system on the basis of experience with the consequences of
particular time delays in terms of effect on dNut. When the learning is
complete, we have an E. coli model that works in the usual way, as a
control system. The desired consequence is a positive time rate of
change of concentration; the means of varying this consequence is
varying the delay before the next tumble. I think this model could work,
also I think the S+ and S- business is overelaborate. But I also think
the program does work as Bruce says it does, so we have to get that
straightened out. When we are sure that the verbal description fits the
model, or vice versa, then we can argue about reinforcement theory.
···
-------------------------------
After doing this demo, let me know if you think that reinforcement
theory merits anything other than a hearty belly laugh?
I think that's a pretty insulting thing to say to a person who honestly
considers reinforcement theory to be viable and has not yet agreed that
it is not. If we have to use emotional pressure (fear of being the only
one who doesn't get the joke) to make our point, then we're on shaky
ground. And we would be doing exactly the same thing that we complain
about: dismissing someone else's model without understanding it simply
because it seems to do something we consider impossible.
I'm not impressed by Bruce's faith in reinforcement theory. Neither
should he be swayed by the intensity of our belief in control theory.
This controversy has to be settled on scientific grounds, not on grounds
of who can defend his faith the most vehemently.
------------------------------------------------------------------------
Bruce Abbott (941115.1400 EST) --
Kahn: Explain it to them.
(a) NutSave is the dNut value resulting from movement in the new
direction selected by the last tumble. Its value is a function of
the angle selected at random by Tumble(Angle), and thus is itself
random.
Not so. Look at DoTumble. First it calculates a new Angle. Then it sets
JustTumbled := true. Then it sets NutSave := dNut. Notice that dNut is
not recalculated; it is the value remaining after the last calculation
(dNut := NewNut - NutCon), which was done prior to the tumble. You would
get the same value of dNut if you did this assignment statement before
the tumble(Angle) statement, because the new angle hasn't been used yet.
So NutSave is the value that dNut had just before the tumble, not just
after it.
If you wanted dNut to be the value after the tumble, you would have to
calculate NewNut using NutConcen(EcoliX,EcoliY) and the new Angle, then
get a new value of dNut from dNut := Newnut - NutCon, then say NutSave
:= dNut. But then your program wouldn't work.
--------------------------------------
(b) NutSave is NOT the reinforcer here, as I took pains to explain.
To repeat: Certain consequences of tumbling are NOT random.
Tumbling while moving up the nutrient gradient usually makes things
worse; tumbling while moving down the nutrient gradient usually
makes things better. Better = reinforcement, worse = punishment.
OK, see below.
---------------------------------
(c) A naive e. coli does not know what the consequences of tumbling
are. All it knows is that it LIKES to go up-gradient, and DISLIKES
going down-gradient.
Right. So if it senses a negative time rate of change of concentration,
it shortens the delay, by speeding up the countdown to the trigger point
(using my method), and if it senses a positive rate, it lengthens the
delay. There is nothing it can do about the tumbling itself: that is a
stereotyped episode during which some of the flagellum motors are
briefly reversed and then begin spinning forward again.
----------------------------------
(d) To gain control over nutrient change (dNut), e. coli has only one
response it can try: tumbling.
But it has no way of varying the tumbling itself. The only output
variable it can profitably affect is the parameter in its own output
apparatus that determines the length of the delay before the next
tumble. Let's keep this straight: the "behavior" that is being varied by
the system is the delay-time, not the tumble. The tumble itself is
unchangeable: it isn't a system variable.
But it can try this response under different conditions and see
what results. So it tries tumbling when the nutrients are
increasing. Because (unknown to e. coli) tumbling produces a
random change in dNut, the usual result is that nutrient rate
gets worse. So e. coli learns not to tumble when nutrients are
increasing. It tries tumbling when nutrients are decreasing.
Because this usually improves the nutrient rate, e. coli learns to
tumble immediately when nutrients start decreasing.
Right, this is how a learning process would have to work; it is actually
what you programmed, although not how you described the program in
words. And what is learned is not "when to tumble" but the value of a
parameter that affects the length of the delay before the next tumble.
As I understand your approach, you're saying that if dNut is positive,
then a tumble is more likely to decrease dNut than to increase it
further. Similarly, the more negative dNut is, the more likely it is
that the next value of dNut will be less negative. I agree with this.
There are more values of dNut less than a positive value than there are
greater than that positive value, etc...
So we should write the program, as you did, to remember dNut before the
tumble and compare it with dNut immediately after the tumble (which
would require waiting one iteration's worth of swimming, then
calculating a new value of dNut to compare with the saved value). If the
old value of dNut was at or greater than the reference value (we can
assume r = zero), then a positive change in dNut should result in
increasing the factor relating delay to dNut. Otherwise it should result
in decreasing that factor.
This moves me to rearrange part of your program:
ORIGINAL:
If DeltaNutRate > 0 then { Nutrient rate increased by tumble: reinforce }
begin { tumbling }
If NutSave > 0 then { If S+ present during tumble then }
begin { increase probability of tumble given S+ }
pTumbleGivenSplus := pTumbleGivenSplus + LearnRate;
if pTumbleGivenSplus > pMax then pTumbleGivenSplus := pMax;
end
else { S- present when last tumbled then }
begin { increase probability of tumble given S- }
pTumbleGivenSminus := pTumbleGivenSminus + LearnRate;
if pTumbleGivenSminus > pMax then pTumbleGivenSminus := pMax;
end
end
else
if DeltaNutrate <= 0 then { Nutrient rate decreased by tumble: punish }
If NutSave > 0 then { If S+ present when last tumbled then }
begin { decrease probability of tumble given S+ }
pTumbleGivenSplus := pTumbleGivenSplus - LearnRate;
if pTumbleGivenSplus < pMin then pTumbleGivenSplus := pMin;
end
else { If S- present when last tumbled then }
begin { increase probability of tumble given S- }
pTumbleGivenSminus := pTumbleGivenSminus + LearnRate;
if pTumbleGivenSminus > pMax then pTumbleGivenSminus := pMax;
end
REARRANGEMENT:
If NutSave > 0 then { If S+ present during tumble then}
begin { increase probability of tumble given S+}
If DeltaNutRate > 0 then { Nutrient rate increased by tumble: reward}
begin
pTumbleGivenSplus := pTumbleGivenSplus + LearnRate;
if pTumbleGivenSplus > pMax then pTumbleGivenSplus := pMax;
end
else
begin { decrease probability of tumble given S+}
pTumbleGivenSplus := pTumbleGivenSplus - LearnRate;
if pTumbleGivenSplus < pMin then pTumbleGivenSplus := pMin;
end
end
else
if NutSave <= 0 then { If S+ present when last tumbled then}
begin { decrease probability of tumble given S+}
if DeltaNutrate > 0 then { Nutrient rate decreased by tumble: punish}
begin
pTumbleGivenSplus := pTumbleGivenSplus + LearnRate;
if pTumbleGivenSplus < pMin then pTumbleGivenSplus := pMin;
end
else
begin
pTumbleGivenSminus := pTumbleGivenSminus - LearnRate;
if pTumbleGivenSminus < pMin then pTumbleGivenSminus := pMin;
end;
end;
----------------------------------------------------------------------
I hope I have done the transformation correctly.
Now at last I think I can see what's going on. Consider the
major case in which NutSave > 0. This means that the previous
dNut was positive, so the most likely change in dNut,
DeltaNutRate, will be negative. The most likely change in
pTumbleGivenSplus is therefore a decrease; decreases will
outnumber increases, as I found out by counting them. A
decrease in pTumbleGivenSplus is equivalent to an increase in
the delay used when dNut is positive.
The same reasoning shows why the most likely change in
pTumbleGivenSminus is positive, meaning that the delays when
dNut is negative will be shorter.
I couldn't see this before when the logic was organized with
DeltaNutRate as the major "if" statement: the probabilities
were split between the two major cases.
---------------------------------------------------------------
I'm still thinking about how to implement this explicitly as a
learning control system. Somehow, experience with the change
in nutrient rate following a tumble in positive and in
negative gradients should determine the form of the function
controlling the tumble interval in the lower-level system.
The model would start without a systematic relationship
between dNut and tumble interval; experience would then change
the parameters until the correct function emerged. How about
you or Tom or Rick giving it a shot?
There are several things we need to do to make the model more
realistic. The logic hides the action of the comparator because
there is either a positive error or a negative error and never
zero error. Also, we have to have two different delay times
which simply trend inevitably to a limit of maximum or minimum.
The "learning" system has usurped the role of the control
system, because it is the learning system that now determines
the length of delay, rather than the error signal in the
control system. In the PCT model, the error signal varies the
delay time, making it long when dNut is positive and short when
dNut is negative. In yours, the delay time is either maximum
(positive error) or zero (negative error) after the learning is
finished, and all the control system can to is select the long
or the short delay.
What I would like to see is a model in which the error signal
continues to adjust the delay proportionally (because that is
what is observed in the real E. coli), and to make the learning
adjust the gain factor between the error signal and the changes
in the delay. Furthermore, I would like to see the learning
system vary this gain factor in a way that would enable the
model to distinguish between an attractant and a repellent,
reversing the gain for a repellent. And I would like to see
this work with just a single delay mechanism, not a different
delay for each logical condition.
I will work on this. E. coli, as far as I know, does not learn
these mechanisms, but perhaps what we learn by doing this model
will apply to systems that do learn.
---------------------------------------------------------------
Best,
Bill P.