[From Bill Powers (941123.1415 MST)]
Bruce Abbott --
I didn't represent your logic correctly. Here is a handy diagram of it:
S(old) > 0 ------------>-
/ 0.37 |
-S(new) > S(old) -> (reward) |
> \ | + Prob
> S(old) < 0 [S(new) > 0 --> next beh]
> \ | - Prob
^ \ --------->-
> \ / 0.63
Prev beh-| / \
> / --------->--
v / 0.63 | + Prob
> S(old) > 0 [S(new) < 0 --> next beh]
> / | - Prob
-S(new) < S(old) -> (punish) |
\ |
S(old) < 0 ------------>-
0.37
I'm feeling even more stupid than I am, so forgive me if I fumble around
with these ideas some more. The main point is that your model works and
it works for the reasons you give. I haven't disputed that point, but I
still have this nagging feeling that there's something the matter here:
it shouldn't be possible to come up with a valid general law strictly on
the basis of describing external observables. Any general law MUST
contain a model of the organism, however well-concealed, because you
can't create a determinate model of a two-dependent-variable system
without two equations. And in all cases where output affects input via
the environment, and input affects output via the organism, you have two
equations in two unknowns: the input and the output.
···
-------------------------------
You say:
Discriminative = what condition was present at the time a tumble
took place(increasing or decreasing nutrient levels).
Reinforcing = what CHANGE in dNut took place following a tumble
(increased nutrient rate = reinforcement, decreased = punishment).
So, if the previous action (Tumble) made S(new) better than S(old)
then
If S(old) was favorable, INCREASE the probability of the action
during future encounters with a favorable (positive) gradient, else
If S(old) was unfavorable, INCREASE the probability of the action
during future encounters with an unfavorable (negative) nutrient
gradient.
Thus improvement in dNut following a tumble reinforces (increases
the probability of) a tumble associated with the discriminative
stimulus present at the time the tumble took place, i.e., S(old).
I don't know what you will make of this, but in fact the logical path
you describe just above is NOT the effective path-- it works the wrong
way, with a probability of 0.37. The effective path (probability 0.63)
is
If the previous action made S(new) WORSE than S(old) then
If s(old) was favorable, DECREASE the probability of the action
that takes place when s(new) is favorable.
We have a peculiar situation here because an increase in the probability
of the behavior in question actually cuts short a move in the favorable
direction while it is going on. What is needed is for something to
_decrease_ the probability of the behavior (tumble) when the current
situation is favorable. In fact, in the diagram above, the most
favorable-appearing situation, in which the previous value of dNut was
favorable and a favorable change has just taken place (so the present
value is even more favorable), the probability of a tumble is
_increased_, which works in the wrong direction. By a chance property of
the situation, however, this turns out to be the least probable path, so
it is most likely that this probability will be decreased in the long
run via a different path.
-----------------------------------
The basic fact that Rick and I overlooked is that the previous value of
dNut does predict the value after a tumble. We had constructed a control
model that works strictly on the basis of the current dNut, and makes no
use either of the previous value or of the change in dNut. So when you
introduced a model that does take these variables into account, and it
worked, we were both very surprised. Your descriptions of this model, as
above, referred to the logical path that was actually least probable, so
the model did not seem to match your description. But you had found a
loophole in the argument that the control model worked where the
operant-conditioning model could not work. That was a very clever, and
annoying, thing for you to do. None of the referees who rejected Rick's
paper found this argument, as far as I know.
So in this situation we have to admit that the Law of Effect model as
you represent it does provide an explanation of the behavior of E. coli,
although perhaps not exactly the short-form explanation you give in
words. This explanation rests on a number of assumptions:
That E. coli's behavior is affected by the current rate of change of
nutrient concentration.
That E. coli's behavior is affected by the previous rate of change
of nutrient concentration, prior to the last tumble.
That E. coli's behavior is affected by the difference between dNut
before and after a tumble.
That the aspect of E. coli's behavior that is affected in either
case is the probability per unit time that a tumble will occur.
In our model, we assume the following:
That E. coli's behavior is affected by the current rate of change of
nutrient concentration.
That the aspect of E. coli's behavior affected by the current rate
of change is the delay before the next tumble.
Since your model embodies "learning", the control model would have to be
modified to include the same thing, roughly as follows:
Ref
- +|
Nut -->dNut --->Comp --> Gain --> Delay--- TUMBLE
> +| |-
>--->Effect of Nut good-> | increment or
--->Effect of Nut bad ->-- decrement gain
Here the delay depends on the error signal which depends on the current
value of dNut; if the gain is positive, dNut will generally be kept
positive. In fact this model uses only present-time values of all
variables. This is essentially Ecol4bwp.pas. It will act properly for
both attractants and repellants (a repellant has a "bad" effect). It
will lead to a path that simultaneously avoids repellants and seeks a
nearby attractant.
------------------------------------
Clearly the control model assumes less and is simpler. We have two
models to explain the same phenomenon, and neither one can be
transformed into the other: they are not equivalent. To accept the
control model is therefore to say that the behavior of E. coli is _not_
governed by reward and punishment, and that it does NOT depend on
previous values of dNut or on the change of dNut across a tumble.
It is still true that IF reward and punishment worked as stated, and
depended on the variables identified in your model, we would see the
same behavior. But it is also true that this behavior can be explained
at least equally well without assuming that reinforcement is taking
place or that previous values of dNut figure into the behavior at all.
---------------------------
Perhaps this needs to be amplified. If it were true that a pellet of
food could act on an organism in such a way as to increase the
probability of the behavior that produced the pellet of food, then it
would be proper to describe operant behavior in that way. This
description rests on certain assumptions about cause and effect and on
how behavior is created. But if those assumptions were factually true,
the discriminative-stimulus-reinforcement explanation would be not just
plausible, but correct.
It is possible for logical and other relationships to be observable in
behavior without their having any causal relationship to the behavior.
If E. coli works as the control model describes it, it is still true
that the previous value of dNut predicts the next value, but this fact
does not enter into the operation of the model. It is just a fact.
Similarly, it may be true that if dNut decreases across a tumble and the
result of the previous tumble was negative, increasing the probability
of a tumble when dNut > 0 would have the right effect -- but the right
effect is actually being achieved by a different and far simpler means.
If two models are not reducible to each other, yet both explain the same
behavior satisfactorily, there are two reactions we could have. One is
to say that you can use whichever model strikes your fancy -- it makes
no difference. The other is to say that only one model can be correct,
and to look for evidence other than the predicted behavior that would
enable us to choose between the models.
In the case of modeling the real E. coli, such a discriminative
experiment has been done. The relationship of tumble delay to dNut was
measured by Koshland with the bacteria tethered (in a gel) so their
attempts to tumble had no effect on the time rate of change of repellant
or attractant that they experienced. It was found that the delay between
tumble-attempts was continuously variable and depended on the present
value of dNut, with a baseline rate of tumbling peculiar to each
bacterium (to me, evidence of an internal reference signal). So clearly
there was no change between the value of dNut prior to and after a
tumble, and DeltaNutRate is ruled out as a factor in determining the
delay between tumbles. I haven't seen the raw data, but the graphs in
Koshland's book of delay versus change in nutrient concentration were
quite smooth, showing no sign that they were generated by a
probabalistic process. The curves suggest an analogue process with only
a small random component.
So given the choice between your model and the control model of E. coli,
I would have no hesitation in choosing the control model, not because
the reinforcement model fails to predict correctly, but because the
critical assumptions behind the reinforcement model have been
experimentally ruled out.
When two models predict the same behavior equally well but are not
equivalent to each other, they are making different claims about the
internal organization of the behaving system. I am of the school that
considers such a situation to be unacceptable. The only thing to do
before either model is actually used is to find some way of testing the
different claims to see which are verifiable. All of my reactions to the
operant conditioning or reinforcement model, now that I think of it,
have been along those lines. If a piece of food has a reinforcing
property today and with this organism, why does it not always reinforce
behavior tomorrow, or with a different organism? This is my way of
testing the idea that there is something about the food pellet that is
inherently reinforcing, that gives the food a deterministic effect on
the organism.
Something similar, done by someone else, accounts for the fact that I
always use random disturbances in tracking experiments, and when
experimenting for real, change to a new disturbance pattern with every
trial. I once submitted a paper on tracking using sine-wave
disturbances, and was criticized because the subjects might be
memorizing the "predictable" disturbance patterns (over a one-minute
run, when the actual pattern of disturbances was never visible). To rule
out memorization, however implausible such an explanation, I switched to
random disturbances where nobody could memorize the pattern, and began
changing the pattern on every run so that even if they did somehow
memorize it, it wouldn't do them any good.
-----------------------------------------------------------------------
Best (dNut > 0),
Bill P.