ECOLI4 Misanalysis

[From Bruce Abbott (941104.2020 EST)]

Rick Marken (941104.1400)

Tell the aliens that ECOLI4 still contains no reinforcement. The stimulus
effects just go back one extra trial. That is, the change in response
probability depends on the current input (dNut) and the prior input
(NutSave) but there is no differential change in the probability of the
response depending on what response was actually made after the prior
stimulus - - ie. there is no reinforcement. Because there is no
reinforcement, E. coli still does its biased random walk to the target; that
is, it controls.

I have consulted with the aliens, and they refuse to accept your analysis.
They say that the response being targeted for "reinforcement" or "punishment"
is Tumbling, and that Tumbling takes place in a Stimulus Context defined by
the sign of dNut, which is, after all, the only thing that e. coli can sense.
A positive dNut is S+ and a negative dNut is S-. They further note that the
Consequences of tumbling in the presence of these Discriminative Stimuli serve
to alter the probability of tumbling (i.e., reinforce or punish the behavior).
These Consequences which follow a given Tumble are a positive dNut
(reinforcement) or a negative dNut (punishment). Consequences of Tumbling-in-
the-presence-of-S+ change the probability of Tumbling in the presence of S+.
Consequences of Tumbling-in-the-presence-of-S- change the future probability
of Tumbling in the presence of S-. This is what is called in the alien's
language the "three-term contingency."

So, if, while experiencing a positive nutrient gradient, a Tumbling response
is followed by reinforcement its probability of occurrence in the (future)
presence of a positive gradient goes down (for the evolutionary reasons given
earlier). It the response produces a negative gradient, this punishes the
response, making it more like to occur when a negative gradient is experienced
in the future. Parallel reasoning holds for the case when a Tumbling response
occurs in the presence of a decreasing nutrient gradient.

Thus the immediate Consequences of Tumbling (one time-cycle later) in the
Presence of a particular gradient alter the future probability of Tumbling
when that gradient is again experienced.

The aliens suggest that you re-read their revered Law of Effect to see if this
simulation does not meet its requirements, keeping in mind the following
definitions:

     Situation in which the response occurs: positive dNut, negative dNut
     Response: Tumbling in the Situation
     Satisfying State of Affairs: positive dNut immediately AFTER Tumbling
     Annoying State of Affairs: negative dNut immediately After Tumbling

You might wish to confirm that the probability of Tumbling|S+ and of
Tumbling>S- begin at equal value, which would lead to a random walk if
Reinforcement and Punishment of Tumbling in the two contexts had no systematic
effects. In addition, in both contexts, tumbling CAN produce either increased
or decreased future tumbling, depending on its immediate consequences in the
next time interval. This e. coli can and will learn to move the wrong way if
by chance the wrong responses are rewarded or punished. In the long run,
however, it learns to make the responses that tend to move it up the nutrient
gradient.

Show the aliens my post (941103.2100) and tell them that they will have a
reinforcement model if they implement the method of updating
pTumbleGivenSplus and pTumbleGivenSminus (I didn't notice the "p"'s before)
that I described in that post. [Hint: in order to do this, the program will
have to keep track, not only of the prior stimulus, but of the response to
that stimulus].

I regret to say that this post to which you refer has yet to emerge from
hyperspace; I am beginning to fear that it has been swallowed by a black hole.
For this reason I am unable to show it to the aliens at this stardate. Yet
strangely, I have received Bill Powers's comment on your post and your comment
on Bill's comment on your post.

Be that as it may, the aliens say that the model DOES keep track, "not only of
the prior stimulus, but of the response to that stimulus." It is the boolean
variable JustTumbled. [The aliens are assuming that by "prior stimulus" you
mean S+ or S- and that by "response" you mean Tumbled or not-Tumbled.] Yet
the model strangely refuses to do what you claim is should do (random walk)
and instead gradually learns to stay close to the source of nutrient.

Tell the aliens that they'll know that they've got the reinforcement
model right when E. coli just does a random walk around the screen, sometimes
getting to the target by chance but usually not.

I relayed your message, but the aliens only smirked (or at least it looked
like a smirk) at the circularity of your statement. Despite their
technological backwardness, they immediately saw that this assertion denies
the validity of any empirical test which fails to come out as you predict.
If there is no empirical test that will lead you to abandon your belief, then
your belief is untestable and hence unscientific.

At this point I would like to speak for myself rather than for the aliens. I
would agree that a control systems approach to this particular problem is far
simpler and, as Spock would say, logical. When I get another free moment, I
will try a control-systems simulation of a LEARNING e. coli so that we can
show the aliens just how superior our technology is.

Animistically,

Bruce