E. coli model; TRT and PCT

William_T_Powers2 · November 2, 1994, 11:14am

[From Bill Powers (941101.1150 MST)]

Rick Marken (941101.1100) --

Your explanations and discussions of the E. coli effect are clear; they
would work if the listener knew something about control theory (as Bruce
Abbott and Sam Saunders do). Where they do not work is with people who
have not yet understood control as a phenomenon. Bruce finds it easy to
translate from TRT (traditional reinforcement theory) to PCT -- even
though we're not quite satisfied yet with the translation -- where
someone else, like your referees, would not see the control process as
controlling perception, but only as controlling behavior.

People who don't understand control tend to set up experimental
situations that make control extremely difficult or even impossible. As
I've mentioned, when an experimenter maintains a rat at 80% of free-
feeding body weight, the result is to make the weight control system
ineffective. This means not only that this control system can't be
studied under these experimental conditions, but that the control system
is going to be in a continuous state of extreme error, having powerful
but unknown effects on the control systems that are subordinate to it,
such as the food-getting systems. And of course, reorganization is
likely to be occurring, which probably explains the 80% criterion: it
would speed learning. However, since the learning does not remove the
intrinsic error, the reorganization would continue and add noise to the
data.

One reason that reinforcement theorists were unfazed by the E. coli
paper was that they didn't see what was controlled: the sensed time rate
of change of distance to the goal (in the human case). The E. coli
situation makes this hard to see, because the random tumbling puts a
very large amount of noise into the system, and because the control
system is a one-way system (you can't have a negative tumbling rate).
The control phenomenon is much easier to see in the FR schedule on
relatively easy schedules -- but there the TRT people have recourse to
"satiation" to explain the effects, and they don't apply disturbances to
reveal that the rate of reinforcement is actually under control.

I've heard it said that there really aren't any "crucial experiments"
that decide the issue between rival theories. Actually, there are in the
case of control theory: it's easy to show that the input, not the
output, is under the organism's control. However, to understand the
demonstration you have to understand control theory just as well as the
theory you've been using, and it's very unusual for a person who
believes in a different theory (and believes that it explains behavior
correctly) to stick with control theory long enough to become equally
expert at using it. Once that stage is reached, no persuasion is
necessary. But that takes a few years of trying. Some stay the course
and some don't.

···

----------------------------------------------------------------------
Tom Bourbon 941101.1620) --

Right, Janet Spence it was. She was a nice lady, until I sent that first
paper to Science (12 years after I'd left her classes). She was one
reviewer of it, and she said that the paper on the face of it didn't
belong in a scientific journal; it was an insult to science. Her review
was so nasty that the editors sent me a copy of her comments (not then
customary) with a remark that this represented a rather extreme reaction
which they were discounting.

I second your seconding of the proposal for analog research with
animals. Actually, TRT researchers have done lots of work with analog
control system in animals, but they've been more like parlor-game
demonstrations or commercial applications; instrumenting behaviors like
walking in a figure-eight or playing ping-pong is pretty difficult. The
existing physical apparatus is best suited for measuring rates of
contact closures, so it pretty much dictates the kinds of behavior that
will be explored. The only thing you can vary about a contact closure is
when it closes, so rates or reciprocal rates are about the only usable
measure. I guess some researchers have put force-transducers under the
keys, but I don't recall that they ever had the animals control the
force.
-----------------------------------------------------------------------
Bruce Abbott (941101.1720 EST) --

I'm disappointed at your judgement that "I'll note that the model as it
now stands produces behavior unlike that actually observed." I hope you
meant "unlike in some respects." Actually, it's very like a certain kind
of behavior called "scalloping." See p. 104 in _Cumulative Record_, Fig.
2. After each reinforcement, there is a period of no responding, and
then the response rate starts low and rises toward a high value just at
the next reinforcement. This isn't self-evident on the plots in my model
because I just show a vertical line at each response; you have to look
closely to see that the lines get closer and closer together just before
the next reinforcement. If you presented them as a cumulative record
they would look just like Fig. 2, with the gain, reward size, and decay
rate set appropriately. By adjusting parameters you can get all degrees
of scalloping from none at all to extreme and any length of PRP.

Performance of well-practiced pigeons under typical deprivation
conditions generally consists of a brief acceleration to a high
terminal rate of about 4 to 6 responses per second, which is then
interrupted by the delivery of the reinforcer.

You can make the model's limiting rate anything you please by putting a
limit on the size of the error signal or on the maximum rate of
pressing. I haven't done that in Operant2, but would be easy to add.
Right now the limiting rate is one press per iteration, which, with dt
set to 0.1 sec, is 10 per second. You'd have to run the plot faster to
see the individual responses.

Ignoring the time required to visit the food magazine and ingest the
grain reinforcer, there is usually little post-reinforcement pause as
long as the fixed ratio is relatively low. At higher ratios (e.g., FR-
40) a brief post-reinforcement pause (PRP) of a few seconds develops.
As the ratio requirement is increased, the length of the PRP
increases..

This would depend greatly on the experimental conditions: how much food
the animal gets on each reinforcement would be the main factor. You can
see this by imagining that the animal gets as much food from one
reinforcement as it normally gets in a bout of feeding. Then even at
high ratios there would be a long PRP. My model assumes that the animal
gets all the reinforcer (a fixed amount each time) in a single instant
and that consuming it and registering its effects takes no time. If the
reward size is small enough, there is never any PRP.
----------------------------------------
The point of the model is to see how its parameters have to be set to
reproduce a given behavior under given circumstances. If the same model
can fit a variety of behaviors, as this one can, but the parameters have
to be changed when the circumstances change, this leads to looking for a
simple addition to the model that will change the parameters
appropriately. Also, I'm not claiming that this model is final -- there
may be better forms that will work over a wider range of conditions.
----------------------------------------
One very important factor needs to be known: which regime the animal was
in. Did behavior decrease as the ratio decreased, or increase? From the
Staddon/Motherall data, it's clear that the control-system relationship
is seen for the easier schedules, but the reverse relationship is seen
for the most demanding ones. What is "demanding" will depend a great
deal on the reward size for a given schedule. Given the standard idea
that an increase in reinforcement increases behavior, wouldn't you
expect experimenters to select conditions that produce the relationship
on the left in Motherall's data? If the experimenters reported data from
which obtained reinforcement rate can be found, then we could tell
whether the animal was in the range where we expect normal control, or
in the other range.

What you describe sounds like behavior "behind the power curve", where a
decrease in amount of reinforcer leads to a decrease in behavior. I
haven't tried the more general model used for the Motherall data in
combination with the Operant2 program, using an error curve with a
maximum in it and a falloff of behavior at high error levels.
------------------------------------------
Also, we REALLY need to know what proportion of the time the animal
actually spends in front of the bar or key, pressing.
------------------------------------------
I'm surprised that blockage of the food delivery resulted in more
scalloping. The lore I had learned was that scalloping is primarily an
effect of too large a reward size.
------------------------------------------
I'm delighted with the quote from Ferster: reference level! Of course he
meant a reference level that HE used in evaluating the weight. But it's
a neat coincidence.
------------------------------------------

Overall response (and reinforcement) rates thus decline with increases
in ratio requirement

Ah, I missed this the first time through. This says we are definitely in
the range on the left side of the Motherall data, and far outside the
range where we would expect the normal control relationships to be seen.
In the control range, reinforcement rates do decline with increases in
ratio requirement, but behavior rate RISES. You see the positive
relationship between reinforcement rate and behavior rate only under
extreme schedules where the payoff in frequency (or I presume amount) of
reinforcement per behavior is very poor.

I really don't want to start out by modeling behavior over such a wide
range. In the Motherall data, 6 of the 8 points are in the normal
control range and a linear control model fits them quite well, as near
as I can tell. The other two points occur only under the most extreme
conditions, and that doesn't strike me as a good way to start the
modeling.

In checking the predictions of Operant2 against data, I think we should
select data in which the correct relationship is seen -- where behavior
decreases as the ratio decreases and reinforcement increases. As you can
see from the Operant1 model, accounting for the whole range requires a
more complicated nonlinear model, which I would like to leave for later.

The Mechner data are interesting, but would require a much more complex
model to explain them.
--------------------------------------------
In your final paragraph:

As I'm sure you know, the reason is that the disturbance is acting to
increase the error signal on the same time-scale on which the responses
are occurring on the ratio.

There is no disturbance in this model, except what is implied by the
decay rate of q1. Is that what you meant?

In the real rat what probably happens is that the error builds
gradually, finally crossing a threshold that then triggers food-seeking
behaviors.

You're thinking of a second level of control that senses a longer-term
effect of the ingested reinforcer. That's why I mentioned a distinction
between a short-term "appetite" control system and a longer-term
"nutrition" control system. You can make the error build up as gradually
as you like by making decay1 smaller (like 1e-6) and reducing the reward
size accordingly. The "threshold" here is simply the fact that when the
perceptual signal is greater than the reference signal by any amount,
the error signal remains at zero. This is a one-way control system;
errors are positive only for inputs less than the reference level. You
could also try an actual error threshold: no behavior below a certain
amount of error.

One reinforcer is not sufficient to cancel this error. This would lead
to the rat's emitting a series of ratio runs, finally canceling the
perceptual error, allowing the rat to go back to "other" behavior.

Yes. A second level of control would accomplish this. But we still need
to account for the scalloping effect that occurs after every
reinforcement, which implies (when it is seen) that individual
reinforcements are more than sufficient to cancel whatever error exists,
until the decay rate brings the sensed input down below the reference
signal again. The higher-level slower system will account for the
"bouts" of feeding, but not for the scalloping during a bout.
------------------------------------
Let me know when you have the new demo program running. Even if you
understand every nuance of it, discussing it will help people who are
less familiar with PCT. I'll post the executable code on Bill Silvert's
server so others can download it and run it (on PCs, of course -- sorry,
Mac users).
-----------------------------------------------------------------------
Sam Saunders (941101.1715 EST)--

An example of a state-change as reinforcer would be from some of the
work of Lattal and others, using a change in the rate of non-contingent
food ("VT schedule") to support key pecking.

The direction of this effect will depend on whether you're in the
control region (the right side of Motherall's plot) or the other region
(see above). If you're in the control region, non-contingent
reinforcements will depress the key pecking rate. In the other region
they will increase it. A non-contingent reinforcement is an example of a
disturbance in the control model.

     One thing that seems to me to be missing in the considerations of
     both Bill and Rick, on the one hand, and Bruce, on the other, is
     the stimulus side of the "three term contingency". In the E-coli,
     and in many other situations, there are both stimulus and
     "reinforcing" aspects to the same event. An alternative model
     might be the following:

     Two 'stimulus contexts':
      A. Increasing gradient
      B. Decreasing gradient

In the E. coli model that behaves like the real organism, we consider
only stimulus effects. The time rate of change of concentration
experienced by the bacterium is represented as a continuous perceptual
signal proportional to that rate (the signal is constant if the rate of
change is constant). Daniel Koshland determined by perfusion experiments
with tethered bacteria that it is in fact the time rate of change of
concentration on which the interval between tumbles (systematically and
reproducibly) depends.

In the model, the perceptual signal is compared with a fixed reference
signal, and the error signal is converted through a gain factor into the
size of an increment of a timing variable; negative errors reduce the
increment, positive errors increase it (the increment is always
positive). When the timing variable reaches a fixed limit, it is reset
to zero and a tumble occurs. That is the totality of the mechanisms in
the model. Perhaps you can find "reinforcement effects" in here in
addition to the stimulus effects, but we did not put them in on purpose.

Two parameters determine the performance: the reference signal setting,
and the gain applied to the error signal to determine changes in the
size of the timer increment. It's not hard to fit the model to human
behavior in Rick's E. Sapiens experiment, so the model approaches the
target in about the same time that the person does, and by a roughly
similar-looking path with about the same number of tumbles. Of course
with the random element we can't reproduce the _exact_ path.

     In A, a 'flip' will be followed by either an equal (increasing
     gradient) or worse (decreasing gradient) consequence, and so would
     be expected to decrease in probability.

     In B, a 'flip' will be followed by either equal (decreasing
     gradient) or better (increasing gradient) consequence, and so
     should increase in probability.

Our model alters the timer increment continuously before the next tumble
on the basis of the present error; this modification continues to occur
all during the delay, so the increment can decrease and then increase
again if the gradient reaches a maximum and starts to go negative before
the next tumble. This gives the same effect as varying the probability
of a tumble, except that the process is systematic, not stochastic.
Actually, I don't know how you would physically create a change in
probability except by some such mechanism.

If a flip is FOLLOWED by an equal or worse consequence, this can have no
effect on the delay before THAT flip, which has already occurred. You
could only use that information to affect the NEXT delay. But a
favorable result is just as likely to occur after the flip following a
short delay as after a long one, so over the long term there would be no
selection for any particular delay. The flips produce truly random
directions of swimming; the result after the flip is no indication of
what the result was before it.

Here's a little thought-problem. It is true that the longer delays go
with the most favorable directions of travel, on the average. So if you
gradually adjust the probabilities of delays of various lengths
according to the gradient existing during those delays, you will
eventually end up with the longest delays being the most probable. But
carry this to an extreme: let us say that the probability of the longest
delay becomes 1, and that of all other delays becomes 0. With any fixed
delay, however, you will get a random walk. So this method fails _a
priori_. End of thought-problem.

If a long delay ended with a flip that produced an improvement in the
situation, basing the next delay on the outcome of that previous delay
or any number of previous delays and consequences will get you nowhere;
that is one of the reinforcement models I tried, and all you get is a
random walk. The delay before the next flip can't take anything before
and including the previous flip into account. The delay has to be based
STRICTLY on the PRESENT-TIME difference between the sensed rate of
change of concentration and the reference level (which you can set to
zero as Bruce did; in E. coli and human subjects it is not zero -- that
is, setting it to zero does not give the best fit to observed
performance).

I suppose it is incumbent on me to render the above in a working
simulation.

Yes. Go ahead and try it, by all means. At least this will mean getting
your graphics going so you can see what is happening. Send us a copy of
your source so we can run it, too. You will find that there is no way to
use information about the previous delay in computing the present delay
-- no way, that is, that will have the slightest effect on the results.

One thing to beware of in modeling (in my vicinity, anyway) is putting
complex functions into the system just because they will create the
required result. An example would be saying that if the previous delay
resulted in a favorable outcome after the tumble, increase the
probability of that delay's occurring again. This won't work, of course,
but the point is that by putting such a computation into the model, you
are claiming that the _bacterium_ is doing this computation. Whatever
you put in the model, you are asserting is done by the bacterium. So it
is incumbent on the modeler to pick computations representing processes
that we can believe actually go on inside E. coli.

Looking forward to your results.
----------------------------------------------------------------------
Best,

Bill P.