[From Bill Powers (950714.1205 MDT)]
Bruce Abbott (950714.1215 EST) --
So we agree that the mechanism involved in behavior maintenance is
different from the mechanism involved in "selection" of behavior, and
that the term "reinforcement," if used at all, should not be used for
both. That's a big step forward.
This latter situation is one in which I have always felt that a
"regulatory" model (as I used to call control-system models)
provided a far simpler account of the observations.
What PCT has to offer is the idea that it is perception not behavior
that is regulated, and that behavior (visible action) is simply the
means by which perception is controlled. The behavior varies with every
disturbance, wherease the perception (known to us as an input to the
organism) remains under regulation -- control.
In the Staddon-Timberlake approach, this issue gets confused by their
use of "contingent behavior" in place of "reinforcement." It is true
that without eating behavior, the reinforcers produced by the
instrumental behavior would not be consumed, but the goal of the
instrumental behavior is not to produce the contingent behavior; it is
to produce the ingestion of food pellets that the contingent behavior
produces. If the instrumental behavior caused the food to be injected
directly into the rat's mouth, no contingent behavior would have to
interrupt the instrumental behavior.
Getting back to the issue of "devil's advocacy," yes, I'm just
trying to convey a general notion of how reinforcement theorists
would approach the problem posed by data such as those relating
rate of responding to rate of reinforcement on various schedules.
Staddon, Timberlake, Allison, Mazur, and others have proposed
specific approaches, and there is currently much debate about the
merits of each relative to the others as well as empirical research
designed to test implications of these models.
The problem with the models I have seen is that they aren't physical
models. Staddon approaches the problem as one of curve-fitting on a
Cartesian plot -- he actually deduces something resembling a reference
level, but as a geometric distance between a point and a curve.
Timberlake does nearly the same thing.
The result of this way of modeling is that the parameters of the model
have no physical significance, not even a proposed physical
significance. The main aim seems to be to find a mathematical form that
will have the same shape as the data plots. This is not modeling, at
least not the kind that goes on in engineering and PCT.
In PCT modeling, every function, every connection, every variable, and
every parameter has at least a proposed physical meaning of its own. The
system equations come out of putting the equations describing each
component of the system into a complete system description. Each
component of the model (such as input function, comparator, and output
function) is intended to represent some actual physical process in the
real system, carried out by physical means.
This is one reason I had reservations about your Ecoli4 model. The logic
circuits I could accept as being physically realizable, but the
reinforcement process, which changed a probability, simply didn't look
physical to me. A probability isn't a physical entity, and changing a
probability isn't a physical operation. I would have much preferred to
see some output device described in a way that _behaved_ as if a
probability were being changed, but which actually operated in ways that
ordinary neural circuits could carry out in a simple way. The
description in terms of probability changes doesn't propose any physical
means of implementing those changes.
In the Hanson-Timberlake model, we have a reference level s for a time-
allocation being set, and an error signal is described as s - t, where t
is the actual time allocation (equations (3)). A pair of equations
involving such error signals is solved simultaneously to predict the
time allocations between two behaviors (instrumental and contingent).
But this is not a model at all. How is a reference level for a time
allocation compared with an actual time allocation? Somehow the actual
time allocation must be sensed, and compared against the reference
allocation, and the difference must be turned into some effect on action
that results in changing time allocation. But that whole part of the
Hanson-Timberlake model is simply omitted; it is as though enough
information is contained in the comparison process alone to allow
predicting behavior! There is clearly no sense of a physical model here,
one in which all processes that lead to behavior must be accounted for.
The schedule of reinforcement that gives the instrumental behavior an
effect of reinforcement rate is not even used in this model!!!.
All that is really going on is fitting of an arbitrarily-selected
mathematical form to the data. There is no physical reason for choosing
this model, and its parameters have no physical meaning, no meaning in
terms of properties of a behaving system's components. The authors have
set up two arbitrary differential equations and found a solution for
them, with no reason to think that these equations have anything to do
with processes in the animal (other than the fact that the resulting
equations can be fit to the data using 6 parameters).
And now that I think of it, the cost
should really go as the square of the behavior rate, not the error --
I'll try that when I'm finished here.
Good, I'm anxious to hear of the result.
It didn't work. Making the cost depend on the square of the behavior
rate and using it to adjust the gain of the system produced a flattening
of the curve on the left, but no downturn no matter what parameter
values I set.
We have to be concerned not only with getting a downturn where it
actually appears, but with making the entire plot scale up and down with
the reference level (as, apparently, it does). My original model will do
this if the threshold for the downturn equation is a fixed fraction of
the reference signal. But I can think of no simple mechanism that would
have such a result.
In the data cited by Timberlake, the authors were taking a
regulatory view of feeding and drinking and so offered food (or
water) under nondeprivation conditions within what was essentially
the home cage, 24 hrs/day.
The problem with the experiment for simple modeling is that the "offer"
of food does not determine how much food will be ingested per
reinforcement. So we can't simply say that the input is the rate of
reinforcement times the reward size. At high rates of reinforcement, the
reward size is small because the animal doesn't eat very fast during the
access. As the schedule becomes harder, the frequency of reinforcement
falls, but the animal eats faster, consuming more per access; the net
amount of reinforcer per unit time does not fall in proportion to the
fall in reinforcement rate. As I said, this calls for a two-level model
even to account for the basic behavior. Why do these people insist on
investigating such COMPLEX behavior, before they can even model simple
behavior?
In one experiment they could earn access to the food or water by
lever-pressing and could then eat/drink as much as they desired in
that bout of access. Failure to access the food/water for 10
minutes ended the bout, after which the rats had to respond on the
lever again to gain access... The authors suggested that this
situation represents a laboratory analog to what the rats would
have to do in the wild. Lever-pressing for access is equivalent to
"foraging" for food or water; once a source was obtained the rat
would then consume a "meal" or "drink" of a size of its own
choosing.
According to our new understanding, this isn't a good analogy. Foraging
is like casting about at random until some behavior is found that
produces food, and then spending more and more time at that behavior.
That's like the "selection" phase. The earning of access by producing a
single kind of well-learned behavior is not like foraging.
The authors' analysis would suggest that meal size or amount drunk
(per bout) and access frequency are both controlled perceptions in
the service of a higher-level system controlling overall
consumption. Constraint of one system (e.g., meal frequency) leads
to error in the higher system which alters the reference level for
the other lower-level system (meal size; drinking bout size) so as
to correct that error; the result is a "compensatory" increase in
meal size or drink size).
Excellent. Good possibility. Actually, I think we will eventually be
able to include the "selection" aspect of reinforcement in this same
picture, and thus replace reinforcement by control entirely.
···
-----------------------------------
O.K., so you're suggesting that on interval schedules, results such
as those reported by Catania and Reynolds indicate that at the
smallest intervals tested, the pigeons were already operating on
the left side of the curve.
Yes, that's what I suspect. If the reward size was large enough, then
reducing the interval requirement even more would have resulted in even
more reinforcement, and eventually we would be on the side of the curve
where the obtained reinforcement rate is approaching the reference
level. In that region, behavior rate falls as reinforcement rate
increases.
The problem with this is that you are using the term "S-R" to refer
to all systems of lineal causality. This is guarenteed to confuse
behaviorists, especially those in EAB, and leads to your being
accused of being about 50 years behind the times. S-R theory as
used in psychology refers to a reinforcement theory (especially the
one developed by Clark Hull and Kenneth Spence) which assumed that
all behavior is essentially a series of reflexive responses to
eliciting stimuli. A specific stimulus was supposed to produce a
specific response just as in the doctor's knee-jerk reflex. This
doctrine was repudiated long ago, although lineal causality is, as
you note, still very much with us.
Yes, this is the situation. When S-R theory was "repudiated," the basic
model of lineal causality was retained. The main change Skinner made was
to say that a _class_ of stimuli leads to a _class_ of responses. But
that didn't change his basic understanding that stimuli enter the
nervous system, pass through it and modify it, and cause behavior. When
I say that S-R theory is alive and well, this is what I am talking
about. The squabble between Skinner and reflexologists looks to me like
the squabbles between Methodists and Baptists; the similarities make the
differences look trivial. But it's hard to see them as trivial without
some very strongly contrasting view with with to compare them -- like
comparing Methodists, Baptists, or Catholics with Buddhists.
-----------------------------------
Models such as Allison's response deprivation model are equilibrium
models but do not control because they do not include this active
opposition element. It should be relatively easy to show which is
correct (equilibrium versus control) by determining whether there
is active resistance to disturbances.
Right. We have to get disturbances into the act.
-----------------------------------------------------------------------
Best,
Bill P.