ratio model; comparing PCT and EAB

[From Bill Powers (950706.0905 MDT)]

Bruce Abbott (950705.1815 EST) --

     Looks like I need to clarify this satiation concept. I'm viewing
     the control system involved as comprising two levels. The top
     level involves something like nutrient level and/or stomach
     loading; let's just call it "hunger" perception. This perception
     varies from less than zero to some maximum positive value. Zero
     represents the absence of hunger and the beginning of satiation
     (as satiation increases the values go negative); it is also the
     presumed set point of the hunger system. [Note: This is a
     considerable simplification of the real system/systems involved.]

     Food deprivation increases the hunger level. As hunger rises above
     its reference level this produces a rising error signal, which
     represents "hunger motivation." This signal serves as the
     reference for the lower-level system, which sets the desired rate
     of food consumption (which is controlled via rate of lever pressing
     on the ratio schedule).

     If rate of food consumption is low enough relative to the rate at
     which the food is "burned," deprivation level will increase over
     time and hunger perception will increase toward its maximum (if not
     already there). At typical schedule parameters, however, the rate
     of food consumption is large enough to reduce deprivation level
     over time. Thus lever-pressing serves to reduce error in the
     "hunger" system (but not rapidly) over the course of a session.

So far so good, but I would make some small modifications. The argument
is simplified if we just identify the state of hunger as being
proportional to the deprivation -- i.e, to the error signal. So the only
reference signal needed is for the stomach loading or whatever -- a non-
zero reference signal specifying a non-zero level of the controlled
variable. The relationship of the consciously-experienced state we call
hunger to the variables in these low-level control systems is not
necessarily straightforward -- the subjective experience of hunger may
be associated with sensed efforts in lower-level systems to do something
about error signals (such as churning in the stomach, etc.).

If we just say that there is a reference signal for average stomach
loading, then "satiation" remains identical to the state of zero error,
and corresponds to a nonzero perceptual signal matching a nonzero
reference signal. You don't need a second reference signal, set to zero,
for "hunger." We eliminate the "hunger" control system and use just a
stomach-loading control system. It amounts to the same thing, but is
simpler and more physiological.

So this gives us two levels of satiation: one associated with control of
rate of ingestion, and the second associated with control of stomach
loading. I would like to use a uniform definition of satiation that is
consistent with our operational definition of reference level. The
reference level of a variable is that level of the variable at which
behavior just goes to zero. In the same way, we can say that the
satiation level of a variable is that level of the variable at which
behavior just goes to zero. The degree of deprivation, which is the
error signal, is then just the reciprocal of the degree of satiation.
When error is exactly zero, we have infinite satiation.

     FOOTNOTE: The problem with the concept of satiation is the same as
     with the concept of utility. Both are awkward measures when
     converted into control-system terms. The most satiation you can get
     is complete satiation, just as the most utility you can get is 100%
     of the maximum possible utility. It's hard to conceive of "too much
     utility", and it's hard to conceive of "too much satiation." The
     problem is that these terms contain no provision for algebraic
     signs, whereas an error signal is inherently signed, passing
     through zero where the sign changes. Both utility and satiation
     vary more or less as the reciprocal of absolute error, or perhaps
     as k/(1 + |error|), so there is a finite maximum of k units.

Back to text:

     What does happen from a reinforcement point of view is that as the
     ratio requirement increases the net reinforcement (benefit - cost)
     maintaining behavior on a given ratio schedule decreases. At some
     point the net reinforcement reaches zero and the response
     extinguishes.

But think about this. As the ratio requirement increases, the net
reinforcement decreases, and at some point the response extinguishes.
But what we observe in the Motheral curves is that as the ratio
increases and the reinforcement decreases, the behavior rate
_increases_. We observe the decrease in reinforcement, and can
reasonably deduce that the _net_ reinforcement is decreasing even faster
than what we observe because of the increased costs, but we do not
observe a decrease in the behavior rate. Instead, we observe an
increase. Only when the schedule ratio exceeds a certain amount, and the
reinforcement rate falls below a certain small percentage of the
reference level, do we start to see a decline in the behavior rate (and
eventually, we presume, extinction). What you are talking about is only
the region to the left of peak in the Motheral curve. To the right of
the peak, we do NOT observe the relationship you describe above.

     This may come as a surprise, but the amount of basic research
     Skinner published is rather miniscule. Most of the early results
     in this field were described mostly in qualitative terms (e.g.,
     patterns of behavior that develop on a given schedule; whether one
     schedule supports higher rates of responding than another when
     reinforcement rates are equated, etc.) Almost no work was done
     with ratio schedules, and most of this did not focus on the effect
     of varying the ratio requirement.

That is indeed a surprise! So where DID Skinner get the idea that an
increment in reinforcement results in an increment in behavior? Was it
simply from the commonsense notion that if you offer a reward for
behavior, you will get more behavior? Or was it from Thorndyke, who was
talking about selection effects rather than maintenance effects?

     On interval schedules, shorter intervals (higher reinforcement
     rates) do indeed produce higher rates of responding (up to an
     asymptote).

Wait a minute. I can see that higher rates of responding would produce
higher rates of reinforcement (up to an asymptote); that is merely the
nature of the feedback function in an interval schedule. But are there
the equivalents of the Motheral data for interval schedules where
obtained reinforcement rates are plotted against behavior rates for a
wide range of schedules? I very strongly suspect that we would see the
same general kinds of curves as for ratio schedules, although the shapes
might be different due to the nonlinearity in the schedule function.
After all, if each successful behavior produced a generous amount of
reinforcer, the animal would probably not press at a high rate, unless
the schedule were set so that reiforcements came only rarely.

The Motheral approach is the only way to measure the organism's input-
output function. All other approaches, which simply rely on the schedule
function to predict behavior, lack any ability to say where on the
schedule curve the actual behavior-point will be. Only by plotting the
actual reinforcement and behavior rates to the same scales using a range
of schedule parameters can the organism function itself be seen. The
Motheral curve is an actual plot of the steady-state organism function.

···

----------------------------------------------
RE: region of Motheral curve left of peak:

     Well, it's a nice model, but it isn't consistent with my intuition.
     And I still hold that this region is not where operant studies are
     typically run. I'm not sure what evidence I need to present to
     support this.

The required evidence would involve data that not only reproduce the
Motheral curves, but do so over a wide range of reinforcement sizes.
There is no absolute scale of reinforcement size; about the only
comparison that can be made is in terms of amount received in one
reinforcement against amount consumed while free-feeding, both per unit
time. Reinforcement sizes are probably adjusted to produce reasonable-
looking data, and what is reasonable depends on what you believe should
be seen. If you think it reasonable that more frequent reinforcement
should produce more behavior, you will adjust reinforcement size until
that is what you see.

I investigated my model a bit, and found that when you reduce
reinforcement size (kv), the result is only a mild shift of the peak to
the right. The main effect, unless I've made some mistake, is (to my
surprise) to raise the slope on the right toward the horizontal, until
it looks like an asymptote of behavior rate. So for small enough
reinforcer value, the whole curve looks as though increasing the rate of
reinforcement increases the rate of behavior until some asymptotic
behavior rate is reached, after which further increases in reinforcement
rate have no further effect on increasing behavior.

This asymptote would not be interpreted as satiation, but simply as the
animal's having reached the maximum behavior rate it can produce. Since
my model postulates no maximum behavior rate, this is obviously not the
only explanation. But the point here is that this apparent asymptote,
this apparent maximum behavior rate, would form a natural upper bound
for the range of schedules investigated. The only region left where
behavior varies as a function of the ratio is to the left of the place
where the peak would be for larger reinforcer values. And in this
region, behavior rate increases with reinforcement rate -- the
relationship I have assumed to be taken for granted as the basic law of
reinforcement.

I'm not completely sure I've got my model set up correctly, so all this
is provisional.
-----------------------------------

So we have two behaviors, bar-pressing and eating, which when
performed in a fixed alternation can control something in the nervous
system of the organism that is affected by this process.

     Now that's just what _I_ was thinking. You must have peeked. (;->

That is a remark that bodes well for our project.
-----------------------------------------------------------------------
Rick Marken (950705.2200) --

     But Bill Powers likes your approach so maybe you guys can come up
     with what you can both agree is an acceptable test to distinguish
     reinforcement from control theory. I'll try to work on something
     else.

Well, anybody can play here; this isn't an exclusive game. The players
are self-selected.

What we're doing isn't exactly developing an acceptable test to
distinguish reinforcement theory from control theory. It's more along
the line of trying to figure out what reinforcement theory IS. We have
to do that before any test is possible.

Bruce, of course, is hopelessly contaminated by PCT, so when he starts
pushing the elements of reinforcement theory around to get them into
some kind of consistent form, he's inevitably introducing PCT concepts.
By the time we have finished, his reinforcement model will probably be
totally unacceptable to his colleagues in EAB :slight_smile: He will find himself
ostracized by his former friends, and he will have to join the ranks of
the institutionally homeless like you and Tom Bourbon and all the others
who have contracted this disease.
-----------------------------------------------------------------------
best to all,

Bill P.