Bill's model of operant behavior in EAB terms

[From Bill Powers (950704.0100 MDT)]

Bruce Abbott --

Here is the operant conditioning model I developed and posted last year,
or perhaps it was two years ago, to fit the Motherall data (and, I hope,
other data as well). The following development expresses the
relationships entirely in terms of observable variables in EAB terms,
without any model of the internal organization of the animal. I have
added a "value" parameter and a "relative time" parameter; in the model
I posted last fall, I lumped all the gain parameters into one, but the
effect is the same.


The satiation level of reinforcement rs can be defined as that level of
average obtained reinforcement ra at which the average behavior ba will
just go to zero. When ra is less than rs, average behavior ba will occur
at a rate proportional by km to to the difference (this is the
"motivation due to deprivation"):

ba = km*(rs - ra)

The average obtained reinforcement is the average of the feedback
function ff of the behavior measure plus the noncontingent reinforcement
rn (in my model, rn was zero):

ra = average(ff(ba) + rn)

The "value" kv of the reinforcer to the organism is measured by an
increase in behavior due to increasing some aspect of the reinforcer
such as its size. This requires kv to be used in a position where
increasing kv will increase average behavior:

ba = kv*km*(rs - ra)

The measured average amount of behavior depends on the fraction of the
time kt that the animal actually spends performing the behavior b, as
opposed to some other behavior, where kt varies from 1 (full-time
behavior) to 0 (none of the time spend on the behavior). As kt decreases
from 1, the apparent average behavior rate also decreases. Thus kt must
also appear in the expression for b:

ba = kt*kv*km*(rs - ra)

     If the average behavior ba consists of continuous behavior bc for a
     time tc alternating with zero behavior for a time tz, the
     continuous behavior bc can be obtained from ba by

     bc = [(tc + tz)/tc]*ba, tc > 0.

This gives us the main two system equations:

(1) ba = kt*kv*km*(rs - ra)

(2) ra = average(ff(ba) + rn)

Note that rn will tend to reduce behavior, by increasing ra and thus
reducing the motivation.
These equations apply to the right-hand side of the Motherall curve. As
we move left on that curve, the observed behavior rate begins to fall
below the straight line predicted by the above equations, with the curve
turning downward. One possible explanation is that as the average
motivation ma increases beyond some critical value, the animal begins to
spend more time on other behaviors, thus reducing the value of kt.

Average motivation ma is defined as

ma = (rs - ra)

A linear point-slope model for the effect of ma on kt can be defined as

(3) kt = kt0 - k1*(ma - mc), ma >= mc

where mc is a critical amount of motivation.

When this expression is used for kt in equation (1) and the constants
are properly adjusted, the model fits the Motherall data (for body
weight equal to 80% of normal) over the entire range of ratios. Other
mathematical forms may give nearly the same results.
The loop gain of this control system is kt*kv*km*(partial of ra with
respect to ba). The output sensitivity is kt*kv*km, which I treated as a
single gain constant in the posted model.

This model makes average behavior a two-valued function of average
reinforcement, with a maximum point. Left of the maximum point, average
behavior increases as average reinforcement increases. Right of the
maximum point, average behavior decreases as average reinforcement
increases, reaching zero when the average reinforcement reaches the
satiation level. The effects of costs of behavior can be absorbed into
km if they are assumed linear.

The position of the maximum of the curve depends on the output
sensitivity kt*kv*km, the critical motivation value mc, and the
parameter k1.

For any range of schedules of reinforcement and constant values of k1
and mc, the output sensitivity can be adjusted so that behavior rises
with reinforcement over the whole range, decreases with reinforcement
over the whole range, or first rises and then decreases with
reinforcement as in the Motherall data. In general, decreasing any
output sensitivity factor (kt, kv, or km) will move the operating region
toward the condition of increased reinforcement going with increased
behavior; the peak of the curve will move to the right.

So, for example, by decreasing the "value" of the reinforcer (as by
decreasing its size), the behavior can be made to rise monotonically
with increases in reinforcement over the whole range of schedules. This,
I propose, is how the general rule of "more reinforcement, more
behavior" was initially established, and how, in fact, the concept of
"reinforcement" gained credence. As long as the product kt*kv*km is kept
low enough, this rule will apply. It is always possible, therefore, to
set up an experiment to prove that an increment of reinforcement will
cause an increment of behavior. All that is required is to keep the
product kt*kv*km small enough, which can be done by manipulating kv as
by reducing the size of the reinforcer.

When this relationship is not the primary subject, a large reinforcer
can be used (large value of kv), moving the peak of the curve far to the
left. Then as the schedule of reinforcements, starting with an easy
schedule such as FR-1, is changed to produce less and less average
reinforcement per unit behavior, the behavior will rise as the
reinforcment decreases, leading to very large values of the behavior at
low rates of obtained reinforcement. This is how Skinner demonstrated
shaping the pecking behavior of pigeons to very high rates. He did not
seem to notice that the "normal" relationship of reinforcement rate to
behavior rate had reversed.
In this way of developing the model, we accept certain relationships
without explaining why they hold -- for example, why excessive levels of
motivation lead to spending less of the total time on the behavior in
question. We accept the satiation level as given and fixed. In a more
complete model, we would try to relate a variable satiation level to,
for example, weight gain and loss, and to relate excessive motivation to
the commencement of a trial-and-error search for other sources of
reinforcements. The latter consideration, starting with the search in
progress, would come to explain the apparent "selective" effects of
All of this, as you know, is pure PCT. Several phenomena that are new
may be explained by this model, particularly the shift of the peak of
the Motherall curve with changes in certain parameters. The apparent
effect of reinforcement on behavior is explained in terms of the model,
with the "standard" effect appearing only over a certain range of
parameters. The reinforcement variable does not play any special role in
behavior other than the apparent one; it does no "maintaining" of
behavior, although the observed relationships can be interpreted in that
way. Causation is completely circular, with the only independent
variables being rs and rn: the satiation level of reinforcement and the
noncontingent reinforcers.

Does this bear any resemblance to the model you're working on?

Bill P.