schedules, explanations, programs

William_T_Powers2 · November 3, 1994, 11:16pm

[From Bill Powers (941103.1410 MST)]

Bruce Abbott (941103.1030)--
RE: FR schedules

I don't have _Cumulative Record_ to consult at the moment, but I can
tell you that the performance you describe is typical on fixed
INTERVAL, not fixed RATIO schedules. In fact, the pattern is referred
to as the "fixed interval scallop." On FR there is usually a rather
sudden transition from not responding to responding at a high rate.

In _Schedules of reinforcement_ opposite page 56 is an example of what I
call scalloping. There are many others in this chapter, "Fixed Ratio."

From context in your post, I deduce that you call this phenomenon a

"pause" when the resumption of pressing goes abruptly to a high rate,
and "scalloping" when the resumption follows a smooth rising curve to
the limiting rate. I wasn't making that distinction. Both kinds of
phenomena are seen under FR as well as VR, if you look at all the
examples.

In terms of Operant2, the abruptness of the resumption of pressing is
determined by g, the output gain. For high gain, even a small amount of
error signal causes maximum pressing rate. The maximum pressing rate can
be set by introducing an errorlimit variable and adding a line to the
model of the control system right after

if e < 0.0 then e := 0.0;. The line is

if e > errorlimit then e := errorlimit;

Include errorlimit in the initialization for the control system, and
declare it globally, real.

By playing with g and errorlimit, you can reproduce all the forms of
"scalloping" or "pauses" that appear in _Schedules of reinforcement_,
provided that the reward size and input time constant are large enough
to keep q near the reference level r. That's necessary to be sure that a
single reinforcement substantially corrects or overcorrects the error.

One explanation offered for this pattern is that the first response
after reinforcement is necessarily at some temporal distance from the
next reinforcer delivery (i.e., the time required to complete the
ratio). Delay of reinforcement weakens it effect on responding, so the
ability of reinforcement to increase the probability of the first
response in the ratio run will decrease with the size of the ratio.
Because this first response must compete with other behavior [animism
again?--no, it's just easier to say this than to say that "the
probability of the first response is lower than the probability of
other behaviors at that point in time, and everyone knows what you
mean], it generally occurs after some delay which tends to become
larger with larger ratios. However, once the first response on the
ratio is emitted, the others rapidly follow. One reason for this is
that high rates on FR schedules minimize the time to reinforcement;
thus there is differential reinforcement of high rates. This
characteristic of FR schedules tends to drive rates during the ratio
runs to high values. The resulting pattern of behavior is basically
two-valued: pause following reinforcement, then quick transition to
high rate until the completion of the ratio.

This is a pretty free-wheeling explanation, involving half a dozen
critical assumptions and a couple of dubious proposals, such as the
ability of a reinforcer that hasn't happened yet to influence behaviors
that come before it. I think that any hard questioning about the details
of this explanation would land you in deep trouble. This is a VERY
complicated argument, far more complicated than my model, as you would
find if you tried to program a runnable model of what you just said.
Compare the number of words in the paragraph just cited with the number
in my program for the control system, and the rigor of the definitions,
too. Modeling not only represents what you mean more succinctly, but it
quickly exposes places where you're just making wild guesses.

···

--------------------------------------------------------

Speaking of Motherall's data, if you look on p. 210 in Staddon's book
(Figure 7.6) you will see FR and VR response functions for two
individual subjects rather than for groups as in the Figure 7.18 to
which you refer. The portion of the curves to the left represent a
condition called "ratio strain," a condition under which control over
behavior (there I go again) by the ratio schedule begins to break down.

I think you're missing the main point here. In the region of "ratio
strain," where an increase of reinforcement goes with an increase in
behavior and you say that the control of reinforcements over behavior
begins to "break down," we are seeing exactly the relationship between
reinforcement and behavior that is usually cited as normal. The more
reinforcement there is, the more behavior there is. Look at that Fig.
7.6, for RR2 (solid triangles). On FR160, behavior is occurring at a
rate of about 800 presses per 1-hour session, and reinforcement is
occurring at about 5 or 10 dippers per hour (hard to estimate from the
plot). For the same rat on the next easier schedule, behavior is
occurring at about 2800 presses per hour, and reinforcement at about 20-
25 dippers per hour. So more reinforcement goes with more behavior, as
per most definitions of the effects of reinforcement that I have seen.

For the same rat, compare the peak behavior rate with the FR1 behavior
rate, and look at the corresponding reinforcement rates. In this region,
more reinforcement (increasing from about 100 to about 290 per hour),
the behavior rate declines from about 4200 to 290 presses per hour, the
same as the reinforcement rate on FR1. Over the greater part of the
range of schedules, an increase in reinforcement rate is associated with
a sharp and large decrease in behavior rate.

So we have to conclude that in experiments where increasing the
reinforcement rate leads to an increase in the behavior rate, the animal
is being tested under conditions of "ratio strain," and is not showing
normal behavior. What's your comment on this?
---------------------------------------

What is happening is that our simple model is not sensitive to the
costs of responding, whereas the real bird is. In all portions of the
graph, increasing the ratio decreases the rate of reinforcement. In
the descending portion, rate of responding goes up with ratio ALMOST
enough to compensate for the increasing response requirements, but not
quite.

I believe you have the code for Operant1, in which the model behaves
quite exactly as the average of four rats in Fig. 7.18 does, over the
whole range of schedules. This model began, five or ten years ago, as a
model in which another system senses cost and benefit (amount of food
obtained and rate of pressing required to obtain it, weighted),
perceives cost minus benefit, compares that number with an adjustable
threshold, and reduces the gain of the operant control system as a
constant times the excess over the threshold. This model, too,
reproduces the data of Fig. 7.18 quite closely. But I decided that this
was overkill, so I just assumed that the gain of the control system
began to fall when the error passed a limit, being reduced by a factor
times the excess of gain over the limit. That produces the same fit, but
without so many unwarranted elaborations of the model. Since the rate of
bar-pressing is proportional to the error signal, the cost is
proportional to the error signal. The perceptual signal is of course
proportional to the benefit. So we are already taking cost and benefit
into account in the basic model, leaving only the nonlinear effect on
output gain to be accounted for. In Operant1 I did that as simply as
possible, and got a very good fit with only two added parameters.

Staddon DID propose a control-system model, on p. 176, but despite
considerable correspondence between us he never went on to look at the
implications, and he preferred pencil-and-paper calculations to
simulations. To him, R0 was not a reference level, but simply a
constant.
---------------------------------
RE: what happens during pauses

I could estimate fairly accurately from published cumulative records--
given that the running rate is fairly constant over a wide range of
ratios, variation in reinforcement rate would be a simple function of
ratio size and average pause length. Pause-length distribution is also
given in Palya's report, which I described in an earlier post. Of
course, we'll also get that when we run our own studies.

What I'm concerned about is what happens during the so-called "pauses."
In "Schedules", opposite p. 120, Fig. 102-A, we see "pauses" that last
anywhere from 3 or 4 minutes to nearly 10 minutes, followed by a solid
string of pecks (as near as I can deduce from the text, 110 pecks) at
somewhere around 5 to 10 pecks per second. What was the bird doing
during those very long periods of time before it started pecking the
key? Was it standing in front of the key working up to another series of
key-pecks, or was it wandering around the cage, pecking in the reward
dish or in other corners looking for food? In short, are we looking at a
simple behavior that could be modeled by a single operant control
system, or at a VERY VERY COMPLICATED behavior that would require models
we don't even know how to design yet?

In the bimodal curves in Staddon, this gets to be a very important
question. Suppose that for ratios up to about 30 and high deprivation,
the animals were completely focused on getting food and never did
anything else but work the bar and eat the food. A simple control model
fits this region of the data very well. But suppose that when the ratio
became 40, 80, or 160, the animal started to give up on the key and look
around for some other source of food. In that case, we would be modeling
behavior at the key during a time when the animal wasn't even at the
key; when it was engaged in some other kind of behavior. This would make
all the cost-benefit stuff and the nonlinear output function ridiculous,
because the model shouldn't even be applied except when full-time bar-
pressing is going on. If we average the rate of pressing and
reinforcement during active control with the rate of pressing and
reinforcement when the animal is doing something else -- zero for both
reinforcement and behavior rates -- our final numbers will be
meaningless, and so will any model that fits them.

Just by looking at a record of bar-pressing, you can't tell from a pause
in behavior whether the model should still be applied or whether data-
taking should be suspended. Is the animal not pressing because for the
moment it is experiencing a transient zero error, as in the model, or
because it is experiencing a large error and has gone elsewhere in an
attempt to reduce it? The only way to tell is to record what the animal
is doing, or at least where it is NOT, at all times. If the animal is
NOT at the lever or key, poised to press or peck, then the measures of
reinforcement rate, behavior rate, and elapsed time must be skipped
until the animal gets back to work. If we did this with the Motherall
experiments, we might find that the actual rate of behavior simply
continued upward to the left, perhaps to a maximum rate but with no
downturn.

When a rat is pressing a bar at fairly high rates, does it ever let go?
If not, you could sense the resistance between the floor grid and the
lever, and record data and elapsed time only when a low resistance
indicated that the rat was engaged in bar-pressing. This could also tell
you when the rat was away eating the reward. The current needed to
measure the resistance could be in the low microamps, too small to be
sensed by the rat.
-----------------------------

Careful--there is evidence to suggest that pauses may be as much a
function of temporal distance TO the next reinforcer as it is of time
since the previous one.

I wouldn't touch that one. The next reinforcer hasn't happened yet, so
at best it's the time to the rat's estimate of when the next reinforcer
will occur. I don't think we need to consider that hypothesis until
we've ruled out a whole lot of simpler ones.
------------------------------

It's up and running. Nice display! By the way, Tom says he has
trouble with the initialization of graphics when trying to run one of
my E. Coli programs but not with this one of yours. Yet I copied your
initialization routine.

Let's settle on using your grUtils Unit and your initialization, and if
anyone has problems, fix them.

I'm working on a generalization of the display that can go into a Unit.
You will be able to specify the screen size of the rectangle, the
scaling in x and y, the zero point in x and y, what variables are to be
plotted in what colors (up to 3 on the y axis, 1 on the x-axis), legends
for each axis (three on the y axis, oriented vertically), and the grid
size in the background. The x variable can be any variable in the
program, or the clock, so you can do time plots or plots of one variable
against another. Should be done in a few days. This will be a handy way
for viewing what's going on in a simulation, like having an oscilloscope
to apply to different variables.

I am going to assume a color EGA or VGA screen, 640 x 480 x 16 colors.
Anybody have a problem with that? I could cut back to 4 colors.
------------------------------------------------------------------------
Best,

Bill P.