Cyclic ratio data

[From Bruce Abbott (950726.1230 EST)]

Bill Powers (950726.0545 MDT) --
    Bruce Abbott (950725.2110 EST)

    I still get an uncomfortable feeling that you do not quite follow
    my line of thinking yet on these cyclic-ratio data.

I follow it, all right, but I believe you are making some conceptual
errors. Maybe I am -- but maybe, too, you are.

    If we take the inverse of the reinforcement rates and convert, we
    get seconds per reinforcement.

That is certainly true.

    This is the average time required to complete one reinforcement
    "cycle": complete each response, collect the pellet, and return to
    the lever.

Again, true and understood.

    This number is NOT, repeat NOT b/m.

Right: it is m/b. The reinforcement rate is always exactly the behavior
rate divided by the ratio; the time per reinforcement is always exactly
the ratio divided by the behavior rate, which gives 1/r.

Sorry, I was thinking about the original conversion you insist on making.
Because, on the ratio schedule, b/m = reinforcement rate, you for some
reason that baffles me want to insist that we should substitute 1/ (b/m) for
1/r, which gives m/b. This implies that at FR-0 (m = 0, no responses
required), b should be zero, which is true if b is lever-pressing. But if r
= b/m, r should also be zero. In fact, under these conditions the rat is
happily collecting reinforcers at the rate of about one every 5.5 seconds.

As I managed to demonstrate above, I can get confused, and as you noted, so
can you; we're only human. About the logic of substituting m/b for r in my
graph, it's your turn to be confused. The problem with your substitution is
with the equation r = b/m. It doesn't apply here. I demonstrate this
mathematically below.

Let s = seconds per reinforcement,
      m = responses required per reinforcement (the ratio),
      L = seconds per lever-press, and
      c = seconds to collect food.
      r = reinforcement rate in rft/sec
      b = response rate in rsp/sec

Now, the following is apparent:

(1) s = mL + c

That is, seconds required to collect a reinforcer equals the number of
required responses times the number of seconds per response plus the number
of seconds to collect the reinforcer. The inverse of this is the
reinforcement rate:

(2) r = 1/(mL + c)

But

(3) b = 1/L

thus

(4) r = 1/(m/b + c) = (b + c/m)/m

Yet you wish to represent

(5) r = b/m

This leads to a contradiction, since this implies that

(6) b = b + c/m

which is decidedly untrue.

The problem is that you have confused average behavior rate, which includes
collection time, with lever-pressing rate, which does not.

(Average) reinforcement rate equals (average) behavior rate divided by
the ratio. That is the physical fact. The reward rate is zero only if
the behavior rate is zero, which can happen. But for the reward rate to
equal the reference level, the behavior rate would have to be infinite
(by r = b/m), which is impossible. Between a ratio of 1 and 0 there
would have to be ratios of 0.5, 0.05, 0.0005, and so on to the limit,
with behavior approaching infinity as a limit. But the apparatus works
only with integer values.

An infinite behavior rate implies a zero delay, which is clearly not
impossible if there is no behavior required (FR = 0) to produce the pellet.
But this does not imply an infinite reinforcement rate, because it takes
time to collect the reinforcer once it becomes available. If we assume that
the rat is standing at the lever when we deliver the pellet, (as would be
the case at ratio completion for non-zero ratios), the time is about 5.5
sec, which implies a reinforcement rate of 654.5 rft/hour.

    As to the x-axis, the ratio value is the number of responses
    required to complete one reinforcement cycle. So we are plotting
    the average time required to complete one cycle as a function of
    the number of responses required to complete one cycle.

Yes, you CAN plot this if reinforcement rate is the independent
variable, but it is not. Of course you can plot it anyway, but then your
mathematics ceases to have any connection to the physical situation:
you're just pushing numbers around. When you divorce mathematical
manipulations from their physical meaning, you're doing numerology, not
science.

Absurd! The number of responses required to complete the ratio and collect
a reinforcer is the independent variable here, not the rate of
reinforcement. As to the connection of the mathematics to the physical
situation, it is clear and simple, as I described to Rick:

By plotting the data in the way I suggest, you get a relationship with
a simple and obvious interpretation. The time to complete a ratio and
collect the pellet equals the time required to leave the lever, pick up
the food, consume it, and return to the lever. This value is given by
the y-intercept of the line for a given animal. The slope of the line

indicates the time penality exacted per additional required response.

If this penalty is 1/2 second per additional response, then requiring,
say, 16 responses, will add 1/2 * 16 = 8 seconds to the time required to
complete the ratio, collect the pellet, and return to the lever. If the
collection-time (y-intercept) is 5.5 seconds, it will require 5.5 + 8 =
13.5 seconds on average to complete the ratio, collect the food, and
return to the lever, ready to begin a new ratio. Thus the total time
between reinforcements will be 13.5 seconds.

Once you push through this conceptual block, I think you will be amazed how
simple it all becomes. Perfectly sensible.

When Herrnstein "generalized" his matching law to

B1/(B1 + B2...Bn) = R1/(R1 + R2 ... Rn) and so forth,

he didn't, apparently, realize that the whole series is algebraically
identical to

B1/R1 = B2/R2 = ... Bn/Rn

which merely says that ALL the schedules are identical.

We discussed this issue a long time ago on the net. At that time I thought
we ultimately agreed that this was not the problem it appears to be. The
matching law applies to simultaneously-available alternative schedules, not
to schedules in isolation. More responses directed to one schedule mean
fewer responses directed to the others. On ratio schedules what emerges is
exclusive responding on the lower-ratio schedule, so obtained response rates
match obtained reinforcement rates, as expected under the law. On interval
schedules, responses are allocated across schedules in such a way that the
implied equivalence is obtained, at least to a first approximation (later
research revealed systematic deviations under certain conditions). There
are other problems with the matching law, but "numerology" is not one of
them. Nor is my analysis of the cyclic-ratio data another example of the
same, as I think I've shown pretty clearly above.

   Rat intercept slope r r-sq
     C1 5.35 0.679 1.000 1.000
     C2 5.77 0.528 0.999 0.997
     C3 5.47 0.458 0.999 0.999
     C4 5.21 0.397 0.999 0.998

    You will note that the correlations are extremely high, indicating
    an excellent linear fit to the data.

This is not just an excellent linear fit to the data: it is (when
rounding errors are removed) a _perfect_ fit. There is no way you could
have estimated, or even measured, the data values with the implied
accuracy. What you have here is the result of computing an algebraic
identity. Just as Rick said, you have computed one function of a
variable, and then the inverse function of the result, ending up, within
computational limits, with the original values of the variable. If you
write out all the equations you used and solve them simultaneously, you
will find that you have proven that 0 = 0.

This is just the fit your program generated when you did it. So if I'm
guilty, we're both guilty. Here is your result:

Calculate seconds per reinforcement
Ratio Rat
          1 2 3 4
2 7.15 6.86 6.62 6.40
4 8.15 7.89 7.63 7.12
8 10.62 9.52 9.23 8.18
16 15.74 13.89 12.20 10.99
32 27.07 23.90 19.79 17.76
64 48.91 39.10 35.04 30.80

Calculate intercept of [sec/reinf vs ratio]
Rat 1 intercept = 5.36 slope = 0.68
Rat 2 intercept = 5.77 slope = 0.53
Rat 3 intercept = 5.47 slope = 0.46
Rat 4 intercept = 5.21 slope = 0.40

This is no trivial identity; it is the actual fit of four straight lines to
empirical data. Minitab confirms your analysis and adds the correlations
and r-squares. I'm hoping that at this point you are now in agreement, and
we can get back to discussing what these functions imply rather than whether
they mean anything at all.

To make you feel better, I initially did the same thing in my paper on
experimental measurement of purpose in Wayne Hershberger's book.

I will feel much better, thank you, when you finally see that the approach
I've taken with these data is correct. There's much more to do, but first
we have to get past this problem.

Regards,

Bruce

[From Bruce Abbott (950727.1220 EST)]

Bill Powers (950726.1600, 950727.0600 MDT) --

Today looks like a busy one and I've already gotten off to a late start, so
I'll just make a brief comment or two at present. First, I'm glad to see
that we're now on the same wavelength concerning my _descriptive_ account of
behavior on the cyclic-ratio schedule, which says basically that the time
between reinforcers is, to a close approximation, a linear function of the
number of responses required. An exactly linear function (neglecting
experimental error) would be expected simply from analysis of the task
(press lever to completion of ratio, collect food, return to lever) _if_ the
rates of both lever-pressing and food-collection remain constant across the
ratio.

What you have shown empirically, therefore, is that

1/R = c + m/b, or

R = 1/(m/b + c),

which is equation (1).
------------------------------------
Equation (1) gives us R as a function of b and the ratio. It describes
how the apparatus will react to behavior at a rate b alternating with
nonbehaving periods of duration c. As b increases, R will increase
nonlinearly to a limit 1/c. Therefore the maximum possible reinforcement
rate for any behavior rate and any ratio is 1/c.

Yes, the maximum reinforcement rate is limited by the time c required to
complete the second link of the behavioral "chain": collect the pellet and
return to the lever.

This, however, gives us only a description of the environment. To solve
for the actual behavior rate and reinforcement rate, we must have a
second equation describing the operation of the organism.

Exactly so. This is the "next step" I said would have to wait until we
agreed on the environment function. We have seen how a negatively-sloped
Motheral-type function (right limb) arises naturally from the change in
ratio requirement even in the absence of any change in the rates of the two
component actions; the question then becomes, what happens when a control
system for reinforcement rate is subjected to these changes in schedule
function? Does the system then generate a "response" function that fits the
data? What would be really interesting is if the function also makes sense
of what looks likes a slight curvature in the otherwise linear
reinforcement-rate vs. ratio functions for three of the four animals.

In your analysis, you used only the environment equation showing how R
depends on b. No manipulation of this equation, with or without
empirical backing, can do any more than tell you how R depends on b.
What you proved using the data was that equation (1) is a very good
representation of the apparatus.

Yes, and also that behavior rates across ratio requirements are, to a first
approximation, constant. (Note: they may in fact vary, but in such a way
that the variations counteract each other's effect in exactly the right way
so as to produce the _appearance_ of constancy in both rates. We'll have to
take a more detailed look at what is going on moment-by-moment before we can
tell which is the case.)

What you did not prove, however, is that there is no control. To do
that, you would have to propose an equation representing the way b
depends on R via a different path, the path through the organism rather
than the apparatus. Then you would have to show that this equation does
not result in solutions that fit the data.

I agree. I have some thoughts about what may be going on that I'd like to
discuss with you, but will do so when I have a bit more time to spare. It's
clear to me that in nature the rat controls its rate of food procurement and
injestion; if we don't see that here, that result must be a consequence of
our experimental arrangement. Meanwhile, I'll take a look at OPCOND5. In
the interim, here's something to think about. Wouldn't it be proper to view
lever-pressing (procurement) and food-collection (injestion) as two control
systems under sequence-control? The goal of the first is to perceive a food
pellet in the cup; the goal of the second is to perceive a food pellet in
the stomach. That then disturbs the first perception (there is no longer a
pellet in the cup), leading back to lever-pressing. Both lever-pressing and
food-collection rates would be expected to depend on the level of error in
the stomach-loading system (deprivation level, etc.).

Hey, not a word in there about reinforcement. I'm starting to sound like a
control theorist! (About time, huh?) (;->

Regards,

Bruce

[From Bruce Abbott (950809.0950 EST)]

Bill Powers (950808.1145 MDT) --
   Bruce Abbott (950807.1050 EST)

Looks as though my reading skills are deteriorating. I said 45 sec when
I should have said 45 min. However, I do still have a leg or two to
stand on.

After two decades of reading research reports I still often have difficulty
decoding what the authors actually did from what they said they did.
Sometimes it takes several readings to get it straight.

RE: dynamic effects.

It makes no difference how many times the cycles were repeated if each
ratio within each cycle lasted only a small fraction of the system's
time constant for changing from one pressing rate to another. Suppose
the system's time constant for changing the rate of pressing is two days
(one time constant is the time required to reach 1-1/e or 63% of the
final value, for those who don't know). In 10 days (Collier et.al), the
pressing rate would be 0.993 of the final value. If you now start
changing the ratio every 45/12 = 4 minutes, you will see a pressing rate
that changes only 0.0014 of the way to asymptote by the time the ratio
is completed. This would create the appearance of a constant pressing
rate appropriate to the 24-day average ratio. And that is what is seen:
a constant pressing rate (see below).

Yes, this is what I was alluding to. However, there is both internal and
external evidence that rats can rapidly adapt to changes in schedule
parameters (as indicated by altered response rates) if given (a) some way to
tell when the change has occurred, and to what, and (b) plenty of practice
with the changing schedules. The internal evidence is the consistent, rapid
change in pause length and the emergence of pausing within the ratio run at
the highest ratio values. The external evidence comes from performance on
multiple and chain schedules of reinforcement. In a multiple schedule,
different schedules of reinforcement ("components") are identified by
different discriminative stimuli (e.g., for pigions, green keylight during
FR 10 and red keylight during FR 30). After sufficient practice, pigeons
alter their response rates immediately with the change in component; in
addition, if the pattern of responding differs between components (as in FR
versus VR), the pattern also changes immediately to reflect the new
component. In a chain schedule, different discriminative stimuli identify
each "link" in the chain. Completion of the requirement for one link
produces the next link; the last link in the chain produces the reinforcer.
For example, the chain schedule may require FR 10 during the first link, VI
30 during the second link, and FI 10 during the final ("terminal") link.
The discriminative stimuli may be explicit or may simply be events that
indicate the completion of the response requirement of a given link. In
chain schedules, the observed behavior changes immediately in correspondence
with the schedule requirements, once the animal has had sufficient practice.

All this is not to deny that there may be slow dynamic processes as work in
the cyclic-ratio schedule which never reach stable values within a given
ratio presentation; in fact, I'm guessing that these are involved in the
lack of curvature of the response functions at the highest ratios, as seen
in the single-ratio schedules (B animals).

Note that E&S don't seem to use "running rate" the same way that you
have defined it.

I'm not sure what you mean here. Running rate is the rate of responding
computed from the first response following food (which ends the
post-reinforcement pause) to the delivery of the food. At lower ratio
values the animals have a strong tendency to complete the ratio, once they
have begun the run, rather than taking breaks within the run. I'm still
looking for data showing the pausing that emerges at very high ratio values;
there is some suggestion that at these values the animals may be disposed to
break off during a run, although I suspect this may be more likely early in
the ratio than late. Such within-run pausing would appear as reduced
running response rate in the overall calculations.

    Again, the rats were exposed to the cyclic-ratio schedule for 24
    days. The single session's performances were representative of a
    typical run; apparently there wasn't much changing going on across
    sessions.

And again, this tells us nothing about what the pressing rate would have
done with long series of identical ratios.

The between-groups comparison was intended to answer that question.
Ettinger and Staddon noted that the slopes of the functions for the C and B
rats were in the same range and thereby concluded that the two procedures
gave essentially the same results. However, this ignores the "slump" in
rates of the B animals at the highest ratio, which seems rather consistent
here and in previous studies. A simple (if somewhat vague) interpretation
notes that high ratios on the cyclic-ratio schedule are surrounded by lower
ratios (higher reinforcement rates), which may help to sustain performance
at the higher ratios (higher the average reinforcement rate across the
cycles) relative to the higher ratios on the single-ratio schedules (lower
average reinforcement rates).

The sequence of ratios should have been varied in the sense of
randomizing it.

There are two problems with doing this. First, if placed first on the
higher ratios, the animals would never have acquired. You have to get the
animals responding at a reasonable rate at a lower value before switching to
a higher value or you get extinction. Only after extensive training can the
animals be given the various ratio values in random orders. Second, in the
cyclic-ratio schedule there has to be some way for the rats to identify
which ratio value is current. In the cyclic-ratio schedule they learn to do
this from the sequence, which repeats. Randomizing the sequence could be
done, but this would require providing a different discriminative stimulus
to associate with each ratio value; otherwise your FR schedule becomes VR.

In fact, there was no regulatory behavior, but the authors proceeded on
the assumption that their results did show regulation. They even looked
for the effects of drugs on the gain and reference level of the control
system -- even after they had shown that there was no control occurring.

Well, we were fooled, too. I'm going to hate having to disabuse them of
this notion . . .

And I can't see why the authors offer the unlikely proposition
that the increase in other activities anticipated the ratio, when a much
more plausible hypothesis would be that it anticipated the reinforcement
rate.

I agree, but in these data a given ratio is associated both with a given
reinforcement rate AND a given response effort. Stating it in terms of
ratio requirement simply presents the facts without speculating on which (if
not both) of these factors related to the ratio is relevant.

I am also full of questions about how that relationship between pre-
ratio pause and ratio was plotted. As I understand it, the ratio went
2,4,8,16,32,64,32,16,8,4,2,4,8,... . If that's so, where do the first
and last points come from, and how come there are two points at the
highest and lowest ratios when those ratios occur only once per cycle?
If you just go through the points sequentially, you find that the second
top point on the descending arm should go with the ratio of 32, not with
64, so the figure should have a loop in it. I think this figure is
nonsense -- either that, or the so-called cyclic ratio has two
duplications of ratios in it.

The ratio went 2, 4, 8, 16, 32, 64, 64, 32, 16, 8, 4, 2, 2, 4, 8, . . .

    As for why pauses were plotted against the upcoming ratio rather
    than the previous one, I tried replotting the data as a function of
    previous ratio, and they were not nearly as systematic. When you
    do it their way you get functions for ascending and descending
    ratios that show only a small hysteresis and essentially replicate
    each other, especially for C1.

How did you replot the data? I'm curious about how you handled the
duplication of points for ratios of 2 and 64.

I replotted each point at a given ratio by assigning it to the previous
ratio in the cycle:

  Original: 2 4 8 16 32 64 64 32 16 8 4 2 2*
    Replot: 2* 2 4 8 16 32 64 64 32 16 8 4 2

The last 2 in the original plot (2*) becomes the first 2 in the replot.

How do you split the total pause into a pre-ratio and a post-ratio part?
There is only one pause between ratios. You would need the actual record
of individual presses and food consumption times to make the distinction
you're proposing.

I'm basing it on the authors' statement, supported by their IRT
distributions, that the apparent running rate changes are due to pausing
that emerged during the ratio runs at the highest ratios. Thus there is
pausing associated with the PRP and pausing after the first response in the
ratio run. My linear analysis can only partition the time between pellets
into a component that is fixed regardless of ratio and a component that
varies with the ratio. We know that the PRP varies with the ratio, as do
the _average_ running rates (which include pauses at the highest ratios).
Both of these varying times would be absorbed into the slope constant of the
line on my plot, leaving the part that does not vary with ratio value in the
intercept.

I agree, however, with your explanation. Lever-pressing behavior is not
a function of ratio in this experiment. Pause time evidently is,
although I would prefer to suppose it is a function of the rate of
reinforcement received per unit time, not of the ratio, knowing which
depends on dividing the behavior rate by the reinforcement rate .

Nice to hear. This analysis has opened up a whole series of interesting
questions for us to address empirically. Meanwhile, we do have a nice set
of data from Collier et al. to analyze. It will be interesting to compare
the latter to the Ettinger-Staddon data to see whether the results support
the same analysis or follow a different pattern.

Regards,

Bruce