cyclic ratio data

William_T_Powers4 · July 25, 1995, 6:19pm

[From Bill Powers (950725.0950 MDT)]

Bruce Abbott (950724.2135 EST) --

     Staddon's "regulatory" model predicted that certain manipulations
     should affect the slopes of the functions without changing the
     intercepts and others should change both. The manipulations were
     designed to test these predictions.

I was referring mostly to the drug runs. When you administer a drug with
unknown effects on the nervous system and measure its effects on non-
physical arbitrary parameters of a model, I don't think you're learning
much. When you're also changing two or three other experimental
parameters at the same time, I think you're losing ground.

7.15 s/rft - 5.77 s/rft = 1.38 sec/rft;

     3600 sec rft
     -------- * -------- = 2608.70 rft/hr * 2 rsp/rft = 5217.39 rsp/hr
        hr 1.38 sec

     Alternatively, 0.68 sec/rsp = 1.471 rsp/sec * 3600 = 5294.118
     rsp/hr; the difference between this result and the previous one is
     due to rounding error. Your figure is 4013.36.

This is basically doing what Rick said: applying the inverse of the
function that you used to get the intermediate result. The difference is
that you removed the intercept first, but other than that you haven't
really got any new result.

As you figured out later, my approach was to calculate the time
available within which the actions occurred, and convert that to an
equivalent peak rate of acting according to the proportion of total time
between reinforcements occupied by actions.

[Can we use "actions" when we see "responses" without any corresponding
stimuli? We can see actions, but we can't tell whether they are
responses unless we can see the specific stimulus to which each action
is a response.]

     The "peak" rates do not include collection time (or so you
     indicated above) so it can have no influence on the numbers at all.
     The actual "peak" rates for a given animal do not change _at all_
     over the 32:1 range of ratios; for Rat 1 it is about 0.68
     sec/response, or something in the 5200+ rsp/hr range as calculated
     above, regardless of the ratio requirement.

The "peak" rates are simply the rate of action that would produce the
same number of acts per reinforcement, but within a shortened time
interval during which the actions were occurring. I did not see any
perfect constancy over ratios -- the range was about 50% relative to the
lowest rates, which we have to assume to be real unless further
investigations show it to be a statistical fluctuation.

     By the way, not to open another can of worms, but you can estimate
     the collection-time from the Motheral-type graph as the point where
     a straight line fitted to the right portion of the curve meets the
     x-axis. This gives the rate of reinforcement at FR-0. Invert and
     convert to seconds for the collection-time. You can estimate the
     (constant) response rate by noting where the line crosses the y-
     intercept. This gives the rate for FR-infinity, when no
     reinforcement would ever be delivered (thus excluding collection-
     time from the figure). Invert and convert to get the slope of the
     line in the "sec/rft vs. ratio" plot.

I'm not going to do this now, but what is needed is to see whether there
is an assumed collection time that would make an equation fit the right
side of the curve under the assumption of a constant rate of acting
during the remaining time. While I didn't expect to get the relatively
constant rate of acting that you have brought out, this is what I was
after in recommending that we measure the time the rat actually spent in
front of the lever. But there could be additional pauses, so we really
need the detailed action-by-action record as well. I knew that
collection time would have an effect, but I never suspected that it
would be so large!

···

-----------------------------------------------------------------------
Bruce Abbott (950725.0930 EST) --

     What you did was to get the intercept from the regression equation
     and subtract it from the seconds per reinforcement number at each
     ratio value, leaving the time required to complete the ratio. You
     then divided this into the seconds per reinforcement to get the
     ratio of total time to ratio-completion time, and used this as a
     multiplier of the original response rate figures. You could have
     done the same by dividing the time required to complete the ratio
     into 3600 and multiplying the result by the ratio value.

Yes, that is what I did. The time that is calculated in this way is in
units of sec/reinforcement, because you had already divided the behavior
rate by the ratio to get reinforcement rate. So the time per
reinforcement has to be changed first to reinforcements per unit time
(which is meaningless because the reinforcements do not occur during the
action time) and then has to be multiplied by the ratio to get back to
behavior rate.

     The resulting function has a slight curvature in three of the four
     rats, with a maximum value at FR-16. The other rat's data are less
     regular but show a peak at FR-8. It is possible that the observed
     nonlinearity is real,but to a first approximation the data are well
     fit by straight lines. (This is especially apparent when you plot
     overall reinforcement rates rather than response rates.)

Right about the fourth rat. I think you will agree that nothing is
really "constant" in these curves -- only approximately so. I prefer to
leave approximations to the very end, because introducing them too early
can conceal important relationships.

     A problem for interpretation of any curvature is that the values
     for the lowest ratios are highly sensitive to the intercept of the
     fitted line (as you noted). For example, if the intercept for Rat
     1 were Rat 2's 5.77 instead of 5.36 (a difference of 0.47 s), the
     computed response rate excluding collection time increases by over
     1200.

That is indeed a problem. It's especially a problem because of using the
ratio as an independent variable and extrapolating b/m vs m to b = 0.
Near zero there can be all sorts of departures from straight lines. If
you did a second-order fit to the data, you'd get very different
intercepts for each rat. And because of the extreme sensitivity to
apparent collection time, the results could be very strongly affected.
Also, as I pointed out, when you use the corrected peak action rates
(the apparent mean rate corrected for the fact that actions take place
only part of the time), and THEN do the extrapolation to zero behavior
rate, the intercept would move to much higher reinforcement rates.

     The data I presented were measured from graphs showing response
     rate versus reinforcement rate. I used the response rate because
     it provided a higher resolution and computed reinforcement rate
     from this. But I could have used reinforcement rate directly.
     Since one is just a multiple of the other, there can be no
     "independent" data on reinforcement rate.

I agree. In ratio experiments, reinforcement rate is strictly a
dependent variable: it is completely determined by the behavior rate
when there are no disturbances.

     So how would the actual position of the reference level (if there
     is one) be determined? Would this be the rate at which the rat
     consumed food pellets if given free access to them?

Yes, pretty near (strictly speaking we should probably equalize
exercise). Actually, since reference levels apply to inputs, not
actions, we do get a pretty good estimate of the reference level simply
by extrapolating the mean curves to zero. That is the state of the
(increasing) input variable at which the output just falls to zero, by
definition. What is causing us problems here is the fact that the output
is not simply proportional to the error: the output occurs in bursts
with pauses between them. So we're dealing with a non-simple output
function, which makes the required model more complex.

I find no simple proportional relationship between rate
of responding and reward rate. In fact, the rate of responding remains
essentially constant (plus-minus 25%) while the reward rate varies by
a factor of 5 to 6. Your generalizations above don't seem to fit the
data.

You seem to be misreading me. You just stated what I stated, and
then concluded that what I said was wrong.

What you said was

      What does stay the same across all ratio requirements are (a) the
      time required to collect the reinforcer and return to the lever
      (about the same for all subjects) and (b) the average rate of
      responding (which differs across subjects).

The average rate of responding does NOT remain constant: it varies +/-
25% across ratios. As I said, this is too early in the argument to be
making approximations. Your amplification was

     For a given animal, the rate of responding between collections is
     essentially constant regardless of the ratio requirement (I am
     assuming the deviations are experimental error, to a first
     approximation).

This is the approximation I don't to make. It is NOT "essentially
constant." It varies. That variation, although not large, may prove to
be essential in constructing a workable model. The same happens with the
control equations: if you assume that the error is "approximately zero"
too early in the argument, you'll end up being unable to explain why
there is any action from the output function.

     So this leads to the question: where is there any evidence for
     control of reinforcement rate? It would appear instead that a
     given level of deprivation, size of reward, etc. as provided in
     these experiments sustains a particular rate of responding. I'm
     not really comfortable with that conclusion, but it seems to be
     implied by the data.

The proof is in the fact that if it were not for the behavior, the
reinforcement rate would be zero. The behavior brings the reinforcement
rate up to some non-zero value, and when the loop gain is high enough
(low ratios) we can see about where the reference level for
reinforcement is (even though it can't be reached exactly). Applying
disturbances would settle the matter.

     The reliable effect apparent in the graphs (of rsp rate vs rft
     rate) is a reduction in slope. The 80% and 95% lines tend to
     converge as the ratio decreases toward FR-2. The implied
     "collection rate" is either constant or, oddly enough, somewhat
     higher at 95% than at 80%. (I'm just estimating visually from the
     plots.)

To say they "tend to converge" doesn't tell me which curve is higher. I
assume that under 95% body weight the apparent reference level for
reinforcement is lower.

If you maintain the same rate of responding, the reward rate will
decrease as the ratio increases. That's just arithmetic. So it seems
that the reinforcement rate has no effect of its own on the behavior
rate; what does affect behavior rate is the error signal, the level of
deprivation.

Yep. But does this make sense?

Not under our simple model. There are conditions under which the
behavior rate is quite uniform (_Schedules of reinforcement_) but these
are not those conditions.
--------------------------------------------
If you will look at my operant conditioning model in which the temporal
details of behavior are brought out (I'll post it again soon for those
who didn't save it or lost it), you'll see that the behavior in these
Staddon experiments is much more like what that model predicts, or can
predict with the proper adjustment of parameters.

In this model, reinforcements are accumulated in a leaky integrator that
generates the perceptual signal. A single reinforcement causes an abrupt
rise in the perceptual signal which then decays exponentially back
toward zero. When reinforcers are occurring at a given rate, the
perceptual sigual becomes a sawtooth wave with abrupt rises and slow
declines, oscillating above and below a mean value set by the mean
reinforcement rate. My latest model offered for the Staddon data uses
only mean values, so the oscillating character of the perceptual signal
is omitted. Note that the size of the upward step in the perceptual
signal depends on the size of the reinforcer.

When we include those oscillations, we can see that with the right value
of decay constant the mean perceptual signal will be below the reference
signal, creating some average rate of action. The peaks may rise above
the reference signal, depending on the decay constant and the reward
size. While the perceptual signal is greater than the reference signal,
the error signal is zero (this is a one-way control system). The result
is that the output rate of action will fall to zero immediately after
each reinforcement, and remain there until the perceptual signal decays
to a value smaller than the reference signal: in short, a pause will be
generated. The output function is a relaxation oscillator which produces
actions at a frequency proportional to the error signal.

When the perceptual signal declines below the reference signal, an error
signal will start to grow and the action rate will start to rise from
zero. How fast it rises, and how far, depends on the characteristics of
the output function set by parameters in the model. With a very
sensitive output function, the rate of action will rise very rapidly as
the error signal begins to rise; it can be made to rise to the maximum
rate (another parameter) as rapidly as you please, in the limit creating
the appearance of an output behavior rate that is either zero or some
nonzero constant value. A nonlinear output function could be used to
produce any desired relationship between on-off and some smooth
relationship.

This model, therefore, automatically generates pauses after each
reinforcement. The actions will occur at some high rate after the pause,
with some rate of rise in action frequency and some maximum value of
frequency. We have been assuming a constant behavior rate during the
active phase, but in fact we couldn't tell (from the mean data values) a
constant rate from a rate that changed during the active phase. This is
one reason we need the actual record of actions and reinforcements -- to
test the assumed form of the output function in this model.
--------------------------------------
It's obvious that the effect of the collection time will vary greatly
depending on the behavior rates and reinforcement rates that occur at
the lower ratios. In the Motheral data, the maximum rate of
reinforcement was about 400 per session. If a session lasted one hour,
that is one reinforcement per 9 seconds. With a collection time of 5.5
seconds, the burst behavior rate would be 9/3.5 or 2.6 times the
apparent mean rate. In the Staddon data, the minimum time between
reinforcements was only 6.4 seconds and the collection time was
apparently 5.2 sec, leaving only 1.2 seconds for the active phase: the
burst behavior rate is then 6.4/1.2 or 5.3 times the mean behavior rate.

I hesitate to predict what my model will do when an assumed collection
time is included in it. I'll put it in as a parameter before posting the
new code.
---------------------------------------
The main impression I'm getting from all these investigations is how
terribly complex the behavior is that is being explored. When you look
at all the details that have to be accounted for in a working model, it
becomes apparent that a simple empirical approach to this kind of
behavior can't possibly sort it all out. It seems to me that EABers have
jumped into the middle of a huge complex system without having looked
into the simplest behaviors that it produces.

One reason I think this is the very nature of the experiments. The
production of repetitive acts would require, in the HPCT model, at least
five levels of control (events). To vary these behaviors in accordance
with a complicated logical condition so as to produce food would require
even higher levels of organization. I don't see how you can do any sort
of orderly investigation of behavioral organization in this way.

What we can hope for is that all these complex relationships seen under
various complex environmental conditions will prove to be the outcomes
of putting a system with a relatively simple internal organization into
different environments. I suspect that once we have a good working model
for ratio schedules, it will continue to work for all other single-
schedule experiments of all types. But we'll see.
-----------------------------------------------------------------------
Best,

Bill P.

William_T_Powers4 · July 26, 1995, 1:59pm

[From Bill Powers (950726.0545 MDT)]

Bruce Abbott (950725.2110 EST) --

I still get an uncomfortable feeling that you do not quite follow
my line of thinking yet on these cyclic-ratio data.

I follow it, all right, but I believe you are making some conceptual
errors. Maybe I am -- but maybe, too, you are.

If we take the inverse of the reinforcement rates and convert, we
get seconds per reinforcement.

That is certainly true.

     This is the average time required to complete one reinforcement
     "cycle": complete each response, collect the pellet, and return to
     the lever.

Again, true and understood.

This number is NOT, repeat NOT b/m.

Right: it is m/b. The reinforcement rate is always exactly the behavior
rate divided by the ratio; the time per reinforcement is always exactly
the ratio divided by the behavior rate, which gives 1/r.

If it were, m = 0 would be undefined, yet we can get a perfectly
reasonable number there.

No, m = 0 can't actually occur. There is no ratio of zero actions per
reward; this is not possible in the experimental situation, because
reward depends strictly on behavior and the ratio. The only way to get
zero actions is to supply sufficient reward independently of behavior --
in other words, to include a disturbance in the experiment. You are
reasoning as though the reward rate were the independent variable, in
which case you _could_ set the reward rate to zero and get the resulting
behavior rate. But that is not the case, and the mathematics behaves
accordingly. The only legitimate domain of the mathematical
relationships in the experiment as done is from m = 1 upward. At m = 0,
any rewards would have to be supplied independently of the apparatus.

When we extrapolate rightward to b = 0, we are going outside the
conditions of the experiment. As mentioned, the only possible physical
meaning of b = 0 and r > 0 is that somehow the reward is being
externally supplied. Since there is no independent supply of reward in
this experiment (no additive disturbance), the reference level we deduce
in this way is hypothetical; it can't be demonstrated under the
conditions of the experiment. It is physically impossible to reach,
given that the ratio is an integer.

     Furthermore, it would reflect a different quantity at each ratio
     value rather than the same old seconds to complete one cycle. It
     is NOT appropriate to substitute "behavior rate" for reinforcement
     rate on this graph, as you seem to insist on doing.

(Average) reinforcement rate equals (average) behavior rate divided by
the ratio. That is the physical fact. The reward rate is zero only if
the behavior rate is zero, which can happen. But for the reward rate to
equal the reference level, the behavior rate would have to be infinite
(by r = b/m), which is impossible. Between a ratio of 1 and 0 there
would have to be ratios of 0.5, 0.05, 0.0005, and so on to the limit,
with behavior approaching infinity as a limit. But the apparatus works
only with integer values.

     As to the x-axis, the ratio value is the number of responses
     required to complete one reinforcement cycle. So we are plotting
     the average time required to complete one cycle as a function of
     the number of responses required to complete one cycle.

Yes, you CAN plot this if reinforcement rate is the independent
variable, but it is not. Of course you can plot it anyway, but then your
mathematics ceases to have any connection to the physical situation:
you're just pushing numbers around. When you divorce mathematical
manipulations from their physical meaning, you're doing numerology, not
science.

Herrnstein's original "matching law" for choice experiments is an
example of numerology. If you say that

B1/B2 = R1/R2,

you are saying that B1/R1 = B2/R2. In other words, you are saying that
the ratio of behaviors to responses on the two keys is the same. This
can be true only if the schedules are in fact the same. But Herrnstein
then went on to apply this "law" to schedules which were NOT the same,
and of course the law did not fit the observations, since it can be true
only for schedules that ARE the same.

When Herrnstein "generalized" his matching law to

B1/(B1 + B2...Bn) = R1/(R1 + R2 ... Rn) and so forth,

he didn't, apparently, realize that the whole series is algebraically
identical to

B1/R1 = B2/R2 = ... Bn/Rn

which merely says that ALL the schedules are identical.

I have seen a lot of this in the EAB literature -- manipulating
algebraic expressions without regard to their physical meaning and even
more important, without regard to the distinction between independent
and dependent variables.

For example, suppose we have d = distance, g = gallons of gasoline, and
m = miles per gallon. The amount of gasoline used is

g = d/m

This is equivalent to

m = d/g.

If we set the gallons used to zero, we find that the miles per gallon,
for any finite distance traveled, is infinite! Obviously, in this
situation, it is not legitimate to specify the gallons used
independently of the miles traveled, even though for values of g other
than zero, we can deduce the miles per gallon from knowing the miles
traveled and the number of gallons. Miles per gallon is a physical
characteristic of the car, just as rewards per behavior is a physical
characteristic of the experimental apparatus.

Similarly, when r = b/m, we find that for a ratio of zero, and any
finite reward rate, the behavior rate is infinite. So obviously the same
consideration applies: the problem is that r is a dependent variable,
not one that can be arbitrarily set to any value including zero
regardless of the values of the other variables. Setting m to zero
implies a physically impossible situation. If rewards occur without
behavior, that represents, physically, an independent source of
"noncontingent" reinforcement -- a disturbance.

On the other hand, if we introduced a disturbance d in the form of added
or subtracted reinforcements we would have

r = b/m + d

Now we can see that b/m can be zero for r = d, a physically realizable
condition. This would occur not at m = 0, but at any realizeable value
of m from 1 on up such that r = d.

···

----------------------------------------
     If you plot the seconds/cycle as a function of the number of
     responses per cycle, you get four essentially straight lines
     (within experimental error). The lines all converge to nearly the
     same intercept (collection time) but have different slopes (rates
     of increase in cycle time per additional response). Note that the
     ratio value should be plotted on a linear, not log, scale. Minitab
     gives the following regression results for these four lines:

     Rat intercept slope r r-sq
      C1 5.35 0.679 1.000 1.000
      C2 5.77 0.528 0.999 0.997
      C3 5.47 0.458 0.999 0.999
      C4 5.21 0.397 0.999 0.998

You will note that the correlations are extremely high, indicating
an excellent linear fit to the data.

This is not just an excellent linear fit to the data: it is (when
rounding errors are removed) a _perfect_ fit. There is no way you could
have estimated, or even measured, the data values with the implied
accuracy. What you have here is the result of computing an algebraic
identity. Just as Rick said, you have computed one function of a
variable, and then the inverse function of the result, ending up, within
computational limits, with the original values of the variable. If you
write out all the equations you used and solve them simultaneously, you
will find that you have proven that 0 = 0.

To make you feel better, I initially did the same thing in my paper on
experimental measurement of purpose in Wayne Hershberger's book. I fit a
model to behavior, and then used its parameters to deduce the behavior
of the reference signal. Using those deduced values, I then ran the
model again, and obtained correlations of model behavior with real
behavior of 0.999999 and up. I thought "Oh, my God, this is incredible."
And it was incredible, as Greg Williams pointed out. I had solved for
the reference signal values that would explain the behavior, and of
course when I used those values, they exactly explained the behavior. If
not for rounding errors, that correlation of 0.99999... would have been
exactly 1. You will notice that in the published paper, I said that
these correlations proved that I had not made any mistake in the
calculations -- not that I had correctly deduced the reference signal.
----------------------------------------
     Yes, we have a "chain" in which bringing one variable to its
     reference value (pellet present in food cup) must be completed
     using one set of actions before a second variable can be brought to
     its reference value (food in mouth) by means of a different set of
     actions. The two control systems must alternate.

This says that ideally we should have a higher-level control system that
turns the two systems on and off, perhaps by using their reference
signals or by gating the output functions. In OPCOND4 I simply assume a
constant collection time, and suspend output operations in the main
control system during the collection time (with the program supplying
the action of the hypothetical higher-level system). The higher-level
system would be one that controls a longer-term average value of input.
---------------------------------------
     Ettinger and Staddon present data on interresponse times (IRTs)
     which indicate that the peak of the IRT distribution is not
     affected by the ratio requirement; essentially, when the animal
     responds to knock off the ratio, it does so at the same relatively
     steady rate regardless of the length of the ratio. The peak IRT is
     about 0.2 seconds. This rate is too high to account for the
     average time per response as reflected in the slope of the function
     relating cycle time to ratio value.

I wish to hell that people would stick to one scale of measurement
throughout an analysis. IRT is just 1/rate; it's not a separate
phenomenon.

The _peak_ of the IRT distribution can remain the same while the actual
IRTs vary from one to the next according to any waveform. And of course
since there is a dead time (collection time) following the start of
every collection period after a reinforcement, the peak rates relate to
the average rates according to the time division.

I would like to know what the authors mean by a "relatively" steady
rate. I suspect that the rate at the end of the pause rises along some
curve from zero to a maximum. The relative steadiness of the rate
depends on how quickly the maximum rate is reached.

The length of this pause increases in proportion to the number of
responses [in] required in the UPCOMING ratio.

What's this, time travel?

At zero responses per cycle the "collection time" includes whatever
postreinforcement pause may follow consumption of the food pellet.

I remind you that we have no data for this condition, nor can it be
obtained without an independent supply of reinforcer.

     It is all so very seductive. It SEEMS so simple on the surface.
     What could be simpler than rewarding every, say, 8th response? Not
     only that, but you can get clear, reproducable functions as you
     manipulate such things as the ratio requirement, level of
     deprivation, and effort requried to press the lever.

Yes, each manipulation affects a different controlled variable in the
middle of a system of interacting control systems. There are observable
effects, and it is encouraging that they are reproducible at least for a
given animal. But this reproducibility is only the first step; it does
not substitute for an orderly investigation in which we try to isolate
the variables and understand why they behave as they do.

     I'm still hopeful. But I think we will be well served by paying
     attention to the details revealed by studies like Ettinger and
     Staddon's and using that information to improve our guesses about
     what variables may be under control or may affect the parameters of
     the control systems we specify in the model.

I agree that we should pay attention to data, and keep in mind the
conditions that have been set up. However, many conditions (such as
administering drugs) have effects that we can't hope to understand
without a working model. The normal progression in model-making is first
to do simple experiments and find one or more simple working models
(simulations) that will fit the results. Then when more complex
conditions are brought in, either the model will still handle them, or
there will be differences that require changes or additions to the
model. If the steps of added complexity are small enough, we can hope to
find the required revisions quickly. What makes model-building difficult
is doing experiments that just throw a lot of manipulations into the
hopper at random, with no notion of whether their effects will be simple
or complex.

By the way, have you received the material I sent you last week via
snail mail?

Yes, and many thanks. I haven't had much chance to look at them, but the
transfer-function paper looks more or less OK, even though the authors
have made it almost impossible to relate these transfer functions
(impulse response is what they calculated) to physical time. Using
sessions of variable duration as the independent variable is rather
wierd. What's interesting in a qualitative way is that they were
measuring some _extremely slow_ system, with a time constant on the
order of days. Unless I have misunderstood. These people are clearly
neither physicists nor engineers.
-----------------------------------------------------------------------
Best,

Bill P.

William_T_Powers4 · July 26, 1995, 11:55pm

[From Bill Powers (950726.1600 MDT)]

Bruce Abbott (9500726.1230 EST) --

You have me mostly convinced that the behavior rate within ratio periods
(for the Staddon data) is nearly constant and independent of the ratio.
I did my own fitting of the straight lines to the data, and came out
with proportional errors of at most 6.5% of the time per behavior (at
the lowest rates). My mistake in interpreting your table was in casually
assuming that high correlations implied actual fits of the data that
were 99.9% perfect. I should have known better. Sorry for impugning your
mathematical integrity.

I still would like to get rid of the verbal arguments and see this whole
thing laid out mathematically.

Some of our differences are only a difference in which variable we are
talking about:

(2) r = 1/(mL + c)

But

(3) b = 1/L

thus

(4) r = 1/(m/b + c) = (b + c/m)/m

Yet you wish to represent

(5) r = b/m

The "b" I was talking about was the observed value, the apparent
behavior rate. Calling that B, it is related to your b by

m press/reinf
B = ---------------------- press/sec
m/b + c sec/reinf

···

------------------------------------------------
     An infinite behavior rate implies a zero delay, which is clearly
     not impossible if there is no behavior required (FR = 0) to produce
     the pellet.

If there is no behavior, there will be no pellet. What you are talking
about is a case in which the apparatus will emit a new pellet as soon as
the old one is eaten (but not before). You aren't sticking to the
equations that describe the actual apparatus. Your argument above is
sophistry: by that argument, no behavior is equivalent to zero delay
which is equivalent to an infinite behavior rate. You found the word for
it ...

     Absurd! The number of responses required to complete the ratio and
     collect a reinforcer is the independent variable here, not the rate
     of reinforcement.

I thought it was absurd, too, to treat the reinforcer as an independent
variable.

The ratio is not the only independent variable. The other one is the
reference level. A different reference level would result in a different
curve for the same ratios.
-----------------------------------------------------------------------
Let's pause and think about the implications of your discovery. If it
holds up, what we have is an animal that always presses at some steady
rate regardless of the ratio and the consequent rate of reinforcement.
The only thing that creates the appearance of an inverse relationship
between amount of reinforcement and behavior is the necessity for
interrupting the pressing for a fixed time to collect each
reinforcement. Basically, the animals are pressing the bar as fast as
they can, or care to.

I believe you can set up OPCOND5 to imitate this condition. What is
required is a high gain and an appropriate limit on the error signal, so
that in effect if there is any error, behavior becomes maximum, and if
there is no error, behavior is zero. Also, it will be necessary to
reduce the reward size until the perceptual signal never rises above the
reference signal for any ratio from 2 to 64; the only "pause" that then
exists is due to the collection time. With this arrangement of
parameters, the animal will either be pressing at the limiting rate or,
during the collection period, not pressing at all. That should be
sufficient to recreate the Staddon data.

I don't think that this result is general. If reward size is large
enough, we should begin to see pauses after each reinforcement that are
longer than the collection time. In that case, we will see that as the
ratio decreases, the actual within-ratio behavior rate also decreases. I
know that such pauses are seen in many experiments.

In the following table, I plot as a proportional percentage the
difference between calculated time per reinforcement and actual time --
that is, calculated minus actual. A negative sign means that the actual
time is longer than the calculated time.

Percent deviation from fit

RATIO
2 4 8 16 32 644

Rat 1 -6.5 -1.0 1.5 2.9 -0.0 -0.2
Rat 2 -0.5 -0.1 4.7 2.3 -5.4 1.2
Rat 3 -3.6 -4.5 -1.1 4.7 1.6 -0.8
Rat 4 -6.5 -4.7 2.5 4.9 0.8 -0.7

This shows that at FR-2 and FR-4, the actual behavior rate is starting
to slow down. If the data were available for FR-1, I would expect the
slowing to be even greater. If this were so, it would indicate that the
error signal is coming down below its limit, so that active control is
becoming possible. Of course active control is going on anyway -- it is
only the behavior rate that is keeping reinforcement rates as high as
they are. But we are apparently seeing a control system that is up
against the stops.

The implication is that something about the Staddon experiment is
creating a very large, nearly maximum, error signal in the animals. I
haven't worked this out yet, but if we extrapolate to the reference
level using a second-order fit, we will get an apparent reference level
that is much higher than the one computed from the straight-line fits.
You can probably do this with your statistics package. Having the FR-1
data would have helped.

The Motheral data will, as you say, also be subject to the collection-
time effect. I haven't tried that calculation yet. If we get something
similar, the suggestion will be that actual reference levels are much
higher than what we have been deducing, and errors are approaching or at
a limiting amount. The reason I'm so sure about this is that if reward
size is increased enough, behavior rates will obviously slow down as the
animal begins to get food at or above the rate specified by its
reference signal. However, the range of good control may be much
narrower than what I have been deducing from my models.
--------------------------------------
There is another possible explanation. It is that the animals are
incapable of learning graded control for this kind of task. Once the
selection phase is over, the animals have either learned to press at
some fast rate, or they haven't.

And another possibility is that these animals aren't control systems!
-----------------------------------------------------------------------
Best,

Bill P.

William_T_Powers4 · July 30, 1995, 11:39pm

[From Bill Powers (950730.1230 MDT)]

Bruce Abbott (950730.1130 EST) --

I'm sorry to keep nibbling at you like a duck, but something is nagging
at me and I can't let go of it until the error is resolved one way or
the other.

I do understand how your reasoning went. The time per reinforcement is
the sum of two times: (1) the time needed to execute the ratio at a
pressing rate p and (2) a constant collection time. You fitted a
straight line to the data and for each rat came up with a value of a
constant and a slope. The constant is the collection time and the slope
is the time per press. As you present the result, there seems to be no
escaping your conclusion (that's half of what keeps nagging at me).

I suppose the bottom line is whether a straight line is actually a
better fit than a curve, and whether the deduced value of c is the best
value. Since the rate of pressing is something that can be measured, and
so is the value of c, there's no point in continuing to argue about them
as theoretical values. Do you suppose that Staddon might have made a
record of every bar press on disc, and saved it, and might be willing to
let us use the data?

You can adjust c in CYCLIC.PAS to make the behavior approximately
constant, and fit a rat-function to it by raising the reference level
and fiddling with the limit and gain. This would show a control system
that is essentially saturated and far outside its control range -- a
unverifiable model, since there can be no data for reinforcement near
the reference level. Or we can adjust the four parameters for the
smallest least-squares error (the program is initialized that way), and
see what a plot of m versus m/b+c looks like. But I'd rather have real
data.

Best,

Bill P.

Bruce_Abbott · July 31, 1995, 12:30am

[From Bruce Abbott (950730.1930 EST)]

Bill Powers (950730.1230 MDT) --

I do understand how your reasoning went. The time per reinforcement is
the sum of two times: (1) the time needed to execute the ratio at a
pressing rate p and (2) a constant collection time. You fitted a
straight line to the data and for each rat came up with a value of a
constant and a slope. The constant is the collection time and the slope
is the time per press. As you present the result, there seems to be no
escaping your conclusion (that's half of what keeps nagging at me).

I have to admit that I'm not terribly happy with this result either, but it
seems to be what the data imply.

I suppose the bottom line is whether a straight line is actually a
better fit than a curve, and whether the deduced value of c is the best
value. Since the rate of pressing is something that can be measured, and
so is the value of c, there's no point in continuing to argue about them
as theoretical values. Do you suppose that Staddon might have made a
record of every bar press on disc, and saved it, and might be willing to
let us use the data?

We seem to be thinking along the same lines. I've already enquired--Friday.
No reply as yet but it's still early. The data were originally collected by
computer, so there's a good chance that a disk exists.

Regards,

Bruce

William_T_Powers4 · July 31, 1995, 3:10pm

[From Bill Powers (950731.0815 MDT)]

Bruce Abbott (950730.1130 EST) --

Well, I think my long confusion is ending. We have been forgetting
something about control systems: they control their inputs, not their
outputs. And what is the input in an operant conditioning experiment?
The reinforcement rate.

The reinforcement rate depends on the mean behavior rate.

r = b/m

For the rat equation we can fit a straight line to the data of the form

b = g*(r0 - r)

where r0 is the reference level for reinforcement rate and g is the gain
from error to b. This line intersects the x-axis at the reference level
for reinforcement rate.

If we assume a pressing rate p and a collection time c, we can say

p = m/(m/b - c)

Since p is a function of m, b and c, and we know b and m, the only
remaining unknown is c. Your method solves for c. Once c is determined,
p is determined and we can calculate it.

The pressing rate p is the mean rate of pressing between collections;
the control action adjusts it on the basis of the error r0 - r. The gain
from error to p is reduced by the interruptions for collections, leading
to a lower gain from error to b. The net gain is nearly constant.

As m increases, the error increases but the loop gain of the control
system (g/m) decreases. As it happens, the two effects on p are nearly
offsetting under the conditions of this experiment. Estimating from the
plots in CYCLIC2.PAS, the control system output gain is about 8.8 for
Rat 1. With a ratio of m = 2, the _loop_ gain is 4.4.

If you adjust the collection times in cyclic2.pas to 5.35, 5.77, 5.47,
and 5.21 for rats 1 through 4 (the values obtained by your method), you
will see that the values of p shown by the green circles are not
constant as the ratio changes. Since they represent the outcome of
opposing nonlinear effects, it is not surprising that the values show
some curvature. The general horizontal trend is a coincidence.

I will produce cyclic3.pas today, using the strategy outlined here. From
this we will get a control system model with a reference signal and a
gain characterizing the rat.

···

---------------------------------------
Sam Spence, I have been ignoring your propositions about the rat
behavior until this modeling business gets straightened out. When it's
finished, I think you will see that elaborate hypotheses about what the
rat is perceiving will not be necessary.
----------------------------------------------------------------------
Best to all,

Bill P.

Tom_BOURBON3 · July 31, 1995, 5:18pm

From Tom Bourbon [950731.11:42]

[From Bill Powers (950731.0815 MDT)]

Bruce Abbott (950730.1130 EST) --

Bill brought this thread back to a crucial point:

Well, I think my long confusion is ending. We have been forgetting
something about control systems: they control their inputs, not their
outputs. And what is the input in an operant conditioning experiment?
The reinforcement rate.

The reinforcement rate depends on the mean behavior rate.

r = b/m

For the rat equation we can fit a straight line to the data of the form

b = g*(r0 - r)

where r0 is the reference level for reinforcement rate and g is the gain
from error to b. This line intersects the x-axis at the reference level
for reinforcement rate.

This discussion is starting to look a little more like it is about a model
of a living control system that controls its own perceptions. In my attempt
to catch up with this thread, I've been handicapped by not being in at the
beginning, or until only about a week ago. Even with that constraint, one
thing that struck me, right away, was what seemed to be an attempt to model
organisms as controllers of their own behavior. (I commented briefly on
that impression, yesterday.)

Bill's suggestions start to bring the focus of the modeling back to control
of perceptions, but they don't go all the way. For one thing, the
corrections described above still assume the organism controls its _inputs_,
not its perceptions. That's one of the classic blunders made by many
control-system engineers, when they start to study living control systems.
I'm not saying Bill made that blunder, just that it is hard to avoid
something like it when we try to understand what it is that is "reinforcing"
about a "reinforcer."

I'm sure many things have changed in the operant literature since I looked
closely at parts of it a few years ago, but back then, no one seemed to know
exactly what it was about a food pellet that was "reinforcing." Something
to do with "getting," chewing, swallowing, stomach loading -- probably.
Something to do with blood chemistry, or metabolic rates, or energy stores
-- probably. There are many other possibilities.

Does an animal really control the rate of input (ingestion?) of pellets?
What happens if the size of the pellets varies? The mix of nutrients vs
inert ingredients in each pellet? The texture of a pellet? And so on. I
know things like that have been studied in the operant community, but all of
the studies I recall (vaguely) treated those putative reinforcing properties
of the reinforcer as though they were in control of the rat's behavior, not
the other way around. Is the situation any different today, Bruce? Do you
know of any data that might help us better understand what the rat controls
in an operant procedure? Which perceptions are the most likely to be "under
control? Might it be a constellation or ensemble or series of them -- some
to do with perceptions of pellets at a distance; others with pellets in the
mouth; in the gut; dissolved and circulating, stored or combusted?

Those are some of the variables I think someone might want to study, as
potential controlled variables/controlled perceptions. I suspect that if
someone were to do those studies very carefully, we would end up with a much
better idea of that which the animal controls. Then we would be better able
to know what to make of traditional data on "rate of reinforcement."

Good solid data about the phenomenon of control. There is no substitute!
(In my admittedly biased opinion.)

Later,

Tom

William_T_Powers4 · August 1, 1995, 6:21pm

[From Bill Powers (950801.1130 MDT)]

Bruce Abbott (950801.1220 EST) --

The so-called reference level turns out to be just the maximum rate at
which food deliveries can be obtained.

     It may be more accurate to state that it's the rate at which food
     deliveries _are being_ obtained, rather than the maximum rate at
     which they _can be_ obtained.

No, the so-called reference level turns out to be 1/c, which is the
maximum rate at which food could be obtained with an infinite rate of
pressing. The actual rate of obtaining food is always less than 1/c.
This isn't a matter of the apparatus; it's the effect of collection
time, same as you computed. The finding that the "reference level" is
1/c when the pressing rate is assumed constant was a complete surprise
to me.

Of course I should say "the maximum rate given the observed collection
time."

In this experiment each rat is either collecting food or pressing the
bar at a constant rate. This is true for all schedules from FR-2 to
FR-64. The schedule has absolutely no effect on the behavior.

     This interpretation says that the rat has reached some upper
     physical limit in its ability to generate a high rate of pellet
     delivery. This would explain the constancy in response rates and
     the lack of actual control over rate of reinforcer delivery.
     However, this appears inconsistent with the fact that the rates are
     also constant _but lower_ at a lower level of food deprivation
     (95%). It is _as if_ a given level of error in the "stomach
     loading" system is determining the observed response rates, rather
     than determining the reference level for rate of pellet collection.

This looks more like an observation than an interpretation. The detailed
record will tell the story.

The upper limit needn't be physical; it could simply be that the
reference level for rate of pressing (which is also, presumably, a
controlled perceptual variable) is set to some value as a result of
other considerations, such as degree of fatigue, cost-benefit ratio, or
what have you. Actually, that's more or less what you said. But since we
apparently have a constant value of rate of pressing, it's tempting to
think that some signal has reached a limit.

     The question is, does this make sense? In nature the rat must vary
     its rate of food discovery and injestion [ingestion? WTP] as
     required to maintain itself in a proper nutritional state.

Sure, it makes sense. It's just that the experiment is set up so the rat
is unable to maintain itself in a proper nutritional state -- and at the
same time meet all its other internal requirements -- under the
conditions of this experiment. It's like setting a small room air
conditioner to a temperature of 60 degrees when the outside air
temperature is 120 degrees. It tries, but it can't do it. it just pumps
heat at the maximum rate it can, given the line voltage, internal
friction, obstacles in the air path, and so forth.

I think that Staddon and Ettinger just had the bad luck of setting up
the experiment so that the conditions were outside the range of the
animals' control systems.

     In the operant chamber, however, the state of the nutrient system
     is carefully maintained at a nearly constant level of error. If
     the rate of food-gathering activity depends on this error, that
     rate will remain roughly constant so long as the error remains
     roughly constant.

That's closer, I think.

     Whatever happens to it, the change will be corrected between
     sessions through the experimenter's intervention. However, if the
     rat had to earn _all_ its food by responding on a given ratio
     schedule, response rates that failed to produce enough food would
     lead to increasing deprivation level (error) and thus to higher
     response output. In other words, under this condition we should
     see control.

Yes, this makes sense. There's enough evidence that rats have weight-
control systems that are very accurate (weight control better than 1% in
the presence of largish disturbances). But these control systems are
very slow, so to see the control we'd have to sample over a long time. A
1-hr session is essentially an instantaneous sample.

We might or might not see control. That depends on whether the balance
between available energy and pressing rate has already been reached.
Higher deprivation could simply lead to less behavior, because what the
animal is being deprived of is the very energy it needs to produce
behavior. Once stored energy is depleted, the animal simply can't
produce any more behavior.

      In the typical operant study, by holding error constant, at best
     we are able to measure only the gain of the output function at the
     particular level of error we have induced in the nutrient control
     system.

At best. The problem is that with protracted error, we can expect
reorganization to have an effect, which means that the system we're
trying to measure is changing while we're measuring it.

     Furthermore, there is the problem of competing control systems.
     Given a certain level of error in the nutrient control system and
     in, say, a system controlling whatever perception is involved in
     excercising, and given that the rat can't both lever-press and run
     in a running wheel at the same time, the availability of the
     running wheel means that part of the time that might be spent
     obtaining food pellets will now be spent reducing the error in the
     latter system instead.

Once we have a working model of these control systems under simple
conditions (running and eating separated so they don't compete), we
should be able to predict what happens when there is conflict. The
natural solution to conflict, as you have mentioned, is to control a
sequence: alternate running and eating. Whatever is controlled by each
must change on a long time-scale, so the alternation would give almost
the same effect as simultaneous control.

     Questions: (1) How will the rat "allocate" its time? (2) Does the
     answer to this question fall naturally from a PCT analysis of the
     situation? (3) If so, how?

The "choice" experiments have to be looked at very carefully. One thing
we can try in simulation is to check out the "matching law" using random
switching of bar-pressing. That's really the baseline, and I don't
recall its being investigated. If you have two ratio schedules, you will
always get exactly b1/r1 = b2/r2 even if we arrange for a simulated
organism to switch randomly from one key to the other.

When other patterns appear, they have to be compared with the pattern
when there is random switching. Only the difference will tell you
anything about switching strategies.

I think that "allocation of time" is not an actual control process. It's
a side-effect of the actual processes. If you have an animal casting
about at random looking for a food source, the "selection" effect of
reinforcement (i.e., reorganization) will gradually increase the dwell-
time on behaviors that produce the most food, leading to a bias in favor
of those activities that produce the most food. It would take a more
advanced cognitive system to deliberately allocate time on the basis of
a sampling of experience with consequences of different behaviors. At
least that's the approach I would take to this problem when working with
rats or birds, rather than trying to design a system that controls the
fraction of time spent on each of several tasks.

I do think that some simulation experiments should be done using random
switching. The results could prove illuminating.

···

---------------------------------------
I hope we're not done with the Staddon-Ettinger experiment. Our results
are well worth a paper and could have far-reaching effects. It might be
good to run the results past Staddon and Ettinger to see if they concur.
If they do, they might even want to sign on as co-authors.
-----------------------------------------------------------------------
Best,

Bill P.

Bruce_Abbott · August 4, 1995, 2:15am

[From Bruce Abbott (950803.2115 EST)]

Bill Powers (950803.1355 MDT) --

Before you invest too much work in the collection-time analysis, there
are still some considerations that need to be looked at. First is the
fact that the straight-line fit you're doing has most of the points
bunched up near the origin, so small variations actually represent
larger proportional departures from the straight line fit.

I've been concerned about that, too--the strong influence of the high-ratio
values on the position of the best-fitting line and the bunch-up of the
values at low ratios. However, in my last couple of posts, the data I've
presented have been based on the "response function" (Motheral-type graph)
rather than my replot. I've either used an eyeball fit to the right limb of
the curve or the straight lines provided in the figure. Presented in this
form, the different ratio values tend to be fairly evenly spaced and so
avoid the bunch-up problem.

By projecting these fitted lines to the two axes, I get the food rate
(reciprocal of the collection time, x-axis) and response rate excluding
collection time (reciprocal of time per response). Except for differences
which may occur because of the way the line is fit, the reciprocals of these
two values, converted to seconds, are the same values derived from plotting
inter-food interval as a function of ratio value.

And I think the fit you need to compute is the
_proportional_ error, because the actual variable we're interested in is
the reciprocal of the one you're plotting: pressing rate, not time per
press.

Fitting a straight line to the plot of food rate versus response rate should
avoid this problem, yes? One problem to be aware of is that the precision
of the points is often not as good as one would like for theory testing. My
original derivation of the straight-line relationship between ratio value
and inter-food interval was based on averaging the data from each animal
across four replications of the same set of conditions. When we replicate
these studies I plan to collect data over a larger number of sessions so as
to improve the precison of our estimates.

Second, you are assuming a constant collection time at all ratios. Since
you have already found that collection time can vary a great deal under
different conditions of deprivation, the assumption of constancy is not
necessarily a good one. Only a _very_ small change in collection time (1
or 2 sec) can make a huge difference in the pressing rate that is
calculated at the lower ratios.

One or two seconds is a fairly large change if the base is 6 seconds. I
agree that the relationship may not always be a linear one. However, the
fact that a straight-line fit is such a good one is certainly consistent
with a constant collection time over ratio values. This does not mean that
collection time must be constant over other manipulations (e.g., deprivation
level),; indeed, we see such changes where they make theoretical sense.

Third, in your latest analyses of Staddon's data, you show collection
times that vary from 5.6 sec to 14.4 sec for one animal. That is a very
striking range! Fourteen seconds is a very long time to collect the
food, and raises the question of how rewards were given. In other
experiments, a three-second access to food is talked about. Did Staddon
allow a far longer time? And if so, did he allow a _different amount of
time_ at different ratios or under different conditions? It would seem
very strange that (in your data of 950802.1130) an animal that took 17.2
seconds to collect the reward when at 95% of free body weight would
return from the collection dish after only 8.8 seconds when it was at
80% free weight. Did it leave the dish while there was food still
accessible?

In the cyclic-ratio studies the food (45 mg pellet) was delivered into a
food cup, where it remained until recovered by the animal. At higher levels
of deprivation or lower values of food palatability, the rate of approach to
the food cup and rate of consumption of the pellet are probably reduced, and
the animal probably becomes less interested in returning to the lever
immediately after the food has been consumed. Instead, it may groom or
investigate the chamber. The time taken by the rat appears to be rather
constant across ratios when everything else but the ratio is held constant,
but it will clearly vary as factors such as the error-level in the nutrient
control system vary.

Whether the rat rapidly collects the food and quickly returns to the lever
or takes its sweet time may depend on something like the relative values of
alternatives. At low deprivation, both the error and gain of the nutrient
control system may be small and so produce low reference levels for rate of
food intake, allowing other control systems to compete more successfully for
"attention." I view the situation as comparable to one in which you have a
choice of fixing dinner or watching TV. If you're starving you'll probably
fix dinner, but if not and the show is good, you may stay glued to the set
and ignore your mild state of hunger. In other words, you will be willing
to tolerate a certain error in one system if that allows you the time to
control another variable and bring it to a preferred level.

Evidence for this is found in Figure 7.17 in Staddon (1983), where two
curves are presented showing the usual Motheral-type curve at 85% ad libitum
weight where a running wheel either is or is not present in the chamber
during lever-pressing for dippers of liquid diet. The two functions appear
almost identical to the two curves shown for 95% and 80% deprivation,
respectively. Extrapolating Staddon's fitted lines to the x- and y-axis, I
get the following rates (per session):

Dippers Lever-Presses
No Wheel 381.2 9918.4
Wheel 285.2 3777.1

Converting to seconds per activity, I get:

Coll. Time Sec/Press
No Wheel 9.4 s 0.36 s
Wheel 12.6 s 0.95 s

This represents a 134% increase in collection time and a 264% increase in
time per lever press. Thus, adding the wheel suppressed lever pressing
about twice as much as it suppressed pellet collection. Most likely the
rats were collecting the food pellet fairly quickly once it became available
but then were likely to take a spin in the wheel on some occasions rather
than returning immediately to the lever. The larger the ratio requirement,
(the more work and time required to get another food pellet), the longer it
tended to be before the rats would return to the lever to complete another
ratio. This portion of the time following reinforcement which varied with
response requirement would show up in the average time between responses and
not in the estimated collection time.

What I'm getting at is that my picture of the rats pressing as fast as
they can except for the time needed to collect the food may be quite
wrong.

I've been trying to dispell that notion for some time now, as I'm sure
you're aware.

Of course to collection time we must also add any time that is spend not
collecting the reward but wandering around looking for food.

Or grooming, or whatever else the rat finds more interesting at the moment.

This time
is likely to increase at lower ratios, giving us a _decrease_ in reward
rate with increasing ratio. But Staddon's experiments don't seem to
include those conditions.

Yes, the detailed pattern within a reinforcement "cycle" probably differs
from the picture painted by the estimated averages for press-rate and
collection time.

I think the conclusion is inevitable. We have do to these experiments
with your rats. There are just too many unknowns behind the published
data. And I hope there will be visual observations of the rats at least
at intervals, so we can answer some of the questions that are left
dangling.

I agree, and come to think of it, we got the same advice about repeating
these studies from John Staddon. As for visual observations, I'm looking
into borrowing a video camera and VCR so we can get a complete visual record
for subsequent analysis. That way if we decide later that we should have
quantified some aspect of the rat's performance we can probably recover it
from the tapes.

If you want the real significance of our computed collection times and
response rates, consider the following data (Teitelbaum, 1956) from
_non_deprived rats earning their entire daily food intake in 12-hr sessions
(no make-up between sessions), by pressing a lever on various fixed-ratio
schedules. The pellets 90 mg) were double the size of those used in
Ettinger & Staddon's cyclic-ratio studies (45 mg).

Ratio Pellets/sess sec/pellet
1 132.1 327
4 118.4 365
16 88.4 489
64 58.9 733
256 16.3 2650

Linear regression on sec/pellet vs. ratio give the following values:

    collection time: 293.4 s
   seconds/response: 9.1 s
               r-sq: 0.993

Regressing response rate on pellet rate (equivalent to fitting a line to the
Motheral-type curve) gives these results:

        y-intercept: 5188 presses/session
        x-intercept: 134.4 pellets/session
               r-sq: 0.926

Converting these to times gives:

collection time: 321.4 s
seconds/response: 8.3 s

As you can see, the line fitted this way yields slightly different numbers.
The results are not as strongly affected by the value at the highest ratio.

Certainly the rats did not require around 5 minutes to collect a pellet!
Nor did they stand at the lever and emit a press once per 9.1 (or 8.3)
seconds. Because all food was earned within a long session, the more likely
senario is that the rats ate their earnings in several meals, with lots of
time between meals during which neither lever-pressing nor food-collection
was going on. That time ended up in the averages, however. Still, the same
linear relationship appears in these data that we find in the cyclic-ratio data.

Regards,

Bruce

William_T_Powers4 · August 3, 1995, 8:57pm

[From Bill Powers (950803.1355 MDT)]

Bruce Abbott (950802.2120 EST) --

Before you invest too much work in the collection-time analysis, there
are still some considerations that need to be looked at. First is the
fact that the straight-line fit you're doing has most of the points
bunched up near the origin, so small variations actually represent
larger proportional departures from the straight line fit. Even in your
original data there were clear indications of a droop of pressing rate
near m = 2 and 4. So the "constant" rate of pressing is only an
approximation. This has to be checked for each new data set. If you only
compute the best straight-line fits without checking how good the fit
is, you could get fooled. And I think the fit you need to compute is the
_proportional_ error, because the actual variable we're interested in is
the reciprocal of the one you're plotting: pressing rate, not time per
press.

Second, you are assuming a constant collection time at all ratios. Since
you have already found that collection time can vary a great deal under
different conditions of deprivation, the assumption of constancy is not
necessarily a good one. Only a _very_ small change in collection time (1
or 2 sec) can make a huge difference in the pressing rate that is
calculated at the lower ratios.

Third, in your latest analyses of Staddon's data, you show collection
times that vary from 5.6 sec to 14.4 sec for one animal. That is a very
striking range! Fourteen seconds is a very long time to collect the
food, and raises the question of how rewards were given. In other
experiments, a three-second access to food is talked about. Did Staddon
allow a far longer time? And if so, did he allow a _different amount of
time_ at different ratios or under different conditions? It would seem
very strange that (in your data of 950802.1130) an animal that took 17.2
seconds to collect the reward when at 95% of free body weight would
return from the collection dish after only 8.8 seconds when it was at
80% free weight. Did it leave the dish while there was food still
accessible?

You reported that Staddon said they did not watch the rats. So I suppose
it will be useless to ask him why such long collection times were seen,
and what was different between the short and long collection times.

What I'm getting at is that my picture of the rats pressing as fast as
they can except for the time needed to collect the food may be quite
wrong. If, for example, the conditions are such as to produce
"scalloping" in the response traces, then at least part of the time we
are treating as collection time could simply be a post-reinforcement
pause that would take place even if actual collection time were zero.
In OPCOND5, the typical perceptual signal can look like this:

* *
** no behavior **
* * / * *
* * * *
-*-----*---------------------*------*----------------ref level--
^ * | * etc.
> * |error & *
reward * v behavior
* *
* *

···

*
Time --->

After the reward occurs the perceptual signal goes above the reference
level and starts to decay. There is no behavior during this period.
Since the reward is not actually received completely until the end of
the collection time, the apparent collection time will have at least
part of the post-reward overshoot time added to it.

As the ratio increases, one effect on the perceptual signal above is to
move the sawtooth curve lower relative to the zero line. This produces
an increase in the total time spend pressing per cycle (as well as an
increase in the rate) and a decrease in the length of the post-reward
overshoot time. Even if the actual collection time were constant, the
apparent collection time would shorten.

If conditions are such that the post-reward overshoot occupies most of
the cycle, increasing the ratio would cause a rapid increase in time
spent behaving and a rapid shortening of the collection time.

Of course to collection time we must also add any time that is spend not
collecting the reward but wandering around looking for food. This time
is likely to increase at lower ratios, giving us a _decrease_ in reward
rate with increasing ratio. But Staddon's experiments don't seem to
include those conditions.

I think the conclusion is inevitable. We have do to these experiments
with your rats. There are just too many unknowns behind the published
data. And I hope there will be visual observations of the rats at least
at intervals, so we can answer some of the questions that are left
dangling.
-----------------------------------------------------------------------
Best,

Bill P.

William_T_Powers4 · August 8, 1995, 10:06pm

[From Bill Powers (950808.1145 MDT)]

Bruce Abbott (950807.1050 EST) --

Looks as though my reading skills are deteriorating. I said 45 sec when
I should have said 45 min. However, I do still have a leg or two to
stand on.

RE: dynamic effects.

It makes no difference how many times the cycles were repeated if each
ratio within each cycle lasted only a small fraction of the system's
time constant for changing from one pressing rate to another. Suppose
the system's time constant for changing the rate of pressing is two days
(one time constant is the time required to reach 1-1/e or 63% of the
final value, for those who don't know). In 10 days (Collier et.al), the
pressing rate would be 0.993 of the final value. If you now start
changing the ratio every 45/12 = 4 minutes, you will see a pressing rate
that changes only 0.0014 of the way to asymptote by the time the ratio
is completed. This would create the appearance of a constant pressing
rate appropriate to the 24-day average ratio. And that is what is seen:
a constant pressing rate (see below).

Note that E&S don't seem to use "running rate" the same way that you
have defined it. When I looked at Fig.4 I did misread it, but that's
cancelled because it doesn't actually show the running rate. The authors
point this out themselves on p. 642:

     Figure 4 shows that running response rates (i.e., response rate
     after the first post-food response of each ratio) also tracked
     successive ratio values throughout the session. [This is what
     misled me, but note the following] The lower running rates at
     larger ratios are a result of periodic interruptions in A STEADY
     RATE OF RESPONDING [emphasis added], not of continuous variation in
     the steady rate.

Fig. 5 is cited to illustrate the _constant_ interresponse times. Fig. 4
probably shows obtained reinforcement rate times ratio. Unfortunately
the interresponse times are not given as a function of time during a
ratio, so we can't see any dynamic effects. Dynamic effects,however, may
have been too small to see over series of up-down ratio cycles where
each cycle lasts at most 45 minutes and any uptrend in pressing rate is
immediately followed by an equal down-trend for an equal time.

     Again, the rats were exposed to the cyclic-ratio schedule for 24
     days. The single session's performances were representative of a
     typical run; apparently there wasn't much changing going on across
     sessions.

And again, this tells us nothing about what the pressing rate would have
done with long series of identical ratios.

     Sequence of ratios was manipulated only in Experiment 1, and only
     in the sense of comparing the cyclic-ratio and single-ratio
     procedure results between groups. I don't know to what you are
     referring when you claim that "number of repetitions within the
     sequence" was manipulated: rats were always exposed to two complete
     repetitions of the cycle.[etc]

Some of the variables I mentioned weren't manipulated, but they should
have been. Changing the sex of the animals might have made a great
difference in the forms of the curves if in fact the situation was set
up to make it impossible for large animals to achieve control, but
possible for smaller animals. Number of repetitions within the sequence
was manipulated between and B and C animals: for the C animals, the
sequence of ratios was 2, 4, 8, 16..., while for the B animals it was
2,2,2,2,4,4,4,4,8,8,8,8 ... etc.: "each schedule value was presented for
four consecutive sessions" (bottom p.640). Note that in the right panel
of Fig. 1, the slopes for all animals are close to vertical (in
individual pairs of points, they even lean to the right), and even in
the left panel, three of the rats show an increase in slope on the
right side (no fitting of straight lines to the whole curves here!).

The sequence of ratios should have been varied in the sense of
randomizing it.

     In separate experiments, deprivation level, taste of food
     (quinine), and drug level (amphetamine) were manipulated; I don't
     see anything "hodge-podge" about the way in which these systematic
     manipulations were carried out.

What was hodge-podge was throwing together a set of very different
experiments without carrying any of them to completion. The first
experiment left many questions unanswered -- for example, the
significance of the fact that pressing rate did not seem to vary with
ratio. That is a direct contradiction of the control-system equation
that was introduced at the start -- unless the behavior rate x in that
equation is interpreted to be the _mean_ behavior rate, or ratio times
rate of reinforcement. But that would put the same variable on both
sides of the equation, and solving it would produce a very different
relationship, showing the reinforcement rate to depend only on system
constants and not on mean behavior rate.

In fact, there was no regulatory behavior, but the authors proceeded on
the assumption that their results did show regulation. They even looked
for the effects of drugs on the gain and reference level of the control
system -- even after they had shown that there was no control occurring.

The only functional dependence on ratio found was that the pre-ratio
pause increased with ratio. Since this pause varied from 10 to 30
seconds, it was clearly reflecting other activities of the rats, the
other activities taking more time at the higher ratios (at least for 2
rats). And I can't see why the authors offer the unlikely proposition
that the increase in other activities anticipated the ratio, when a much
more plausible hypothesis would be that it anticipated the reinforcement
rate. Are animals more capable of sensing the ratio than the
reinforcement rate? I am also unconvinced by the very "anticipation"
hypothesis. If that's true it should be true of all the rats, and it
isn't.

I am also full of questions about how that relationship between pre-
ratio pause and ratio was plotted. As I understand it, the ratio went
2,4,8,16,32,64,32,16,8,4,2,4,8,... . If that's so, where do the first
and last points come from, and how come there are two points at the
highest and lowest ratios when those ratios occur only once per cycle?
If you just go through the points sequentially, you find that the second
top point on the descending arm should go with the ratio of 32, not with
64, so the figure should have a loop in it. I think this figure is
nonsense -- either that, or the so-called cyclic ratio has two
duplications of ratios in it.

     As for why pauses were plotted against the upcoming ratio rather
     than the previous one, I tried replotting the data as a function of
     previous ratio, and they were not nearly as systematic. When you
     do it their way you get functions for ascending and descending
     ratios that show only a small hysteresis and essentially replicate
     each other, especially for C1.

How did you replot the data? I'm curious about how you handled the
duplication of points for ratios of 2 and 64.

···

-------------------------------
     The data are presented for the same four individual rats, and each
     separate manipulation included a replication of the baseline
     cyclic-ratio conditions for direct comparison within that
     experiment.

The data for individual rats was shown; I stand corrected. And in fact,
now that I go back over the data, I can't see where the results for the
B group were used. It looks as though all conclusions were based on the
C group, with no conclusions being drawn from the difference found
between single and multiple presentations of the same ratio during a
cycle, either in terms of the direct behavioral differences or the drug
effects.

     If response rates were being used by the animals to control
     reinforcement rates, at higher ratios, in which responding at the
     same rate as in lower ratios would produce lower reinforcement
     rates, the animals should have _increased_ their response rates to
     compensate. The observed change is exactly opposite.

No, that was a mistake by the authors. The actual response rates did not
vary with ratio.

     My explanation for this pattern was given in previous posts (twice,
     I believe): the rats are responding at about 0.15-0.20
     seconds/response while working off the ratio (peak IRT value),
     _regardless of the ratio_. Larger ratios, however, are associated
     with increased pausing, both prior to the beginnng of the ratio run
     (thus contributing to the length of the pre-ratio pause) and after
     (thus diluting the running response rate).

How do you split the total pause into a pre-ratio and a post-ratio part?
There is only one pause between ratios. You would need the actual record
of individual presses and food consumption times to make the distinction
you're proposing.

I agree, however, with your explanation. Lever-pressing behavior is not
a function of ratio in this experiment. Pause time evidently is,
although I would prefer to suppose it is a function of the rate of
reinforcement received per unit time, not of the ratio, knowing which
depends on dividing the behavior rate by the reinforcement rate .
-----------------------------------------------
More later. I'm tired, and my keyboard now requires the space bar to be
struck exactlyinthecenter or that happens. Supposedly getting a new one
tomorrow.
-----------------------------------------------------------------------
Best,

Bill P.

William_T_Powers4 · August 9, 1995, 8:13pm

[From Bill Powers (960809.1240 MDT)]

Bruce Abbott (950809.0950 EST) --

     ... there is both internal and external evidence that rats can
     rapidly adapt to changes in schedule parameters (as indicated by
     altered response rates) if given (a) some way to tell when the
     change has occurred, and to what, and (b) plenty of practice with
     the changing schedules.

In the present experiment, however, both you and the authors showed that
there is no change in actual response rates as the schedule parameters
are changed. So if any changes in reponse rates actually occurred, they
must have been very slow, too slow to detect under these conditions.
What did change was the _apparent behavior rate_ which is an artifact,
and would be expected to change immediately since it depends only on the
schedule and the reinforcement rate at constant pressing rate.

     The internal evidence is the consistent, rapid change in pause
     length and the emergence of pausing within the ratio run at the
     highest ratio values.

Pausing is a different sort of behavior. It involves unknown, unnamed
other activities that were not observed or reported. I see no way of
studying this phenomenon; all we can note is that one variable seems to
covary with another, and put that aside for later study. Right now I
thought we were trying to develop a control-system analysis.

     The external evidence comes from performance on multiple and chain
     schedules of reinforcement. In a multiple schedule, different
     schedules of reinforcement ("components") are identified by
     different discriminative stimuli (e.g., for pigions, green keylight
     during FR 10 and red keylight during FR 30). After sufficient
     practice, pigeons alter their response rates immediately with the
     change in component

This would suggest that in these experiments, pecking rate _does_ change
with the schedule, so there is the possibility of control. Have you
examined the data using your method of analysis to see if the pecking
rate actually did change, and that this was not just another case of an
illusory change in behavior rate? Perhaps these are the studies we
should be looking at.

Note that E&S don't seem to use "running rate" the same way that you
have defined it.

     I'm not sure what you mean here. Running rate is the rate of
     responding computed from the first response following food (which
     ends the post-reinforcement pause) to the delivery of the food. At
     lower ratio values the animals have a strong tendency to complete
     the ratio, once they have begun the run, rather than taking breaks
     within the run.

But Staddon & Ettinger couldn't have been talking about breaks within
the run. He said flatly that the pressing rate was constant, not
dependent on ratio. The IRT data confirm this. This can only mean that
they did NOT plot the running rate; they plotted b = ratio*reinforcement
rate, which would vary with the ratio in the manner shown.

And again, this tells us nothing about what the pressing rate would
have done with long series of identical ratios.

     The between-groups comparison was intended to answer that question.
     Ettinger and Staddon noted that the slopes of the functions for the
     C and B rats were in the same range and thereby concluded that the
     two procedures gave essentially the same results.

I really hate this "essentially the same" crap. When I look at the two
panels of Fig. 1, I see big and striking differences. What I see tells
me that the pressing rate would be FAR from constant for the B animals.
The B animals might even have been controlling reinforcement rate, while
the C animals clearly were not. But all the data are about the C animals
which were not controlling. If the B animals really are controlling and
the C animals are not, then these results could tell us what kept the C
animals from controlling, so we could avoid making the same mistake.

You know, you EAB types really go pretty easy on each other. These
articles are full of vague and sloppy statements as well as assumptions
with no supporting evidence, apparently springing from thin air. That
sort of stuff would never get past referees and editors in the physical
sciences.

     Randomizing the sequence could be done, but this would require
     providing a different discriminative stimulus to associate with
     each ratio value; otherwise your FR schedule becomes VR.

Good point. We're still faced with the problem that there was zero main
effect in this experiment, rendering all these details moot.

Well, we were fooled, too. I'm going to hate having to disabuse
them of this notion . . .

I was fooled worse than you; I actually constructed a working model of
nothing.

The ratio went 2, 4, 8, 16, 32, 64, 64, 32, 16, 8, 4, 2, 2, 4, 8, .
. .

So it was cyclic with two hitches in it. OK.

···

-------------------------------------
[How to split total pause up]:

     I'm basing it on the authors' statement, supported by their IRT
     distributions, that the apparent running rate changes are due to
     pausing that emerged during the ratio runs at the highest ratios.

But that isn't how I read the authors. I think they just didn't realize
that the pressing rate and the observed behavior rate were different
things, with the pressing rate being constant no matter what the ratio.
I think they thought they were seeing a behavioral effect of the ratio
changes. The effect on pauses, it seems to me, was simply an attempt to
salvage something, however uninterpretable, from what was basically a
failed experiment.

     My linear analysis can only partition the time between pellets into
     a component that is fixed regardless of ratio and a component that
     varies with the ratio. We know that the PRP varies with the ratio,
     as do the _average_ running rates (which include pauses at the
     highest ratios).

I must have been missing something here. In the equations I've been
using, there is only one pause called c, the "collection time." Your
calculation of the intercept assumed that there is just one fixed value
of c. If you start postulating a variable delay, then you have opened
the door to also postulating a variable pressing rate as an alternative
assumption. But as you pointed out yourself, there is only one slope and
intercept that fit the plot of time per reinforcement versus ratio, and
that fit shows a constant rate of pressing and a single intercept, the
post-reinforcement (or at least total) pause.

     Nice to hear. This analysis has opened up a whole series of
     interesting questions for us to address empirically. Meanwhile, we
     do have a nice set of data from Collier et al. to analyze. It will
     be interesting to compare the latter to the Ettinger-Staddon data
     to see whether the results support the same analysis or follow a
     different pattern.

I'm still frustrated by the lack of access to the detailed data, which
would answer many of our questions and make most conjectures
unnecessary. Could there be chance of getting the Coller et.al. raw
data? I'd even be willing to throw in some bucks to defray costs of
putting the data onto disk, if it didn't come to more than a hundred or
two. We're constantly coming up against this barrier, where the
information we want is probably there somewhere, but went unreported or
unnoticed in preparing the published paper. I know that raw data are
sometimes jealously guarded (for reasons at which I can only guess), but
it's worth a try, anyway. Got any clout with Collier?
-----------------------------------------------------------------------
Best,

Bill P.

William_T_Powers4 · August 10, 1995, 11:50am

[From Bill Powers (950810.0700 MDT)]

Bruce Abbott (950807.1050 EST) --

From the E&S data, Fig. 3, we discover that the mean pauses between

ratios vary from a minimum of about 3 seconds to a maximum of about 30
seconds, with a trend toward the higher limit at the higher ratios for
two rats.

Consider rat C1, where Fig. 3 shows the pause at FR-64 as 30 sec. When
we look at Fig. 1 for rat C1, we see that the mean response rate at FR-
64 is about 82 per minute, so completing the ratio requires 0.73
minutes. Subtracting the 30-second (0.5 min) pause, we find that the
time left for pressing is 0.23 minutes, giving a pressing rate of
64/0.23 = 278 per minute. This is the running rate at FR-64.

Looking at Fig. 4, we find that the minima for this rat, which
supposedly correspond to FR-64, are at something like 100 or 125
responses per minute, while the maxima would average between 250 and
possibly 350. This clearly does not agree with the running rates
calculated from Fig. 1 and Fig 3. One could ouly guess that the cycles
actually began with FR-64 or that the data were misplotted.

At FR-2, we find from Fig. 3 that rat C1 pauses about 10 seconds between
ratios (0.167 min). At that ratio, from Fig. 1, we find that the mean
response rate is about 18 per minute. Completing the ratio of 2 requires
2/18 minute or 0.111 minute. Subtracting the pause time of 0.167 minute,
we get a time for responding of -0.056 min. To get this number as high
as zero, we would have to assume that the actual behavior rate was as
low as 12 per minute instead of 18, or that the actual pause time was as
small as 6.7 seconds instead of 10. And that would leave zero time for
responding. These are enormous discrepancies for results supposedly
obtained from the same data set for the same rat.

I hope I have not made any errors here, because the conclusions I am
drawing are pretty severe. In fact, the data for Experiment One as
presented look completely inconsistent with each other, implying either
extremely large uncertainty in the measurements or basic mistakes in
reducing or plotting the data. All the data used here are said to be
from a "single session" on the cyclic-ratio schedule, meaning 6
repetitions of the cycle. I presume they are all from the _same_ single
session, although that is not mentioned in the text (just another little
example of the sloppiness).

None of these results is anywhere near consistent with your finding that
the inter-ratio pause averages 5.5 seconds and that the total time
available for pressing is a linear function of the ratio with time per
press being a constant for each rat.

Bruce, this is what Richard Feynman once called "Cargo-Cult science."
The authors have gone through the motions of a data analysis, but have
done none of the things that a real data analysis demands, such as
cross-checking the internal consistency of the results. They have
described none of their methods of calculating results so a reader can
check them out both for logic and for accuracy. They generalize from
subjective appraisals of the curves and fail to mention any alternative
interpretations where more than one is clearly possible. They present
conclusions without any explanation of how they reached them ("The lower
running rates at larger ratios are a result of periodic
interruptions..."). They give no indication of the accuracy or
reproducibility of the data. Their text contradicts itself and the
figures ("The constancy of the running rate across ratios is
demonstrated by the interresponse times in Figure 5" vs "Figure 4 shows
that running response rates ... also tracked successive ratio values
..." -- and both statements are in the same paragraph).

God only knows what we would find if we started with the raw data.

···

--------------------------------
Let's just do our own experiments. Pretty please? We need to run some
rats, find out if they're controlling, and if they aren't why not, so we
can either adjust the conditions to permit control or conclude that rats
can't control by means of bar-pressing.
-----------------------------------------------------------------------
Best,

Bill P.

Bruce_Abbott · August 10, 1995, 11:04pm

[From Bruce Abbott (950810.1805 EST)]

Bill Powers (950810.0700 MDT) --

Looking at Fig. 4, we find that the minima for this rat, which
supposedly correspond to FR-64, are at something like 100 or 125
responses per minute, while the maxima would average between 250 and
possibly 350. This clearly does not agree with the running rates
calculated from Fig. 1 and Fig 3. One could ouly guess that the cycles
actually began with FR-64 or that the data were misplotted.

Bill, this is nothing new, as I already pointed this descrepancy out both to
you and to John Staddon, who agreed that the figures do not add up. I've
asked Chip Ettinger to look into the matter, but he is currently ill and
unlikely to respond soon. I did compare the data in Figures 2 and 3 (from
the same session) for rat C1 and found that the average delays from Figure 2
do agree well (allowing for inaccuracies in measuring off Figure 2, where is
is hard to tell exactly where to measure a given ratio) with those given in
Figure 3. I also measured the "running rates" (if that's what they actually
are) from Figure 4, converted them to sec/response, then multiplied to get
the time to complete the ratio, once the first response had occurred. The
total time to complete the session, if only "running rates" were considered,
was 35.41 minutes. From Figure 1's pellets/sec, C1 completed the session in
22.75 minutes, which of course includes the ratio runs AND the
pre-reinforcement pauses. The total amount of time spent pausing (from
Figure 2) is about 19.44 minutes, for a total (ratio run + pause time) of
54.85 minutes, which is about double the time actually required to complete
the session, according to Figure 1, and I might add, greater than the 45 min
maximum limit to the session length. Other than that, Ettinger & Staddon's
figures look great. [Stated in the same sense as "Other than that, Mrs.
Lincoln, how did you like the play?]

Perhaps Chip Ettinger will be able to shed some light on all this.

Let's just do our own experiments. Pretty please? We need to run some
rats, find out if they're controlling, and if they aren't why not, so we
can either adjust the conditions to permit control or conclude that rats
can't control by means of bar-pressing.

I'm itching to do so, but I have too many other things to get out of the way
first. I'm going to try to get things lined up so that we can start up
right after the physical plant people finish re-doing the air-handling
system in the animal care building, which is scheduled to be done early this
coming semester. Meanwhile, I think the Collier et al. data may provide
enough information to at least do a bit of modeling. I also have some
additional findings to discuss which may bear on the problem of the rat's
apparent failure to control food rate via lever-pressing on ratio schedules.
I'll bring that in after we've had a chance to see what may be going on in
the Collier study.

Regards,

Bruce