PCT and EAB experiment

[From Bill Powers (951208.1730 MST)]

RE: comparison of reinforcement theory and EAB.

Gentlemen, it seems to me that we're still complicating matters
unnecessarily (Rick's model, too). What is the VERY SIMPLEST example of
behavior under reinforcement? I suggest it is simply an organism
pressing a key at an asymptotic rate while getting reinforcers at a
corresponding rate. This bypasses the acquisition phase, and quite
rightly because this phase is almost never studied in EAB, either. It's
too transient and variable, and it can't be repeated with the same
organism. What we need to look at is the model that explains steady-
state behavior, not the process by which the correct behavior is
stumbled across nor the final trimming that leads to the steady state.
In other words, we need to investigate the simple claim that
"reinforcement maintains behavior."

I don't care what the actual experiment is, but it should include a
reinforcer that is provided according to a schedule when a person
presses a key at some rate. After the final rates are observed, I
propose adding to the reinforcer a slowly time-varying amount that is
some fraction of the asymptotic rate, as a disturbance.

To make a control-theoretic prediction, we have to fit a model to the
data. We can make a _general_ prediction without needing the
disturbance; this prediction will be a model of the usual form, with one
parameter and a value of the reference level to be determined from the
data. We would claim that when the disturbance is applied, we can find
fixed values of the parameters that will generate predictions of actual
pressing rate over time for any waveform of the disturbance. We can't
predict the detailed behavior in advance, because we need a varying
disturbance to make a one-time evaluation of the parameter and the
reference signal. However, we can state in advance the computational
procedure to be used in evaluating the parameters, so we can in some
sense claim to be making a prediction. Of course with a new waveform of
disturbances, we could then do a true prediction.

How a reinforcement model could make predictions about this situation I
do not know. But the burden of providing that model is on the EAB side,
if we go the comparative route.


Bruce Abbott has raised the distinct possibility that fully-trained
animals do not actually vary their pressing rates; either they press at
a constant rate or they don't press at all. This is what he found when
taking the collection time into account, assuming a constant collection
time over a wide range of ratios. While there might be some variation in
pressing rate near zero error, the conditions used in standard
experiments may be making any region of continuous rate variation hard
to observe: with the standard reinforcer sizes used, that region could
be between FR-1 and FR-0. Since there's no way to set up a ratio of less
than 1, the critical region can't be plotted.

It would be relatively easy to set up an experiment for a human being in
which varying the rate is necessary to maintain a specific rate of
reinforcement or value of reinforcer. But it is not likely that any
animal experiments have been done under simular conditions, so
reinforcement theory would have to come up with a prediction from basic
principles (if this is possible).

The method is to convert the rate of pressing into a continuous variable
representing that rate, provide an indicator of the resulting average
value, and make the reinforcement contingent on the average value being
at some specific level. The position of a cursor, for example, could be
determined by the smoothed rate of pressing. Scoring rate would depend
on how close to a target the cursor is kept.

I don't know if a rat could learn to do this.
Rick Marken sent me the Killeen paper. If this paper has anything to do
with PCT, or with science, somebody else is going to have to prove it. I
read as much of it as I could. As I suspected, the equation relating
behavior to reinforcement is the same equation that relates
reinforcement to behavior, backward. The algebraic forms are, in fact,
the first two terms in the series expansion of the equation for the
variable-interval schedule. There is no independent source for the
organism equation that I could find. All the permutations are just the
same equation for the apparatus, served up plain, with chocolate sauce,
tied with a ribbon, or viewed in a mirror.

The algebraic form for the variable-interval schedule that Killeen uses
shows reinforcement rate R as a function of behavior rate B and
"scheduled reinforcement rate" R'. It is

R = B(1 - exp(-R'/B)).

As B goes to zero, R goes to zero. But to see what's required to get R =
R', we have to use the series expansion of exp(-R/B):

R = B[1 - (1 - R'/B + (R'/B)^2/2! - ...))

To get R = R', B has to go to infinity. The "scheduled rate of
reinforcement" is the reinforcement rate you would see for an infinite
behavior rate. That seems to me to be a very strange definition of the
scheduled rate of reinforcement. In fact, it seems strange to me to
speak of a scheduled rate at all, since you can't predict what the
actual reinforcement rate will be; it's determined by the behavior rate.

I don't want to see any more papers like the Killeen paper. Reading
them, I feel as if I'm being sucked into some strange nightmare. How are
things in Saraneb? How goes it with the Christopeds?
Best to all,

Bill P.