What's a reinforcer?

[From Bruce Abbott (970908.2000 EST)]

Me:

Richard Marken insists that the event termed the "reinforcer" in
EAB is the controlled variable of PCT. I agree that it is a CV in the
system that behaves so as to produce it, but have proposed that the relevant
property is not that it is a CV in this system, but that at some higher
level each reinforcer delivery serves to diminish the error between the CV
at that level and its reference.

Something about this didn't feel right, leading me to give it more thought
and finally, to reject once again the idea that the reinforcer of EAB is the
CV of control theory, even in the case of the system that produces it. The
operant produces the pellet, so it would _seem_ that the pellet is what is
being controlled. But on closer analysis, I see that this is not true; what
is being controlled is whether or not the pellet is available for
consumption. If the reference of this system is to have the pellet in the
food cup, then delivering the pellet reduces the error between this
reference and the current state of pellet not-present in the food cup. The
situation is a bit different than in the case of the pellet's effect on
nutrient concentration (once the pellet has been swallowed) because the
delivery of the pellet eliminates the error in the pellet availability
system at once, whereas consumption of the pellet has only a small effect on
the error in nutrient level and usually will not cancel it. In both,
however, the function of the pellet is that it serves to reduce error
between the CV and its reference.

So the pellet is not a CV, but it does qualify as a reinforcer under my
control-system definition of same.

Richard Marken _should_ be happy about this change of heart, even though I
am rejecting his definition. Under his definition, the Test for the
controlled variable (PCT) and the Test for the reinforcer (EAB) would be
equivalent tests, a consequence that I rather doubt Richard would have
liked, had it occurred to him. My reanalysis eliminates that problem.

I had asserted previously that they are not equivalent tests, although they
are closely related. I now reassert that claim. Identifying the reinforcer
suggests what the CV may be (something delivery of the reinforcer affects in
a particular direction); identifying what the CV is tells one what events
will serve as reinforcers: those that reduce error between that CV and its
reference.

Regards,

Bruce

[From Rick Marken (970908.1845)]

Bruce Abbott said:

Richard Marken insists that the event termed the "reinforcer" in
EAB is the controlled variable of PCT. I agree that it is a CV
in the system that behaves so as to produce it..

Now Bruce Abbott (970908.2000 EST) says

Something about this didn't feel right, leading me to give it
more thought and finally, to reject once again the idea that
the reinforcer of EAB is the CV of control theory

I thought you might find it VERY unpleasant to agree that a
reinforcer is a CV. I'm sorry you came to what I consider to
be a ludicrous conclusion but I am glad that you have finally
managed to reduce an error signal for yourself.

Richard Marken _should_ be happy about this change of heart

I'm not happy at all. But it certainly confirms my belief that
you are completely hopeless. Just do what you will do; publish
what you want about PCT; say what you want; you're not interested
in what I have to say so I'm through with you; wish I could have
said it was nice knowing you.

Best

Rick

···

--
Richard S. Marken Phone or Fax: 310 474-0313
Life Learning Associates e-mail: rmarken@earthlink.net
http://home.earthlink.net/~rmarken/

[From Bruce Abbott (970908.2300 EST)]

Rick Marken (970908.2000) --

Rick Marken (970908.1845)

I thought you might find it VERY unpleasant to agree that a
reinforcer is a CV. I'm sorry you came to what I consider to
be a ludicrous conclusion but I am glad that you have finally
managed to reduce an error signal for yourself.

The only error I was experiencing was an error in logic. I was willing to
agree that a reinforcer was a CV as long as that _seemed_ the logical
conclusion, and in fact it would be no great problem for me if it were true.
But it just doesn't make sense that it would be the CV. Sorry.

I should add that this is a "ludicrous conclusion" because the
versions of The Test already performed in EAB reveal that food
pellet arrival rate (or some very similar variable) is controlled.
You can't use reason to decide whether or not reinforcers are a
controlled variable. You have to Test it. And the Test shows
that reinforcers are a CV -- like it or not.

(a) A food pellet is not a variable. Therefore it cannot be a controlled
    variable.

(b) Rate of food pellet delivery is a variable, but rate of food delivery
    is not the same as a food pellet.

(c) Rate of food pellet delivery is not a controlled variable; the
    appearance of being controlled turned out to be an artifact, as
    Bill Powers will confirm.

Regards,

Bruce

[From Rick Marken (970908.2000)]

One last point. I said.

I thought you might find it VERY unpleasant to agree that a
reinforcer is a CV. I'm sorry you came to what I consider to
be a ludicrous conclusion but I am glad that you have finally
managed to reduce an error signal for yourself.

I should add that this is a "ludicrous conclusion" because the
versions of The Test already performed in EAB reveal that food
pellet arrival rate (or some very similar variable) is controlled.
You can't use reason to decide whether or not reinforcers are a
controlled variable. You have to Test it. And the Test shows
that reinforcers are a CV -- like it or not.

Best

Rick

···

--

Richard S. Marken Phone or Fax: 310 474-0313
Life Learning Associates e-mail: rmarken@earthlink.net
http://home.earthlink.net/~rmarken/

[From Bruce gregory 970909.1020 EDT)]

Rick Marken (970908.2000)

I should add that this is a "ludicrous conclusion" because the
versions of The Test already performed in EAB reveal that food
pellet arrival rate (or some very similar variable) is controlled.
You can't use reason to decide whether or not reinforcers are a
controlled variable. You have to Test it. And the Test shows
that reinforcers are a CV -- like it or not.

I'm puzzled. Bruce A. has stated that a "reinforcer" is a
component of the external portion of a control loop. As he says,
a food pellet is not a controlled variable. A food pellet can,
however, serve as part of the loop by which an organism controls
a perception. Where is this argument flawed?

Bruce

[From Bill Powers (970909.0847 MDT)]

Bruce Abbott (970908.2000 EST)--

The operant produces the pellet, so it would _seem_ that the pellet is

what >is being controlled.

That is not how we detect control. The effect of the operant on the pellet
delivery rate is certainly necessary, but it's not sufficient. To test for
control we would have to disturb the pellet delivery rate and observe an
opposing change in behavior rate, so the net delivery rate changes less
than what we would predict under the hypothesis of no control. If we could
find a situation in which behavior rate is actually variable, I predict
that this is what we would observe. So far in our personal investigations
of these phenomena, we have not found a situation in which behavior rate is
a variable, or in which it is affected by reinforcement rate, however
defined. So we have no data, yet, on which to base ANY conclusions. We're
talking thought-experiments here.

But on closer analysis, I see that this is not true; what is being
controlled is whether or not the pellet is available for
consumption. If the reference of this system is to have the pellet in the
food cup, then delivering the pellet reduces the error between this
reference and the current state of pellet not-present in the food cup. The
situation is a bit different than in the case of the pellet's effect on
nutrient concentration (once the pellet has been swallowed) because the
delivery of the pellet eliminates the error in the pellet availability
system at once, whereas consumption of the pellet has only a small effect on
the error in nutrient level and usually will not cancel it. In both,
however, the function of the pellet is that it serves to reduce error
between the CV and its reference.

So the pellet is not a CV, but it does qualify as a reinforcer under my
control-system definition of same.

What you're saying is that if you can show that nutrient level is a CV,
then you have proven that pellet delivery (rate) is NOT a controlled
variable. In a one-level model this would be true, but not in a
hierarchical model. In a one-level model, the output of the nutrient-level
control system would be a _behavior_. But in a multiple-level model, it
would be a reference signal not for a behavior, but for the state of a
lower-level input variable. Only the lowest level of control system would
actually produce a behavior affecting the outside world.

One method for detecting multiple levels of control is to demonstrate that
a lower-level system acts to correct its error before a higher-level system
can alter its reference signal. If we apply disturbances to the
pellet-delivery rate, we will see immediate changes in behavior rate:
"bursting" is an example. This occurs before there can be any appreciable
change in the nutrient level. So there IS control of pellet delivery rate
in such cases, and pellet delivery rate IS a CV.

I had asserted previously that they are not equivalent tests, although they
are closely related. I now reassert that claim. Identifying the reinforcer
suggests what the CV may be (something delivery of the reinforcer affects in
a particular direction); identifying what the CV is tells one what events
will serve as reinforcers: those that reduce error between that CV and its
reference.

What we need now is some actual data showing that behavior is a function of
ANYTHING.

Best,

Bill P.

[From Bill Powers (970909.0911 MDT)]

Bruce Abbott (970908.2300 EST)--

(c) Rate of food pellet delivery is not a controlled variable; the
   appearance of being controlled turned out to be an artifact, as
   Bill Powers will confirm.

In those cases, so was the appearance that behavior rate was related to
reinforcement rate an artifact. The experiments, when re-analyzed, did not
show the effects their authors claimed, or that we expected to see. In the
Staddon cyclical-ratio experiments, for example, the behavior rate was NOT
a function of the changes in reinforcement rate brought on by changes in
the ratio requirement, contrary to Staddon's published conclusions. So this
sort of experimental "data" is not an appropriate vehicle for comparing PCT
to reinforcement theory, or even for showing that reinforcement exists.

Best,

Bill P.

[From Rick Marken (970909.0850)]

Bruce Gregory 970909.1020 EDT) --

I'm puzzled. Bruce A. has stated that a "reinforcer" is a
component of the external portion of a control loop. As he says,
a food pellet is not a controlled variable. A food pellet can,
however, serve as part of the loop by which an organism controls
a perception. Where is this argument flawed?

I am getting out of this conversation because I've found some
far more useful and satisfying ways to spend my time.

Arguments about controlled variables are not interesting to me
any more. What I want to see are Tests to determine what variables
are controlled. Such Tests have not yet been done systematically
because conventional psychologists have no idea what controlled
variables are. Bruce A. has argued that existing psychological
data allow us to test control theory. I have argued that this is
not true because there have been no systematic Tests for controlled
variables. Bruce A. has countered that this doesn't matter; that
there is existing data than can be used to Test for controlled
variables. I finally agreed that there is some data that suggests
that variables like pellet delivery rate are controlled; schedule
data and non-contingent pellet delivery data, for example. But
these data are limited -- and, indeed, Bruce A. showed that the
schedule data is probably useless due to an artifact. So there
really is no existing data that clearly shows what variables an
animal controls in an operant situation -- which is what I had
claimed in the first place. But, since I was willing to try to
see _some_ of this data as a Test for the controlled variable,
I am told that I am being duplicitous because the data don't
warrent this conclusion.

While I stuck to my guns and maintained that all existing conventional
data was useless for testing control theory, I was deemed an extremist
and ignored. When I tried to moderate my position and accept some
conventional data as evidence that pellet rate is a controlled
variable, I was deemed duplicitous because the data (as per my
"extremist" position) don't warrant that conclusion. I think
we'd all be a lot happier if I just stay out of these lofty
discussions.

Best

Rick

···

--
Richard S. Marken Phone or Fax: 310 474-0313
Life Learning Associates e-mail: rmarken@earthlink.net
http://home.earthlink.net/~rmarken

[From Bruce Abbott (970909.1315 EST)]

Bill Powers (970909.0911 MDT) --

Bruce Abbott (970908.2300 EST)

(c) Rate of food pellet delivery is not a controlled variable; the
   appearance of being controlled turned out to be an artifact, as
   Bill Powers will confirm.

In those cases, so was the appearance that behavior rate was related to
reinforcement rate an artifact. The experiments, when re-analyzed, did not
show the effects their authors claimed, or that we expected to see. In the
Staddon cyclical-ratio experiments, for example, the behavior rate was NOT
a function of the changes in reinforcement rate brought on by changes in
the ratio requirement, contrary to Staddon's published conclusions. So this
sort of experimental "data" is not an appropriate vehicle for comparing PCT
to reinforcement theory, or even for showing that reinforcement exists.

The same experiment showed that the rate at which the rats responded on the
lever was low during initial training (before they had learned the
relationship between lever pressing and pellet delivery) and high
thereafter. Although we didn't do it, we easily could have shown that the
observed increase depended materially on the fact that pressing the lever
causes food pellet delivery. An increase in response rate that results from
making an event contingent on the response is called reinforcement; we
certainly observed that result.

I suggest that the reason why the rate of pellet delivery turned out not
vary across the different values of the ratio schedule we tested is that the
rats were not controlling for rate of pellet delivery, but rather for access
to food pellets in the cup as soon as the rat desired another pellet for
consumption. Because completing the ratio imposed a delay (owing to the
fact that lever presses require time to execute), the best that the rat
could do was respond as rapidly as possible at all ratio values.
Reinforcement theory would account for the same constancy in rate by noting
that, on ratio schedules, higher rates of responding yield shorter times to
reinforcement, which in turn would be expected to generate yet higher rates
of responding. This is a positive feedback loop that pushes response rate
up to the maximum, or at least to an equilibrium value established by the
conflicting effects of reinforcement and response effort. (Effort increases
with rate; a response rate is reached where the decrease in delay to
reinforcement is balanced by the increase in effort.

Both explanations are proposals that must be tested in future experiments.

Bill Powers (970909.0847 MDT) --

Bruce Abbott (970908.2000 EST)

The operant produces the pellet, so it would _seem_ that the pellet is what
is being controlled.

That is not how we detect control. The effect of the operant on the pellet
delivery rate is certainly necessary, but it's not sufficient.

I should have phrased my sentence more carefully: "The operant produces the
pellet, so it would _seem_ that what is being controlled is the pellet." I
did not mean that the mere fact that the operant produces the pellet is
sufficient to demonstrate that the pellet is controlled. I meant that,
given that something is being controlled by the operant, it would be easy to
conclude (mistakenly) that what is being controlled is the pellet.

Regards,

Bruce

[From Bruce Abbott (970909.1925 EST)]

Bill Powers (970909.0847 MDT) --

Bruce Abbott (970908.2000 EST)

But on closer analysis, I see that this is not true; what is being
controlled is whether or not the pellet is available for
consumption. If the reference of this system is to have the pellet in the
food cup, then delivering the pellet reduces the error between this
reference and the current state of pellet not-present in the food cup. The
situation is a bit different than in the case of the pellet's effect on
nutrient concentration (once the pellet has been swallowed) because the
delivery of the pellet eliminates the error in the pellet availability
system at once, whereas consumption of the pellet has only a small effect on
the error in nutrient level and usually will not cancel it. In both,
however, the function of the pellet is that it serves to reduce error
between the CV and its reference.

So the pellet is not a CV, but it does qualify as a reinforcer under my
control-system definition of same.

What you're saying is that if you can show that nutrient level is a CV,
then you have proven that pellet delivery (rate) is NOT a controlled
variable.

No. I said that, in the system controlling the state of pellet access,
delivery of the pellet is the event that changes the CV from
pellet-inaccessible to pellet-accessible. If the reference state is
pellet-accessible and the current state is pellet-inaccessible, then
delivery of the pellet reduces (in this case to zero) the error between the
CV and reference to zero. Under my definition of "reinforcer," (something
that reduces error in a control system), delivery of the pellet qualifies as
the reinforcer under the conditions just indicated.

This has nothing to do with showing that nutrient level is a CV. But if
nutrient level _is_ a CV, then the pellet would serve as a reinforcer in
that system, too, when nutrient level is below its reference level. This is
because the effect of the pellet (when consumed) is to reduce this error.

Please not also that, whereas I keep referring to the effect of the delivery
of a pellet (a constant) on the CV, you keep switching to pellet delivery
_rate_, which is a variable, and possibly a controlled variable. We aren't
going to be communicating very well if we are talking about two different
things, so let's try to keep focused on the effect of delivery (or
consumption) of a _single_ pellet on the particular CV involved.

Regards,

Bruce

[From Bill Powers (970910.0138 MDT)]

Bruce Abbott (970909.1315 EST)--

In those cases, so was the appearance that behavior rate was related to
reinforcement rate an artifact.

The same experiment showed that the rate at which the rats responded on the
lever was low during initial training (before they had learned the
relationship between lever pressing and pellet delivery) and high
thereafter. Although we didn't do it, we easily could have shown that the
observed increase depended materially on the fact that pressing the lever
causes food pellet delivery. An increase in response rate that results from
making an event contingent on the response is called reinforcement; we
certainly observed that result.

Allow me to offer a different description.

Before the rats were able efficiently to generate the action would produce
the food, they produced the relevant action only in passing. Then, as their
behavior patterns changed, their actions came to produce the food more and
more often, until the actions were producing food at a limiting rate. From
then on, the rate of food delivery was maintained by the maximum rate of
action that the rats could sustain.

When much greater amounts of food are produced by a given behavior rate,
the rats do not change their behavior rate appreciably. They behave in a
stereotyped way, producing and eating food as quickly as possible until a
certain amount of food has been ingested, producing shorter and shorter
bursts of behavior as the normal meal size is approached and then doing
other things for a hour or more. Then another session of eating and
ingesting takes place (if possible) until a similar meal size is generated,
and so on during a normal day, maintaining the amount of daily intake that
the rats require, or an individual rat requires.

What we observe is the behavior maintaining the food delivery rate to the
extent physically possible, in the dictionary sense of the transitive verb,
"to maintain." The subject is the rat, the object is the rate of food
delivery, and the means is the production of behavior in such a way as to
produce the food.

The role of the experimenter is limited to setting up a means by which the
rat's behavior, should the particular act occur, can generate food which
the rat can eat. Since the experimenter is the more powerful of the two
participants in the situation, the experimenter can establish any relation
between any action and the appearance of food, and prevent any other action
from producing food. The rat's requirements for food determine that the rat
will search for and find the key behavior or something sufficiently like it
to generate food, provided that the requirement is not beyond the rat's
ability to discover it, and that the action needed is not beyond the rat's
ability to sustain it.

If the experimenter changes the contingency, the expected effect with no
change in the rat's pattern of behavior would be an increase or a decrease
in the rate of food delivery. If physically possible, the rat's pattern of
behavior always changes so as to restore the rate of food delivery to its
original level, or as close to it as possible. The only exception is when
the rat is already producing as much behavior of the right kind as it can,
and the change of contingency reduces the rate of food delivery. Then we
observe no counteracting change in the rat's behavior, because no more
behavior can be generated.

What we have here is an animal presented with a problem which it proceeds
to solve to the best of its ability, unaided except for any hints provided
by the experimenter by way of shaping. The experimenter sets the problem by
creating a contingency. The rat solves it by varying its patterns of action
until it is producing as near to the amount of food it needs as it can
under the given circumstances --neither more nor less.

This, I submit, is a fairly complete and accurate account of what is
observed, at least as complete and accurate as any behaviorist account of
similar length and detail. All references to agency and direction of
causation are expressed correctly in terms of conventional interpretations
and direct observations; the behavior is produced by the rat, the
requirements for food are the rat's, and the rate of food delivery is
completely dependent on the rat's actions. The experimenter's action
consists entirely of setting up the passive environmental link between some
action and food delivery rate, and this link runs in one direction only,
from action to food delivery.

···

----------------------------------

I suggest that the reason why the rate of pellet delivery turned out not
vary across the different values of the ratio schedule we tested is that the
rats were not controlling for rate of pellet delivery, but rather for access
to food pellets in the cup as soon as the rat desired another pellet for
consumption. Because completing the ratio imposed a delay (owing to the
fact that lever presses require time to execute), the best that the rat
could do was respond as rapidly as possible at all ratio values.

This implies that under other conditions, the rats would press at lower
than maximum rates. I say this is false: the rats are incapable of
producing a systematically variable rhythm of lever pressing. All they can
do is produce a rapid repetitive action on the lever, or cease pressing
altogether. They are either producing this stereotyped action, or they are
doing something else. The only reason that "rate of pressing" caught on as
a measure of behavior was that the measure was obtained by dividing total
presses by session duration. As a result, the rat's changing the _kind_ or
_location_ of behavior was indistinguishable from its changing the _rate_
of behavior of a single kind at a single location. In fact, the latter,
which is assumed to occur, does not occur. So it does not need an explanation.

Reinforcement theory would account for the same constancy in rate by noting
that, on ratio schedules, higher rates of responding yield shorter times to
reinforcement, which in turn would be expected to generate yet higher rates
of responding. This is a positive feedback loop that pushes response rate
up to the maximum, or at least to an equilibrium value established by the
conflicting effects of reinforcement and response effort. (Effort increases
with rate; a response rate is reached where the decrease in delay to
reinforcement is balanced by the increase in effort.

There is no need for this explanation, because rates of responding do not
actually change. If they did change, the explanation might be appropriate,
but as they do not, the explanation is empty of meaning. It explains
something that does not happen. I should also point out that whatever we
observe about behavior, it is not "responses." To observe a response one
must also observe the stimulus; otherwise all that one is observing is an
action.
-------------------------------------------------------
What I am saying here, Bruce, is that the behaviorist account, which you
present as a simple factual description of observations, is nothing of the
sort. It is a biased account slanted toward encouraging the listener to
conclude that the environment is controlling behavior -- that behavior is
controlled by its consequences, which is exactly the opposite of the truth.

In my alternative account above, there is no way to conclude that the
environment is in control. From this account, it is obvious that the rat is
an active agent working through a passive environment. It is clear that the
delivery of food pellets is the rat's doing, not the environment's, and
that the pellets are doing nothing to the rat's behavior; they are simply
being eaten.

All the strange, contorted, backward reasoning in behaviorist accounts, all
the special terminology, all the special definitions of auxiliary terms
that ignore the primary usages of the words, all the insistence on a
particular set of terms in which to express observations -- all this points
in only one direction. It points to a concerted attempt to put the
environment into the causal role and remove purposiveness from organisms.

I don't see how you can deny this: this has been the avowed purpose of
behaviorists like Skinner and his followers from the very beginning.
Skinner said you must always chooose the way of speaking that attributes
the initiation of behavior to the environment; that science itself demands
this overt bias; that nobody who speaks otherwise can really be a
scientist. And his words have been echoed again and again; they are part of
the behaviorist credo. The term "radical behaviorism" is well chosen; this
movement is a extremist one based not on science but on ideology.
----------------------------------
One last point. Even though I am a control theorist and think that PCT is
basically correct, and wish to persuade others to my point of view, I can
still describe the facts on which PCT is based without using a single
special term. I can do it without mentioning control, controlled variables,
reference levels or signals, errors or error signals, disturbances (in any
special sense), input functions, comparator functions, or output functions.
I can describe these phenomena in such a way that nobody who doesn't know
me (or control theory) would ever guess what explanation I might offer.

Can you do the same for EAB? I very much doubt it. The language of EAB uses
special terms and auxiliary words each one of which is carefully defined
(mostly in unusual ways) to support the theoretical position that is
asserted with every breath. Theory and observation are so intertwined that
there is no way to separate them; take away the special terms, and the
observations can't even be described. If you're not allowed to use the word
"reinforcement," what do you say instead? Any other terms you might use
would lay out for all to see what the theoretical bias is. Try it and see.

Best,

Bill Powers

[From Bruce Abbott (970911.1005 EST)]

Here is the promised theory-free description (to the extent that any
description can be theory-free, and to the extent that I have succeeded) of
our rat's behavior in the operant chamber.

An adult laboratory rat has been placed on a restricted diet until its
weight has fallen to 80% of its free-feeding weight. It is then placed into
an operant chamber, a small box about 25 cm on a side. The front wall is
made of clear acrylic plastic and serves as the door to the chamber; the
remaining walls are made of sheet aluminum. The floor consists of a set of
parallel stainless steel rods spaced about 1 cm apart, which provide support
for the rat but allow feces and urine to pass through to a tray below.
Protruding from one wall of the chamber, about 3 cm above the floor, is a
pivoted metal bar or "lever." The lever can be depressed downward about 1
cm, thus closing an electrical switch. The spring of the switch and a
counterweight provide a force that restores the lever to the raised position
when the lever is released. Along the same wall there protrudes a small
recepticle, the "food cup," into which can be delivered a small round 45 mg
food pellet. A feeder device located behind the wall delivers one pellet to
the cup each time it is sent an electrical impulse. A computer interface
informs an IBM PC clone computer of the status of the switch on the lever
and allows the computer to send the required electrical impulse to the
feeder when required. At this time the computer's program is recording
switch-closures from the lever but has arranged no contingency between these
switch-closures and feeder operation.

On being placed in the operant chamber, the rat immediately begins to move
about, approaching various parts of the chamber, sniffing here and there,
rearing up on its hind feet and pressing its nose into the upper corners of
the chamber, touching the walls with its front paws, approaching and
sniffing at the lever, the food cup, the floor where the rods are anchored
to the walls. In the course of all this activity, the rat's body
occasionally contacts the lever, sometimes with enough force to depress the
lever to the point of switch-closure. The part of the body involved varies
(although most often one or both paws are involved), and the movements being
executed at the time also vary, depending on that the animal was doing just
prior (e.g., sniffing at the upper portion of the wall directly above the
lever). The record of switch-closures shows that these are occurring at low
frequency and with no apparent system (i.e., more or less at random). This
record is saved to provide a baseline of switch-closure activity against
which to compare the same activity in the next phase.

Now the computer program establishes a contingency between switch-closure
and feeder operation, such that each switch-closure delivers one food pellet
into the food cup. From the rat's point of view, nothing has changed. The
rat continues to move about the chamber as before, engaging in similar
activities, and eventually trips the switch on the lever. The feeder clicks
and a food pellet rolls into the food cup. The rat jerks its body slightly
at the sound of the feeder and hesitates, then resumes its activity,
releasing the lever in the process. After a short while, it passes the food
cup, turns toward it, sticks its snout into the food cup, retrieves the
pellet, changes to a seated position while raising its front paws off the
floor, transfers the pellet from its mouth to its paws, takes a bite out of
the pellet, then consumes the remainder. Its paws now free, it stands on
all fours again, places its snout into the food cup and, working its snout
at various angles, licks the inner surface of the food cup. When it is
unable to retrieve any more food powder in this way, the rat resumes its
former activity, but now it is observed to spend more of its time near the
food cup, occasionally sticking its snout into the cup and sniffing or
licking it.

With the rat's activities now taking place more often in the vicinity of the
food cup, the number of times that contact is made with the lever increases;
consequently there is a small increase in the frequency of switch-closures.
With further pairings of lever-contact and pellet delivery, the rat now is
observed to spend more of its time near the lever (when it is not retrieving
food from the cup). A marked tendency emerges for the rat to repeat the
activity it was performing at the time the feeder clicked. If the rat had
been touching the lever with its left paw, it tends to approach the lever
and touch it with its left paw again. However, if in so doing it does not
exert sufficient downward force to trip the switch, the rat may soon abandon
this activity. If the activity succeeds, it continues to be repeated,
particularly those elements of it that are consistently followed by feeder
operation. These common elements are retained. With continued successful
repetitions, behavior converges on a relatively efficient set of movements
with respect to the lever, and the rate at which the animal presses the
lever, collects and consumes the food pellet, and returns to the lever again
increases dramatically. The behavior involved in pressing the lever may
continue to vary somewhat, but after a time the whole performance takes on a
stereotyped appearance. Lever-pressing (and food consumption) are now
taking place at a high rate.

After quite a number of food pellets have been consumed, the chain of
activity involved in obtaining and consuming the pellets changes. Rather
than consuming the pellets almost whole and returning to the lever while
still chewing on the pellet, the rat now transfers the pellet to its
forepaws and nibbles at it. This consumes more time; the rate at which the
loop of behavior repeats declines to about 60% of its former value. With
further consumption of pellets, the rat now begins to take short breaks from
the cycle of lever-pressing and food consumption immediately after consuming
the pellet. It may groom its face and body or wander about sniffing at
various portions of the chamber before returning to the lever. With yet
more pellets consumed, these breaks become more frequent and, on average,
longer. The rate of recorded switch-closures becomes more irratic and, on
average, lower. Eventually, having consumed perhaps 300 pellets, the rat
may curl up in a corner of the chamber, close its eyes, and take a nap.

Comments?

Regards,

Bruce

[From Rick Marken (970911.0910)]

Bruce Abbott (970911.1005 EST) --

Here is the promised theory-free description

...

Comments?

Excellent!

The only nit I noticed is where you say:

If the activity succeeds...

The term "succeeds" suggests that the rat had the purpose
of producing a particular result (the pellet); this is a
description that implies a particular theory: PCT. So it
would probably be better to say "if the activity causes
a pellet to fall into the cup...".

Nice job.

Best

Rick

···

--
Richard S. Marken Phone or Fax: 310 474-0313
Life Learning Associates e-mail: rmarken@earthlink.net
http://home.earthlink.net/~rmarken

[From Bruce Abbott (970911.1225 EST)]

Bruce Gregory (970911.1025 EDT) --

Bruce Abbott (970910.2045 EST)

I have explained in what sense the "reinforcer" can be said to strengthen an
operant on which it is contingent: When the experimenter arranges a
contingency between the operant and delivery of the food pellet, the rate at
which the operant is observed to occur rises above its former level. If an
operant can be said to be "stronger" when it occurs with greater frequency,
then what has been observed is a strengthening of the operant that is
(indirectly) due to the fact that the operant now produces the pellet. A
synonym for "strengthen" is "reinforce" -- in the ordinary meaning of the
term "reinforce."

I suspect that the problem arises because of the words "due to"
which imply that a component of the control loop causes a
behavior. Strictly speaking, what is observed is that the
behavior is strengthened, i.e., occurs more frequently. We
observe no "due to". Connecting a furnace to a thermostat makes
it possible for the thermostat to control its perception of the
temperature. The establishment of the connection, however, does
not make the furnace "cause" the thermostat to turn the
switch on and off.

I agree. But remember, this is a descriptive explanation, not an
explanation in terms of mechanism. Descriptive explanations are appropriate
when the mechanism is unknown; they are not intended as competitors of or
replacements for theories of mechanism. Descriptively, one can show that
the crucial element in "strengthening" the behavior is the contingent
delivery of the reinforcer, given that the reinforcer will function as such
under the given conditions. Eliminate the contingency and the behavior
reverts to its former low rate. Keep the contingency but replace the food
pellet with a mild beep from a speaker, and the behavior reverts to its
former low rate. The continued elevation of the rate of the behavior when
the behavior produces the reinforcer is "due to" the production of the
reinforcer in the same sense that the light being on in my office is "due
to" my having flipped the light-switch to the on position. In neither
statement is there any assertion of mechanism.

What is asserted instead is (a) there is an observed relation between A and
B, such that B is in state B1 when A is in state A1 and B is in state B2
when A is in state A2, and (b), given this relationship, the reason that we
observe B to be in state B1 (or B2) is that A is in state A1 (or A2). The
mechanism through which the observed relation between A and B emerges is not
described, because it is unknown.

It is the job of a theory of mechanism to account for the observed relation.
Knowing the mechanism is far better than relying on a purely descriptive
account, but a descriptive account serves to organize the observations until
the mechanism is discovered. Descriptive explanations thus represent an
intermediate stage in the development of a scientific account. They are
less powerful than a theory of mechanism, but they are no less scientific in
that they are grounded on systematic observation and the experimental
testing of alternative hypotheses.

B.F. Skinner developed his descriptive approach to behavior at a time when
experimental psychologists were proposing all sorts of fanciful mechanisms
which could neither be confirmed nor denied given the available anatomical
and physiological knowledge. The only evidence for the proposed mechanisms
was that they yielded the functional relations they had been invented to
account for. Skinner saw (correctly, I think) that this sort of
unrestrained speculation about mechanism was leading nowhere; his antidote
was to opt instead for a systematic program of experimental manipulation
that would establish the existence of systematic empirical relations among
the relevant observable variables.

Where this program went wrong is that its practitioners seem to have
forgotten that the ultimate goal of science is the discovery of the
underlying mechanisms responsible for those observed relations. Skinner
himself fostered this attitude by claiming that the discovery of such
mechanisms was to be a job left to the physiologist. However, the knowledge
base and procedures needed to begin exploring the physical substrate of
behavior have improved greatly since Skinner formulated his program, making
theories of mechanism much more testable than they formerly were.
Furthermore, as PCT demonstrates, it is possible to test for the presence of
certain mechanisms by simulation -- comparing the behavior of a generative
model with respect to certain variables against observation. It is a
mistake to insist on merely descriptive explanations when the discovery of
mechanism is within reach.

Regards,

Bruce

[From Bill Powers (970911.1521 MDT)]

Bruce Abbott (970911.1005 EST)]

Here is the promised theory-free description (to the extent that any
description can be theory-free, and to the extent that I have succeeded) of
our rat's behavior in the operant chamber.

Very good! I could quibble about a word here or there, but no need.

What you describe really makes me think of reorganization, or something
between a systematic search strategy and a random one (don't ask me what
that would be). Of course this is the appropriate aspect of PCT to be
comparing with the concept of reinforcement, a thought that hadn't hit me
before. It would be interesting to hae a detailed study of the rat's
behavior, via videos, prior to enabling the contingency. Does the rat
follow any fixed sequence of behaviors, or establish a path around the
cage, or are the various elements juggled around more or less at random?

The most useful aspect of this description is that I think we can use it to
track down the point where PCT (or actually reorganization theory) diverges
from reinforcement theory. My hunch about the difference between behaving
in different ways (and locations) vs changing the amount of behavior of a
given kind at a specific location is borne out by your description.

When it is
unable to retrieve any more food powder in this way, the rat resumes its
former activity, but now it is observed to spend more of its time near the
food cup, occasionally sticking its snout into the cup and sniffing or
licking it.

This is one way of looking at it, which if continued will lead to the
impression that the appearance of the food has _strengthened_ the tendency
to stay near the food cup. But let's introduce the other viewpoint right
away: the idea that prior to the first appearance of the food, the rat has
been searching/reorganizing, and that when the food appears, the rate of
switching from one behavior to another is depressed -- apparently, sharply
depressed upon the appearance of even a small amount of food. The rat stays
nearer to the cup not because that tendency has been increased, but because
the tendency to switch to other behaviors has been decreased. This is just
looking at the other side of the same coin: giving a different description
of the same phenomenon. We see the rat remaining near the lever because
that is what it was doing when the behavior-switching behavior became much
less marked. It stays near the lever, as it were, by default, simply
because it has ceased ranging around the cage so widely.

This part of the post concerns what I have called Phase 1, and brings us to
the point where we make a transition into Phase 2.

A marked tendency emerges for the rat to repeat the
activity it was performing at the time the feeder clicked. If the rat had
been touching the lever with its left paw, it tends to approach the lever
and touch it with its left paw again. However, if in so doing it does not
exert sufficient downward force to trip the switch, the rat may soon abandon
this activity.

Again, this way of describing it encourages the conclusion that the food
reward has strengthened the tendency to perform the activity that was going
on when the feeder clicked. The other viewpoint says that the tendency to
switch to a _different_ behavior is markedly reduced, which means of course
that the behavior that was going on simply continues. If food is not
forthcoming immediately, the switching to other behaviors picks up again.

At this point we see the rat depressing the bar more often, but it is still
performing unnecessary and ineffective actions, with variations still
visible but now on a smaller scale.

With continued successful
repetitions, behavior converges on a relatively efficient set of movements
with respect to the lever, and the rate at which the animal presses the
lever, collects and consumes the food pellet, and returns to the lever again
increases dramatically. The behavior involved in pressing the lever may
continue to vary somewhat, but after a time the whole performance takes on a
stereotyped appearance.

This "convergence to a relatively efficient set of movements" can again be
seen as the strengthening of the tendency to perform the more effective
movements, leading to the concept of reinforcement. However, it can also be
seen as slowing the rate at which behavior switches among the now-smaller
set of variations, dwelling longer on behaviors that produce more food as a
result of switching less often (or over an ever-decreasing range of
alternatives?). The end result is to leave, by default, just the one
behavior that is most effective, when food is being delivered regularly. If
the swtiching ceases entirely, that is the behavior that will be left.

This takes us through what I have called Phase 2. The remainder of the post
describes Phase 3.

If we think of a range of behaviors and plot the food intake against type
or location of behavior, we will get some kind of distribution curve.
Behaviors that take place near the bar will produce more food, on the
average, than behaviors that take the rat away from the bar part of the
time. In terms of probability distributions, the changes we see increase
the probability of behaviors near the maximum of the distribution, and
therefore decrease the probability of those farther out on the flanks of
the curve. We can also say, however, that effects that reduce the
probability of switching to behaviors farther from the central maximum
necessarily increase the probabilities of behaviors near the center, for
the same reason: the total probability has to add up to one.

Reinforcement theory is based on the interpretation that says the
appearance of food increases the probability of behaviors near the maximum
of the curve. Reorganization theory says that appearance of food decreases
the probability that an effective behavior will change to a behavior
farther from the central maximum. In either case the net effect is the
same: a relative increase in the probability of effective behaviors.

So whether we pay attention to the production of effective behaviors or to
the elimination of switches to less effective behaviors, the observed
result remains the same. This would appear to be the bifurcation separating
reinforcement theory from reorganization theory, and thus the causal
picture of behaviorism and the control-theoretic picture of PCT.

Best,

Bill P.

[From Bruce Abbott (970912.1220 EST)]

Bill Powers (970911.1521 MDT) --

Bruce Abbott (970911.1005 EST)

Here is the promised theory-free description (to the extent that any
description can be theory-free, and to the extent that I have succeeded) of
our rat's behavior in the operant chamber.

Very good! I could quibble about a word here or there, but no need.

What you describe really makes me think of reorganization, or something
between a systematic search strategy and a random one (don't ask me what
that would be). Of course this is the appropriate aspect of PCT to be
comparing with the concept of reinforcement, a thought that hadn't hit me
before.

Yes! In another recent post I alluded to an "alternative proposal" for
reorganization (instead of the e-coli principle). Basically, it holds that
whatever behavior is on-going at the time the pellet is delivered tends to
be repeated. The simple criterion for repetition is success in obtaining
what the rat is trying to get. That idea needs considerable elaboration (I
won't do that now), but that is the core notion. This is somewhat different
from the notion that the output has been selected at random and that success
(defined in reorganization theory as reduction of error in an "intrinsic"
variable) yeilds retention of this selection by halting reorganization.

It would be interesting to hae a detailed study of the rat's
behavior, via videos, prior to enabling the contingency. Does the rat
follow any fixed sequence of behaviors, or establish a path around the
cage, or are the various elements juggled around more or less at random?

I have never noticed any particular pattern to this activity, although the
range of things the rat is observed to be doing is rather limited, probably
because the environmental support for many of the rat's species-typical
behaviors is lacking in the operant chamber (e.g., burrowing, nesting,
mating). What we see seems to fall mostly into the category of
exploration/investigation. These activities tend to diminish with time in
the chamber; if given enough time there the rat may even take a nap. Any
sudden changes will bring the rat to attention and further investigation of
the environment may ensue. An important event in this regard is the
discovery of food in the food cup. This event will be followed (after
ingestion of the pellet) by much activity directed at the food cup, and for
a time, at least, investigation will tend to be concentrated in the vicinity
of the food cup. There will also be a general increase in exploratory
activity, relative to what would be observed at that stage had no food been
discovered.

The most useful aspect of this description is that I think we can use it to
track down the point where PCT (or actually reorganization theory) diverges
from reinforcement theory. My hunch about the difference between behaving
in different ways (and locations) vs changing the amount of behavior of a
given kind at a specific location is borne out by your description.

When it is
unable to retrieve any more food powder in this way, the rat resumes its
former activity, but now it is observed to spend more of its time near the
food cup, occasionally sticking its snout into the cup and sniffing or
licking it.

This is one way of looking at it, which if continued will lead to the
impression that the appearance of the food has _strengthened_ the tendency
to stay near the food cup. But let's introduce the other viewpoint right
away: the idea that prior to the first appearance of the food, the rat has
been searching/reorganizing, and that when the food appears, the rate of
switching from one behavior to another is depressed -- apparently, sharply
depressed upon the appearance of even a small amount of food. The rat stays
nearer to the cup not because that tendency has been increased, but because
the tendency to switch to other behaviors has been decreased. This is just
looking at the other side of the same coin: giving a different description
of the same phenomenon. We see the rat remaining near the lever because
that is what it was doing when the behavior-switching behavior became much
less marked. It stays near the lever, as it were, by default, simply
because it has ceased ranging around the cage so widely.

Yes, I agree. Reinforcement theory holds that the linkage to output is
forged by success, whereas HPCT proposes that the linkage is randomly formed
and only retained by success (via stopping reorganization).

So whether we pay attention to the production of effective behaviors or to
the elimination of switches to less effective behaviors, the observed
result remains the same. This would appear to be the bifurcation separating
reinforcement theory from reorganization theory, and thus the causal
picture of behaviorism and the control-theoretic picture of PCT.

Good summary. (I've skipped over most of the details in your post because I
agree with your descriptions.)

Where do we go from here? And by the way, I should be getting the new panel
for the operant chamber back today from having the holes for the lights
drilled in it. The panel accepts the new Coulbourn lever and food
recepticle. When I get it assembled and wired, we'll be ready to start
taking data on VI schedules -- with videotape of the rats, as usual.

Regards,

Bruce

[From Bruce Gregory (970912.1430 EDT)]

Bruce Abbott (970912.1220 EST)

Yes! In another recent post I alluded to an "alternative proposal" for
reorganization (instead of the e-coli principle). Basically, it holds that
whatever behavior is on-going at the time the pellet is delivered tends to
be repeated. The simple criterion for repetition is success in obtaining
what the rat is trying to get. That idea needs considerable elaboration (I
won't do that now), but that is the core notion. This is somewhat different
from the notion that the output has been selected at random and that success
(defined in reorganization theory as reduction of error in an "intrinsic"
variable) yeilds retention of this selection by halting reorganization.

It would seem to me that any system with "memory" can improve
on a randomly selected output in an attempt to counter
persisting error. Only when remembered solutions fail to reduce
error, must the system resort to random outputs.

Bruce

[From Bill Powers (970912.1200 MDT)]

Bruce Abbott (970912.1220 EST)--

Yes! In another recent post I alluded to an "alternative proposal" for
reorganization (instead of the e-coli principle). Basically, it holds that
whatever behavior is on-going at the time the pellet is delivered tends to
be repeated. The simple criterion for repetition is success in obtaining
what the rat is trying to get.

As you read further into my post, I trust that that saw that this effect is
in fact achieved by reorganization, although by a different route. A
behavior can "tend to be repeated" more often simply because it doesn't
tend to be changed as soon.

I have never liked the concept of "strengthening behavior." It falls apart
when you try to apply it to any kind of model. Behavior needs to be
stronger or weaker only in the sense that it must be of the right amount
and in the right direction to have the right effect. If you're lifting a
box to put it on a shelf, learning to do this skilfully requires weakening
the behavior if you lift the box too high, and strengthing it if you don't
lift it far enough. And once you get it right, if the result were to
"strengthen" the behavior, you'd just lift it too high the next time.

The idea of "increasing the probability" of a behavior assumes that either
you have the behavior or you don't. There are very few real circumstances,
however, in which just producing the right qualitative kind of behavior
would be enough to acomplish a particular result. Unfortunately, closing a
switch contact is one of those circumstances: it doesn't matter how you get
the contact to close, and the contact can be only open or closed. What is
really a continuously variable sort of action gets converted by the
conditions in the environment into a simple binary event. This leads to
treating the whole situation as if it were a logical proposition:
contingency on, contact closure produces food. Contingency off, contact
closure produces no food. So you reach a logical conclusion: it's the
appearance of food that leads to closing the contact.

That idea needs considerable elaboration (I
won't do that now), but that is the core notion. This is somewhat different
from the notion that the output has been selected at random and that success
(defined in reorganization theory as reduction of error in an "intrinsic"
variable) yeilds retention of this selection by halting reorganization.

I hope you will go slow on this new insight. "Repeating" behavior is not
the only thing we have to account for. Behavior isn't just a chunk that
gets repeated or not repeated. It has to be adjusted _quantitatively_,
under most real circumstances.

It would be interesting to hae a detailed study of the rat's
behavior, via videos, prior to enabling the contingency. Does the rat
follow any fixed sequence of behaviors, or establish a path around the
cage, or are the various elements juggled around more or less at random?

I have never noticed any particular pattern to this activity, although the
range of things the rat is observed to be doing is rather limited, probably
because the environmental support for many of the rat's species-typical
behaviors is lacking in the operant chamber (e.g., burrowing, nesting,
mating). What we see seems to fall mostly into the category of
exploration/investigation. These activities tend to diminish with time in
the chamber; if given enough time there the rat may even take a nap. Any
sudden changes will bring the rat to attention and further investigation of
the environment may ensue. An important event in this regard is the
discovery of food in the food cup. This event will be followed (after
ingestion of the pellet) by much activity directed at the food cup, and for
a time, at least, investigation will tend to be concentrated in the vicinity
of the food cup. There will also be a general increase in exploratory
activity, relative to what would be observed at that stage had no food been
discovered.

I thought I saw, in some of our video tapes, that the rats would follow a
particular sequence of explorations, moving roughly counterclockwise around
the cage (in one case I think I remember -- I'll have to look at the tape
again). I wasn't paying very close attention -- this really should be
looked at systematically. You can get an impression of general exploratory
activity without necessarily noticing the pattern.

Reinforcement theory holds that the linkage to output is
forged by success, whereas HPCT proposes that the linkage is randomly formed
and only retained by success (via stopping reorganization).

Yes. This does not, however, _necessarily_ involve E. coli type
reorganization. There could be a perfectly systematic higher-order control
system that runs through a search sequence, slowing down when there is
success and speeding up when there is failure. There could also be a search
pattern with a radius that decreases as error decreases. That's why I asked
about patterns in exploratory behavior during Phase 1. By "search pattern"
I don't mean literally a spatial pattern, but only a succession of trials
of different lower-order behaviors (which existing control systems can
already carry out) in some sequence. As error decreases, we would see at
first a wide variety of very different behaviors, then a less wide
variety, and finally explorations among a small set of behaviors with very
similar effects.

So whether we pay attention to the production of effective behaviors or to
the elimination of switches to less effective behaviors, the observed
result remains the same. This would appear to be the bifurcation separating
reinforcement theory from reorganization theory, and thus the causal
picture of behaviorism and the control-theoretic picture of PCT.

Good summary. (I've skipped over most of the details in your post because I
agree with your descriptions.)

Where do we go from here? And by the way, I should be getting the new panel
for the operant chamber back today from having the holes for the lights
drilled in it. The panel accepts the new Coulbourn lever and food
recepticle. When I get it assembled and wired, we'll be ready to start
taking data on VI schedules -- with videotape of the rats, as usual.

This will be interesting. I do hope, however, that we will some day be able
to try experiments with continuous control. My offer of a four-channel A/D
converter board for your PC still holds, any time you want it.

Best,

Bill P.

[From Bruce Abbott (970922.2000 EST)]

Fred Nickols (970921.1325 ET) --

Rick Marken (970920.1750)

8. A reinforcer is a controlled perception.

I happen to agree with you, although I don't know that BruceA
will.

I little thought about the matter will show that this cannot be correct. A
controlled perception is a variable, like the position of a cursor. It
would not be correct to call the cursor itself the controlled variable if
what is being controlled about the cursor is its position relative to a
target. The cursor is only an object, not a variable. By the same logic,
it would not be correct to call a food pellet -- an object -- the controlled
variable when what is being controlled about the pellet is its rate of
presentation, or its current state of availability (available/not
available). To do so only invites confusion.

Perhaps what you have in mind is that the reinforcer is some particular
state of the controlled variable, e.g., cursor on-target or pellet present.
That is, the reinforcer is "what is wanted." But a particular state of a
controlled variable is not the same as a controlled variable, so in this
definition too, a reinforcer (a particular state of a CV) cannot be a
controlled variable. Dr. Marken looses either way.

So what is that for which a person or animal will respond in order to
receive? The answer is -- anthing that will reduce the gap between what it
wants and what it has. If I want to have at least $100, I will take action
to obtain any amount of money that will move me toward that state (so long
as this does not conflict with any other goals, such as not wanting to work
too hard for too little). And any amount of money received by way of that
action will reduce the discrepancy between what I have and what I want.
Thus, a reinforcer must be that which reduces error between what I have and
what I want, or in other words, between the current state of a CV and its
reference value.

Rick Marken (970921.0940) --

But I think it's important to understand
that a reinforcer is, indeed, a controlled perception. Otherwise
one might be tempted to think that behavior modification
approaches to dealing with people actually have some merit.

I don't see the connection, but at least I now understand why the good Dr.
Marken -- who apparently does see a connection between these two ideas --
wishes to deny the obvious.

Regards,

Bruce

[From Bill Powers (970923.1043 MDT)]

Bruce Abbott (970922.2000 EST)--

A controlled perception is a variable, like the position of a cursor. It
would not be correct to call the cursor itself the controlled variable if
what is being controlled about the cursor is its position relative to a
target. The cursor is only an object, not a variable. By the same logic,
it would not be correct to call a food pellet -- an object -- the
controlled variable when what is being controlled about the pellet is its
rate of presentation, or its current state of availability (available/not
available). To do so only invites confusion.

The confusion is in the idea that a food pellet is not a variable. All
perceptions are variables in particular states. What makes the food pellet
what it is is the set of states of all variable attributes associated with
it: size, color, odor, position, shape, weight, and so forth. When you have
specified all the variable attibutes and their values, you have defined the
food pellet as it is perceived. Some of these attributes can be changed by
behavior; some, perceptually, interact (the distance of the food pellet
from the rat affects the intensity of its odor, for example).

Perhaps what you have in mind is that the reinforcer is some particular
state of the controlled variable, e.g., cursor on-target or pellet
present. That is, the reinforcer is "what is wanted." But a particular
state of a controlled variable is not the same as a controlled variable,
so in this definition too, a reinforcer (a particular state of a CV)
cannot be a controlled variable. Dr. Marken loses either way.

What Rick has in mind, and what I have in mind, is that there is nothing
reinforcing about the food pellet. The so-called reinforcer is JUST some
attribute of the food pellet, such as its perceived position or taste or
smell, that is under control by the organism.

Thus, a reinforcer must be that which reduces error between what I have
and what I want, or in other words, between the current state of a CV and
its reference value.

This is not a general definition. If anything that reduces the difference
between a CV and its reference value is a reinforcer, then a change in the
reference value toward the perceived value of the CV is a reinforcer. Under
this definition, it makes no difference what reduces the error; there is
nothing in this definition to say that noncontingent food pellets would not
be reinforcing -- they would be just as effective in reducing error as
pellets produced by behavior. Obviously, the reduction in error per se
can't be the only criterion for defining a reinforcer.

The food pellets consumed change from being merely food pellets to being
reinforcers only when they are produced as a consequence of behavior. Their
error-reducing effects are not what make the difference. If we want to find
the difference, we have to try to see what is different about the situation
when the food pellets are and are not produced by behavior. That is the
only difference that makes any difference.

If you're trying to define a reinforcer in terms of PCT, you have to find
some unique role that the reinforcer plays that fits your definition under
all circumstances. It is necessary, but not sufficient, that the reinforcer
reduce error. It is necessary, but not sufficient, that the reinforcer be
produced by behavior. So is it necessary AND sufficient that the reinforcer
be produced by behavior AND reduce error? No, not yet, because there are
other conditions that have to be satisfied.

When you're speaking of Phase 1, food that appears because of behavior will
be followed by more of the behavior that produced it. We will observe a
concurrent increase in the number of behaviors of the required kind and the
number of deliveries of food pellets. So the necessary and sufficient
conditions are met and reinforcement does take place.

During phases 2 and 3, however, we find that while food produced by
behavior does reduce error, any change in the contingency that results in
_more_ food being produced leads to _less_ behavior, and vice versa. So now
the purported reinforcing effect of the food has reversed. Where in Phase 1
an increase in food production went with an increase in behavior, now an
increase in food production goes with a decrease in behavior.

Thus we have to conclude that food is reinforcing only under phase 1.

However, there are other explanations, such as that when food appears
during phase 1 it does not increase the freqauency with which more of the
same behavior will occur; it simply stops the continuing search for food in
different places.

Best,

Bill P.