The Operant

[From Bill Powers (951211.1945 MST)]

Bruce Abbott (951211.1830 EST) --

Replying to Mary, you say

     Reinforcement as defined in EAB is an empirical fact, not a theory.
     An operant is a member of a class of activities having a common
     consequence (e.g., depressing a lever to the point of switch-
     closure). When this consequence is linked to another (the later is
     made "contingent" on the former), the frequency (or some other
     aspect, depending on the contingency) of the operant is sometimes
     observed to increase as a result. If so, the contingent
     consequence of the operant has been demonstrated to reinforce the
     operant.

What you seem to be claiming is that depressing a lever to the point of
switch closure starts out as an operant -- that is, this result is
naturally created by a variety of behaviors, and while the detailed
actions may differ, the outcome remains the same, a closure of the
contacts. And THEN, when this operant is linked to a special
consequence, reinforcement is observed to occur, as an increase in the
rate of occurence of the operant act and hence the contact closures.

In order for this description to hold up as an empirical fact, you would
have to see a rat repeatedly depressing a lever by a variety of means,
like using one paw or another, or sitting on it, or nosing it. Then you
could say that you have observed an operant: a class of actions having a
common outcome. It isn't enough to say that in principle there are many
different actions that _could_ produce the same result. You have to
demonstrate that such a variety of behaviors with a common consequence
actually takes place in the behavior of a single rat, and does so prior
to any reinforcements.

I think that what is actually observed during shaping is that normal
behavior seldom produces a lever press. When shaping is done, the
reinforcer is given for any move toward the lever, until finally the rat
accidentally depresses the lever -- upon which, the reinforcer is
immediately given. So we do not first observe the operant, and then make
a reinforcement contingent on it. We do not observe the operant unless
reinforcement is contingent upon it from the start. And in fact, what
usually happens, as I understand it, is that the _first_ action that
depresses the lever is normally used from then on ("superstitious
behavior"). So an operant is not observed at all.

I know you are trying to link the operant with controlled variables. But
this way of doing it misses a principal aspect of a controlled variable:
that when something interferes with the action that normally controls
it, behavior will _change_, it will change _immediately_, and it will
change specifically in the way needed to keep the value of the
controlled variable the same. A lever depression would be a controlled
variable if we did something to interfere with depressing it by one
means, and a different means was immediately used to depress it anyway.
This fits the definition of an operant, but it also brings in the
purposiveness of behavior. The behavior changes to a new method in
precisely the way needed to achieve the same outcome as before.

As I understand the concept of the operant, this is not its intent. The
main idea is that the animal is producing a lot of effects in its
environment, and some of the different acts happen, more or less at
random, to have a similar consequence. So when the common consequence is
made to produce a reinforcer, the operant is reinforced; it is made to
be more likely to happen. However, if something prevented that
particular act from succeeding, the animal would NOT immediately switch
to another degree, direction, or kind of behavior that has the same
consequence. If that happened, we would see the operant as creating an
_intended_ consequence, a concept that has long been rejected by
behaviorists. We would see the lever depression as a controlled
variable.

The concept of the operant was an attempt to get around the appearance
of purpose in behavior. If an animal switched from one means of pressing
a lever to another means, this was not to be taken as showing that the
animal intended that the lever be depressed; that the depression of the
lever was a goal to be achieved by variable means. What Skinner proposed
was that each different means simply happened to have a similar effect,
and that the animal did not _choose_ the means _in order to_ have the
effect. This was an explicitly-stated goal of Skinner's -- to eliminate
language that implies an active role by the organism in selecting the
consequences of its own behavior.

Why would Skinner have specifically wanted to eliminate the concept of
purpose from explanations of behavior? Because he thought it was a
mystical idea, a mentalism. Because he knew of no mechanism by which
purposive behavior could be created. The whole thrust of his concept of
the operant was to do away with the idea of constant ends being achieved
by variable means. He thought up a way in which different actions (the
variable means) could just by chance have a constant end, and saw that
as the answer to James' proposal. Because he thought that this took care
of the problem, he never recognized the cases in which the "variable
means" were _systematically_ varied, in exactly the way required to
achieve the constant end. Under the concept of the operant, if one
action fails to achieve the result, a new trial-and-error period would
have to follow, in which _any_ other action would be tried, not just
actions that have the same consequence. The animal is not permitted to
know that the consequence is what needs to be repeated and to select a
new action that will have the same consequence. That selection must be
left up to the reinforcer, with the animal emitting different responses
at random, not in systematic relationship to one particular consequence.

So if as you say reinforcement is an empirical fact, it is an empirical
fact seen through a narrow-band philosophical filter. It isn't just the
apparent fact of reinforcement that's involved here; it's the
interpretation of the supporting observations that is critical. Skinner
interpreted the variations in action in just the way needed to support
the idea that the reinforcer, not the organism, was producing the
constant consequence. If in fact the organism itself is systematically
selecting whatever action is needed to press the key and get the
reinforcer, then the idea that the reinforcer is a cause rather than an
effect is simply wrong. The airtight case springs a leak.

···

-----------------------------------------------------------------------
Best,

Bill P.

[From Bruce Abbott (951212.1220 EST)]

Bill Powers (951211.1945 MST)

    Bruce Abbott (951211.1830 EST)

Replying to Mary, you say

    Reinforcement as defined in EAB is an empirical fact, not a theory.
    An operant is a member of a class of activities having a common
    consequence (e.g., depressing a lever to the point of switch-
    closure). When this consequence is linked to another (the later is
    made "contingent" on the former), the frequency (or some other
    aspect, depending on the contingency) of the operant is sometimes
    observed to increase as a result. If so, the contingent
    consequence of the operant has been demonstrated to reinforce the
    operant.

What you seem to be claiming is that depressing a lever to the point of
switch closure starts out as an operant -- that is, this result is
naturally created by a variety of behaviors, and while the detailed
actions may differ, the outcome remains the same, a closure of the
contacts. And THEN, when this operant is linked to a special
consequence, reinforcement is observed to occur, as an increase in the
rate of occurence of the operant act and hence the contact closures.

In order for this description to hold up as an empirical fact, you would
have to see a rat repeatedly depressing a lever by a variety of means,
like using one paw or another, or sitting on it, or nosing it. Then you
could say that you have observed an operant: a class of actions having a
common outcome. It isn't enough to say that in principle there are many
different actions that _could_ produce the same result. You have to
demonstrate that such a variety of behaviors with a common consequence
actually takes place in the behavior of a single rat, and does so prior
to any reinforcements.

So, to use an analogy, if I claim that a "person" is a class of objects
having certain properties in common, it is your claim that if I encounter
you, and find you to have those properties, that I cannot thereby conclude
that you are a member of the class "person." This is obviously wrong.

I think that what is actually observed during shaping is that normal
behavior seldom produces a lever press. When shaping is done, the
reinforcer is given for any move toward the lever, until finally the rat
accidentally depresses the lever -- upon which, the reinforcer is
immediately given.

In shaping, the experimenter defines a progressive series of different
operants. To get a pigeon to peck at a key, I might start by defining the
operant as pecking at _anything_ on the side of the chamber that holds the
response panel. Any peck there would be followed immediately by
presentation of grain. What will be observed (if the grain is an effective
reinforcer) is that the bird will come to spend more and more of its time
pecking at things on that side of the chamber. I now select a different
operant for reinforcement: pecking on or near the base of the response
panel. Because the bird is already pretty much restricting its pecks to
that side of the chamber, there is now an increased liklihood that this new
operant will occur in a reasonable time and can be followed by grain
presentation. Or to put it in terms of direct observation, the frequency of
pecks on or near the base of the response panel is now higher. By
reinforcing pecking on the panel-side of the chamber, I have elevated the
frequency of this second operant as well.

The process involved is a process of variation and selection. Initially,
pecks occasionally occurred, but they were "all over the place." Those that
met a positional criterion (panel-side of chamber) were selected by the
experimenter for reinforcement. This contingency resulted in moving the
center of the peck-position distribution nearer the response panel;
variations around this new center then resulted in more pecks being directed
on or near the base of the panel. Restricting the contingency to those
pecks then brings about another change in the center of the peck-location
distribution, and now many pecks are being directed onto the response panel,
with variation as to height. At this point, only pecks above a certain
height on the panel would be selected for reinforcement, and this height
criterion would be raised until the pigeon is pecking the panel at
key-level. Some of these pecks will land on the key; at this point the
criterion is again changed so that only keypecks will be followed by grain
presentation. Variation, selection, new variation, new selection -- this is
shaping. Each new criterion for reinforcement defines a new operant,
although all involve pecking at something.

So we do not first observe the operant, and then make
a reinforcement contingent on it. We do not observe the operant unless
reinforcement is contingent upon it from the start.

This is incorrect. In an experiment, we first observe _an_ operant, and
then select it for reinforcement. What qualifies as "the" operant is
determined by the experimenter, who defined the objective criteria by which
an instance of an operant will be defined and counted.

And in fact, what
usually happens, as I understand it, is that the _first_ action that
depresses the lever is normally used from then on ("superstitious
behavior").

In that case what defines the operant is the closure of the lever's
contacts. Any set of behavioral acts that have this result would qualify as
"the" operant. Because a fair number of distinctly different acts can all
achieve this end, any one of them might be reinforced in a given instance.
The particular act that is observed to increase in frequency is the act that
is followed by reinforcement, not necessarily all members of the class
selected for reinforcement. However, if there is variation in that act, all
those variations get reinforced, too. The consequence is that the behavior
used to depress the lever may "drift" into other forms over repetitions.

So an operant is not observed at all.

I think you can now see that this conclusion is based on a false premise and
is incorrect.

I know you are trying to link the operant with controlled variables. But
this way of doing it misses a principal aspect of a controlled variable:
that when something interferes with the action that normally controls
it, behavior will _change_, it will change _immediately_, and it will
change specifically in the way needed to keep the value of the
controlled variable the same. A lever depression would be a controlled
variable if we did something to interfere with depressing it by one
means, and a different means was immediately used to depress it anyway.
This fits the definition of an operant, but it also brings in the
purposiveness of behavior. The behavior changes to a new method in
precisely the way needed to achieve the same outcome as before.

Yes, but we have to be careful here. What is the controlled variable? One
thing our animal is trying to control is the availability to it of food.
Because of the conditions established by the experimenter, food is not
available unless some member of a class of actions defined by the
experimenter has been performed: one that depresses the lever to the point
of switch-closure. If some one of those acts is immediately followed by the
reinforcer, it tends to be repeated. Is the rat attempting to control the
position of the lever, or control some other perceptions in a way that
results in the repetition of a particular behavioral act? In other words,
has the rat learned that it must depress the lever, or that it must grab
this moveable thing between the teeth and shake it?

And what do we mean when we say that a given act has been "repeated"? At
some level, the activity observed will be "the same" and can be described in
phrases like "pressing the lever with the right paw." At another, more
detailed level, the specific movements comprising that activity will be seen
to vary, compensating for various exigencies along the way. If the rats is
not near the lever, it will approach it, and it will do so from different
starting positions that require it to move in different directions, using
different patterns of leg movement. It is said that "approaching the lever"
is reinforced, because the lever cannot be depressed, and thus food made
available, unless the rat is first positioned near the lever.

Phrases like "approaching the lever" and "pressing the lever" describe what
must be _done_ but do not indicate how this is to be accomplished.
Reinforcement theory states that acts described in this way are what get
selected for repetition ("reinforced") when a contingency between them and a
reinforcer is established; such acts are the operants of operant
conditioning. Control theory provides the missing mechanism and shows that
what gets "selected" is a set of perceptions to be controlled. Once the
mechanism has been constructed, it is capable of producing variations in
behavior that automatically compensate for disturbances. But first the
animal must "know what to do" to achieve that compensation. In the operant
experiment the "knowing what to do" [control-system structure] at the level
of locomotion and body positioning is already there, having been provided by
genetically-determined structure in interaction with early experience. The
"knowing what to do" at the higher level implied by phrases like "press the
lever" must be learned in the course of the experiment. When you get down
to the bottom of it, what reinforcement theory says is that determining
"what to do" involves an evolution-like process of variation in behavior and
selection of those variants which have consequences of current importance to
the animal. An interesting problem (from my point of view) is how to
construct a structual model that will behave in this way, as _described_ by
reinforcement theory, but which will establish the necessary control
structure so as to produce a disturbance-resistant observable pattern of
behavior that can be described as, e.g., "pressing the lever with the left paw."

As I understand the concept of the operant, this is not its intent. The
main idea is that the animal is producing a lot of effects in its
environment, and some of the different acts happen, more or less at
random, to have a similar consequence. So when the common consequence is
made to produce a reinforcer, the operant is reinforced; it is made to
be more likely to happen. However, if something prevented that
particular act from succeeding, the animal would NOT immediately switch
to another degree, direction, or kind of behavior that has the same
consequence. If that happened, we would see the operant as creating an
_intended_ consequence, a concept that has long been rejected by
behaviorists. We would see the lever depression as a controlled
variable.

Yes, but I think we are free to redefine the operant in light of control
theory. Reinforcement theory is a descriptive theory that relates
observable behavior to observable conditions, past and present, in the
environment of the animal, through a process of blind variation and
selection. The theory is more elaborate than we have thus far described,
but these elaborations are simply more descriptions of conditions under
which certain regularities are observed (for example, effect of "delay" of
reinforcement). These regularities can guide the construction of an
adequate structural model; any successful model must produce them. If the
molecular theory of heat did not predict Boyle's Law, that would have been
grounds for rejecting the theory.

So if as you say reinforcement is an empirical fact, it is an empirical
fact seen through a narrow-band philosophical filter. It isn't just the
apparent fact of reinforcement that's involved here; it's the
interpretation of the supporting observations that is critical.

I think you've missed something crucial here. The empirical _fact_ of
reinforcement is the set of observations I've described, which show that
certain acts, when followed by certain consequences, are likely to be
repeated. What you are calling "reinforcement" is not this empirical fact
but a "theory of reinforcement" -- an interpretation of the empirical fact
of reinforcement. As too often is the case, we are talking about different
things.

We can keep the empirical _fact_ of reinforcement (even if we discard the
name) and the descriptive model (if to do so serves a useful purpose), but
junk the interpretation in favor of a structual model based on
control-system theory.

Regards,

Bruce