Reinforcement Theory

[From Rick Marken (950623.2020)]

···

------------------------
Mark Abrams -- I am still working on the "Action Science" review -- at least,
in my heart. I don't have your phone number so, if you are still going to the
CSG meeting, please give a buzz so we can nail down the plans.
-----------------------

Bruce Abbott (950622.1740 EST) --

Rick, I don't understand your new R. Coli demo, or rather, I don't
understand why your "subject" should show the behavior you
diagrammed (950621.1410).

I don't either. I presume it is because this behavior was selected by its
consequences. But I'm having trouble seeing how this might have happened. What
do you make of it?

Me:

I think it is significant that reinforcement theorists must develop a
new version of their theory each time a new behavioral situation is
described...I have never heard reinforcement described as the change
from "no food" to"food".

Bruce Abbott (950622.1920 EST) --

What nonsense! ... What changes is not the theory but the model
which applies the theory to the situation in question.

I seem to recall that a reinforcer was defined as a consequence of responding
that increases the probability of the response that produces it. There is
nothing in that definition about the consequence being a _change_ from what
it was before the response to what it is after.

In E. coli I empirically determined the reinforcing value of each consequence
of a response based on the notion that the reinforcingness of a consequence
could be measured by the probability, after the consequence, of the response
that produced it; if the consequence of a response was movement away from the
target (regardless of what the movement relative to the target had been before
the response) the probability of a response after this consequence was high
(the interval between responses was brief); if the consequence of a response
was movement toward the target (again, regardless of what the movement
relative to the target had been before the press) the probability of a
response after this consequence was low (the interval between responses was
long).

your understanding of reinforcement theory is about 25 years out of
date.

I'm willing to believe that and that's why I accepted your explanation.

In developing a control model, for example, you must first decide
what the controlled variable is.

This is not true. When we develop a control model we know what the controlled
variable is; otherwise there is no reason to develop a control model in the
first place.

Your guess might be wrong, and you will then have to try another.
Does this mean that control theory is wrong?

No. It means that the person trying to apply control theory doesn't
understand what the theory is about. Control theory is about control;
you can't even sensibly talk about control until you know what variable(s)
is (are) being controlled.

Me:

Since [reinforcement] theory provides no independent means of
measuring the reinforcingness or punishingness of the environment,
we are free to ascribe these changing characteritics to the environemnt
as necessary to explain behavior.

Bruce:

Based as it is on a misconception of how reinforcers and punishers are
defined, and of what we are doing with these models, this conclusion
is moot. How about "since the theory (PCT) provides no independent
means of measuring the controlled variable, we are free to ascribe
these changing characteristics to the environment as necessary to
explain behavior." Your statement regarding reinforcement and
punishment is as true as my paraphrase, and for the same reason. It
sure sounds good.

Actually, as you well know, there IS an independent means of measuring the
controlled variable; it is The Test for the controlled variable. The Test
is of critical importance to PCT. If you don't know that a variable is under
control or what that variable is, then you have no reason to try to model
behavior using PCT.

Actually, I think that the "old fashioned" definition of a reinforcement (as a
consequence that increases the probability of the response that produced it)
does provide an independent means of measuring the "reinforcingness" of
environmental events. But, as you say, my understanding of reinforcement
theory is about 25 years out of date (well, 15 years; I did read Skinner's
1981 Science article) so maybe now there is no independent means of measuring
reinforcingness at all.

Best

Rick

[From Bruce Abbott (950623.0715 EST)]

Rick Marken (950623.2020)

Bruce Abbott (950622.1740 EST)

Rick, I don't understand your new R. Coli demo, or rather, I don't
understand why your "subject" should show the behavior you
diagrammed (950621.1410).

I don't either. I presume it is because this behavior was selected by its
consequences. But I'm having trouble seeing how this might have happened. What
do you make of it?

It's a control system that doesn't control? Obviously I'm missing something
here. Help me out.

I seem to recall that a reinforcer was defined as a consequence of responding
that increases the probability of the response that produces it. There is
nothing in that definition about the consequence being a _change_ from what
it was before the response to what it is after.

I can't imagine a "consequence" without a change, can you? If the
"consequence" of a response is that nothing changes, then the response has
no consequence. Thus if your cursor is drifting to left, away from target,
you press the button, and nothing changes, the response has had no
consequence and has been neither punished nor reinforced. (Note: I am
ignoring possible "response cost" here, which I assume is slight.)

In developing a control model, for example, you must first decide
what the controlled variable is.

This is not true. When we develop a control model we know what the controlled
variable is; otherwise there is no reason to develop a control model in the
first place.

That would depend on the available information. For example, I seem to
recall your assisting me in developing TWO models of control in the
inverted-t illusion situation, one based on a difference between two
perceived lengths and the other on their ratio. Seems to me we didn't know
what the controlled variable was until we developed the models and then
determined empirically which one generated a better fit to the data.
Furthermore, even those two possibilities were only (informed) guesses. The
actual variable being controlled might be something different yet, like a
weighted combination of difference and length. So don't tell me you always
"know" what the controlled variable is when you develop a control model. It
isn't true.

Your guess might be wrong, and you will then have to try another.
Does this mean that control theory is wrong?

No. It means that the person trying to apply control theory doesn't
understand what the theory is about. Control theory is about control;
you can't even sensibly talk about control until you know what variable(s)
is (are) being controlled.

I see. So now you're telling me that you developed two models of control in
the inverted-t task because you didn't understand what control theory is
about. This is getting interesting...

Actually, as you well know, there IS an independent means of measuring the
controlled variable; it is The Test for the controlled variable.

Yes, and there is a similar independent means of determining whether a
consequence is a reinforcer, punisher, or neutral event under given
conditions. Thorndike described it in 1898.

Regards,

Bruce

[From Beter Burke (950623 9:30 PDT)]

[From Rick Marken (950623.2020)]

I seem to recall that a reinforcer was defined as a consequence of responding
that increases the probability of the response that produces it.

If a reinforcement is defined as a consequence of responding
thatdecreases the error between the perception and the reference signal
then there would seem to be little difference between PCT and
reinforcement theories!! :wink:

···

-------------------------------------------------------------------
Peter J. Burke Phone: 509/332-0824
Sociology Fax: 509/335-6419
Washington State University
Pullman, WA 99164-4020 E-mail: burkep@unicorn.it.wsu.edu
-------------------------------------------------------------------

<[Bill Leach 950625.10:44 U.S. Eastern Time Zone]

[From Bruce Abbott (950623.0715 EST)]

I REALLY don't want to "get between you and Rick"... since the EPA banned
asbestos, and it's quite hot around here already...

I can't imagine a "consequence" without a change, can you? If the ...

I realize that this is bordering of "word games" but yes, I can. If
press the key and nothing happens that is a "consequence". The
"consequence" is that I did something and "nothing" happened.

You may view Rick's logic for the program as faulty and possibly you are
right. However, I don't think that it is "strange" to consider that a
purposeful being that has "set out to do something":

      explicitely performs an action and
      observes that the action fails to produce desired results

would be expected to conclude that the consequences of the action was
unfavorable.

In one sense, I do agree with you that "consequence" can not exist
without change. Where I disagree is that I believe that it is illogical
to presume that only a change in _result_ is a consequence. The
"initiating action" is also a _change_.

If I push on a door and it does not open, you are claiming that there
were no consequences to my action -- that is preposterous!

advanced knowledge of the CEV

You "hooked" him pretty good on that one Bruce!

EAB does acknowledge that "reinforcement", "punishment" and "neutral
events" are specific to the individual experimental subject under the
specific conditions of the experiement, yes? That is, EAB researchers
recognize that they can not accurately "generalize" the experimental
results?

For example, one of the concepts of EAB (that I might well be rather
confused about), is the idea that a "reinforcer" causes repetition of the
behaviour that "resulted" in the reinforcer.

Nearly anyone more than 8 years old would see that the above is just
plain not true (including EAB researchers of course).

So there is a concept of "satiation"(?) added to account for some of the
observed limits for effect of reinforcers upon behaviour. Is not
"satiation" an outright admission that the "reinforcer" is only related
in some actually unknow way to whatever process is being observed? Is
not satiation admitted to be "somehow" internally controlled by the
subject? Specifically, does not EAB recognize that their own data proves
that "reinforcers" can not be the cause of behaviour?

It is true, is it not, that in an "action get you a food pellet" type
experiment that the delivery of four pellets vice one pellet per action
results in fewer actions? What happens if each action produces a
reinforcer that is greater than that amount that the subject would eat
for the duration of the experiment?

It seems to me that such presents a significant challenge to the basic
principle of EAB. Every "modification" points to the idea that the
external "thing" IS NOT determining the subject's actions but rather what
the subject wants determines the subject's actions. Thus, the
"reinforcer" is incidental - if the subject wants some amount of the
reinforcer then behaviour will adjust (if possible) to get that just that
amount.

The matter is so profoundly different when compared to PCT research as to
be incomparable (even when it appears otherwise). PCT asserts that with
the single exception of application of overwhelming force that the ONLY
thing that is important is the wants, goals, or references of the
subject. Any research that does not explicitely search for, test for and
describe the postulated reference(s), obtains data that at best, can NOT
be reliably generalized and at worst, is completely wrong as understood.

-bill

[From Chris Cherpas (970815.1803 PT)]

[re Bill Powers (970815.1129 MDT)/etc.]

As one who was trained in EAB and is now trying to learn PCT
and apply it to education, I'd first like to tell a story about EAB
(skip it if you like), and some observations about PCT vis a vis EAB
(of course, skip that too!).

The Story:

Much of EAB, to me, represents a kind of "meantime" strategy
for explaining behavior. The implicit philosophy (variably
made explicit by B.F. Skinner) is this: since the "causes" of what
people experience as behavior have not been determined neurologically,
one is safest to experimentally isolate single organism/environment
functional relations which can be accumulated and "systematized" with a
minimum of theory -- until the discovery of the physiological mechanisms
catches up (i.e., it's what EAB does in the "meantime").

Skinner went on to say that what EAB produces will not be invalidated
by neuroscience, and, in fact, the relationship between the micro-mechanisms
of within-organism biology and EAB's whole-organism/environment relations may
forever require bridges made of probabilities, not deterministic casual links.
Not wanting to slide into a lazy kind of wait-and-see reductionism, and perhaps
drawing from J.R. Kantor's "interbehviorism," EAB-ers have maintained a fairly
broad contextualist stance, so that the occasional forays into mechanistic
(immediate cause) theorizing are always balanced by a defining respect for what
can be observed in the onion skins of context surrounding the organism,
including the history of environment-organism interactions, as well as in the
immediate environment.

Skinner took a step away from S-R psychology by defining operants, which
describe variable means to common ends, but EAB has repeatedly invented ways
of sliding back -- only to have to reinvent a way back out. The best EABers
knew that S-R was not right, but still needed to find causes. Selection
became a way of talking about a "different kind of causation." Perhaps
the synthesis of evolutionary biology and genetics would be recapitulated
in the field of behavior. Discriminative stimuli only "set the occasion"
for behavior, even as a sub-field called "stimulus control" by its very
name betrayed the commitment to avoid S-R explanations.

None of this terminological hooey would have really been of much consequence,
but the problem I kept facing was that not much ever seemed to happen in
the area of aggregating the typical operants into more organized systems
at the "big" end, nor decomposing operants into the components which the
typical lab operants might have organized at the small end. In behavioralese,
I might say that while I could do operant analysis, I couldn't do much
"repertoire analysis." In short, EAB lacked what HPCT at least promises
to deliver. It's not as if there was _nothing_ either. I did my graduate
experimental work with current operants and explored the matching law just
as far as I could. Some EABers, especially in the earlier days, like
Jack Findley, and later, Murray Sidman, tried to put some organization
into the "bag of operants" I felt I had inherited from the EAB program,
but nobody seemed to be able to keep extending the work without looking
like another pie-in-the-sky mentalist.

Despite Skinner's concern that continuous nature of behavior could not
be adequately described by parsing into stimuli and responses (e.g.,
see Science and Human Behavior), the attempts to work with "continuous
repertoires" (e.g., Holland's) and the isolated experiments with
micro-level operants just didn't add up for me (and I doubt anybody
else). The "illusion of behavior" which PCT reveals is especially
meaningful to me, because it addresses the problems of continuity
and organization directly.

The Observations:

1. The prototype of a disturbance in EAB is the _closing_ of the food
hopper when the programmed duration for food access "times out" (sorry,
no pellets, I worked with pigeons). The bird would do anything to
get back to working on bringing the state of deprivation (or however one
may eventually define the controlled variable and related reference),
but it can't. Turning _off_ the discriminative stimulus is another
disturbance which you don't get to see the organism oppose very
successfully in most operant preparations. Of course, it's easier
to talk in terms of disturbances when electric shock is involved,
for which the reference is going be zero in most cases.

2. Jack Michael, Skinner's long-time interpreter, put together a concept
called an "establishing operation" which brings current purpose back
into the arena of current behavior, though the internal reference is not
the focus -- only the "operation" (disturbance) which makes it clear that
something is now important for the organism to perceive. Contingencies don't
cause behavior, they just constrain the possibilities for current action
and/or eventual reorganization. But the "EO" (establishing operation),
I believe, is the crack in the structure that could open the way.

3. Non-contingent delivery of food has been done lots. In a "virtual"
current operant set-up, you establish a baseline of getting food
contingently and vary the frequency of the non-contingent access to food,
and you see the corresponding (inverse) in rate of key-pecking (again, the
matching law). EAB would not predict that the rates of pecking would go
up.

4..., actually, I gotta go now...

Best,
cc

What does EAB stand for?

[From Bill Powers (970816.0348 MDT)]

Note to Ellery: EAB means "Experimental Analysis of Behavior," the term
used by radical behaviorists for what they do.

Chris Cherpas (970815.1803 PT)--

As one who was trained in EAB and is now trying to learn PCT
and apply it to education, I'd first like to tell a story about EAB
(skip it if you like), and some observations about PCT vis a vis EAB
(of course, skip that too!).

Wouldn't skip it for the world, Chris -- what an excellent post.

Skinner went on to say that what EAB produces will not be invalidated
by neuroscience, and, in fact, the relationship between the micro-mechanisms
of within-organism biology and EAB's whole-organism/environment relations may
forever require bridges made of probabilities, not deterministic casual

links.

If you read carefully what Skinner said about this, you will see that he
expected future neurology merely to discover the internal connections
between inputs and outputs (whether probabilistic or not). Only if you
assume such connections does it make any sense to suppose that by measuring
observable inputs and outputs you can adequately describe what the nervous
system does.

Skinner took a step away from S-R psychology by defining operants, which
describe variable means to common ends, but EAB has repeatedly invented ways
of sliding back -- only to have to reinvent a way back out.

This is one of the most frustrating things about Skinner's near-miss. He
saw that there were many means to a given end -- but he rejected means-ends
concepts, because they involve the organism's intentions. And most
frustrating, he never saw that each different behavior in an "operant" was
accurately adjusted so that in the current environment, the same result
occurred.

He wasn't the only one who missed this point; look at Brunswik's "lens
model," cited in B:CP. One of the prevailing ideas of the time was that
behavior is randomly variable, but because many different behaviors could
produce the same result, all the different behaviors had the same effect.
There was some vague idea that the statistical average of all the varying
behaviors was what produced the consistent results. Nobody seemed to see
that the variations in behavior were _required_ if the same result was to
occur -- not just any old variations would do. Behavior had to change
specifically so as to counteract the effect of disturbances and changes in
environmental properties. Of course if anyone had tumbled to that
relationship, PCT would have been here long ago.

Some EABers, especially in the earlier days, like
Jack Findley, and later, Murray Sidman, tried to put some organization
into the "bag of operants" I felt I had inherited from the EAB program,
but nobody seemed to be able to keep extending the work without looking
like another pie-in-the-sky mentalist.

This was the crippling consequence of deciding that only observables could
be used in explaining behavior. If you have to avoid looking like a
mentalist, you have to give up trying to model what goes on inside the
organism. And as we know, that means giving up the most powerful tool of
science. Just ask any physicist (well, not ANY physicist). Ask Watson and
Crick.

The Observations:

1. The prototype of a disturbance in EAB is the _closing_ of the food
hopper when the programmed duration for food access "times out" (sorry,
no pellets, I worked with pigeons). The bird would do anything to
get back to working on bringing the state of deprivation (or however one
may eventually define the controlled variable and related reference),
but it can't. Turning _off_ the discriminative stimulus is another
disturbance which you don't get to see the organism oppose very
successfully in most operant preparations. Of course, it's easier
to talk in terms of disturbances when electric shock is involved,
for which the reference is going be zero in most cases.

Nice observation. In fact the basic arrangement in an operant conditioning
experiment removes most obvious possibilities for control by the organism.
When Bruce and I started looking at the data in the begining of the
experiments, it was clear that something was haywire, control-wise. We
realized that the problem was that Bruce was trying to keep the animals'
weights at a fixed level by manipulating the total food supply, while the
animals were trying to control the same weight (or something related to
it). The experimenter was in direct conflict with the animal. Their
bar-pressing was going all over the place. When we allowed the animals to
control whatever they were controlling by varying their food intake, the
data started looking better.

2. Jack Michael, Skinner's long-time interpreter, put together a concept
called an "establishing operation" which brings current purpose back
into the arena of current behavior, though the internal reference is not
the focus -- only the "operation" (disturbance) which makes it clear that
something is now important for the organism to perceive.

Bruce A. brought this up earlier. The "establishing operation" is, of
course, deprivation of the thing that is to be used as a reinforcer. The
EO, in other words, is a great big disturbance of something the animal
wants or needs to control, as you say. In the typically narrow views of an
experiment, where nothing outside the temporal or physical boundaries of
the formal setup counts, it is naturally overlooked that the disturbed
variable is something that the organism was previously perceiving and
controlling. It's not that this variable is NOW important to perceive; it
always was important, but now there is a huge error that was formerly
prevented from happening. The animals are always getting food inside
themselves by _some_ means, when they're allowed.

Contingencies don't
cause behavior, they just constrain the possibilities for current action
and/or eventual reorganization. But the "EO" (establishing operation),
I believe, is the crack in the structure that could open the way.

Say it again, Chris, in case Bruce A. is listening. the "crack in the
structure" is not what he is looking for, but it's there (in multiple copies).

3. Non-contingent delivery of food has been done lots. In a "virtual"
current operant set-up, you establish a baseline of getting food
contingently and vary the frequency of the non-contingent access to food,
and you see the corresponding (inverse) in rate of key-pecking (again, the
matching law). EAB would not predict that the rates of pecking would go
up.

Yay! So the experiment I had in mind has been done! It's a wierd thing, how
people can sit and stare at data that absolutely contradict the basic
principles of reinforcement, and simply not notice the contradiction. Even
looking at contradictory data from experiments (like Motheral's and many
others that show behavior decreasing as reinforcement increases) the
dedicated EABer can go right on saying that (1) reinforcement creates and
maintains behavior, and (2) this is simply a report on what is observed.
The basic principles of reinforcement have become so solidly entrenched
that mere data can no longer invalidate them.

Thanks for a lovely post, from my point of view.

Best,

Bill P.