PCT and EAB

[From Bill Powers (950626.0630 MDT)]

Bill Leach (950625.13:07 U.S. Eastern Time Zone) --

You bring up a number of interesting points about reinforcement theory
and PCT. One recurrent theme is that the environment does have some
controlling effects on our behavior -- but then again, it doesn't!

     Within limits, it is not unreasonable to conclude that the
     "environment makes us "behave" the way that we do behave."

     That some environmental factors have a consistant, testable cause
     effect relationship to living beings is beyond dispute (at least
     with anyone even pretending a scientific approach). For example,
     placing a human (or any mammal) in a normal atmosphere except that
     it is oxygen free for a sufficient length of time will result in
     the organism's death.

But in another post:

     If my goal is to drink a glass of milk, there are quite literally
     thousands of different ways in which that goal might be satisfied
     including the many different actions that would occur should there
     not be any milk in the house or what exists is spoiled, etc.

One of the main points of the PCT model is to explain how it is that
there are quite literally thousands of different ways in which a given
goal might be satisfied, yet somehow an organism finds just the way that
will do the trick -- most of the time. This goes clear back to the basic
puzzle about behavior that William James noticed: variable behavior
producing consistent ends.

Probably the greatest hole in reinforcement theory was patched by the
most casual plastering job. Skinner noticed way back in the beginning
that what we casually describe as "the same behavior as before" is most
often a very different behavior from the one we saw before. The actions
of organisms, in the sense of the outputs they produce with muscles and
limbs, do not actually repeat from one instance of behavior to another.
What repeats is some particular consequence of acting. Rather than
asking how this can possible be, Skinner simply defined "the operant" as
any class of actions that has a particular effect.

At that point a serious question for a materialist explanation of
behavior arises: how is it that the same effect can be produced by
different actions? If Skinner had been inclined to take a physical-
science approach to behavior, this would have been a serious question
indeed. And a physical-science answer would have been of little comfort
to a behaviorist. The only way in which a consistent result can come
from variable actions is for the environment to vary in just the way
needed to make up for the variations in the actions.

That, of course, is putting it the wrong way around. The right way
starts by asking what the same consequence would have looked like
without the organism's actions. Where would the milk that you drank have
been if you had not acted? It might have been in a thermos jug, in a
carton in the refrigerator, in a grocery bag waiting to be unpacked, in
a store, or in a cow. Yet no matter where it is, you end up getting a
drink of milk. The only POSSIBLE explanation is that under each
different environmental circumstance, you produced just the physical
action required for the milk to end up in a glass and the glass to be
emptied down your gullet. Your action does not produce the final effect.
All it does is make up the difference between a particular final effect
and the effect that would have happened without the action. If someone
goes and gets the milk and puts it into a glass and tips the glass into
your mouth, the only action required from you to get a drink of milk is
to open your mouth and swallow.

The subject matter of PCT lies in the very part of behavior that Skinner
dismissed as basically unexplainable. When he said that behavior is
"emitted," he ceased to talk about the physical actions of the organism
and began talking about _physical consequences_ of those actions. What
is "emitted" is a change in the physical observable world due to a
change in the organism's action coupled with any change in the physical
world capable of altering that same physical variable. What we commonly
call behavior is really a resultant; the outcome of combining forces
created by an organism with forces that originate elsewhere.

When we see it this way, we realize how strange it is that an organism
can actually appear to emit the same physical consequence of action over
and over. We can't just say, as Skinner did, "Oh, well, the organism
just did one of the many things that could have had the same physical
effect." It's not as simple as that. In any one physical situation,
there is only ONE action the organism can take that will have a
particular physical effect. If the local environment changes in any way,
there is still only ONE action the organism can take to create the same
effect as before, but now it is a DIFFERENT action.

What's really going on is that when the organism emits a particular
physical effect, its action is PRECISELY THE ONLY ACTION THAT COULD HAVE
PRODUCED THAT EFFECT AT THAT TIME. Given the same state of the
environment, any other action would have resulted in a DIFFERENT effect.
As Skinner noted, there are many actions that can have the same effect.
But what he failed to notice is that this is true only over a large
number of instances of the actions. In any one instance, it is not true
that any number of actions can have the same effect. Given the current
effects of the environment on a particular variable, for that variable
to be in or remain in a particular state requires ONE AND ONLY ONE
ACTION by the organism. It is that action which, when added to all other
influences acting at the same time, will produce that particular effect.

When this fact finally sinks in, the problem of explaining purposive
behavior returns with full force. We can no longer just wave vaguely at
the details and say that one of the actions that could have had the
observed effect must have occurred. We must account for the fact that
the observed effect is produced and produced again, each time by THE
ONLY ACTION THAT COULD HAVE PRODUCED IT AT THAT TIME.

If you think of the process of getting from one state of the environment
to another, there are of course many trajectories by which this can
happen even in a single instance of behavior. To understand the import
of the point I'm trying to make you have to see the world at each
instant, not over a series of instants. At any point during any action,
the state of any variable including all its time derivatives is
determined by the sum of all influences on it. If the trajectory is to
repeat, then the action of the organism must at all times be exactly
what is required to make up the difference between the trajectory that
would have occurred without the influence of the action and the
trajectory that is to be reproduced. And at every instant, there is only
ONE action that can do this.

The basic problem in explaining behavior, therefore, is to explain how
it is that the one action that is necessary to produce a given result is
the one that is produced by the organism -- even when each instance of
"the same behavior" requires that a specific different action be
produced. This is the very question that Skinner's definition of "the
operant" bypassed.

···

--------------------------------------------
The other question raised by your comments concerns the role of the
natural environment in behavior. Does the enviroment "make" organisms do
anything? I claim it does not. But if it doesn't, what _is_ the part
played by the environment in behavior?

One thing the environment does is to determine the physical effects of
any action generated by the organism. The organism has, generally
speaking, no way of altering the physical laws that determine these
effects. If you exert a force on a free mass, it will accelerate; the
only way you can keep it from accelerating is to remove the force or add
a second cancelling force. There is no way any organism can learn to
apply a net force to a free mass without accelerating it.

A second thing the environment does is to have physiological and
physical effects on the bodies of organisms. A human organism will die
without oxygen, food, water, the right temperature range, and protection
against physical damage, to mention a few items. There is nothing an
organism can learn that will free it from such effects.

And a third thing the environment does is to stimulate sensory nerve
endings. When we speak about the environment causing behavior, the
causal path we usually mean involves the sensory systems. The question
really is, can the environment make the organism produce certain actions
by stimulating its sensory endings in the right way?

Let's not confuse this causal path with others. It's true that if the
environment is suddenly depleted of oxygen, a person would "respond" by
dying. So a requirement of continued life is a continued supply of
oxygen. But this fact does not make the organism seek oxygen. If the
organism did nothing to counteract the lack of oxygen, it would simply
die. The question is not whether it is necessary for life that the
environment be in a certain state; it is whether deviations from that
state can, in themselves, make an organism do anything in particular.

The answer is, of course, no. The only possible way in which lack of
oxygen could stimulate an organism to seek oxygen would be for the
organism to possess sensory equipment that could report on the state of
the oxygen supply, and internal organization that would convert a bad
report into an action that would have the effect of restoring the oxygen
supply. Whether such an action would occur does not depend on anything
in the environment. It depends on processes inside the organism. If
those processes are missing, the organism will die. If a lack of oxygen
could make an organism behave to restore the oxygen supply, then
organisms would never die from lack of oxygen. The environment would
make them do what is necessary.
------------------------------------------
     A search for "bigger" reinforcers or for the CCEV could look very
     much alike.

Suppose we say that food is a reinforcer: more food, more behavior that
produces food.

If we are looking at food as a CCEV, then we see it a little
differently. We say, the more food there is, the less behavior there is
to produce food. When the amount of food reaches a certain level, the
behavior that produces it disappears altogether.

Are you sure you want to say that searching for reinforcers looks very
much like searching for CCEV's?

It seems to me that this, not the points you cite, is the basic
irreconcileable difference between EAB and PCT. We are talking about a
FACTUAL difference. When the amount of reinforcement increases, does
behavior increase (EAB) or decrease (PCT)?

In recent posts, I have suggested, and Bruce Abbott has tentatively
agreed, that the apparent increase in behavior due to an increase in
reinforcement that is seen under some schedules of reinforcent is
actually due to the organism's turning to other kinds of behavior when
the reinforcement rate is low, and spending more time on a particular
behavior when the reinforcement rate associated with that behavior is
higher than for other behaviors. The implication is that if the organism
continued to be engaged in a particular reinforcment-producing behavior,
the relationship would be that an increase in reinforcement goes with a
decrease in behavior and vice versa -- the opposite of the basic
assumption of EAB, but consistent with PCT.
-------------------------------------------
     PCT then takes the position that events in the environment are only
     the ultimate cause of behaviour overall because the subject would
     not even exist in the environment if some events had not occurred.

The cause of existence is different from the cause of behavior -- i.e.,
specific actions. Not so?
-----------------------------------------------------------------------
     If I have a reference for "picking up mount Shasta with my hands",
     the "environment" will "stop me" from doing such. To PCT the
     "consequences" of my attempt are neither a "reinforcer" nor a
     "punisher". PCT does recognize that somehow my "control failure
     experience" is "learned". PCT does NOT predict that I will
     immediately discontinue attempting to pickup mountains as a result
     of this failure or any other. THE MEAR FACT OF THE EXISTANCE OF A
     CONTROL FAILURE IS NOT A GUARANTEE OF BEHAVIOURAL CHANGE!

I think you're treating this on too intellectual a plane. If you picture
actually trying to pick up Mount Shasta with your hands as a REAL
PROCESS, I think you would discover a lot of reasons for reorganizing.
Your hands would be raw and bleeding; you would have ruptured tendons
and torn muscles all over your body; you would be in a state of chronic
and complete exhaustion. I predict that you would very quickly give up
this attempt because of extreme errors in other control systems,
including intrinsic systems that are involved in reorganization. It's
not the environment that stops you from having this goal; it's the felt
consequences of trying to carry it out. If you didn't mind bleeding and
aching and gasping for breath, nothing else could keep you from going
right on with the attempt. The environment doesn't care how you feel,
and it won't stop you from even suicidal efforts.
-------------------------------------------------
Direct post:

     Have you tried removing the "world model" code and running his
     program?

Not yet, but I will.

     There are some restrictions as to the compatibility between the
     Amiga Pascal that I have and Turbo Pascal.

I understand that you have to take special precautions when using a
mouse as a control device on an Amiga to prevent time-sharing with other
processes, thus throwing the timing off. Are there other Amiga users out
there who can help with this?

     I believe that I will have to rewrite the display routine to get a
     useable display and there is something "funny" about keyboard
     control attempts.

Right. My experiences with the Amiga led me to give up on it. To read a
keystroke and display the character on the screen takes a whole lot of
foreplay that, in the end, I wasn't willing to go through.

If you can get our programs to run on an Amiga, there might be others
who would be interested in seeing how you do it.
-----------------------------------------------------------------------
Best,

Bill P.

<[Bill Leach 950626.22:27 U.S. Eastern Time Zone]

[From Bill Powers (950626.0630 MDT)]

Probably the greatest hole in reinforcement theory was patched by the
most casual plastering job. Skinner noticed way back in the beginning
that what we casually describe as "the same behavior as before" is most
often a very different behavior from the one we saw before. The actions

Which points out to me that I really did not understand the EAB meaning
of "operant".

The subject matter of PCT lies in the very part of behavior that Skinner
dismissed as basically unexplainable. When he said that behavior is
"emitted," he ceased to talk about the physical actions of the organism
and began talking about _physical consequences_ of those actions.

and that can be soooooo close! (yet so far away)

and over. We can't just say, as Skinner did, "Oh, well, the organism
just did one of the many things that could have had the same physical
effect." It's not as simple as that. In any one physical situation,
        ...
What's really going on is that when the organism emits a particular
physical effect, its action is PRECISELY THE ONLY ACTION THAT COULD HAVE
PRODUCED THAT EFFECT AT THAT TIME. Given the same state of the
environment, any other action would have resulted in a DIFFERENT effect.
        ...
If you think of the process of getting from one state of the environment
to another, there are of course many trajectories by which this can
happen even in a single instance of behavior. To understand the import
of the point I'm trying to make you have to see the world at each
instant, not over a series of instants. At any point during any ...

I agree with this but it is doubtlessly related to where some of the
misunderstanding comes from.

An individual repeating the "same act" under the same environmental
circumstances will likely not employ exactly the same trajectory twice
even if the final result is exactly the same.

Thus, we are faced with the fact that in most behavioural situations, a
slight change in the output will still provide acceptable results. The
set of correct actions vs the set of incorrect actions are both infinite
sets of the same order. However if employing any discrete measurement
system then there are many orders of magnitude more members in the
incorrect action set.

I would suspect that "emitted behaviour" type thinking fails to recognize
that the theoritically infinite number of possibly correct trajectories
completely masks recognition of the fact that typically an almost
infinitesimally tiny variation with respect to the range of output in any
one of the available degrees of output freedom is a "complete miss".

oxygen and environmental paths

No, I definately was not trying to say that lack of oxygen was any sort
of control event. Of course many other "schools" of psychology don't see
it as a control event either but then that might be because they don't
recognize control at all.

Interestingly enough, there is no "natural" sensing of oxygen supply and
thus this is a recurring "closed volume" cause of death problem.

    A search for "bigger" reinforcers or for the CCEV could look very
    much alike.

Suppose we say that food is a reinforcer: more food, more behavior that
produces food.

I was being too kind! One very serious problem that I am having
personally with "reinforcers" is that there seem to be at least two
"kinds". When Bruce was talking about "rats preferring control" and
Dennis; "Johnny wants attention", these seem like one kind and an object
in the environment (ie: food) another kind.

Searching for a "reinforcer" related to "preferring control" should
definately look a lot like a CCEV search. Looking for a "reinforcer"
similar to food (that is an object in the environment) would most
certainly NOT resemble a PCT type search.

As mentioned before, the "ultimate cause in the environment" was meant to
be a bit humorous. If a few events that occurred in the environment over
which I had no possible control, I would not exist.

intellectual plane

Yes, I agree. Such attempts at examples are anything but easy. I could
add a few goals for no physical pain or injury in the attempt but the
whole idea is just too weak. I can't think of a physical example that
does not become too convoluted.

A miscellaneous comment:

We do have many years experience with engineered open loop "control" in
well defined environments (the early "pattern" machine tools coming
immediately to mind). These worked as long as the disturbances were not
too great (and at that quite a bit of manual intervention was used).

OTOH, the _actions_ of a negative feedback machine tool are NOT exactly
"predictable". The actual amount of movement and force varies as
necessary to counter-act disturbance.

Pascal and Amiga

I understand that you have to take special precautions when using a
mouse as a control device on an Amiga to prevent time-sharing with other
processes, thus throwing the timing off. Are there other Amiga users out
there who can help with this?

I don't expect any problems in this area. You were probably using an
Amiga A1000 or A2000 machine (7Mhz 68K). I am using either an A3000
(25Mhz 68030) or an A2000/040 (33Mhz 68040).

The real "problem" with mouse events is in knowing how to request them.
Intuition will "give" you enough IDCMP messages to keep your program busy
"for a week" if you let it (the problem is not latency between your
request and the arrival of the information but rather if you request
continous reporting your program may not be able to keep up).

    I believe that I will have to rewrite the display routine to get a
    useable display and there is something "funny" about keyboard
    control attempts.

Right. My experiences with the Amiga led me to give up on it. To read a
keystroke and display the character on the screen takes a whole lot of
foreplay that, in the end, I wasn't willing to go through.

Actually things are not that bad. C is complex because there are so many
structures that are required to 1) create a screen, 2) create one or more
windows and 3) set up message ports (if using system calls for screen and
window writes).

I think that my problems with the code written by both you and Bruce
exists in two areas. Your choices for the window color palette result in
"invisable" stuff and the screen geometry might be different. The second
area might be a compatibility error in that I believe that the code (as
compiled on my machine) miss interprets the keystrokes.

Han's code is fine on both scores so I may just print out all three and
compare.

-bill

[From Bruce Abbott (970902.2000 EST)]

Bill Powers (970902.0757 MDT) --

This is a long post and I have several issues to take up in reply. It might
be best to take things one step at a time, reaching agreement (if possible)
on one issue before taking up the next. I'll start at the beginning, with
my statement on the phenomenon of reinforcement and your comments on that
statement.

Bruce Abbott (970901.1030 EST)

Added comments on post concerning relation of EAB terms to control theory.

Your paragraph:

1. The empirical phenomenon of reinforcement is demonstrated by showing that
   the operant occurs at a relatively high rate when the operant produces the
   reinforcer and at a relatively negligible rate when the operant does not
   produce the reinforcer.

My comment:

Since "the operant" is defined in terms of its effects on producing a
reinforcer, can we say that there is an operant when no contingency
connects behavior to the reinforcer? If I recall correctly, the operant is
that class of responses all of which produce the same effect on a
reinforcer. So when there is no contingency, and thus no effect of any
response on a reinforcer, there is no operant. There are just responses, or
behaviors of various kinds.
The output of the control system, when it produces a reinforcer, is an
operant. Otherwise it is just an output.

In the prototypical experiment of the rat in the operant chamber, one
defines an operant in terms of specific observable consequences of the
behavior, e.g., depression of the lever to the point of switch-closure.
This is the event that will be required during the contingency phase in
order to produce the reinforcer. The class of responses that produces this
event is the operant. The same operant can occur during extinction, when
the event does not produce the reinforcer.

The operant is an "end accomplished through variable means" and therefore is
the product of a control system, the intended act (although the initial
accomplishment of that act may be an unintended byproduct of action, as
during exploration.) But from the point of view of the control system
involving the reinforcer, the control system producing the operant can be
placed within a little box labeled "output."

Also, we need a word for "input quantity", because when the input quantity
is not affected by the output, it is not a reinforcer although it is still
an input quantity.

I find this confusing, because in PCT the term "input quantity" means
something else -- an input to the perceptual input function. The quantity
here in question is a variable affecting the CV. Has it been given a name
in the PCT model?

Finally, please note that this paragraph applies ONLY to phase 1 and
extinction.

The comparison is between rate of the operant when the control system has
been formed and error between the CV and its reference are still relatively
high, versus the late phases of extinction, when the operant is being
produced once again at baseline rates. This does not correspond to phase 1
versus extinction.

That's enough for now.

Regards,

Bruce

[From Bill Powers (970902.2105 MDT)]

Bruce Abbott (970902.2000 EST)--

In the prototypical experiment of the rat in the operant chamber, one
defines an operant in terms of specific observable consequences of the
behavior, e.g., depression of the lever to the point of switch-closure.
This is the event that will be required during the contingency phase in
order to produce the reinforcer. The class of responses that produces this
event is the operant. The same operant can occur during extinction, when
the event does not produce the reinforcer.

The operant is an "end accomplished through variable means" and therefore is
the product of a control system, the intended act (although the initial
accomplishment of that act may be an unintended byproduct of action, as
during exploration.) But from the point of view of the control system
involving the reinforcer, the control system producing the operant can be
placed within a little box labeled "output."

I thought the operant was a name for a _class_ of behaviors -- those that
produce a common result. I guess in this case the operant is whatever
behavior will close the contacts -- if that's it, my suggestion is
irrelevant. I suppose I was generalizing, to the natural case where many
different "contingencies" exist that will produce the reinforcer and the
animal might employ any of them.
I stand corrected.

Also, we need a word for "input quantity", because when the input quantity
is not affected by the output, it is not a reinforcer although it is still
an input quantity.

I find this confusing, because in PCT the term "input quantity" means
something else -- an input to the perceptual input function. The quantity
here in question is a variable affecting the CV. Has it been given a name
in the PCT model?

Yes, I see the problem. Our difficulty is that the CV as you've defined it
is inside the organism where it can't be directly observed. This would make
it part of the perceptual input function. The "input quantity" in PCT is
the _observable_ counterpart of the perceptual signal in the model -- in
other words, it's what you have been terming the reinforcer.

When we do the test, we apply disturbances to an observable variable that
affects the inputs of the organism. Of course this means we have to observe
it in the right way, through the right perceptual function. But in terms of
the observer's experiences, it appears to be an aspect of the environment.
For example, the observer might see not just one pellet, then another and
another, but a _rate of appearance_ of pellets. That would be the input
quantity, as far as the Test is concerned. To see if this input quantity is
controlled, the observer would do things tending to disturb the rate, such
as adding or subtracting pellets at various rates. If the behavior changed
so that the pellets provided by behavior increased or decreased in the
opposite direction to the disturbance, keeping the net rate approximately
constant, that phase of the test would be passed. So the rate of pellet
delivery would be the controlled variable. Prior to passing the test, this
variable is just an input quantity.

I think this misunderstanding arose when you decided to add a step between
the contingency and the CV, so as to provide a separate place to measure
the reinforcer. Unfortunately, this made the actual CV, as you defined it,
invisible to the experimenter. I tried to show mathematically that because
the internal CV is simply proportional to the rate of reinforcement, in the
steady state, these are not really different measures; either one would
pass the Test. Of course dynamical observations would show a difference, if
we could disturb and measure nutrient levels directly. But from outside the
organism, it is the rate of pellet delivery that would be called the
controlled variable. And it would, indeed, be controlled.

In any case, when pellets are delivered but not through the contingency,
they are not (as I understand it) reinforcers. That is, they do not
increase the probability of a response (or the observed rate of responding,
if you consider probability a theoretical term). Isn't this true?

Finally, please note that this paragraph applies ONLY to phase 1 and
extinction.

The comparison is between rate of the operant when the control system has
been formed and error between the CV and its reference are still relatively
high, versus the late phases of extinction, when the operant is being
produced once again at baseline rates. This does not correspond to phase 1
versus extinction.

The statement was "The empirical phenomenon of reinforcement is
demonstrated by showing that the operant occurs at a relatively high rate
when the operant produces the reinforcer and at a relatively negligible
rate when the operant does not produce the reinforcer." This is contrasting
the state of affairs during phase 1 (where the search predominates) with
the state of affairs during phases 2 and 3, or between phases 2 and 3 on
the one hand, and the state after extinction on the other hand. It is not
the situation that holds _within_ phases 2 and 3.

In phase 1, at the point in the search where the correct action has been
found, there is an increase in the reinforcement (from zero) produced by an
increase in the behavior (from zero). This is the only time when there is a
qualitative shift from no reinforcement to some. The rest of the time,
reinforcement rate depends on behavior rate according to the contingency,
and both variables change together, quantitatively. During extinction, we
make the transition the other way: from some reinforcement to none, with a
return to the search mode.

As your first paragraph has it, reinforcement is defined only by this
transition, not by the relations that appear during phases 2 and 3. I said
this poorly the first time.

As I showed in the mathematical treatment, during phases 2 and 3 both
behavior and food delivery rates are dependent variables, affected in
common by loop gain and the setting of the reference level. So any apparent
effect of food delivery rate on behavior rate during those phases is
illusory, according to the control model.

Best,

Bill P.

[From Bruce Abbott (97903.0850 EST)]

Bill Powers (970902.2105 MDT) --

Bruce Abbott (970902.2000 EST)

In the prototypical experiment of the rat in the operant chamber, one
defines an operant in terms of specific observable consequences of the
behavior, e.g., depression of the lever to the point of switch-closure.
This is the event that will be required during the contingency phase in
order to produce the reinforcer. The class of responses that produces this
event is the operant. The same operant can occur during extinction, when
the event does not produce the reinforcer.

The operant is an "end accomplished through variable means" and therefore is
the product of a control system, the intended act (although the initial
accomplishment of that act may be an unintended byproduct of action, as
during exploration.) But from the point of view of the control system
involving the reinforcer, the control system producing the operant can be
placed within a little box labeled "output."

I thought the operant was a name for a _class_ of behaviors -- those that
produce a common result.

It is. The class of behaviors consists of those variable means that produce
a common result.

I guess in this case the operant is whatever
behavior will close the contacts

It is. Closing the contacts is the common result.

-- if that's it, my suggestion is
irrelevant. I suppose I was generalizing, to the natural case where many
different "contingencies" exist that will produce the reinforcer and the
animal might employ any of them.
I stand corrected.

One could also define all those operants that will produce the reinforcer as
belonging to the same class: those that will produce the reinforcer. But
each member of this class would be a different operant.

Also, we need a word for "input quantity", because when the input quantity
is not affected by the output, it is not a reinforcer although it is still
an input quantity.

I find this confusing, because in PCT the term "input quantity" means
something else -- an input to the perceptual input function. The quantity
here in question is a variable affecting the CV. Has it been given a name
in the PCT model?

Yes, I see the problem. Our difficulty is that the CV as you've defined it
is inside the organism where it can't be directly observed. This would make
it part of the perceptual input function.

No. Inside the organism need not be the same as inside the control system.

The "input quantity" in PCT is

the _observable_ counterpart of the perceptual signal in the model -- in
other words, it's what you have been terming the reinforcer.

No, it isn't. Here's the basic diagram:

                                r
                           p | e
                     [pif]--->[comp]----->[out. f]
                       ^ |
                     i | | o
                       +-------[CV]<---[eff]--+
                                ^ ?
                                >
                                d

The "?" symbol labels the variable that tends to drive the CV toward its
reference level; it and the disturbance d contribute to the value of the CV.
Note that ? is not identical to i. In fact, even in the absence of
disturbance, ? may differ from i; for example ? may be the rate of food
intake and i may be the integral of ?

In any case, when pellets are delivered but not through the contingency,
they are not (as I understand it) reinforcers. That is, they do not
increase the probability of a response (or the observed rate of responding,
if you consider probability a theoretical term). Isn't this true?

In the same way that a CV is not a CV when control over it is lost, but we
still call it the CV, so the reinforcer will be called the reinforcer even
when it is presented ouside of the contingency. It still affects the CV in
the same way, but does not participate in the closed loop through which
output (the operant) changes.

As your first paragraph has it, reinforcement is defined only by this
transition, not by the relations that appear during phases 2 and 3. I said
this poorly the first time.

As I showed in the mathematical treatment, during phases 2 and 3 both
behavior and food delivery rates are dependent variables, affected in
common by loop gain and the setting of the reference level. So any apparent
effect of food delivery rate on behavior rate during those phases is
illusory, according to the control model.

Yes. That was to be one of the conclusions of my analysis, if you hadn't
jumped ahead of me.

Regards,

Bruce

[From Bill Powers (970903.0956 MDT)]

Bruce Abbott (97903.0850 EST)--

I guess in this case the operant is whatever
behavior will close the contacts

It is. Closing the contacts is the common result.

OK.

Our difficulty is that the CV as you've defined it
is inside the organism where it can't be directly observed. This would make
it part of the perceptual input function.

No. Inside the organism need not be the same as inside the control system.

In terms of working with observable variables, there is a problem. The
concentration of nutrients in the bloodstream and plasma is not observable
in normal behavioral experiments. You'd have to make continuous assays of
the internal fluids. If we're not able to do this, we have to take
everything between what we observe and the comparator as part of the input
function. I'm not saying we shouldn't make guesses, but that takes us into
conjecture until we can think of ways to test the guesses.

The "input quantity" in PCT is

the _observable_ counterpart of the perceptual signal in the model -- in
other words, it's what you have been terming the reinforcer.

No, it isn't.

(yes it is) no it isn't (yes it IIIIIS!)

I don't know what you mean by "observable." I mean something we can observe
from outside the organism.

Here's the basic diagram:

                               r
                          p | e
                    [pif]--->[comp]----->[out. f]
                      ^ |
                    i | | o
                      +-------[CV]<---[eff]--+
                               ^ ?
                               >
                               d

The "?" symbol labels the variable that tends to drive the CV toward its
reference level; it and the disturbance d contribute to the value of the CV.
Note that ? is not identical to i. In fact, even in the absence of
disturbance, ? may differ from i; for example ? may be the rate of food
intake and i may be the integral of ?

Fine, but where is the boundary between the observable environment and the
unobservable insides of the organism? It's between the "?" and the CV, and
between "out f." and o. The only thing we could apply the Test to, from
outside, is the "?".

In any case, when pellets are delivered but not through the contingency,
they are not (as I understand it) reinforcers. That is, they do not
increase the probability of a response (or the observed rate of responding,
if you consider probability a theoretical term). Isn't this true?

In the same way that a CV is not a CV when control over it is lost, but we
still call it the CV, so the reinforcer will be called the reinforcer even
when it is presented ouside of the contingency. It still affects the CV in
the same way, but does not participate in the closed loop through which
output (the operant) changes.

No, a CV that is not a CV when control is lost is just an input variable;
it is no longer a CV. You're using CV in two ways here; one to designate a
role in a closed loop, and the other just to identifty one variable and
distinguish it from others, independently of its role. To go on referring
to an input quantity as a CV when it proves not to be under control is
simply wrong. That is why we need names like "input quantity" which serve
only to identify a particular physical variable, without implying anything
but its ability to affect an input to the organism, whether there is a
close loop or not. A reinforcer is first of all an input quantity: a
physical quantity that serves as an input to the organism. Such an input
quantity can still have sensory and other effects without being either a
reinforcer or a controlled quantity. To call it an input quantity simply
distinguishes it from other quantities, like output quantities.

As your first paragraph has it, reinforcement is defined only by this
transition, not by the relations that appear during phases 2 and 3. I said
this poorly the first time.

As I showed in the mathematical treatment, during phases 2 and 3 both
behavior and food delivery rates are dependent variables, affected in
common by loop gain and the setting of the reference level. So any apparent
effect of food delivery rate on behavior rate during those phases is
illusory, according to the control model.

Yes. That was to be one of the conclusions of my analysis, if you hadn't
jumped ahead of me.

OK. You can still have the candy bar.

Best,

Bill P.

[From Bill Powers (941025.1235 MDT)]

Bruce Abbott (941024.1740 EST)--

OK, so we start with the Prelac and Herrnstein equation,

                Br
(1) R = ------,
              B + ar

where R is the delivered rate of reinforcement, B is the rate of
responding, r is the programmed rate of reinforcement, and a is a
constant that is supposed to be around 0.5 for regular responding and
1.0 for random responding (Poisson model, I believe).

Substituting 1/R = VId, 1/r = VIp, and IRT = 1/B, we get (in stages)

(1a) VId = (B + ar)/Br = 1/r + a/B

(1b) VId = VIp + a*IRT

just to make sure I follow your derivation.

···

----------------------------------------
At very high behavior rates, the obtained reinforcement rate R is the
programmed reinforcement rate r. At very low behavior rates, R is
proportional to B. Actually, at very low behavior rates, R should be
equal to B, because the key will be enabled on every response. So _a_
should be 1 regardless of whether behavior is regular or randomly
distributed.

In my simulation in terms of intervals, I found that the obtained
reinforcement rate follows a curve a*(1 - exp(-b*t)) very closely, where
t corresponds to IRT (regular). I expect that if you expanded the
exponential into a series, truncated it, and converted into terms of
rates (1/intervals), you would get something like equation (1). The main
difference would be in the curvature between the two limiting
conditions. I'm not going to quibble. I think our way of simulating the
VI schedule will give the right results.
---------------------------------------

... an interesting article by Bill Palya that examines the fine
structure of responding on simple schedules, including the VI. Figure
3 shows that a bird responding on a VI 60-s constant probability
schedule produced a fairly normal-looking distribution of interresponse
times with a mean of 1.05 s and standard deviation of 0.28 seconds.
The post-reinforcement pauses followed a positively skewed distribution
with a mean of 1.84 s and a standard deviation of 1.16 s.

The numbers above suggest that q1 does not reach the reference level
after a reinforcement, so the effect of a reward would be to reduce the
rate of responding somewhat, but not turn it off altogether. If the
responding were perfectly regular, the post-reinforcement pause would be
just the interresponse time of 1.05 sec. It was actually measured as
1.84 sec, so the response rate was lowered by one reinforcement to 0.57
of the pre-response rate, then (I presume) recovered. If one unit of
error signal was enough to maintain the 1.05 sec interresponse interval,
then the increment of reinforcement reduced the error signal to 0.57
units.

Do you suppose we (you) could replicate Palya's experiment and get a
complete record of the data? I believe that it would be possible to
estimate values for the constants in a PCT model if we had a record of
key-presses, rewards, and the method for generating the schedule. By
looking at the distribution of post-reward behavior rates we could
estimate the decay time constant decay1. The peak suppression due to one
reinforcement could also be estimated that way. We know the transfer
function from behavior rate to reinforcement rate, and knowing the
percentage reduction in behavior rate we can provide a scaling factor to
the percentage reduction in error signal. Then, assuming some arbitrary
reference level like r1 = 100, we could calculate the effective reward
size and the output gain. It seems that the only factor we can't
evaluate is an overall scaling factor from environmental physical units
to signal units inside the control system, and that is arbitrary anyway.

Factors such as cage size, lighting schedule, housing conditions, size
of the test chamber, lights, force required to depress the key or bar,
type of food, amount of food or duration of access to food, level of
food deprivation, species, sex, strain, age, and source of animal,
handling, and many other descriptions typically appear in the methods
sections of EAB articles.

Simply mentioning these things is not the same as calculating their
expected effects on the outcome of the experiment. Listing the
conditions rather leaves it up to the reader to decide whether they
deviated from standard conditions, and whether any deviations that exist
are important. But I don't mean to set up roadblocks. Let's do our own
experiments and decide for ourselves what's important. For the same
reasons, we can make up our own minds about the applicability (and
relevance) of matching.

I'm planning to set up some simple schedules in order to collect the
type of data we are looking for here, but that will take time as I will
have to get the project approved by our local animal care and use
committee, then order the rats, receive them, and bring them down to
80% ad libitum weight.

Whoopee!

I have a personal objection, as well as technical objections, to
reducing animals to 80% of their ad libitum weight, which can be
overcome only by showing a compelling need for this kind of treatment. I
would far prefer to start with animals maintaining themselves at normal
weight. Aside from my personal feelings on the matter, when an animal is
reduced so far in weight, reorganization (says PCT) is probably going on
at a significant rate, making all parameters of behavior highly variable
and introducing unnecessary noise into the experiments. In addition,
when the experimenter is varying the food intake to control the animal's
weight, the animal's own systems for weight control are running open-
loop, and their effect on the reference signals of the subordinate
control systems become unpredictable, as well as greatly exaggerated in
comparison with normal operation. My feeling is that we should get our
data in the region of normal operation first, exploring a wide range of
schedules that still permit the animal to maintain its normal weight or
close to it. Then, if there are questions that can be answered by
arbitrarily reducing the animals' weight, I would not object to doing
this.

A great advantage of doing things this way is that we can simply set the
schedule and let the experiment run continuously. I have always been
bothered by the idea of running an experiment for 4 hours, and then
maintaining the animals under entirely different conditions for the
other 20 hours of the day. Timberlake and colleagues have done a lot of
full-time experiments, with results that are highly significant for PCT.

One reason I can think of for running rats at reduced weight is to test
the hypothesis that large intrinsic errors should increase the
variability of the parameters that characterize a control system. At
normal weight, we can match models to behaviors of individual animals by
adjusting parameters. From these studies we can determine how variable
the best-fit parameters are over time within a given animal, and what
the normal range of parameters is between animals. Then, after reducing
body weight by using difficult schedules or reducing the nutritional
content of the reinforcers, we can do the same assessment to see whether
variability has increased, and if so, how it increases with a drop in
body weight. We can also get a preliminary estimate of the gain of the
weight-control system.

I hope you will have some way of recording the quantity of reinforcer
ingested, instead of just how many accesses were given. A detailed
record would be best, but even a daily estimate of consumption would be
better than no information at all. If amount consumed varies on a short
time-scale, but we treat it as a constant, there will be variations in
the behavior (according to the PCT model) that will add unnecessary
noise to the results.

By the way, I've decided to try reproducing some of the published PCT
models as a way to test whether I really understand how to construct
PCT simulations properly. (Rick Marken's 1986 JEP:HPP model of
coordinated action looks like a nice place to start.) With this VI-VI
simulation I feel as though I'm being asked to fly before I'm sure I
can walk: it may get quite complicated before we're done. And I agree
with you that perhaps we should start by modeling a simpler situation,
such as FR or FI schedule performance.

All this is great news. I couldn't agree more about walking before we
try to fly. There are lots of PCT predictions to be checked out with the
simplest kinds of schedules before we start on more complex ones. For
example, one simple prediction is that an arbitrary additive disturbance
of the controlled quantity will be opposed by a change in the output
behavior. So if you arbitrarily remove reinforcers before they can be
eaten, the animal will immediately start pressing faster, and if you
arbitrarily add them, the animal will slow down its pressing. Of course
to see this effect you have to have the control system operating in its
normal range, near but not too near to zero error. Also, still in this
normal range, we would predict that small changes in the schedule will
be met by opposing changes in the behavior, to keep the rate of
reinforcement nearly the same. That is, if you go from one reward per
two presses to one reward per three presses, you should see the behavior
rate increase by close to a factor of 1.5. The detailed model that
matches the behavior should predict what the exact change in behavior
rate should be. Because changes like these come out of the normal
operation of the control system, with no changes in its parameters, we
should be able to check the effects of such disturbances very quickly.
No learning is involved.

Would you be willing to start out with a simple full-time FR experiment
in which the ratio is varied from day to day within a range that keeps
the animal's weight at 98% of ad libitum or greater? Let's talk about
it.
---------------------------------------------------------------------
Sam(uel) Saunders (941024.1935 EDT)--

Your longitudinal study sounds great. By the time I might have started
such a thing, all my kids were teenagers and couldn't care less about
the old man's computer obsessions.

The latest Turbo appears to be 7.0. Anyone know how compatible this is
with the versions other CSG model enthusiats are using ?

That's the one I just ordered. I know that Bruce's programs are
compatible with version 5.5, which I already had, and Tom Bourbon
reports they are compatible with 6.0. Rick Marken can probably be
prevailed upon to get 7.0 as well. I think you'll be fine with 7.0.

Even if you used some other Pascal, we could all be careful to split off
the graphical routines so that you could adapt the main part of the
program to whatever graphics system your computer uses.

I did use Forth once, while looking for the fastest way to program a
rather large project. It's a lot of fun, but for me it's a write-only
language. After 6 months I could no longer understand my own code. I
think it's a matter of inborn talent (and degree of mental organization)
-- real Forth programmers have expressed pity and amazement that I can't
remember my own word definitions. Good Heavens, they say, what's so hard
about that?

What you need now are some simple experiments in which a model is fitted
to the data, for prediction of future runs. On DEMO-2, the next to last
demonstration actually takes data and lets you adjust perceptual delay
and output integration factor for the best fit, and compares the model's
behavior with the real behavior. Maybe Tom or Rick has some Pascal code
that will do this for you more flexibly. I'll look around to see what I
may have. Best of all is to think up your own experiment, program it,
fit a model to it, and amaze yourself with how well it predicts. That's
what started both Tom and Rick on the road to perdition.

I'll get started on obtaining the cited articles (meaning, Help, Mary!).
-----------------------------------------------------------------------
Best to all,

Bill P.

[From Bill Powers (951110.2100 MST)]

Chris Cherpas (951110.0921 PT) --

     PCT-ers: what's the problem with saying that "behavior controls the
     environment AND the environment controls behavior?"

The problem is that in PCT, the term "control" has a very specific
technical meaning, which does not include other meanings that are
commonly intended (such as, "temperature controls the rate of a chemical
reaction," or "a rudder controls the direction of a ship").

Suppose you have a system A that acts to affect a variable B outside it.
A is said to control B if, for every disturbance tending to change B
away from some reference state, A changes its action on B so as to
oppose the disturbance and restore B to its reference state. The
reference state of B is the state to which the action of A always tends
to restore it.

A rat feeding itself exclusively by pressing a bar will produce a
certain rate of reinforcement. If something external changes that rate
of reinforcement (a change in schedule, extra reinforcements, or missing
reinforcements), the rat will change its behavior in a way that tends to
restore the reinforcement rate toward its original value. So according
to the technical PCT definition of control, the rat is controlling the
reinforcement rate.

This does not work the other way around. Suppose we say that the
reinforcement rate controls the rat's behavior rate. According to the
PCT definition, if the behavior rate is changed by some extraneous
disturbance, the reinforcement rate should then change in such a way as
to restore the behavior rate to its original state. Thus if something
increased the behavior rate, the reinforcement rate should fall enough
to bring the behavior rate back to the original state. This, of course,
is not what happens.

Consider our favorite example of driving a car. The position of the car
relative to the road is controlled by the driver's application of
steering efforts to the wheel. The driver knows of that position through
the appearance of the car and road in the windshield. A disturbance due
to a crosswind tends to alter the position of the car, and the driver
changes his steering effort to maintain the visual appearance in the
windshield in a particular state.

On the other hand, if the driver changes his steering effort, the
appearance of the road in the windshield does not alter its stimulus
effects on the driver in a way that restores the driver's steering
efforts to their original state. The driver controls the perceived
position of the car, but the perceived position of the car does not
control the driver. Thus the driver is able to steer the car around
another car or onto an off-ramp.

In PCT, control has a meaning distinct from "influence", "affect", and
"cause." The latter terms apply to straight-line effects; control
applies strictly to closed circles of causation.

···

-------------------------------
     Do PCT-ers understand that radical behaviorism considers what
     happens within the skin to be part of the functional environment
     controlling behavior?

Yes. But the radical departure of PCT from radical behaviorism concerns
what is controlled, not only the special usage of "control." Behaviorism
generally treats behavior as being what organisms do -- their visible
outputs, such as a rate of bar pressing. They assume that it is this
behavior that is controlled. In PCT, however, it is not output, but
input that is controlled.

Remember the definition above. If A is controlling B, then the behavior
of A is the physical action it produces that affects B. When a
disturbance acts on B, the physical action of A changes in such a way
that its effect on B is equal and opposite to the effect of the
disturbance. This is the process by which B is controlled.

So in general, the behavior of A varies with every disturbance. This is
part of _how_ A controls B, by changing its action as required to
counteract every disturbance of B. The remainder of the explanation is
that A is _sensing the state of B_ and acting to maintain what it senses
in a particular state. The small variations in B are amplified to
produce large variations in A's behavior, and it is those variations
that maintain B in an almost constant state. B is therefore an input to
A, a sensory input, and it is that sensory input that is controlled.

The last thing that has to be understood is that the controlled input is
not simply the input itself, but the input as compared with an internal
reference state, defined inside the controlling system A. If the
reference state is set to 10 units of B, then B will be maintained at 10
units, plus or minus a small error that drives disturbance-opposing
changes in the behavior, the action that affects B. If the reference
state then changes to 20 units, the action will change enough to bring B
to a value of 20 units and, again, maintain it at that value in spite of
disturbances acting on B. Because in general we have to assume that
unpredictable disturbances are always present, these different values of
B could not be brought about just by producing correspondingly different
values of A's behavior. In general, the only way to predict the behavior
is to know both the value of the reference setting and the magnitude and
direction of effects from all disturbances that are acting on B.

Neither the young Skinner nor any of his contemporaries or antecedent
psychologists knew of any kind of physical system that could behave in
this way. A simple description of how an artificial control system such
as a servomechanism works would have seemed to these people to be an
exercise in mysticism or illogic. All the variables in the closed loop
appear mathematically as values of functions that include the same
variables as arguments. The reference signal seems to create a situation
in which future outcomes are affecting present behaviors. The most
common arguments against purposive or goal-directed behavior cited
exactly these features of control systems -- the circular causality and
the ability of goals to determine outcomes -- as obvious fallacies and
absurdities which disproved the ideas of purposes and goals. So the
early Skinner and those who thought along the same lines had, by the
mid-1930s, armed themselves against the very ideas on which control
theory (just being invented by engineers during those same years) is
based. By discarding these ideas as absurdities and fallacies, they lost
their chance to understand purposive behavior, and left the engineers to
make the breakthrough.
-------------------------------
     The changes _within the skin_ produced by action are no less
     relevant than those outside; it's just that their private status
     shouldn't confer any _special_ causal status on them, and that it
     makes it harder to get the verbal community to establish very good
     agreed-on discriminations involving such events.

Of course. The PCT model is all about "changes within the skin," where
the nervous system and brain are. But PCT is about a particular kind of
organization "within the skin," an organization consisting of neural
signals and functions. There is no special causal status of any given
signal or function; what creates the assymetry between causes inside and
causes outside is the mathematics of control, the fact that the internal
processes involve enormous power amplification, while natural outside
processes normally involve only power losses.

     Then there's the open-loop, closed-loop dilemma: do PCT-ers think
     that radical behaviorists see a stimulus as producing a response
     and that's it for the causal stream?

What EABers have not understood is the special nature of a closed loop
system. In an approximate way, they have realized that it is there.
Behaviors affect reinforcements, reinforcements affect behaviors. But
knowing that it is there is not the same as understanding how it works;
understanding the laws that govern closed-loop systems. The nearest that
EABers have come to getting a glimmer of these laws is through seeing an
alternating sequence: behavior-reinforcement-behavior-reinforcement. But
this sequential treatment is fundamentally incorrect; starting with that
kind of analysis you will never get to the actual equations describing
how the whole loop works as a unit.

This problem is compounded by the fact that while the mathematics of
feedback functions have been explored and laid out as equations, the
reciprocal effects, from the consequence through the organism to the
behavior, have been characterized only in the loosest way, mostly
verbally. This means that true system equations can't be written and
can't be solved to find the behavior predicted by any assumptions about
the organism. Modelers like Staddon have come very close to setting up
true system equations, but nobody I know of has yet gone all the way to
a true working simulation in which feedback effects are correctly
treated. Staddon actually wrote the equations, but he didn't seem to
spend much time applying them, that I'm aware of.
-----------------------------------------------------------------------
Best to all,

Bill P.

[From Bruce Abbott (951201.1510 EST)]

Bill Powers (951130.0600 MST) --

    Bruce Abbott (951129.1930 EST)

Your argument about the explanation of autoshaping is persuasive but I'm
not yet convinced.

    But you seem to be assuming that reinforcement theory can only
    account for data in this post hoc fashion. On what evidence?

Only negative evidence; I haven't yet seen any experiment in which a new
result was predicted beforehand according to basic principles of
reinforcement theory. For all I know, thousands of examples exist; I
just haven't seen any. In the case you mention, classical conditioning
theory and reinforcement theory both existed before the initial
experiments with autoshaping, yet apparently nobody was able to predict
what would happen in this experiment until after it had been done.

Absence of evidence is not evidence of absence. One thing to keep in mind
when looking for examples in which predictions have been made is that
"reinforcement theory" is not a single theory to which everyone in the field
subscribes, although most versions share a common core of assumptions. Much
of the work being done is aimed more at identifying how certain factors
should enter into the analysis (i.e., theory development) than at testing
the predictions of a finished theory. Consider the recent work on choice in
which various alternative proposals have been developed and their
predictions compared: all may assume that choice is dictated by relative
reinforcement and "response cost" and yet differ in their assumptions as to
how pigeons perceive what might be termed the reinforcing value of each
alternative and what is being optimized or maximized via those choices. The
quantitative models that have been developed to account for choice make
specific, testable predictions, which have in fact been submitted to
empirical test. The surviving models have been able to successfully predict
choice before the fact and some have yielded surprising predictions that
have been supported by experimental results.

Where you generally get into the post hoc type of reasoning is when somebody
gets some result and challenges the other guys to account for it with their
particular view. It's not a great test of theory, and it is not the way
theories are normally assessed, but it does say at least something if the
promoter of a theory can't think of a way to make the theory postdict the data.

If you want to see experiments in which theory-based predictions have been
made and tested, I suggest taking a look at just about any issue of JEAB;
you will generally find a number of reports of this type, along with more
purely empirical studies.

    The explanation emerged as a result of experimental testing to
    determine what was actually going on in this situation. It had
    nothing to do with guesses about what the illuminated key looked
    like to the pigeon.

Well, here is how your words went:

    The explanation that was eventually developed for autoshaping goes
    like this. Hungry pigeons normally exhibit an innate (unlearned)
    response toward things that look like they might be edible: they
    peck at them. ...

I think you brought up the way keys look to the pigeon; am I misreading
this?

I can see how you might read it that way, but yes, you are misreading it.
Pigeons (without training) peck at seeds etc. they see on the ground and
sieze them. (They soon learn which visual objects usually turn out to be
palatable and which do not.) Through classical conditioning, another
stimulus, not necessarily resembling the US in any way, triggers the same
response. The pigeon pecks at the key, not because the key looks like it
might be edible, but because key illumination has been reliably followed by
food delivery, setting up the conditions necessary to convert the
illuminated key into a CS that elicits pecking.

Now that it has been established that classical conditioning can be
involved, have all new experiments been analyzed with that in mind? Or
is this explanation invoked only when standard reinforcement theory
can't explain the results? Does the classical conditioning stop working
when you're not paying attention to it?

Classical conditioning needs to be taken into account only when the
experimental arrangement meets the requirements for classical conditioning.
Those requirements were met in the autoshaping experiment because
illumination of the keylight was reliably followed, after a short and fixed
time period, by delivery of grain, without any response being required of
the pigeon for that delivery. Where a keypeck is required (operant
conditioning), food cannot be expected in the absence of the response and
classical conditioning to the illuminated key per se will not occur. So no,
all new experiments are not analyzed with that in mind, nor need they be.
Once it is recognized that a particular response _can_ be classically
conditioned, one can predict under which conditions such conditioning is to
be expected.

Now before I get accused of being a rabid reinforcement theorist again, I
will note that reinforcement theory works, when it does work, because of the
actions of control systems. Without a correct appreciation of what
"reinforcement" really is, one can only identify by empirical means what
will and will not serve to "reinforce" a response, and under what
conditions. Guessing about reinforcers is a lot like guessing about
controlled perceptions, and in both cases a good model cannot be built until
the empirical work has identified those things. But control theory is
fundamental: tell me what perception is under control, and I'll tell you
what sorts of things can serve as the "reinforcers."

I suggest, by the way, that our first rat experiment should be focused
on answering a very simple question: can rats vary their rate of bar-
pressing in a systematic relationship with food intake? From our
previous modeling efforts, there is a strong suggestion that they can't,
or that if they can, the conditions under which they can do this are
ill-defined. I should think it would be very significant to EAB if it
turned out that rats either press at some fixed rate (perhaps related to
deprivation level) or don't press at all. For PCT, the original idea of
what the behavioral control system is demands that in at least some
small region near the free-feeding situation, a decrease in food intake
rate should lead to an increase in pressing rate. If that's not true, we
have to look for a very different model.

That's my intention, as I want to begin by replicating the FR findings in
both the traditional and Collier-type (all food earned via lever-pressing)
situations while collecting the additional high-resolution data we need for
the analysis. But I suspect that disturbing performance by changing the
ratio (i.e. environment function) may prove to be a poor way to go; we may
want to do it by other means, such as by changing the amount of food
delivered or the response force requirement.

P.S. Dag, I haven't forgotten you. I'm working on a reply.

Regards,

Bruce

[From Bruce Abbott (951201.2105 EST)]

Bill Powers (951201.1430 MST) --

    Bruce Abbott (951201.1510 EST)

    yes, you are misreading it. Pigeons (without training) peck at
    seeds etc. they see on the ground and sieze them. (They soon learn
    which visual objects usually turn out to be palatable and which do
    not.) Through classical conditioning, another stimulus, not
    necessarily resembling the US in any way, triggers the same
    response.

Wait a minute. Let's see if we can't get explicit about what you're
proposing here.

1. "Pigeons peck at seeds" you say. "Peck" I understand, and "seeds" I
understand, but what does it mean to say that the pigeons peck AT the
seeds? If a seed is an unconditional stimulus, is the effect of the
stimulus connected to the muscles that create pecking so that the
pecking response brings the beak near or on the seed that stimulated it
(wherever it happens to be) instead of somewhere else? Is this
phenomenon accounted for in EAB (remembering that it is said to be
innate)?

I suspect that the phenomenon of pecking at seeds (or things that might be
seeds) is apparently so "obvious" that it has escaped the attention of
reinforcement theorists. Either the pigeon's brain computes the muscle
contractions necessary to propel its beak toward the seed (ballistic strike)
or it is computing the error between beak and target and adjusting tensions
as necessary to keep the trajectory on target; either way I doubt that
reinforcement theorists have thought much about how the pigeon does this,
other than to think that there is no real problem accounting for it. This
directed pecking is presumably part of an innate action pattern released
(under the right conditions) by the sight of stimuli having the right form
(i.e., look like seeds or other objects a pigeon will accept as edible).

2. When you say "through classical conditioning" another stimulus comes
to trigger the "same response," are you saying that another stimulus
acts on the nervous system to have the same effect that a seed has; that
is, triggering a pecking action that is aimed at a seed? Or does it now
trigger the same response but in such a way that it is aimed at the new
stimulus instead of a seed?

The latter.

3. Exactly what is it _about_ classical conditioning that produces this
result? Are you sure that you're not just naming this effect "classical
conditioning" and then using the name to explain the effect?

Like so many things, classical conditioning is a name both for the procedure
and for the process that supposedly takes place in the context of that
procedure. The _procedure_ involves pairing CS (e.g., tone) with US (e.g.,
meat powder), a stimulus that reflexively produces a response (i.e., innate
S-R connection). The _process_ is the acquisition of a (usually similar)
response to the CS (i.e., the formation of a learned S-R connection). If
you apply the procedure properly and the CS is discriminable, you expect to
get the process. If you can show that the acquisition of the response
depends on the administration of the classical conditioning procedure
(ruling out other explanations), then it is a classically conditioned response.

In the autoshaping case, sight-of-seed = US, pecking (directed at seed) =
unconditional response to US, CS = illuminated key and pecking (directed at
illuminated key) is presumably the conditioned response. Experimental tests
designed to rule in/rule out classical conditioning as the process involved
in the acquisition of the keypeck led to the conclusion that classical
conditioning is the process responsible.

    Where a keypeck is required (operant conditioning), food cannot be
    expected in the absence of the response and classical conditioning
    to the illuminated key per se will not occur.

Is the first phrase, "food cannot be expected in the absence of the
response" an explanation of the second one, "classical conditioning ...
will not occur?" Or are you only saying that classical conditioning is
defined as the case where food can occur when the key is illuminated and
without any pecking, and not as the case where pecking is required to
produced food?

I meant the former; if a peck is required, then the illuminated key (CS) by
itself is not a reliable signal for food; only CS+peck is. CS alone then
will not elicit a classically conditioned peck.

If the first phrase is meaningful, what does the term "expected" mean?
Do the pigeon's expectations play a role in differentiating between
classical and operant conditioning?

I was being a bit looser with my terms than an EABer would tolerate;
expectation is a cognitive term. I only meant to convey the idea that if a
keypeck is required, then illuminated key + keypeck signals food, whereas
illuminated key alone does not.

OK. If I understand you correctly, if food is produced by a peck on a
key, classical conditioning is not predicted, but if the key is
illuminated and food follows even without a keypeck, classical
conditioning is predicted.

Yes.

What happens during the time that the key is
illuminated but before the noncontingent delivery of food occurs?

Nothing is _programmed_ to occur.

Does
the food appear immediately upon a keypeck when the key is illuminated?

Yes.

I have to presume it does, because otherwise the food would appear at
the end of the delay whether or not there was a peck. And if so, is not
the delivery of food contingent on the keypeck just as in operant
conditioning?

Yes.

I fail to see any rigorous distinction between the

classical and operant phenomena in autoshaping.

The delivery of food contingent on the keypeck reinforces the keypeck (in
the presence of the illuminated key; this is pure operant conditioning, one
expects that pecking of the key will become more frequent.

However, it is possible to program the experiment so that food delivery
would _not_ be contingent on a keypeck; the pigeon need do nothing but await
the delivery of grain at the end of CS presentation. Here there is no
operant procedure in effect (no immediate reinforcement of the keypeck
response), yet what is observed is that the pigeon begins to peck at the key
anyway, although it is a total waste of the pigeon's effort. But if
keypecking is an innate response sequence released by certain perceptions
(under the right conditions), then the procedure meets the specification for
a classical conditioning procedure. Tests indicate that this procedure is
sufficient to produce the keypeck, which pigeons otherwise will not produce
without first being subjected to an elaborate shaping operation.

For operant conditioning, a contingency must be arranged between response
and reinforcer. Such a contingency is absent here, so the keypeck cannot be
attributed to operant conditioning.

However, in the traditional autoshaping situation, both types of procedural
relationship are present. The initial peck(s) is/are a product of classical
conditioning (because it is known that such pecks do not occur in strictly
operant procedures without shaping). Classical conditioning generates a
baseline of responding, thus making operant conditioning (via immediate,
contingent delivery of food) possible (because you can't reinforce a
response that never occurs).

It seems to me that the so-called classical conditioning doesn't work
properly. Why shouldn't it produce pecks wherever they happen to be
occurring when the light turns on? Those pecks, just as surely as pecks
on the key, will be followed by delivery of food. If the pigeon pecks 6
inches to the left of the key each time the light turns on, food will
appear. So if the light turns on, the pigeon should peck 6 inches to the
left of the key. How does classical conditioning ever get it to peck ON
the key?

Presumably the keypeck is target-directed whenever it occurs; what classical
conditioning does is define the illuminated key as a "proper" target.

I would suggest that most theorists in EAB would be perfectly comfortable
with the idea of innate, target-directed pecking and have no problem with
the notion that directed pecking includes an error-correction mechanism that
keeps the peck aimed toward the target through corrective action. Just
about everyone these days is familiar with simple guidance mechanisms like
this, although most are likely to explain their workings incorrectly in
terms of a sequence of events around the loop. The reference (target
position) is an external environmental variable, isn't it? (;-> Just the
sort of the thing an EABer likes to see controlling a response!

If you introduce disturbances in an operant conditioning experiment, you
will no longer be able to define a reinforcer, because no state of the
variable in question will increase the probability of any particular
response. In fact, the responses can be made independent of the state of
the reinforcer, yet control will continue.

I have repeatedly brought this point up, without a comment from you. If
you will promise to discuss this point, I will work up a simple
experiment to illustrate it. But if you're just going to let it slide
past again, I won't bother. What about it?

Ah, you cut me to the quick! I have indeed commented: check your archive.
I don't remember which post, but my reply was given the last time you
brought up the topic. As I recall, you asked me about it in the form of a
test. After reading my reply, you said I passed.

Remember?

Regards,

Bruce