Signal, Schmignal?

[From Bruce Abbott (970830.2125 EST)]

Bill Powers (970830.1812 MDT) --

Ok, so we're agreed on the PCT predictions. Now, what does reinforcement
theory say about the effect of reinforcements on behavior, under the same
circumstances?

I think that discussion of reinforcement _theory_ at this point would be
premature. What I'm attempting to do at present is link certain EAB terms
(e.g., reinforcer, establishing operation) to control theory. Because these
terms label certain observations rather than theoretical constructs, we can
talk about them and their relation to control theory independent of any
particular theory of reinforcement.

Regards,

Bruce

[From Bruce Abbott (970830.2210 EST)]

Rick Marken (970830.1740) --

Bill P.

I'm asking if we are agreed that when the contingency is first
established, we have a condition of minimum CV, maximum error,
and maximum behavior, and that AS REINFORCEMENT ARE DELIVERD,
the CV increases, the error decreases, and the BEHAVIOR DECREASES.
Is this correct? [my emphasis -- RM]

Bruce A.

CORRECT [my emphasis -- RM]

Bill Powers (970830.1812 MDT)--

Ok, so we're agreed on the PCT predictions. Now, what does
reinforcement theory say about the effect of reinforcements
on behavior, under the same circumstances?

You're good. You're VERY good;-)

Not in this case.

I can only surmise from this that you belive that Bill Powers has laid an
oh-so-clever trap for the poor unsuspecting Bruce Abbott, which has now been
sprung on that dimwit in the final Powers paragraph quoted above. Not so.

I am not out to defend reinforcement theory, so this can hardly be a trap
for me. But even if I were, it would be no trap: reinforcement theory's
prediction is the same as the PCT prediction. I'll explain later, when I've
had a chance to lay the proper foundation. Good, schmood.

Regards,

Bruce

[From Bill Powers (970830.2049 MDT)]

Bruce Abbott (970830.2125 EST)--

I think that discussion of reinforcement _theory_ at this point would be
premature. What I'm attempting to do at present is link certain EAB terms
(e.g., reinforcer, establishing operation) to control theory. Because
these terms label certain observations rather than theoretical constructs,
we can talk about them and their relation to control theory independent of
any particular theory of reinforcement.

All right, let's do. What is the observed relationship of reinforcement to
behavior in the situation we've been describing? That is, an establishing
operation has properly taken place, and a contingency is established that
enables behavior to begin producing reinforcements. Just after the
contingency has been established, there is zero reinforcement. What, under
the set of observations to which you are referring, is the expected
behavior rate at that time? After the first reinforcement has been
delivered by a lever-press, what then happens to the rate of
lever-pressing? As reinforcements continue to be delivered, what is the
ensuing change in behavior rate?

Best,

Bill P.

[From Bruce Abbott (970831.0915 EST)]

Bill Powers (970830.2049 MDT) --

Bruce Abbott (970830.2125 EST)

I think that discussion of reinforcement _theory_ at this point would be
premature. What I'm attempting to do at present is link certain EAB terms
(e.g., reinforcer, establishing operation) to control theory. Because
these terms label certain observations rather than theoretical constructs,
we can talk about them and their relation to control theory independent of
any particular theory of reinforcement.

All right, let's do. What is the observed relationship of reinforcement to
behavior in the situation we've been describing? That is, an establishing
operation has properly taken place, and a contingency is established that
enables behavior to begin producing reinforcements. Just after the
contingency has been established, there is zero reinforcement. What, under
the set of observations to which you are referring, is the expected
behavior rate at that time? After the first reinforcement has been
delivered by a lever-press, what then happens to the rate of
lever-pressing? As reinforcements continue to be delivered, what is the
ensuing change in behavior rate?

Assuming that this is an experienced animal, the first lever-press will
occur shortly after the rat has been placed in the chamber, and will then
occur (initially) at a rate limited by the time required to press the lever,
retrieve the pellet, consume it, and return to the lever. As the animal
fills up on food pellets, the rate will decline, both because the chain of
behaviors will be executed less vigorously and because time spent doing
other things (usually between swallowing a pellet and returning to the
lever) will increase. Both changes will be far less marked during a typical
session if the rate of pellet delivery is limited to some low value than if
the animal can consume a great many pellets during that time.

If reinforcers are output-contingent consequences that tend to reduce the
error between a CV and its reference, then the operant search for the
reinforcer is the search for a variable that has a certain physical effect
on the CV. Thus the reinforcer and the CV are closely related, though
separate, concepts. The reinforcer is the means by which error between the
CV and its reference may be corrected. It will be effective only when the
variable we are labeling the "CV" is actually being controlled, error is
present between the CV and its reference, and the error is in the direction
such that production of the reinforcer will tend to reduce that error. When
one has shown that a given behavior is being "maintained" by a given
reinforcer (in operant language), one has identified something with this
property, without necessarily identifying what the CV is that is so
affected. However, by homing in carefully on the specific property of the
reinforcer that is the "effective ingredient," one can also converge on a
limited set of possible CVs; those that would be affected by the property
identified.

I'll stop here and await comments and questions.

Regards,

Bruce

[From Rick Marken (970830.1740)]

Bill P.

I'm asking if we are agreed that when the contingency is first
established, we have a condition of minimum CV, maximum error,
and maximum behavior, and that AS REINFORCEMENT ARE DELIVERD,
the CV increases, the error decreases, and the BEHAVIOR DECREASES.
Is this correct? [my emphasis -- RM]

Bruce A.

CORRECT [my emphasis -- RM]

Bill Powers (970830.1812 MDT)--

Ok, so we're agreed on the PCT predictions. Now, what does
reinforcement theory say about the effect of reinforcements
on behavior, under the same circumstances?

You're good. You're VERY good;-)

Love

Rick

···

--

Richard S. Marken Phone or Fax: 310 474-0313
Life Learning Associates e-mail: rmarken@earthlink.net
http://home.earthlink.net/~rmarken/

[From Bill Powers (970831.0936 MDT)]

Bruce Abbott (970831.0915 EST)--

Assuming that this is an experienced animal, the first lever-press will
occur shortly after the rat has been placed in the chamber, and will then
occur (initially) at a rate limited by the time required to press the
lever, retrieve the pellet, consume it, and return to the lever. As the
animal fills up on food pellets, the rate will decline, both because the
chain of behaviors will be executed less vigorously and because time spent
doing other things (usually between swallowing a pellet and returning to
the lever) will increase. Both changes will be far less marked during a
typical session if the rate of pellet delivery is limited to some low

value >than if the animal can consume a great many pellets during that time.

If the food pellets are the reinforcers, and filling up on food pellets
results in a decline in behavior rate, what is the observed relationship
between the reinforcements and the behavior rate? Isn't it that the greater
the reinforcement rate is, the less the behavior is?

You're talking now about an experienced animal, the case in which it is
said that continued reinforcement maintains or supports behavior that has
already been conditioned. However, the observations you describe entail a
negative effect of reinforcements on behavior, because increasing
reinforcements go with decreasing behavior as long as the contingency is in
effect. This can be seen most easily by artificially raising or lowering
the rate at which the reinforcers appear. The maximum behavior is observed
when the reinforcements aren't appearing at all. As behavior begins
producing reinforcers, the rate or amount of behavior declines. The steady
state is reached when the reinforcement rate ceases to increase and the
behavior rate ceases to fall.

This does not seem consistent with the general observation of "the
strengthening of behavior which results from reinforcement" (Skinner, 1953,
_Science and human behavior_ p. 65). What is actually observed seems to be
just the opposite effect.

If reinforcers are output-contingent consequences that tend to reduce the
error between a CV and its reference, then the operant search for the
reinforcer is the search for a variable that has a certain physical effect
on the CV. Thus the reinforcer and the CV are closely related, though
separate, concepts. The reinforcer is the means by which error between the
CV and its reference may be corrected. It will be effective only when the
variable we are labeling the "CV" is actually being controlled, error is
present between the CV and its reference, and the error is in the direction
such that production of the reinforcer will tend to reduce that error.

This is simply the PCT model, which I already understand. In the PCT model
the negative relationship between the variable you are calling the
reinforcer and the behavior rate is to be expected, and is explained by the
negative feedback loop. But we are talking about observations, not models.
My objection is that the description does not match the observations.

When
one has shown that a given behavior is being "maintained" by a given
reinforcer (in operant language), one has identified something with this
property, without necessarily identifying what the CV is that is so
affected. However, by homing in carefully on the specific property of the
reinforcer that is the "effective ingredient," one can also converge on a
limited set of possible CVs; those that would be affected by the property
identified.

If an increase in variable R is observed to result in a decrease in
variable B, I fail to see why it is appropriate to say that R "maintains"
B. I could see saying that B "maintains" or "supports" R, when B is the
output of a control system and R is an input to it, and there is an
observable positive dependence of R on B. But a leak in a bucket does not
maintain the water level in the bucket; it causes the water level to
decrease. When a negative relationship is observed, "maintain" and
"support" are simply inappropriate terms.

It seems to me that the place where you are most justified in using these
terms is in describing the overall picture of what happens when a
contingency is first established (with a naive animal) and when it is
finally disabled for good. What we observe (using the standard methods of
recording data in EAB) is that the animal at first shows little operant
behavior, and reinforcements occur only rarely. As time goes on, the
reinforcement rate rises as the recorded behavior rate rises, until finally
the behavior rate has come to some high value and so has the reinforcement
rate. As long as the contingency is enabled, the (relatively) high rates of
behavior and reinforcement continue. When the contingency is disabled
permanently, after a while we observe that the reinforcement and behavior
rates decline to some relatively low background level. This can be
interpreted as a theoretical reflexive positive effect of reinforcements on
behavior when the contingency is enabled, a description that seems to fit
both the initial onset of behavior and the subsequence extinction when the
contingency is disabled. Under that interpretation, it makes sense to say
that the reinforcement maintains or supports the behavior, because we're
talking about a positive relationship. We would expect the minimum behavior
rate to go with the minimum reinforcement rate, and the maximum with the
maximum. This is the opposite of the relationship that we would expect from
a control system defined as we've been assuming here.

However, and I don't want to leave this out before turning the mike back
over to you, these observations depend greatly on how the data are
recorded. The usual method is to record contact closures, the rate being
determined by dividing the total number of closures in a session by the
duration of the session. If, instead, we recorded contact closures only
when the animal was in position with its paws in the appropriate place, and
measured elapsed time only under the same conditions, we would find a very
different relationship between reinforcement rate and behavior rate. The
rest of the time, when the animal was roaming around in the cage, grooming,
sleeping, or licking various parts of the cage, we would not measure rate
of behavior or reinforcement.

If we simply look at total behavior, whether it takes place in contact with
the lever or elsewhere, we will see that total behavior rate is a rising
function of deprivation, and that maximum behavior rate goes with the
minimum rate of reinforcement, as in the case of the experienced animals.
The behavior that is occurring initially could be called "exploration,"
moving from one place to another and doing various things. This behavior is
driven by the error signal as usual. When food begins to appear at one
place during this exploration, the error drops and exploration is
interrupted, to be replaced by a control loop in which the behavior
produces food. So the appearance of a given behavior being reinforced
results from a cessation of the exploration behavior -- the principle
behind reorganization. Is this more or less where you're headed?

Best,

Bill P.

[From Bruce Abbott (970831.2010 EST)]

Bill Powers (970831.0936 MDT) --

Bruce Abbott (970831.0915 EST)

Assuming that this is an experienced animal, the first lever-press will
occur shortly after the rat has been placed in the chamber, and will then
occur (initially) at a rate limited by the time required to press the
lever, retrieve the pellet, consume it, and return to the lever. As the
animal fills up on food pellets, the rate will decline, both because the
chain of behaviors will be executed less vigorously and because time spent
doing other things (usually between swallowing a pellet and returning to
the lever) will increase. Both changes will be far less marked during a
typical session if the rate of pellet delivery is limited to some low
value than if the animal can consume a great many pellets during that time.

If the food pellets are the reinforcers, and filling up on food pellets
results in a decline in behavior rate, what is the observed relationship
between the reinforcements and the behavior rate? Isn't it that the greater
the reinforcement rate is, the less the behavior is?

No, but close. The greater the number of reinforcers delivered, the less
the behavior is, assuming that the increase in error between CV and its
reference due to disturbance is small by comparison, on the same time scale.

You're talking now about an experienced animal, the case in which it is
said that continued reinforcement maintains or supports behavior that has
already been conditioned. However, the observations you describe entail a
negative effect of reinforcements on behavior, because increasing
reinforcements go with decreasing behavior as long as the contingency is in
effect. This can be seen most easily by artificially raising or lowering
the rate at which the reinforcers appear. The maximum behavior is observed
when the reinforcements aren't appearing at all. As behavior begins
producing reinforcers, the rate or amount of behavior declines. The steady
state is reached when the reinforcement rate ceases to increase and the
behavior rate ceases to fall.

Yes, the effect of reinforcers in this case is to reduce the rate of the
operant that produces them, and thus the rate of reinforcement. This effect
is termed "satiation."

This does not seem consistent with the general observation of "the
strengthening of behavior which results from reinforcement" (Skinner, 1953,
_Science and human behavior_ p. 65). What is actually observed seems to be
just the opposite effect.

You are returning to theory again. Don't forget that the effectiveness of
reinforcement has been shown to depend directly on the degree of deprivation
(in this example), and that deprivation level is declining as more and more
food is consumed. But if each pellet delivered is small and the rate of
delivery limited, little of this effect will be observed (i.e., the error
between CV and its reference will remain high throughout the session). So
long as lever presses continue to produce the reinforcer, lever presses
continue to to occur: maintained behavior. I don't see a contradiction here.

If reinforcers are output-contingent consequences that tend to reduce the
error between a CV and its reference, then the operant search for the
reinforcer is the search for a variable that has a certain physical effect
on the CV. Thus the reinforcer and the CV are closely related, though
separate, concepts. The reinforcer is the means by which error between the
CV and its reference may be corrected. It will be effective only when the
variable we are labeling the "CV" is actually being controlled, error is
present between the CV and its reference, and the error is in the direction
such that production of the reinforcer will tend to reduce that error.

This is simply the PCT model, which I already understand. In the PCT model
the negative relationship between the variable you are calling the
reinforcer and the behavior rate is to be expected, and is explained by the
negative feedback loop. But we are talking about observations, not models.
My objection is that the description does not match the observations.

This is more than "simply the PCT model." What I have done is to relate to
the PCT model the phenomena to which the terms "reinforcer" and
"establishing operation" refer, and in a way that differs from your own
previous attempts to do so. (Earlier you said that the reinforcer was the
CV.) Now you claim that "the description does not match the observations."
How so?

When
one has shown that a given behavior is being "maintained" by a given
reinforcer (in operant language), one has identified something with this
property, without necessarily identifying what the CV is that is so
affected. However, by homing in carefully on the specific property of the
reinforcer that is the "effective ingredient," one can also converge on a
limited set of possible CVs; those that would be affected by the property
identified.

If an increase in variable R is observed to result in a decrease in
variable B, I fail to see why it is appropriate to say that R "maintains"
B. I could see saying that B "maintains" or "supports" R, when B is the
output of a control system and R is an input to it, and there is an
observable positive dependence of R on B. But a leak in a bucket does not
maintain the water level in the bucket; it causes the water level to
decrease. When a negative relationship is observed, "maintain" and
"support" are simply inappropriate terms.

The relevant observation is not the comparison between performance early in
the session and later, when significant error reduction has occurred, but
between performance when error is high and responses either do or do not
produce the reinforcer. Responding continues to occur when responses
produce the reinforcer (so long as there is not significant satiation) and
extinguish when they do not. You can keep a rat pressing that lever nearly
continuously, all day long, if the rate at which food is received equals the
rate at which it is lost due to energy needs, so that error between the CV
and its reference remains high throughout.

It seems to me that the place where you are most justified in using these
terms is in describing the overall picture of what happens when a
contingency is first established (with a naive animal) and when it is
finally disabled for good. What we observe (using the standard methods of
recording data in EAB) is that the animal at first shows little operant
behavior, and reinforcements occur only rarely. As time goes on, the
reinforcement rate rises as the recorded behavior rate rises, until finally
the behavior rate has come to some high value and so has the reinforcement
rate. As long as the contingency is enabled, the (relatively) high rates of
behavior and reinforcement continue. When the contingency is disabled
permanently, after a while we observe that the reinforcement and behavior
rates decline to some relatively low background level. This can be
interpreted as a theoretical reflexive positive effect of reinforcements on
behavior when the contingency is enabled, a description that seems to fit
both the initial onset of behavior and the subsequence extinction when the
contingency is disabled. Under that interpretation, it makes sense to say
that the reinforcement maintains or supports the behavior, because we're
talking about a positive relationship. We would expect the minimum behavior
rate to go with the minimum reinforcement rate, and the maximum with the
maximum. This is the opposite of the relationship that we would expect from
a control system defined as we've been assuming here.

You're getting into reinforcement _theory_ again, but I would argue that
this is consistent with a control model that includes reorganization, and
that this is the only proper model to use for comparison with reinforcement
theory.

However, and I don't want to leave this out before turning the mike back
over to you, these observations depend greatly on how the data are
recorded. The usual method is to record contact closures, the rate being
determined by dividing the total number of closures in a session by the
duration of the session. If, instead, we recorded contact closures only
when the animal was in position with its paws in the appropriate place, and
measured elapsed time only under the same conditions, we would find a very
different relationship between reinforcement rate and behavior rate. The
rest of the time, when the animal was roaming around in the cage, grooming,
sleeping, or licking various parts of the cage, we would not measure rate
of behavior or reinforcement.

In reinforcement theory, reinforcement is supposed to affect the
_probability_ with which the operant occurs, which is reflected in the rate
of responding. To the extent that the animal's time must be filled with
something (if only resting behavior), any increase in the probability of the
operant necessarily comes at the expense of a decrease in the probability of
one or more other activities. The observations to which you refer are not
inconsistent with this conception.

If we simply look at total behavior, whether it takes place in contact with
the lever or elsewhere, we will see that total behavior rate is a rising
function of deprivation, and that maximum behavior rate goes with the
minimum rate of reinforcement, as in the case of the experienced animals.
The behavior that is occurring initially could be called "exploration,"
moving from one place to another and doing various things. This behavior is
driven by the error signal as usual. When food begins to appear at one
place during this exploration, the error drops and exploration is
interrupted, to be replaced by a control loop in which the behavior
produces food. So the appearance of a given behavior being reinforced
results from a cessation of the exploration behavior -- the principle
behind reorganization. Is this more or less where you're headed?

Yes.

Regards,

Bruce

[From Rick Marken (970831.0720)]

Me to Bill Powers --

You're good. You're VERY good;-)

Bruce Abbott (970830.2210 EST) --

I can only surmise from this that you belive that Bill Powers
has laid an oh-so-clever trap for the poor unsuspecting Bruce
Abbott

No. He just made a great point (with your help). There is no
need to set traps for you; you're already caught in the trap
of behaviorism. Actually, Bill's trying to spring you!

which has now been sprung on that dimwit

You are not even close to being a dimwit. A dimwit could not
possibly have maintained his belief in the existance of
reinforcers and reinforcement as skillfully as you have over
last several weeks. No, you are _definitely_ not a dimwit!

I am not out to defend reinforcement theory

I know.

But even if I were, it would be no trap: reinforcement theory's
prediction is the same as the PCT prediction.

I am quite certain that you believe this and that you will
continue to believe this. But I didn't want to say that
because it pisses Bill off; he thinks it means that I _want_
you to keep believing this. I don't. But I know that there
is nothing I can do about it. Bill still has hopes; that's
why I made my snide comment to him being a glutton for
punishment. All I can say is that I hope, for Bill's sake
(and mine), that I am dead wrong about this.

I'll explain later, when I've had a chance to lay the
proper foundation.

I can hardly wait. I think the proper foundation has already
be laid down: perceptual control theory. Have you got another
one?

Best

Rick

···

--

Richard S. Marken Phone or Fax: 310 474-0313
Life Learning Associates e-mail: rmarken@earthlink.net
http://home.earthlink.net/~rmarken/

[From Bill Powers (970831.2100 MDT)]

Bruce Abbott (970831.2010 EST)--

If the food pellets are the reinforcers, and filling up on food pellets
results in a decline in behavior rate, what is the observed relationship
between the reinforcements and the behavior rate? Isn't it that the greater
the reinforcement rate is, the less the behavior is?

No, but close. The greater the number of reinforcers delivered, the less
the behavior is, assuming that the increase in error between CV and its
reference due to disturbance is small by comparison, on the same time scale.

This is not quite it, either. It is not the total number of reinforcers
delivered that causes the CV to increase, but the rate of delivery. We
assume that the loss rate is proportional to the nutrient concentration, as
a first approximation. So the mean level of the CV will be the level at
which the loss rate equals the intake rate, and will come to a steady state
at a constant intake rate, not continually increase.

The dynamics of the control system will depend on which controlled variable
you're talking about. The stomach-filling system, for example, would
accumulate the pellets that are eaten far faster than the stomach is
emptied, so the CV would be brought to its reference level quickly and
eating would stop. The mouth-filling control system would work on an even
shorter time-scale. The slowest system would be the nutrient control system.

You're talking now about an experienced animal, the case in which it is
said that continued reinforcement maintains or supports behavior that has
already been conditioned. However, the observations you describe entail a
negative effect of reinforcements on behavior, because increasing
reinforcements go with decreasing behavior as long as the contingency is
in effect. This can be seen most easily by artificially raising or
lowering the rate at which the reinforcers appear. The maximum behavior

is >>observed when the reinforcements aren't appearing at all. As behavior

begins producing reinforcers, the rate or amount of behavior declines. The
steady state is reached when the reinforcement rate ceases to increase and
the >behavior rate ceases to fall.

Yes, the effect of reinforcers in this case is to reduce the rate of the
operant that produces them, and thus the rate of reinforcement. This
effect is termed "satiation."

You're using "reinforcers" here simply as a label for designating the food
pellets. It would not make sense to use it to designate the technical
definition, under which an increase in reinforcement produces an increase
in the observed frequency of behavior, because that is not what is
observed. Calling the reduction effect "satiation" merely puts a label on
this contradiction; it doesn't do away with it. This sort of usage is
tantamount to defining a reinforcer as something that increases the
probability of behavior, except when it decreases that probability.

This does not seem consistent with the general observation of "the
strengthening of behavior which results from reinforcement" (Skinner,
1953, _Science and human behavior_ p. 65). What is actually observed

seems >>to be just the opposite effect.

You are returning to theory again.

No, I was just repeating Skinner's observation -- unless you mean that
"strengthening" is a theoretical term. I took it to be synonymous with
observing an increase in behavior rate.

Don't forget that the effectiveness of
reinforcement has been shown to depend directly on the degree of >deprivation
(in this example), and that deprivation level is declining as more and more
food is consumed. But if each pellet delivered is small and the rate of
delivery limited, little of this effect will be observed (i.e., the error
between CV and its reference will remain high throughout the session).

The sign of the relationship is not affected by the size of the changes.
You seem to be arguing that if the food deliveries are small enough, an
increase in reinforcement will produce an increase in behavior. In the
system as we're discussing it, this can never happen.

So
long as lever presses continue to produce the reinforcer, lever presses
continue to to occur: maintained behavior. I don't see a contradiction here.

You're confusing the overall effect starting with a naive animal and going
through to extinction with the effect in an experienced animal while the
contingency is enabled. And you're using a transitive verb again without
designating the subject, a habit into which you've fallen because of
hanging out with a crowd that doesn't speak English very well. Subject
maintains object; the direct implication is a positive relationship. You
don't say that A "maintains" B if you find that decreasing A results in an
increase in B, and increasing A decreases B. The term "maintain" in this
context carries an implication of a positive sign of effect, as is shown by
the general statement that an increase in reinforcement increases behavior.

I wish you would get off your high horse about having your language
corrected and realize that your use of "maintain" is JARGON. In this case
it's especially pernicious because it allows you to talk about a reinforcer
as if it is having a positive effect on behavior, when it actually has a
negative effect.

If reinforcers are output-contingent consequences that tend to reduce the
error between a CV and its reference, then the operant search for the
reinforcer is the search for a variable that has a certain physical
effect on the CV.

I missed this the first time around. Here you are talking strictly about
the behavior of searching, not the behavior of pressing.

This is more than "simply the PCT model." What I have done is to relate to
the PCT model the phenomena to which the terms "reinforcer" and
"establishing operation" refer, and in a way that differs from your own
previous attempts to do so. (Earlier you said that the reinforcer was the
CV.) Now you claim that "the description does not match the observations."
How so?

Again: you confuse the behavior of searching for an action that will have
an effect on the CV with the behavior of carrying out that action to
control the CV, once it is found. These are two completely independent
control processes.

If you apply the Test to the reinforcement rate, you will find that it,
too, is a CV. In fact, it is simply a surrogate for the nutrient level,
covarying with it.

The relevant observation is not the comparison between performance early in
the session and later, when significant error reduction has occurred, but
between performance when error is high and responses either do or do not
produce the reinforcer. Responding continues to occur when responses
produce the reinforcer (so long as there is not significant satiation) and
extinguish when they do not.

Now you're speaking qualitatively about the process of searching for an
action that will have the effect of reducing error. This has NOTHING TO DO
with the relation between reinforcement and behavior once the correct
behavior has been found. You're talking about different control processes,
one of which varies the _kind_ or _place_ of behavior, while the other
varies the _amount_ of behavior, both toward the end of maintaining a CV at
a given level. And in this case I am using "maintaining" correctly.

You can keep a rat pressing that lever nearly
continuously, all day long, if the rate at which food is received equals
the rate at which it is lost due to energy needs, so that error between

the >CV and its reference remains high throughout.

Yes, but even in that case, increasing the reinforcement will decrease the
behavior rate. This proves beyond doubt that the reinforcements are not
having a positive effect on behavior rate. If the behavior continues, it is
not because the reinforcements are making it continue. It is just the other
way around: the reinforcements continue because the continuing behavior is
making them occur. That isn't theory: it's direct observation.
...

In reinforcement theory, reinforcement is supposed to affect the
_probability_ with which the operant occurs, which is reflected in the rate
of responding.

"Probability" is just an intervening variable; I thought behaviorists were
against such things.

To the extent that the animal's time must be filled with
something (if only resting behavior), any increase in the probability of
the operant necessarily comes at the expense of a decrease in the
probability of one or more other activities. The observations to which

you >refer are not inconsistent with this conception.

How could they be? It's a tautology. If you include "no behavior" as a kind
of behavior, then naturally any behavior occurs at the expense of some
other kind of behavior that could be happening instead -- even no behavior.
However, if you mean _operant_ behavior, then "no behavior" is not a kind
of operant behavior, and you can't use it as a wild card to make the
probability calculations come out right.
...................

If we simply look at total behavior, whether it takes place in contact
with
the lever or elsewhere, we will see that total behavior rate is a rising
function of deprivation, and that maximum behavior rate goes with the
minimum rate of reinforcement, as in the case of the experienced animals.
The behavior that is occurring initially could be called "exploration,"
moving from one place to another and doing various things. This behavior
is
driven by the error signal as usual. When food begins to appear at one
place during this exploration, the error drops and exploration is
interrupted, to be replaced by a control loop in which the behavior
produces food. So the appearance of a given behavior being reinforced
results from a cessation of the exploration behavior -- the principle
behind reorganization. Is this more or less where you're headed?

Yes.

Fine. Then let us separate once and for all the exploration process from
the bar-pressing process. The exploration process is clearly not
"maintained" by any reinforcements it produces: on the contrary, it is
immediately terminated when reinforcements begin appearing. And the process
that takes its place, producing food by executing a specific kind of
behavior in one place, is not "maintained" by the reinforcement it
produces; the reinforcement actually has a negative feedback effect on the
behavior, reducing the amount of behavior that takes place initially when
the reinforcement rate is the lowest. None of this is theoretical; it is a
plain statement of what is observed.

Best,

Bill P.

[From Bill Powers (970901,0400 MDT)]

Bruce Abbott (970831.2010 EST)--

Some loose ends in this post exist, which I failed to deal with properly. I
feel that the exchange is going off the track, with qualitative and
approximate arguments beginning to creep in. The informal meanings of words
are beginning to play a larger role, especially terms that have customarily
been used in behaviorism to assert a direction of causality (like maintain
and support) and to which I object for that reason _as interpretations_. If
we are trying to construct a precise description of phenomena, we must
stick with denotative and quantitative words as much as possible and
mathematics where we can.

At the point where the rails began to give way, we had established that the
reinforcer was to be defined as an observable variable just prior, in the
loop, to the CV, with the CV being a function of it. The CV itself
(nutrient level in the running example) is a hypothetical variable, not
observable in any ordinary behavioral experiment. When we talk about the
CV, or the reference signal, or the error signal, we are talking about a
model, not about observations.

I will go through the predictions that can be derived from the PCT model.

The particular CV we have chosen can be related to food intake in the
following way:

d(CV)/dt = k1*R - L

where R = reinforcements per unit time (reinforcement rate)
       L = loss rate per unit time.

The constant k1 absorbs the weight of the pellets and their nutritive value
per gram.

If the loss rate is proportional to the level of CV (L = k2*CV), a
reasonable first approximation, this equation leads to the steady-state
relationship (where CV has become constant, and therefore d(CV)/dt = 0),

k1*Rss = k2*CVss.

The value of CVss at that point will be

CVss = k1/k2*Rss

This tells us that the steady-state value of the CV is simply proportional
to the steady-state reinforcement rate. Thus the steady-state reinforcement
rate is a measure of the steady-state CV. The Test applied to reinforcement
rate (when the loop is closed) will yield the same results as the Test
applied to the CV.

When the appropriate establishing operation has been carried out, the CV
will be at a level CVmin lower than the reference level CV' (which we
assume constant). The error will be (CV' - CVmin). CV will be declining
slowly at a rate k2*CV, since R is zero (the contingency has not yet been
enabled).

We assume that the behavior rate B is proportional to the error by

B = gain*(CV' - CV)

Note that at this point, just before the contingency has been turned on,
the behavior rate is at its maximum: Bmax = gain*(CV' - CVmin). This, of
course, is reasonable only if we are talking about an existing control
system, not about the search phase or the learning phase.

The contingency is turned on when CV has declined to the value that we
designate as CVmin.

To complete the system equations we must specify how R depends on B. The
reinforcement rate R is some function f of the behavior rate B:

R = f(B)

For our purposes now, we can simply assume a proportionality factor k3:

R = k3*B

The initial dynamic response of the system will be such that the error
declines from its initial value to a final value, CV increases from CVmin
to some steady-state value CVss, and the behavior declines from its initial
rate Bmax to some steady-state value Bss. All these changes will follow
negative exponential courses with a time constant of k1/(loop gain). Note
that loop gain is not "gain" in the equations below.

We can solve for the steady-state values of the variables by solving the
steady-state system equations simultaneously:

CVss = k1/k2*Rss
Rss = k3*Bss
Bss = gain*(CV' - CVss)

Solving for the behavior rate, we have

           gain * CV'
Bss = ----------------------
        1 + gain*k1*k3/k2

This looks different from our usual solutions because the _loop_ gain is
not all concentrated in the output function.

Note that k3 appears only in the denominator. This means that as the
contingency ratio decreases, the behavior rate will decrease, with the
other constants remaining the same, even if the output gain is very high.
Contingency ratios (as in FR schedules) correspond to 1/k3.

Note also that R, the reinforcement rate, has disappeared from the
equation. This shows that R is not an independent variable. In fact,
because we have no disturbance, the only independent variable is the
reference level CV'.

We can solve the same set of equations for the steady-state rate of
reinforcement:

         k3*gain*CV'
Rss = ------------------
       1 + k3*gain*k1/k2

Now the behavior rate has disappeared, showing that Rss depends only on the
reference level. Behavior rate is not an independent variable, either.

So we have the classical case of apparent causality being created when two
variables depend on a single third variable: Bss depends on CV' alone, and
Rss depends on CV' alone. There is, of course, a relationship between Rss
and Bss: it is given by

Rss = k3*Bss

This is one of the initial system equations. If the solutions for Rss and
Bss above are solved together to eliminate CV', exactly the same result
will be found (the hard way).

But that is the ONLY relation between Rss and Bss. Neither one can be
changed without a change in the other; the relation Rss = k3*Bss must
always remain true. This relation is just a description of the
environmental feedback connection, and is not a property of the organism.
The only causal relation between Rss and Bss is the effect of Bss on Rss.

When the system as a whole is nonlinear, and when the contingency is made
more complex, the above equations will in general not be solvable by
analytic means. However, they can be solved in simulation (if any solution
exists) and the major results will remain exactly the same. Rss and Bss
will each be a function of CV'. Rss will be determined by Bss, through some
function Rss = f(Bss), and that will be the ONLY causal relation between
them. The only _independent_ variable will still be CV'.

If anything "maintains" anything else, it is the setting of CV' that
maintains both Rss and Bss.

After we have finished the comparison of EAB terminology to the above
analysis, we can turn to the other two situations: search and learning. The
search phase involves trying different kinds of behaviors in different
places until some reinforcement is obtained. Then the learning phase
involves increasing the skill with which a specific behavior is produced in
a specific place to control the CV. This latter phase has the primary
result of raising the output "gain" factor.

Best,

Bill P.

[From Bruce Abbott (970901.1030 EST)]

Bill Powers (970831.2100 MDT) --

Rather than responding in detail to this long post, I'm going to attempt to
recover our original direction, which is not about reinforcement theory but
about the relationship between certain EAB terms/phenomena to control theory.

1. The empirical phenomenon of reinforcement is demonstrated by showing that
    the operant occurs at a relatively high rate when the operant produces the
    reinforcer and at a relatively negligible rate when the operant does not
    produce the reinforcer.

2. This phenomenon can be explained in control theory by assuming that
    (a) there exists a variable (CV) for which there is an internally-
        specified reference level (r);
    (b) the variable is currently well away its reference level, or in
        other words, there is a certain error (e) between CV and r;
    (c) the effect of the reinforcer on e is to tend to reduce e.
    (d) the operant is the output of the control system.

    When the operant produces the reinforcer, the loop is closed. The
    operant will occur because e is large; to the extent that the reinforcer
    successfully reduces e, the rate of the operant will diminish,
    eventually reaching a steady-state rate dependent on the level of
    continuing disturbance to the CV.

    When the operant fails to produce the reinforcer, the loop is open
    and the operant will occur at maximum intensity (immediately following
    the switch to extinction). Observation shows, however, that the operant
    soon drops to prereinforcement levels, so another mechanism must be
    appealed to to explain why the operant does not continue at maximum
    rate as predicted by the simple control model. This is called saving
    the theory. (;-> (Just making a point, Bill. See below for
    reorganization.)

3. The phenomenon of satiation is observed when repeated application of
    the reinforcer is accompanied by a lowering in the rate of the operant.

4. Satiation can be explained in control theory by assuming that the
    reinforcer is cumulatively reducing the size of the error (e). For
    this to be the case, the rate of error reduction must exceed the rate
    of error production owing to disturbances.

5. The phenomenon of establishment is observed when some "establishing
    operation" must be performed before the reinforcer will act as such.

6. Establishment can be explained in control theory by assuming that the
    establishing operation produces error between the CV and its reference
    in the direction such that the reinforcer reduces that error. Only
    when there is error will the system produce the operant.

7. The phenomenon of trial and error learning is observed when the reinforcer
    is made contingent on performance of the operant, the establishing
    operation has been performed, and behavior which is initially quite
    variable becomes focused on the performance of the operant, so that the
    rate of the operant increases above baseline levels.

8. This phenomenon can be explained in control theory by assuming that
    there exists a "supervisory" system which varies the outputs of another
    control system until the CV is brought under control by the latter, and
    then sticks with that output selection.

9. The phenomenon of extinction is observed when an operant has been
    acquired, the contingency between the operant and reinforcer is then
    broken, and the rate at which the operant is performed declines below
    that observed when the contingency was in effect (usually to baseline
    levels).

10. Extinction can be explained in control theory via the same machanism
    used to explain trial-and-error learning; when control of the CV is
    lost, the supervisory system again begins to vary the type of output
    of the lower system in an attempt to restore control; as a result,
    rate of the operant declines.

11. Punishment is observed when establishing a contingency between an
    operant and some consequence leads to the suppression of the operant
    (i.e., its rate is driven downward, the opposite of reinforcement.)

12. Punishment can be explained in control theory if it can be shown
     that the effect of the consequence on the CV is to increase the error
     between the CV and its reference. Such a consequence is termed a
     punisher. Error produced by emitting the operant can be reduced by
     suppressing the operant.

Please note that nowhere am I asserting that reinforcement theory is the
same as PCT (an idea that comes from confusing usage of EAB _terms_
referring to certain empirical facts, with espousal of the literal meaning
of some of those terms within reinforcement theory). I have only given
descriptions of the phenomena and the names they have been given; all
explanations provided are based on control theory, not reinforcement theory.

[From Rick Marken (970831.2100)]

Bill Powers (970831.0936 MDT) --

If the food pellets are the reinforcers, and filling up on food
pellets results in a decline in behavior rate, what is the
observed relationship between the reinforcements and the
behavior rate? Isn't it that the greater the reinforcement
rate is, the less the behavior is?

Bruce Abbott (970831.2010 EST) --

No, but close. The greater the number of reinforcers delivered,
the less the behavior is, assuming that the increase in error
between CV and its reference due to disturbance is small by
comparison, on the same time scale.

I want to tell you how nice it is of you to explain to Bill
how his theory of control (where organisms select the intended
consequences of their actions) is also a theory of reinforcement
(where actions are selected by their reinforcing consequences).
Does control theory also explain the behavior of balls rolling
down inclined planes?

PCT. What a theory!

Best

Rick

···

--

Richard S. Marken Phone or Fax: 310 474-0313
Life Learning Associates e-mail: rmarken@earthlink.net
http://home.earthlink.net/~rmarken/

[From Bill Powers (970901.1021 MDT)]

Bruce Abbott (970901.1030 EST)--

Bill Powers (970831.2100 MDT) --

Rather than responding in detail to this long post, I'm going to attempt to
recover our original direction, which is not about reinforcement theory but
about the relationship between certain EAB terms/phenomena to control theory.

Excellent, and a good job of doing it. I'd like to expand on it a little,
particularly with regard to thinking of three phases rather than two.

Consider the naive adult animal first. From the standpoint of the behaving
system, the establishing operation has brought some variable well below its
normal reference level. At this point there is no control loop, because the
error has not been connected to any particular output process more than any
other.

Phase 1: Searching.

The error is large and is not decreasing. This brings into play a control
system that is activated by large nondecreasing errors. In the adult animal
this is some kind of search strategy, in which the animal moves from place
to place, sniffing, tasting, looking, and scratching at the environment.
There may be a random element in this strategy -- reorganization may be
involved -- but in the adult animal the strategy is most likely to involve
patrolling a set of locations in some fixed sequence, locations where the
error has been corrected before. This strategy continues until something is
found that ends to reduce the error. A decreasing error satisfies the
search system (feedback is rate plus proportional) and the search routine
slows in the vicinity of the location where sniffing, tasting, and
scratching combine to reduce the error signal somewhat. The sharper the
reduction in error, the more the search process is slowed.

Phase two: tuning or skill-learning.

The error begins to decrease. Now the search strategy is turned completely
off when the critical location is reached, and the animal's actions
repeatedly produce inputs that reduce the error. Since the effective action
has not been isolated from ineffective ones, however, the output gain is
relatively low and the error is in an intermediate range. Now a process
begins in which the weightings of the connection of the error signal to
various lower-level reference signals (sniff, taste, scratch, etc.) are
reorganized. As irrelevant weights are reduced toward zero and relevant
ones are increased, the loop gain increases, and _both_ the observed
behavior rate and the observed reinforcement rate increase.

Phase 3: Control.

Once the loop gain is high enough to keep the error relatively small, the
organism simply produces enough behavior to keep the error from increasing
by more than a slight amount for a short time. Now we observe the final
organization: artificially increasing the input that is reducing the error
will result in an immediate decrease of behavior, and artificially reducing
the same input will result in an immediate increase in behavior.
Reinforcement and behavior now change in opposite directions.

Phase 2 blends into phase 3 as practice continues. Actually, artificially
increasing or decreasing the reinforcement at any time during phases 2 or 3
will produce an opposite change in behavior, but if the system is left to
itself behavior and reinforcement rate will spontaneously increase together
as the loop gain rises, until maximum loop gain is reached. Phase 2 is the
explanation of the commonplace observation that an increase in
reinforcement goes with an increase in behavior.

When the contingency is turned completely off, the first effect, as the
error begins to increase, will be an increase in the behavior rate. As the
error continues to rise, however, it will become large enough (magnitude
plus derivative) to start the search routine going again. We will see
increasingly long periods of moving away from the location where the
control process took place and running through the search. Eventually the
search routine will take over completely, with the action associated with
control occurring only at the times when the search routine reaches that
place in the "patrol." This is the original condition as in phase 1.

There is, however, a difference. If the search routine takes over soon
enough, the reorganization involved in phase 2 (intermediate values of
error) will have little time to work, and the output weightings in the
control system will not be much changed. Therefore if we reconnect the
contingency, when the search routine reaches the effective location again,
good control will be exerted quickly, with less fine tuning than was
required the first time.

This much we know matches observations. Here, however, is a prediction that
may not have been looked for. In the repertoire of behaviors that can be
executed at any location (sniffing, looking, tasting, scratching -- the
"portable" behaviors), there is a possibility that the error signal remains
connected after extinction to all the reference signals for these control
behaviors with the same weightings, at least for a while. Since the error
remains large, we would then predict that as the search routine continues,
we would see, after extinction, the same relative amounts of these
behaviors as during fully-learned control. If scratching was the effective
action, we would see a great deal of scratching in each searched location,
relative to looking, sniffing, and tasting. The relative amounts of
behavior of each kind would be different from what they were in the naive
animal. These differences might decrease with time if reorganization of the
weights continues after extinction.

It is, of course, possible that when the search routine kicks in again, the
reference signal for the ingestion system is simply turned off. In that
case, the pre and post distributions of the different behaviors would be
the same.

This could be tested by making another of the "portable" behaviors the
effective one. For example, licking could be made the behavior that
produces reinforcements. In that case, licking would become the predominant
mode during phase 2, and after extinction would remain predominant at all
positions in the search routine, at least for a while.

Finally, the matter of recording data. If behavior rate is measured by
dividing total contact closures by total session time, it will be
impossible to distinguish between phase 1 and phases 2 and 3. If ONLY
paw-pressing on a lever is recorded, it will be impossible to distinguish
phase 2 from phase 3, although the increase in output gain can be seen.

···

-----------------------------
I think this more detailed analysis can be merged into your series of
points without major modifications.

Best,

Bill P.

From Bruce Abbott (970901.1340 EST)]

Bill Powers (970901,0400 MDT) --

Bruce Abbott (970831.2010 EST)

Some loose ends in this post exist, which I failed to deal with properly. I
feel that the exchange is going off the track . . .

I had the same feeling; the result was my post of 970901.1030 EST. The
problem may be that we are trying to get to different destinations. You
seem bent on showing me how the reinforcement principle (a theoretical
concept) cannot account for the phenomenon labeled as reinforcement (an
empirical relationship), whereas I already know that to be the case.
Meanwhile, I keep trying to use control theory to explain the observations
given certain labels in EAB (e.g., reinforcement) and you keep telling me
that that's just PCT. I've been trying very hard to keep reinforcement
_theory_ out of it, but for some reason you keep turning the discussion back
to that topic. Is there some reason we can't deal with these two topics
_separately_? What I was hoping to do was to get your agreement as to the
control-theory definitions of the EAB terms I've been introducing, and
_then_ talk about the implications with respect to the theoretical concept
of reinforcement.

By the way, I understand and agree entirely with the analysis you presented
in this post, including the fact that neither the operant nor the reinforcer
rate are independent variables. Does that help clear the way?

Regards,

Bruce

[From Bruce Abbott (970901.1510 EST)]

Bill Powers (970901.1021 MDT) --

Bruce Abbott (970901.1030 EST)

Rather than responding in detail to this long post, I'm going to attempt to
recover our original direction, which is not about reinforcement theory but
about the relationship between certain EAB terms/phenomena to control theory.

Excellent, and a good job of doing it. I'd like to expand on it a little,
particularly with regard to thinking of three phases rather than two.

Looks like I responded too soon; our posts are crossing in the mail. Good,
glad you liked it.

Consider the naive adult animal first. From the standpoint of the behaving
system, the establishing operation has brought some variable well below its
normal reference level. At this point there is no control loop, because the
error has not been connected to any particular output process more than any
other.

[etc.]

I like this: phase 1 establishes the control system by linking the right
output through environment to input; phase 2 fine-tunes the parameters, and
in phase 3, we have a well-oiled control system efficiently keeping its CV
under control.

If there is anything I would add, it is that reorganization (during
extinction) does not destroy the old control system that was honed to
perfection during phase 2. It is still available even after extinction is
complete, as shown by the fact that reacquisition of the operant following
restoration of the contingency is much more rapid than initial acquisition.
Additional evidence for this is that, when the old operant does occur during
the late phases of extinction, it is performed complete. For example, if an
FR-20 schedule had been in effect during acquisition, bursts of around 20
responses (probably slightly more) tend to occur, separated by wider
intervals of nonresponding.

Now that I look ahead to the end of your post, I see that these are the
phenomena you have predicted. Looks like we are thinking along the same lines.

Regards,

Bruce

[From Bruce Gregory (970903.1515 EDT)]

Rick Marken (970829.1500)]

Therefore, I find it much _clearer_ to have The Test described
as an attempt to discover controlled _perceptions_. rather than
controlled perceptual _signals_. Even better would be to say that
The Test is an attempt, by the Tester, to discover which of his
own perceptions correspond to the perceptions controlled by an
actor.

Just when I think you are hopeless, you say something brilliant.

Bruce