a reinforcement story

[From Bill Powers (951223.0130 MST)]

Samuel Saunders (951222:14:46:54 EST) --

My first inclination, after reading this post, is to agree with you when
you say "I would like to spend my time working on PCT models." But once
again I yield to temptation.

     Let us assign a value to each behavior possible in some situation.
     We can then express the probability of some behavior Ai in terms of
     the values:

         P(Ai)= v(Ai)/SUMx(Ax).

Suppose there are two behaviors, running in a wheel (A1) and eating food
(A2) (non-contingent; just two different behaviors). To speak of the
probabilities of these behaviors is to say that for any observation at
any instant, there is a probability P1 that behavior A1 will be
occurring, and P2 that A2 will be occurring, where P1 + P2 = 1. If the
probability of each is 0.5, we would expect that any sampling of the
behavior would be equally likely to reveal A1 or A2. This, however, is
false: A sample that reveals A1 will continue to reveal A1 with a very
high (0.99) probability until some time has passed; then the probability
of finding A2 will begin to climb until A2 occurs, after which
successive samples will reveal A2 with a high probability for some time.
In short, the switching is more regular than random, and "probability"
is an inappropriate and misleading measure.

     Data from Skinner (1938) The Behavior of Organisms (p.348) show
     that a rat deprived of food except in an apparatus ate pellets,
     after a brief initial burst, at a rate of 1 pellet a minute. When
     eating was interrupted by making pellets unavailable for a 13 min
     period 25 min into the session, rate of pellet consumption went to
     about 3 per minute for several minutes before returning to the 1
     per minute rate.

Here is a graph of Skinner's "data":

             4 |
               >
               >?
             3 |? *
Consump- |?
tion |?
Rate 2 |? ....?..
               >?
               >?
             1 |? * * * * * * * * **
               >?
               >?
???**********0-------------------*********-----//----
     -10 10 20 30 40 ??
                         Elapsed time, min

The decay of consumption rate with time after the initial increase was
not reported quantitatively, nor was the "initial burst." So we have a
grand total of one point on this graph showing the effect of the
deprivation period. This is data?

     (This cited to show two kinds of deprivation- deprivation of eating
     for a long period raises the probability of eating, and 'relative
     deprivation' by interfering with the resulting rate produces an
     additional increment). Evidence can be marshalled that this is
     typical of behavior in general, that deprivation produces an
     increment in the behavior when opportunity to perform it is first
     introduced. This can be treated as a change in the relevant v(Ax),
     as long as the Ax are independent.

Yes, but why should it be treated that way? What is "value," other than
the change in behavior rate? And what is there in the "data" to suggest
that different phenomena are present when the time of deprivation is
different? There is an unspecified "initial burst" after a long
deprivation of unspecified duration, and a burst that starts out at 3
ppm after a shorter one. I see nothing to indicate two kinds of
deprivation.

This observation shows that the initial effect of a reduction in
reinforcement rate (to zero) is to produce a large increase in behavior
rate. That is certainly an oddity, considering the assumed properties of
reinforcement as opposed to extinction. The argument says that behavior
rate increases because the value of the behavior increases at the
termination of deprivation. And how do we know that the value increases?

From seeing the behavior rate increase. "Value" is observable only

through observing the effect it is supposed to produce. There is no
possibility of verifying that the change in behavior rate is due to a
change in value.

     When opportunity to engage in a behavior is made contingent on
     another behavior, a base response B must be performed in order to
     perform a contingent response C, so, if B is much less likely than
     C, B must be performed more than it would without the contingency
     in order for C to be performed as much is it would without the
     contingency.

Why should C be performed as much with as without the contingency? Isn't
"in order to" a forbidden expression in behaviorism? This comes close to
proposing that there is a reference level C' for C, and that the
behavior will be varied until C = C'. This suggests that it is the
difference, C' - C, that determines the rate of behavior. And that is
control theory, not reinforcement theory.

     The vx(Ax) cannot be measured directly, but only relative to a
     context that offers other alternative behaviors.

It's worse than that: the vx(Ax) can't be measured independently at all.
The only way to measure them is by observing the relative behavior rates
or time allocations. Given the observed rates or allocations, the
relative v-values are simply proportional to the relative rates or
allocations, by definition. You have an algebraic identity, not a system
equation.

    Reinforcement:
       v1(B)/v(M) = b * v0(B)/v(M) + c * v1(C)/c(M)
    Deprivation:
       v1(C)/v(M) = v0(C)/v(M) + h/v(M) * (1-t)
    setting h' = h/v(M)
       v1(C)/v(M) = v0(C)/v(M) + h'(1-t)

These are simply manipulations of the original identity.

     Let us consider the proposed Marken experiment. The contingent
     response is viewing pictures for a fixed length of time p, while
     the base response is pressing the mouse button, which must be
     pressed n times to view a picture.

As I recall, Rick did not say that button-pressing could not continue
while the picture is viewed. Viewing the picture and pressing the button
are not mutually-exclusive activities. Come to think of it, consumption
of food and bar-pressing are not mutually-exclusive activities if the
food pellet can be held in the mouth: only the act of seizing the food
requires momentary interruption of the bar-pressing.

I think that a human being would figure out that the picture could be
maintained continuously on view by doing all but one of the required
presses while the picture was showing, then doing the remaining one as
soon as the picture disappeared. Unless, of course, you deliberately
arranged for presses to be ineffective while the picture was on, in
which case your whole analysis would be a put-up job, because you, not
the subject, would be determining when the switch from one behavior to
the other could occur.

     This all appears to have a hidden control model inside. The v0
     particularly suggest reference levels.

The control model is pretty well hidden.

     I would like to spend my time working on PCT models, and maybe even
     experiments, rather than trying to put reinforcement thinking into
     reasonable form for comparison and contrast. I volunteered to do
     this, however, so I will continue to do it. I think it is
     important to represent the reinforcement view as accurately as
     possible, but I don't assert that view myself, so please try to
     keep comments on a scientific rather than personal level.

Right, nothing personal about it, but the above analysis is crap. It's
just ringing the changes on a single algebraic expression, and from that
you will never get an analysis of the system.

···

-------------------------------
The basic rule of systems analysis is that every variable in a system
must be accounted for, as one of these three types:

  a. An independent variable that can be arbitrarily altered from
outside the system.

  b. A system constant, which remains the same under all conditions.

  c. A dependent variable, which is a stated function of other system
variables.

A solution of the system equations shows one of the dependent variables
as a function of independent variables and system constants alone. If
more than one dependent variable appears in an expression, that
expression is not a complete solution of the equations.

Another basic rule is that all dependencies are treated as one-way: that
is, if y = f(x), varying x will alter y, but varying y will not alter x
backward through the same function f. If there is a mutual effect
between x and y, each effect must be separately represented:

   y = f(x)
   x = g(y)

In general, g is not the inverse of f although is special cases it may
be. That is how you know you have a system, and not just a single cause-
effect relationship.

If a schedule determines that C = f(B), it is not true that
B = f^-1(C), in terms of causation -- although it is true algebraically.
It is true that if you know C, you can deduce what B must have been, but
this does not mean that on a fixed-ratio-10 schedule, if you present 1
reinforcer you will get 10 presses. It means only that if 1 reinforcer
is presented, 10 presses must have occured. In general, the dependency
of number of presses on number of reinforcements will have a different
form, and will involve functions of time.

Do you think that the manipulations above conform to these rules?
-------------------------------
I agree that it would be more interesting to work on PCT models. I don't
hold you to your promise.
-----------------------------------------------------------------------
BNest,

Bill P.