Let's Get Experimental

[From Rick Marken (951206.0930)]

Thanks a ton to Bill Powers (951206.0530 MST) for taking the time to say
exactly what I would have said (though Bill said it better, of course) in
reply to Bruce Abbott (951205.1745 EST), Chris Cherpas (951205.1801) PT),
Samuel Saunders (951205:22:00:58 EST) and even Martin Taylor (951205 13:10)
(who always seems to be willing to jump in on the wrong side of an
argument);-). This gives me time to make a proposal.

I think the best way to compare the reinforcement and control theory models
of behavior is to see how well each accounts for actual data.

I think all parties to the reinforcement/control theory debate can agree that
the best way to compare these models is in terms of their ability to account
for the data obtained in operant conditioning experiments. I think we can
also agree that, so far, we have not found any operant data in the literature
that can be used for this purpose (not enough data is available, the quality
of the data is poor, the wrong data was reported, the data was collected
under extreme or poorly defined conditions, etc.)

What I propose is that we build our own operant conditioning experiment; one
that uses a human subject and can be easily run on a computer (PC and Mac).
It would, of course, have to be one that reinforcement mavens (like Abbott,
Cherpas, and Saunders) agree is a "real" operant conditioning experiment.

Once we agree on the experiment, I propose that Bill P. write it up in Pascal
for PC users and I write it (in parallel) in Hypercard for Mac users.

I also propose that, before any data is collected, the reinforcement
theorists provide a reinforcement model of the behavior expected in the
experiment; the control theorists will, of course, have to do the same with
control theory. These models will be written, along with the experiment, by
Bill and Rick. This way, we can see how the models perform before we see how
the subjects perform -- and make a true prediction of behavior.

Once we have agreed on the experiment and the models we can start collecting
data (from ourselves and, if we have any, our friends). Then we can compare
the actual behavior to the behavior predicted by the two models.

Finally, we can report our results at the Behavior Analysis (or whatever it
is) meeting in SF this summer. If the reinforcement model does better than
the control model then there we jolly well are, aren't we? We will have to
reject control theory as a model of operant behavior. If, on the other hand,
the control model does better than the reinforcement model, then there they
jolly well are, aren't they? They will have to reject the reinforcement model
of operant behavior.

I propose the following, very simple human operant conditioning experiment:

The subject presses the mouse button (B) in order to see a brief
exposure of a picture on the screen (R); the appearance of the picture is a
consequence of mouse presses. The picture will remain on the screen for only
a short time (say, 1/2 sec?) until it is "consumed".

The picture has to be something that is "reinforcing" to the subject (in
ordinary language, it has to be something the subject wants to see). I
suggest just asking the subject to try to keep a simple design (circle?) on
the screen for as long as possible. But Gary Cziko has some pictures that the
male subjects might find rather reinforcing -- so we don't have to tell the
subjects what to want (and I bet my wife or daughter would be happy to try
to find some pictures that the female subjects would find reinforcing).

The program would make it possible for the user to determine how button
presses (B) are related to reinforcements (R); this is the reinforcement
"schedule". I think it would be best to start with simple ratio schedules,
for example, R occurs after N presses, where the experimenter can set N. We
would have to decide whether presses during the occurance of the reinforcer
should be counted toward the schedule requirement; but the reinforcement
mavens can tell us the "correct" choice here.

The program should also make it possible to introduce disturbances; I suggest
a disturbance to the "size" of the reinforcement; how long the picture stays
on before it's "consumed"., Without disturbance the picture could always be
on for 1/2 second; with disturbance, each presentation of the picture could
last anywhere from 1/10 to 2 seconds (maybe?).

So what do you say; can we try to develop such an experiment? Does the
experiment I described seem like a real operant conditioning experiment to
you operant experts? If not, what should we do to make it a real operant
conditioning experiment -- one that could be used to test reinforcement (and
control) theory?

Best

Rick

[Martin Taylor 951206 13:10]

Rick Marken (951206.0930)

...and even Martin Taylor (951205 13:10)
(who always seems to be willing to jump in on the wrong side of an
argument):wink:

I know why you say that. It's because I so often agree with you.
As I do on this occasion as well.

However, I think I must have missed a message somewhere, because you
prefaced this quote with:

Thanks a ton to Bill Powers (951206.0530 MST) for taking the time to say
exactly what I would have said (though Bill said it better, of course) in
reply to ...

I haven't seen a response, except Bruce's, to my suggestion that there are
two normally independent control loops involved in the experimental
situation called "reinforcement." Bill has been concentrating (in what I've
seen) on the control loop in which the food pellet reduces error. If he
has commented on the proposition that the function of the "reinforced
behaviour" is to modify the environmental feedback function in that loop,
I haven't seen it.

If one is to make a PCT prediction of what happens in a "reinforcement
experiment," I would have thought the first thing to do would be to
describe a plausible set of control loops that are involved, and then
to determine, quantitatively if possible, but qualitatively if not, how
those control loops should behave with the disturbances the experimenter
will try to use. So far, I haven't seen that done, though I made a first,
perhaps naive, attempt. Let me try again, a bit more diagrammatically.

               >some reference |"fullness" reference
               > signal | signal
               > >
      ---------O--------- ---------O----------
     > > > >
perceptual output perceptual output
  input function input function
function | function |
     > actions to be| | |
     > "reinforced" | | switch |
    CEV-----<------------\ CEV------- \------<---
     > \ | |
               side-effects\ | |switching mechanism
                            \---------->-----------
                                         >
                                         >
                                         > disturbance introduced by
                                         > experimenter (e.g. food
                                         > deprivation)

The left loop represents something like "key pressing". The right-hand loop
is where the "reinforcer" is involved. The experimenter (or Nature) provides
that when "key pressing" actions occur, they have a side-effect that gives
the "reinforcing" loop a mechanism whereby it can influence its perceptual
error. The side-effect is irrelevant to the "reinforced" control loop,
and the experimental conditions are chosen so that it would normally be
irrelevant to the "reinforcer" control loop. The experimenter sets up
the apparatus so that the "reinforced" actions include a side-effect that
is important to the control of the "reinforcer." Specifically, it eases
or allows the subject's control of the perception in the "reinforcer" loop.

If the "reinforcer" control loop involves the reduction of some intrinsic
error, then it reduces the rate of reorganization, thereby stabilizing
the "reinforced" loop in comparison to other loops that might have had
incompatible actions.

All the situations I've seen discussed in the "reinforcement" debate seem
to have this same structure. The particular facet that seems to me to
be important is that the perception controlled in the "reinforced" loop
is not an element of the PIF of the "reinforcer" loop. The illuminated
key is not part of the "fullness" perception, and does not become part of
it during the "reinforcement" procedure.

Despite this (to me) obvious two-loop structure, much of the discussion
has treated the situation as if there were only a single control loop
to be discussed, the "fullness" control loop. The place of the actions
being "reinforced" has been left unstated, so far as I have seen, but
implicitly it has been taken as part of the action set that is the output
of "fullness" control. Eating the pellet/seed is a lower-level perceptual
control that is part of "fullness" control, but the key peck(s) that
trigger the machinery to dump a pellet/seed seems to me to have a quite
different status.

Is this so screwy? Does it provide opportunity for some kind of prediction?

Inasmuch as qualitative changes in behaviour often relate to reorganization,
it seems only reasonable to suggest that getting pigeons to do abnormal
things such as pecking illuminated keys, or walking in figure-8s, also
might involve reorganization. If so, then experimental predictions must
be predictions about how reorganization proceeds.

And that brings us back to Thorndike's cats, doesn't it? They always jump
in when they smell a rat on the wrong side of the argument, where the
grass is always greener, and that puts the cat among the pigeons in the box:-)

Martin