Martin's diagram

[From Bruce Abbott (951207.1330 EST)]

Bill Powers (951206.0655 MST) --

Martin Taylor 951206 13:10

    I haven't seen a response, except Bruce's, to my suggestion that
    there are two normally independent control loops involved in the
    experimental situation called "reinforcement.

I've been thinking about it.

Guys, pardon me if I sound a bit negative, but isn't this all obvious? I
started to write up a program about 12 months ago that included this
arrangement, about the time Bill was posting his OPCOND.PAS series of
models, but then decided just to assume that the food, once delivered, would
be eaten. For the purpose of the simulation we were developing, this was
sufficient. This is the same kind of simplification we have done in
tracking models, where the lower-level systems doing the mouse movements
aren't modeled but are simply assumed to be doing their jobs.

In this case you do want to model both main systems. For a rat experiment,
having no pellet present constitutes error in the system doing the
lever-pressing; lever-pressing produces (through the schedule of
reinforcement) a food pellet (eliminating the error in this system), which
is a precondition for eating. Eating reduces error a little bit in the
nutrient-control system and rapidly removes the food (time directly
proportional to size of pellet and inversely proportional to rate of eating;
rate of eating proportional to error in nutrient-control system unless
eating-rate has reached a maximum), thus restoring error in the system
involved in lever-pressing. In other words, eating disturbs the
pellet-perception control system while eliminating a condition that must be
true before eating food can occur or continue.

For a pigeon experiment, the only important difference is that the grain
magazine is raised for a fixed period; thus food-access is terminated at a
fixed time following presentation rather than being terminated by
consumption of the available food, as is the case with the rat.

If you wish to extend this model even further, note that the rat can't press
the lever until it has approached the lever and made some kind of contact
with it in a way that will permit the lever to be moved downward, and that
it can't eat the pellet until it has approached the food cup and siezed the
pellet. There is a definite sequence that must be followed, a series of
perceptions that must be "true" before the next control system can do its
job. In operant conditioning terminology, each completed act produces a
"stimulus" (perception) that "sets the occasion" for the next act. Whole
cycle from lever-press to eating and back to lever-press comprises a series
of such "discriminated operants." Thus, even a simple contingency between
some act and food actually implements what is termed a "chain" schedule,
although this term is usually reserved for schedules in which two or more
explicit "links" are programmed, as when pecking a green key is reinforced
on an FR-5 schedule, completion of which changes the key color to red, and
pecking the now red key is reinforced on a VI 15-s schedule with grain.

The way you're both talking, you sound as though you believe you've come up
with some fundamental new insight into the nature of an operant conditioning
experiment. Am I missing something? Or is what I thought so obvious as not
to be worth mentioning (given that it was not required for the simulations
we were developing) really new to you?

Puzzled,

Bruce

[Martin Taylor 951207 13:40]

Bruce Abbott (951207.1330 EST)

On two control loops involved in reinforcement.

Guys, pardon me if I sound a bit negative, but isn't this all obvious? I
started to write up a program about 12 months ago that included this
arrangement, about the time Bill was posting his OPCOND.PAS series of
models, but then decided just to assume that the food, once delivered, would
be eaten.

The way you're both talking, you sound as though you believe you've come up
with some fundamental new insight into the nature of an operant conditioning
experiment. Am I missing something? Or is what I thought so obvious as not
to be worth mentioning (given that it was not required for the simulations
we were developing) really new to you?

I don't know whether it was obvious to you that the critical aspect
of a reinforcement experiment was the modification of the environmental
feedback path of one control loop by a normally irrelevant action of an
arbitrary other control loop. It was far from obvious to me, until I
thought about what you and Bill P were arguing about. Both of you, and
Rick in his occasional interjection, seemed to be talking about just one
loop, in which the "reinforced" action was a component, and in which the
functional effect of the action was to reduce error in the reinforcer loop.

I think your discussions might have been easier to comprehend,
and to see wherein lie the critical differences between reinforcement and
PCT approaches, if you had made sure that it was made clear to the naive
(including me), that it is the changes in the environmental feedback path
of the "reinforcer" loop that matter, not existence (or otherwise) of error
in it. In retrospect, it does seem obvious, in that the "reinforced"
action can be anything of which the subject is capable, regardless of whether
it has any everyday-life association with the reinforcer, regardless of
the relative perceptual levels in the hierarchy of the "reinforcer" and
the "reinforced", and regardless of the fact that execution of the
"reinforced" action has zero effect on the error level of the perception
of the reinforcer.

Even though you say the switched feedback path configuration was obvious,
yet even in this new message, you seem to say, at least by implication,
that the reinforced action is part of a control loop at a lower level than
the reinforcer loop:

This is the same kind of simplification we have done in
tracking models, where the lower-level systems doing the mouse movements
aren't modeled but are simply assumed to be doing their jobs.

I now understand that you don't mean this implication, and that the
reinforcer can be at a much lower level than the reinforced action.

Sorry to have been so dense. But even so, perhaps you can answer the
question I asked: whether you see any difference between the situation
in which the "reinforcer" nutrient is delivered by the intentional
action of the subject once the environmental feedback path permits it,
and the situation in which it is delivered directly as a consequence
of the reinforced action, say by injection?

Never mind. One learns better how a wheel works if one reinvents it, as
opposed to assuming that someone else can tell you when you need to know.
Ending up with the same old wheel doesn't mean the exercise was a total
waste of time. And I'm glad to know that my wheel does have the configuration
you were thinking of all the time, because it will make your future
messages easier to understand.

At least it wasn't a total waste of _my_ time.

Martin