Arthur? (reinforcement)

[From Bruce Abbott (951205.1715 EST)]

Martin Taylor 951205 13:10 --

An experimenter observes that some definable action (e.g. pecking at a key)
occurs with some baseline frequency or probability, but when the action is
"reinforced" it occurs with higher frequency or probability (or lower, if
the reinforcement is "negative"). As far as I can determine, that's a fair
description of the reinforcement process, isn't it?

Except that what you labeled as negative reinforcement is called
"punishment" in today's terminology.

"Reinforcement" occurs when something is presented to the animal that is
not a normal part of the behaviour being reinforced (!?*& This is such an
unnatural way of talking *&#%). The reinforcer might be food, but the
reinforced actions might be pecking at a key when it is illuminated and
not when it isn't. The key (!) point is that there is nothing in the usual
environment of the subject that makes the reinforced actions connect with
the reinforcer. Pigeons in the wild don't have to go around looking for
illuminated keys to avoid starving.

I think you are making an artificial distinction here. Pigeons DO have to
go around looking for SEEDS to avoid starving; if the pecks in a given
location produced nothing but inedibles the pigeon would soon stop pecking
there. The relevant feature is that the behavior does not _necessarily_
produce a consequence that has an effect on the controlled variable. Thus
you can compare the frequency of the behavior when it produces some
consequence and when it doesn't.

Let's try to get back to PCT-talk. The reinforcer clearly is an element
of some controlled perception. Let's imagine that to be hunger-satiation.
A reinforcer affects the feedback path for the "reinforcement-controlled-
perception" in that if a pellet is perceptible than hunger can be reduced
(satiation increased) by pecking at the pellet, but if there is no pellet
the pecking action will have no influence on the H-S perception. The
apparatus does not disturb the H-S perception by putting a pellet into
the pigeon's view. But unless there is a pellet in view, an error "too
much H, not enough S" cannot be influenced by anything the bird does.
So the role of the reinforcer is to provide an environmental feedback
path for the perception that the experimenter has disturbed (e.g. by
not allowing the bird to feed as much as it would have liked prior to
the experiment).

Yes (except that for the pigeon, the reinforcer is usually a few seconds
access to grain, not a pellet).

The experimenter is interested in some action X, that normally does not
influence the birds H-S perception. To "reinforce" X, the experimenter
sets up the bird's environment so that when it performs action X, the
bird sometimes becomes able to reduce the H-S error, whereas it cannot
do so (or is less able to do so) when it doesn't perform X.

Correct.

The EFFECT of reinforcement is on the control system(s) that use(s) action X.

The effect is to set a reference (output of the system whose error is
reduced by the reinforcer) for a lower-level system that will bring about
the delivery of the reinforcer. This system in turn will set references for
yet other perceptions (e.g., perceiving contact of the beak with the key)
which must be realized to bring the perception of reinforcer-delivered to
reference.

The PROCESS of reinforcement seems to be very like that of reorganization,
though I don't see it as necessarily the case that the "reinforcing" (H-S)
control system needs to involve an intrinsic variable. I could see, for
example "access to money" as being a suitable "reinforcer" for actions
like "Screwing up the bolts on the car passing down the assembly line."

Yes, that's my view, too. Contingent events that reduce error in these
systems that form the earlier parts of the chain that ends with reduction of
hunger (or whatever error it is that grain-ingestion corrects) are the
"secondary" reinforcers of reinforcement theory. Having the key depressed
is of no importance to the bird if achieving that goal-state does not
ultimately lead to control of hunger-satiation. Organizing the set of
control systems required to achieve that end involves the "selective"
function of the reinforcement process. The mechanism that accomplishes this
function is as yet unknown, although I have in the past offered some
speculative thoughts about how such a mechanism might do its job. At this
point it is enough to note that the system does get organized, however that
might be accomplished, and that the selected outputs are those which produce
consequences that reduce error in the relevant control systems.

Regards,

Bruce