Where's the reinforcement model?

[From Rick Marken (951207.1300)]

Chris Cherpas (951207.0933 PT) --

Here's a rough design:

...

Before we start considering alternative designs, could you please explain (in
terms of a WORKING reinforcement model) what's wrong with the simple design
I proposed. Your design looks fine -- but it seems a tad complex. I would
REALLY like you to show me, in terms of a working reinforcement model of
operant behavior, what is wrong with the simple operant experiment I
described.

As I said, it would be relatively easy to build a WORKING control model of
the behavior in my simple operant experiment. It would look something like
this:

p = running count of the number of pictures/sec

r = reference for p (adjusted for best fit)

o = 1/interval between presses

The program determines the physical relationship between presses (which occur
at a rate proportional to o) and consequences (picture occurances which
occur at the rate of pictures/sec).

The control model of the organism is:

o := o + k* (r-p)

o is limited by the program to be >= x (where 1/x defines the maximum
interval between presses) and <= y (where 1/ y defines the minimum
interval between press -- highest possible press rate). As the program runs,
a press occurs each time a counter value becomes >= 1/o. When a press occurs
the timer (not o) is reset. I think a program like this would mimic the
responding that occurs in the operant experiment I suggested.

I would like to see a computer model of a reinforcement model that will
produce responding as a function of reinforcement in my suggested experiment.

Thanks

Rick

[From Chris Cherpas (951207.1334 PT)]
[re: 1> Rick Marken (951206.0930)]

RM:
1>I propose the following, very simple human operant conditioning experiment:
1>
1>The subject presses the mouse button (B) in order to see a brief
1>exposure of a picture on the screen (R); the appearance of the picture is a
1>consequence of mouse presses. The picture will remain on the screen for only
1>a short time (say, 1/2 sec?) until it is "consumed".

cc:
Sounds like an operant arrangement to me (...reminds me of early work by
Og Lindsley, who had mental patients respond to keep a TV picture from

RM:
1>The program would make it possible for the user to determine how button
1>presses (B) are related to reinforcements (R); this is the reinforcement
1>"schedule". I think it would be best to start with simple ratio schedules,
1>for example, R occurs after N presses, where the experimenter can set N. We
1>would have to decide whether presses during the occurance of the reinforcer
1>should be counted toward the schedule requirement; but the reinforcement
1>mavens can tell us the "correct" choice here.

cc:
I don't know much about ratio schedules, so they wouldn't be my first pick.
In any case, you don't usually count presses during the presentation
of the "reinforcer." In fact, you could make mouse-clicks produce visual
feedback -- i.e., one for each "accepted" click -- which does not appear if
clicks occur during the picture presentation. Actually, if they're clicking
during the picture, it kind of implies that the picture ain't so reinforcing
because, if it were, they should be spending their behavior/time "consuming,"
not clicking. As you can imagine, pigeons don't bother with the keys if
there's food ready to eat in the feeder.

RM:
1>The program should also make it possible to introduce disturbances; I suggest
1>a disturbance to the "size" of the reinforcement; how long the picture stays
1>on before it's "consumed"., Without disturbance the picture could always be
1>on for 1/2 second; with disturbance, each presentation of the picture could
1>last anywhere from 1/10 to 2 seconds (maybe?).

cc:
This is also a "standard" operant manipulation -- duration of rft, which is
often used as a way of implementing "magnitude" of reinforcement.

--- [re: 2> Rick Marken (951207.1300)] ---

RM:
2>Before we start considering alternative designs, could you please explain (in
2>terms of a WORKING reinforcement model) what's wrong with the simple design
2>I proposed. Your design looks fine -- but it seems a tad complex. I would
2>REALLY like you to show me, in terms of a working reinforcement model of
2>operant behavior, what is wrong with the simple operant experiment I
2>described.

cc:
I don't know that anything is wrong with this. I just don't know of a
reinforcement model that makes quantitative predictions on in a procedure
that varies ratio size and duration of reinforcement, per se. Fixed ratios
can be interpreted in terms of delay to reinforcement, but tend to be
multi-phasic, involving a post-reinforcement pause and a burst of
responding. I suggested a matching situation because that's as close
to a rft "paradigm" that I've seen, although I'm not as familiar with all
the reinforcement models as you might like. A lot of the ratio work
in EAB is done by behavioral economics types, although I think James
Mazur has a model that may work here -- I need to look around.

For better or worse, I'm not one of four people in the world doing
EAB research, so it's hard for me to find "the EAB model" of reinforcement.
I got into matching/melioration because that seemed to promise some kind
of direction for a general model.

RM:
2>As I said, it would be relatively easy to build a WORKING control model of
2>the behavior in my simple operant experiment. It would look something like
2>this:
2>
2> p = running count of the number of pictures/sec
2>
2> r = reference for p (adjusted for best fit)
2>
2> o = 1/interval between presses
2>
2>The control model of the organism is:
2>
2> o := o + k* (r-p)

cc:
This looks wonderfully straight-forward and intuitively plausible.
Some naive questions: How often is o updated? When is p reset? (You

RM:
2>I would like to see a computer model of a reinforcement model that will
2>produce responding as a function of reinforcement in my suggested experiment.

cc:
I would like to see it too. I just don't have it. Sorry to be an
inadequate reinforcement "maven" in this context.

Regards and regrets,
cc

[From Rick Marken (951207.1600)]

Me:

I propose the following, very simple human operant conditioning experiment:

Chris Cherpas (951207.1334 PT) --

Sounds like an operant arrangement to me

Great. One step closer to implementation. I'll take this to mean that you
see this as a reasonable first step at an operant conditioning experiment.
Hooray!

you don't usually count presses during the presentation of the "reinforcer."

OK. So the picture IS the reinforcer. That's what I thought. And thanks for
the info about presses during presentation of the reinforcer: I will set
up the experiment so that presses are not counted during presentation of the
reinforcer.

I'm happy to use interval schedules, by the way. It should be easy to have
the program select any schedules you like; the model of the organism (and the
organism itself) is, of course, the same regardless of the schedule.

I just don't know of a reinforcement model that makes quantitative
predictions on in a procedure that varies ratio size and duration of
reinforcement, per se.

By "ratio" I presume you are referring to the ratio requirement of responses/
reinforcement; I am rather amazed that there are no reinforcement models that
make quantitiative predictions about the effect of ratio size; what could be
a more basic finding of operant research? As far as duration of reinforcement
goes, that was just a way of introducing a disturbance to a controlled
variable; there are other ways we could do this. For now, let's assume that
the duration of picture presentation (like food pellet size) is a constant;
there MUST be a reinforcement model that can produce the appropriate rate of
responses when the consequence is a fixed size reinforcement that is always
delivered after a fixed number of responses. Mustn't there?

What say the other ex-reinforcement theorists? Bruce? Samuel? Can you comment
on my operant conditioning experiment and provide a computer version of a
reinforcement model for it?

it's hard for me to find "the EAB model" of reinforcement.

Any model of reinforcement will do; all it has to do is work (imitate the
responding of the subject).

Me:

The control model of the organism is:

o := o + k* (r-p)

Chris:

This looks wonderfully straight-forward and intuitively plausible.
Some naive questions: How often is o updated?

o will be updated on each iteration of the computer loop that runs the
experiment; if this were a continuous system, I imagine that o would be
the continuously varying value of the integral of error (r-p).

When is p reset?

p is never reset. It is the perceptual variable and gives a continuous
indication of the rate (within some fixed size, continuously moving
time window) of occurance of picture presentations.

What does k represent?

Amplification; the degree to which the system amplifies error into output.
With high amplification, small changes in error produce large changes in
response rate.

Thanks for the help, Chris. I would like to hear suggestions from Bruce
and Samuel also.

Best

Rick