modeling operant conditioning

[From Bill Powers (941015.1335 MDT)]

Bruce Abbott (941014.1440 EDT) --

I deliberately chose FR schedules in my critique of the matching law,
because there it is clearest that the matching law can't work. The ratio
of behavior rate to reinforcement rate is set strictly by the apparatus.
If you count total bar-presses and total rewards received on any key in
a set of FR choices, there is no way that this ratio can be other than
what is set by the fixed-ratio schedule(s).

I agree that a joint simulation project is a good idea. Let's work it
out here on the net, because I'm sure there are others who would be
interested.

ยทยทยท

---------------------------------
Let me muse my way through a start on this problem.

Thus, a VI 30-s schedule "sets up" the reinforcer for delivery on
average once each 30 seconds, giving an average rate of 2 reinforcers
per minute.

I don't think you can say that a schedule of reinforcements sets up any
particular average delivery rate, since the delivery rate depends just
as much on the rate of pecking as on the schedule. Let's say that the
minimum interval is 20 sec and the maximum is 40 sec, with an average of
30 sec and a uniform distribution. If the pigeon pecked at some steady
and very high rate, you would sample all the intervals between 20 and 40
sec. Over a long run, the number of intervals of any given length would
be the same as the number of any other length. We could compute the
average rate by saying that there is just one of each interval with a
total time of 20 + 21 + 22 ...+ 40 sec or 630 sec, during which 21
deliveries take place. The average time per delivery is 30 sec, as
stated.

On the other hand, if the pigeon pecked at a rate of once per 40 sec,
there would be 1 delivery per 40 sec; if once per 80 sec, one delivery
per 80 sec., and so on. So the rate of delivery is not an independent
variable in relation to the rate of pecking.

How is a variable-interval schedule with a nominal interval of t
defined? See program below for my guess.

With two such schedules simultaneously setting up reinforcement
opportunities on two separate keys, how would the pigeons divide their
responses? The answer was summarized by the matching law: the
proportion of keypecks emitted on a key was proportional to the
relative rate of reinforcement provided by the schedule associated with
that key:

P1/(P1 + P2) = R1/(R1 + R2) ["P" represents keypecks]

The denominators of the left and right terms of the equation represent
total keypecks emitted during the session and total reinforcers
delivered, respectively. The matching law thus indicates that if 75%
of the reinforcement opportunities are programmed to occur on the
schedule associated with Key 1, then 75% of the keypecks will occur on
Key 1.

I don't see how the variable-interval schedule can program the number of
reinforcement opportunities, since the number of such opportunities
depends on the distribution of pecking as well as on the schedule. But
anyway --

Suppose the schedules are set up with nominal delivery rates of 2 per
minute and 6 per minute. If the pigeon pressed the key providing 75% of
the rewards 75% of the time, again at the same high rate, 1/4 of the
reinforcements would take 1/2 minute each, and 3/4 of them would take
1/6 minute each. In 100 reinforcements, we would have 25 reinforcements
taking a total of 12.5 minutes, and 75 reinforcements also taking a
total of 12.5 minutes. So the bird would be spending an equal amount of
time on the two keys.

Therefore if we hypothesize that the bird pecks at a constant high rate
and switches from one key to the other at regular intervals, we can
explain why there is matching in a variable-interval 2-key experiment.
We will find that 75% of the pecks take place on the key that provides
75% of the reinforcements. The ratio will be, in general, whatever the
ratio of the nominal delay times is. This is not an effect of the
schedule on the bird's behavior, but the natural result of the bird's
pecking at a high rate and spending equal time on both keys. We can now
predict that if the bird pecked randomly at the two keys, at a
sufficiently high average rate, matching would be observed. Change-over
delay should not, I think, have any effect on this result.

We can test this in simulation.

Here's my start on a program:

program varint;

uses dos,crt,graph,mycrt,textunit;

const
maxkey = 2;

var
nextpress,minint,maxint: array[1..maxkey] of integer;
leftpress,leftreinf,rightpress,rightreinf: array[1..maxkey] of integer;
lastpress: integer; {number of key last pressed}
clock: integer;

{ENTER WITH SKED NUMBER, KEY PRESS STATUS; RETURN TRUE IF REINFORCEMENT}

function schedule(i:integer; press: boolean): boolean;
begin
  if press and (clock >= nextpress[i]) then
   begin
    nextpress[i] := clock + minint[i] + random(maxint[i] - minint[i] );
    schedule := true;
   end
  else
   schedule := false;
end;

Note that some variable declarations are looking ahead to future parts
of the program.

What we need next is a model of the organism which pecks on a key every
n clock ticks (you can make that random with a specific average) and
switches between key 1 and key 2 also at random clock intervals. The
"schedule" function is called on every clock tick, with i set to the key
currently being pressed and "press" set to true if there has been a
peck. The schedule function returns "true" if there has been a
reinforcement.

How does this look to you? Want to try the next part of the program and
send it back to me?
--------------------------------------------------------------------
Best,

Bill P.