VI schedules and matching

[From Bill Powers (941021.0930 MDT)]

RE: VI schedules

I've been pondering the variable (or random) interval schedule,
constant-probability type, looking for a way to express the average
relationship between behavior rate and reinforcement rate. The purpose
is to see whether the same "matching" problem occurs with the VI
schedule as with the FR schedule.

It's obvious that at very low behavior rates, the reinforcement rate
will be proportional to behavior rate, and that at very high behavior
rates the reinforcement rate will tend to a constant value. I sort of
guessed that the relationship might be of the form r = 1 - exp(-ab), but
trying to work that out from basic principles has overtaxed my feeble
math neurons. So I simulated it last night, varying the behavior rate
and accumulating reinforcer in a leaky integrator just as in the
varint.pas model, thus:

···

-----------------------------------------------------------------------
program testvi;

uses dos, crt,graph;

var
enabled: boolean;
timer,time0,x,j: integer;
i: longint;
q,frequency: real;
clock: longint;
k,graphdriver,graphmode,maxx,maxy: integer;
ch: char;

procedure setgraphics;
begin
clrscr;
graphdriver := 0; graphmode := 0;
detectgraph(graphdriver,graphmode);
initgraph(graphdriver,graphmode,'');
graphmode := getmaxmode;
setgraphmode(graphmode);
maxx := getmaxx; maxy := getmaxy;
clearviewport;
end;

begin
k := 100;
setgraphics;
for j := 1 to 100 do
begin
  frequency := 0.0003*j;
  time0 := round(1.0/frequency);
  clock := 0; q := 0.0; timer := time0;
  for i := 0 to 200000 do
  begin
   if not enabled then enabled := random(10000) < k;
   dec(timer);
   if timer <= 0 then
    begin
     timer := time0;
     if enabled then
     begin
      q := q + 1.0;
      enabled := false;
     end;
    end;
   q := q * (1 - 3e-5);
   if (clock mod 1000) = 0 then
    putpixel(5*j,maxy - round(q),white);
   inc(clock);
  end;
  time0 := 2 * time0;
end;
ch := readkey;
closegraph;
end.

This is slow because for each behavior frequency (indexed by j) the
program waits long enough for the accumulated reinforcer q to come to
equilibrium with the decay rate. The reward size is set at 1.0. Go have
a cup of coffee and let it run. The result is a bar chart, which does
indeed seem to have the form r = 1 - exp(-ab) where r and b are
reinforcement and behavior rates and a is a constant that depends on the
probability per unit time of enabling the key (I think it is 1 minus
that probability). Perhaps someone else can use this hint to work out
the proof.

Seeing this underlying form of the relationship between response rate
and reinforcement rate for constant-probability VI schedules makes me
wonder whether matching can be a real phenomenon. With the large amount
of noise that is superimposed on the data by the randomness in the
schedule, noise which increases at the higher behavior rates, it would
seem very difficult to determine whether matching did in fact occur
according to the given formula. If we substitute the underlying
relationship above for the random schedule, the matching law would take
the form

               b1 ? 1 - exp(-a1*b1)
           ----------- = -----------------------------------
             b1 + b2 (1 - exp(-a1*b1)) + (1 - exp(-a2*b2)

Is there any a priori reason to think this relationship might hold true
universally? Is it even possible for it to hold true universally, for
any pair of values a1 and a2? Reduced to minimum terms, it says

             b1 ? b2
        ------------ = ----------------
       1 - exp(-a1*b1) 1 - exp(-a2*b2)

I can't solve this. But it doesn't seem possible that for any pair of
values a1 and a2 (which are determined by the probabilities on the two
schedules) the equality above should hold true. My suspicion is that the
only solution is a1 = a2 and b1 = b2, and that for all other
combinations the proposed equality is not true.

This problem is entirely analogous to the problem with the FR schedule,
where the setting of the apparatus enforces the equality r = b/m, with r
being the reinforcement rate, b the behavior rate, and m the number of
presses per reinforcement delivery. Put in the same reduced form, the
matching law for FR schedules says

          b1 b2
        ------- = -------
         b1/m1 b2/m2

Which reduces to

          m1 = m2.

So for FR schedules, there is only one condition under which the
matching law can hold true: the case where the schedules are the same. I

suspect that this will also be true of the matching law for VI
schedules.
------------------------------------
Note that given only the equation for the feedback function, there is
nothing to fix the rate of reinforcement because there is nothing to fix
the rate of behavior, and reinforcement depends on behavior according to
some fixed rule. The control system model introduces a second equation,
stating that b = f(r,r*). Now fixing the reference level r* fixes both
the behavior rate and the reinforcement rate.
-------------------------------------
I've already said how I would approach this problem: start with simple
one-key FR schedules, match a control-system model to specific behaviors
of specific animals, and then try to predict how those animals will
behave under other kinds of schedules, using the same model. And then
tackle the 2-key situation to see what has to be added. Using schedules
with a random component, it seems to me, just makes the job of measuring
model parameters harder, for no compelling reason. Also, it tends to
disguise the failure of hypotheses like the matching law, by putting us
back into the situation that psychology has been in for so long: trying
to extract regular information from signals buried in large amounts of
noise.
----------------------------------------------------------------------
Best,

Bill P.