[From Bill Powers (941021.0930 MDT)]

RE: VI schedules

I've been pondering the variable (or random) interval schedule,

constant-probability type, looking for a way to express the average

relationship between behavior rate and reinforcement rate. The purpose

is to see whether the same "matching" problem occurs with the VI

schedule as with the FR schedule.

It's obvious that at very low behavior rates, the reinforcement rate

will be proportional to behavior rate, and that at very high behavior

rates the reinforcement rate will tend to a constant value. I sort of

guessed that the relationship might be of the form r = 1 - exp(-ab), but

trying to work that out from basic principles has overtaxed my feeble

math neurons. So I simulated it last night, varying the behavior rate

and accumulating reinforcer in a leaky integrator just as in the

varint.pas model, thus:

## ···

-----------------------------------------------------------------------

program testvi;

uses dos, crt,graph;

var

enabled: boolean;

timer,time0,x,j: integer;

i: longint;

q,frequency: real;

clock: longint;

k,graphdriver,graphmode,maxx,maxy: integer;

ch: char;

procedure setgraphics;

begin

clrscr;

graphdriver := 0; graphmode := 0;

detectgraph(graphdriver,graphmode);

initgraph(graphdriver,graphmode,'');

graphmode := getmaxmode;

setgraphmode(graphmode);

maxx := getmaxx; maxy := getmaxy;

clearviewport;

end;

# begin

k := 100;

setgraphics;

for j := 1 to 100 do

begin

frequency := 0.0003*j;

time0 := round(1.0/frequency);

clock := 0; q := 0.0; timer := time0;

for i := 0 to 200000 do

begin

if not enabled then enabled := random(10000) < k;

dec(timer);

if timer <= 0 then

begin

timer := time0;

if enabled then

begin

q := q + 1.0;

enabled := false;

end;

end;

q := q * (1 - 3e-5);

if (clock mod 1000) = 0 then

putpixel(5*j,maxy - round(q),white);

inc(clock);

end;

time0 := 2 * time0;

end;

ch := readkey;

closegraph;

end.

This is slow because for each behavior frequency (indexed by j) the

program waits long enough for the accumulated reinforcer q to come to

equilibrium with the decay rate. The reward size is set at 1.0. Go have

a cup of coffee and let it run. The result is a bar chart, which does

indeed seem to have the form r = 1 - exp(-ab) where r and b are

reinforcement and behavior rates and a is a constant that depends on the

probability per unit time of enabling the key (I think it is 1 minus

that probability). Perhaps someone else can use this hint to work out

the proof.

Seeing this underlying form of the relationship between response rate

and reinforcement rate for constant-probability VI schedules makes me

wonder whether matching can be a real phenomenon. With the large amount

of noise that is superimposed on the data by the randomness in the

schedule, noise which increases at the higher behavior rates, it would

seem very difficult to determine whether matching did in fact occur

according to the given formula. If we substitute the underlying

relationship above for the random schedule, the matching law would take

the form

b1 ? 1 - exp(-a1*b1)

----------- = -----------------------------------

b1 + b2 (1 - exp(-a1*b1)) + (1 - exp(-a2*b2)

Is there any a priori reason to think this relationship might hold true

universally? Is it even possible for it to hold true universally, for

any pair of values a1 and a2? Reduced to minimum terms, it says

b1 ? b2

------------ = ----------------

1 - exp(-a1*b1) 1 - exp(-a2*b2)

I can't solve this. But it doesn't seem possible that for any pair of

values a1 and a2 (which are determined by the probabilities on the two

schedules) the equality above should hold true. My suspicion is that the

only solution is a1 = a2 and b1 = b2, and that for all other

combinations the proposed equality is not true.

This problem is entirely analogous to the problem with the FR schedule,

where the setting of the apparatus enforces the equality r = b/m, with r

being the reinforcement rate, b the behavior rate, and m the number of

presses per reinforcement delivery. Put in the same reduced form, the

matching law for FR schedules says

b1 b2

------- = -------

b1/m1 b2/m2

Which reduces to

m1 = m2.

So for FR schedules, there is only one condition under which the

matching law can hold true: the case where the schedules are the same. I

suspect that this will also be true of the matching law for VI

schedules.

------------------------------------

Note that given only the equation for the feedback function, there is

nothing to fix the rate of reinforcement because there is nothing to fix

the rate of behavior, and reinforcement depends on behavior according to

some fixed rule. The control system model introduces a second equation,

stating that b = f(r,r*). Now fixing the reference level r* fixes both

the behavior rate and the reinforcement rate.

-------------------------------------

I've already said how I would approach this problem: start with simple

one-key FR schedules, match a control-system model to specific behaviors

of specific animals, and then try to predict how those animals will

behave under other kinds of schedules, using the same model. And then

tackle the 2-key situation to see what has to be added. Using schedules

with a random component, it seems to me, just makes the job of measuring

model parameters harder, for no compelling reason. Also, it tends to

disguise the failure of hypotheses like the matching law, by putting us

back into the situation that psychology has been in for so long: trying

to extract regular information from signals buried in large amounts of

noise.

----------------------------------------------------------------------

Best,

Bill P.