[From Bruce Abbott (960127.1920 EST)]

Rick Marken (960127.1400) --

In case Bruce is still interested, here is what I had in mind.

Well of course I'm still interested!

In an earlier post, Bruce seemed to agree that an equation like

aR

(1) B' = (zeta) Bmax* ------

aR + 1

is the forward organism function in Killeen's model, showing how

response rate, B', depends on reinforcement (incentive) rate, R.

On fixed ratio schedules, reinforcement rate is a function of

response rate and the ratio, N, as follows:

(2) R = B'/N

This is the feedback function. It idescribes a property of the

environment, not the organism. It is not a theoretical equation.

Non-contingent incentives (reinforcements) change only this

feedback function (2); it has no effect on the organism function (1).

The new feedback function might be something like this:

(2a) R = B'/N + K

The non-contingent reinforcement adds a random amount, K, to the

reinforcement rate. Determining the effect of non-contingent incentives

on behavior is then just a matter of solving equations 1 and 2a

simulataneously to determine B' as a function of K. There is nothing

about the addition of non-contingent (free) reinforcemnts (incentives)

that implies a change in the way the organism operates (equation 1).

Now you're talking! Taking things only as far as you've gone, you are

absolutely right. There is nothing thus far to suggest that adding

noncontingent reinforcement changes the way the organism operates.

The problem is that you have left out any effect of noncontingent food

delivery on zeta, the coupling coefficient. Zeta equals rho*M, where rho is

the proportion of incited behavior taken up by target (instrumental)

responses and M is the representation of responses in memory at the time of

incentive delivery.

I have not yet presented the reinforcement mechanism of Killeen's model.

(Because we were dealing with steady-state behavior it was not relevant to

the discussion at that time.) I have not yet modeled this, but here is what

Killeen has to say about it:

The learning model invoked here assumes that upon reinforcement the

probabilities of all events in the trajectory (pi) are increased some

proportion (wj) of the distance to their maximum (1 - pi):

pi' = pi + wj(1 - pi), (13)

where pi' is the updated probability of emitting a response in position i

of the sequence. The proportion wj is simply the weight of the item in

memory, which is:

wj = beta(1 - beta)^(j-1). (14)

Here, j is one for the last (reinforced) response, 2 for the penultimate,

and so on, being indexed by _any_ response the animal makes: in the case

of a sequence of length L, its value is j = L - i + 1.

Equation 13 odes not tell us which response occurred and was strengthened

in the trajectory. For simplicity, rather than index that separately I

lump all nontarget responses together in one category and write their

probability as 1 - pi. Then Equation 13 may be rewritten to cover both

measured and unmeasured responses:

pi' = wjXi + (1 - wj)pi,

where pi' is the updated probability of a target response in position i;

wj the weight in memory of any response that is j - 1 elements away from

reinforcement; Xi is 1 for a measured response, 0 for any other response,

and pi is the prior probability.

Killeen (1994, p. 120)

Killeen goes on to describe how this reinforcement model applies to the

computation of the coupling coefficient; I'm still trying to understand it

in terms of the actual iterative computational steps involved. However, I

think that at this point you can see how the model should work. Whatever is

in response memory at the time of reinforcement gets "strengthened," meaning

that a greater proportion of the incitement provided by the reinforcer goes

into producing those responses, in inverse proportion to their "age"

(position) in memory (i.e., more recent responses are strengthened more than

less recent ones). This applies whether the responses in memory at the time

are target responses (e.g., keypecks) or nontarget responses.

Because of the FR contingency, the animal will always be making a target

response at the time the ratio is completed and a reinforcer thereby

delivered, so target responses get reinforced fairly consistently. But

start introducing incentives at random times and any behavior ongoing

immediatly prior to will be strengthened. This will tend to disrupt target

behavior, so target responses will tend to decrease in frequency (because

there is only so much time available in which to respond). Also, deliveries

that occur in the middle of a ratio run will reduce the number of target

responses in memory at the time of reinforcement, which itself lowers the

coupling. However, I need to confirm this purely verbal conjecture (based

on my understanding of the mechanism) via simulation before I will be

completely confident that this is what Killeen's theory predicts. There is

still some hope for your crucial experiment, Rick, but I don't give it a

high probability.

I should point out that I am not particularly impressed by Killeen's

reinforcement system, and not only because it fails to specify any

physiological mechanism that might realize it. As I noted previously,

Killeen really doesn't tell us how to "parse" the behavior stream into

discrete "responses" that reinforcement can act upon in the manner that he

suggests. In fact, Killeen tells us:

It is of course unrealistic to think that each element in a trajectory

constitutes a unique response that has its own memory register. Therefore,

in the present simulations, after each reinforcement the probability at

position i is assigned the average value for it and the ones immediately

before and after it. This averaging provides the minimal "course graining"

that smooths the results of the finite-element analysis. . . . But such

versimilitude was sacrificed for simplicity . . .

Killeen (1994, p. 121)

Even so, it ought to be possible to derive from the smoothed finite-element

analysis a predicted effect of added reinforcer deliveries. I expect this

analysis to show that the added deliveries reduce the coupling coefficient,

zeta, leading to a lower rate of responding on a given FR schedule.

Regards,

Bruce