[From Bruce Abbott (960127.1920 EST)]
Rick Marken (960127.1400) --
In case Bruce is still interested, here is what I had in mind.
Well of course I'm still interested!
In an earlier post, Bruce seemed to agree that an equation like
(1) B' = (zeta) Bmax* ------
aR + 1
is the forward organism function in Killeen's model, showing how
response rate, B', depends on reinforcement (incentive) rate, R.
On fixed ratio schedules, reinforcement rate is a function of
response rate and the ratio, N, as follows:
(2) R = B'/N
This is the feedback function. It idescribes a property of the
environment, not the organism. It is not a theoretical equation.
Non-contingent incentives (reinforcements) change only this
feedback function (2); it has no effect on the organism function (1).
The new feedback function might be something like this:
(2a) R = B'/N + K
The non-contingent reinforcement adds a random amount, K, to the
reinforcement rate. Determining the effect of non-contingent incentives
on behavior is then just a matter of solving equations 1 and 2a
simulataneously to determine B' as a function of K. There is nothing
about the addition of non-contingent (free) reinforcemnts (incentives)
that implies a change in the way the organism operates (equation 1).
Now you're talking! Taking things only as far as you've gone, you are
absolutely right. There is nothing thus far to suggest that adding
noncontingent reinforcement changes the way the organism operates.
The problem is that you have left out any effect of noncontingent food
delivery on zeta, the coupling coefficient. Zeta equals rho*M, where rho is
the proportion of incited behavior taken up by target (instrumental)
responses and M is the representation of responses in memory at the time of
I have not yet presented the reinforcement mechanism of Killeen's model.
(Because we were dealing with steady-state behavior it was not relevant to
the discussion at that time.) I have not yet modeled this, but here is what
Killeen has to say about it:
The learning model invoked here assumes that upon reinforcement the
probabilities of all events in the trajectory (pi) are increased some
proportion (wj) of the distance to their maximum (1 - pi):
pi' = pi + wj(1 - pi), (13)
where pi' is the updated probability of emitting a response in position i
of the sequence. The proportion wj is simply the weight of the item in
memory, which is:
wj = beta(1 - beta)^(j-1). (14)
Here, j is one for the last (reinforced) response, 2 for the penultimate,
and so on, being indexed by _any_ response the animal makes: in the case
of a sequence of length L, its value is j = L - i + 1.
Equation 13 odes not tell us which response occurred and was strengthened
in the trajectory. For simplicity, rather than index that separately I
lump all nontarget responses together in one category and write their
probability as 1 - pi. Then Equation 13 may be rewritten to cover both
measured and unmeasured responses:
pi' = wjXi + (1 - wj)pi,
where pi' is the updated probability of a target response in position i;
wj the weight in memory of any response that is j - 1 elements away from
reinforcement; Xi is 1 for a measured response, 0 for any other response,
and pi is the prior probability.
Killeen (1994, p. 120)
Killeen goes on to describe how this reinforcement model applies to the
computation of the coupling coefficient; I'm still trying to understand it
in terms of the actual iterative computational steps involved. However, I
think that at this point you can see how the model should work. Whatever is
in response memory at the time of reinforcement gets "strengthened," meaning
that a greater proportion of the incitement provided by the reinforcer goes
into producing those responses, in inverse proportion to their "age"
(position) in memory (i.e., more recent responses are strengthened more than
less recent ones). This applies whether the responses in memory at the time
are target responses (e.g., keypecks) or nontarget responses.
Because of the FR contingency, the animal will always be making a target
response at the time the ratio is completed and a reinforcer thereby
delivered, so target responses get reinforced fairly consistently. But
start introducing incentives at random times and any behavior ongoing
immediatly prior to will be strengthened. This will tend to disrupt target
behavior, so target responses will tend to decrease in frequency (because
there is only so much time available in which to respond). Also, deliveries
that occur in the middle of a ratio run will reduce the number of target
responses in memory at the time of reinforcement, which itself lowers the
coupling. However, I need to confirm this purely verbal conjecture (based
on my understanding of the mechanism) via simulation before I will be
completely confident that this is what Killeen's theory predicts. There is
still some hope for your crucial experiment, Rick, but I don't give it a
I should point out that I am not particularly impressed by Killeen's
reinforcement system, and not only because it fails to specify any
physiological mechanism that might realize it. As I noted previously,
Killeen really doesn't tell us how to "parse" the behavior stream into
discrete "responses" that reinforcement can act upon in the manner that he
suggests. In fact, Killeen tells us:
It is of course unrealistic to think that each element in a trajectory
constitutes a unique response that has its own memory register. Therefore,
in the present simulations, after each reinforcement the probability at
position i is assigned the average value for it and the ones immediately
before and after it. This averaging provides the minimal "course graining"
that smooths the results of the finite-element analysis. . . . But such
versimilitude was sacrificed for simplicity . . .
Killeen (1994, p. 121)
Even so, it ought to be possible to derive from the smoothed finite-element
analysis a predicted effect of added reinforcer deliveries. I expect this
analysis to show that the added deliveries reduce the coupling coefficient,
zeta, leading to a lower rate of responding on a given FR schedule.