[From Chris Cherpas (951030.1315 PT)]
[re: >Bill Powers (951029.1430 MST) and others]
(2) P1/R1 = P2/R2
When you reduce the original equation (1) to its simplest form (2), you
find that the animals behaved so as to make the number of behaviors per
reinforcement equal on the two keys.
But the simplest form of the matching law could instead be stated as:
P1/P2 = R1/R2
The matching law, as a framework for theorizing, seems to be in the spirit
of looking at behavior as a "system" in which relative behavioral allocation
(e.g., P1/P2) matches relative reinforcement (e.g., R1/R2), not as pecks per
reinforcer. Baum has shown that time allocation is at least as sensitive
to relative reinforcement as responses. In other words, what happens at
each operandum is not as important as the relative changeover behavior,
maintained by the conditioned reinforcement associated with switching
between alternatives. That's one of the reasons that so much work went into
concurrent chains, instead of simple concurrent schedules -- i.e., to
"cancel out" the pattern-shaping aspect of reinforcement and to get to
the "value" of reinforcement.
Yes, the matching law has some absurd consequences for concurrent ratios
(but not concurrent VI-VR, per se). In fact, Herrnstein's work on concurrent
ratios was basically to show that "probability learning" was not
consistent with the matching law. The matching law also cannot say
which of two ratios will be exclusively chosen, just that the choice
with be exclusive, not "probabilistic." This is because the matching
law is not intended as a description of the dynamics of behavioral
allocation, just the relations between reinforced behaviors in a
steady state. As Bruce Abbot and I have both mentioned, there are
MANY reinforcement theories about why you get matching, ranging from
economic maximization and optimal foraging theory, at a fairly "molar"
level, to momentary maximizing, at a "molecular" level, with some
in-betweeners, like delay-reduction hypothesis (closer to molar) and
melioration (closer to molecular, and what Herrnstein ended up
proposing with Will Vaughan).
There's not even a consensus on what the "responses" are. Skinner (1950)
definitely was not thinking in terms of two response classes (i.e., one
at each operandum); instead, he thought of "choice" as the result of
reinforcing pecking regardless of which operandum, plus the responses
of switching over. Catania analyzed it in terms of four responses.
Shimp and others thought it was differential reinforcement of distributions
of interchangeover times, where the "responses" are actually temporally
defined (i.e., an interresponse time or IRT).
Reinforcement is an evolving concept. Eventually, there may be a
convergence with PCT concepts -- I really have no idea at this point.
On a more empirical note, I've done an experiment with Concurrent VIs
in which, instead of having the VI timer stop until reinforcement is
picked up, it just goes right on timing the next interval. That means
in a concurrent VIs situation that if you "spend too much time" on
one alternative, that you could have more than one reinforcement
waiting on the other alternative which you can pick up one right after
the other with no delay between them. Vaughan called this linear VI,
according to the feedback function relating response rate (x) to
reinforcement rate (y), because you pick up the maximum reinforcement
as soon as you reach the minimum response rate. On a "normal" VI there's
a curve, because whenever you don't pick up a scheduled reinforcement,
the timer is stopped. The point of all this is that I got matching,
even though there could have been huge variations in allocations while
still getting the maximum rate of reinforcement per session. That weakens
a (molar) maximization account, since even a random allocation would have
given you the same reinforcement rate overall -- yet the birds matched.
I'm only guessing, but from a PCT perspective, what variable is a
bird controlling here? Not overall rate of reinforcement it would seem
(where "overall" means reinforcements per session). While it's only
suggestive, I plotted allocations within session while the schedules
were changing (in matching studies you have to change them from, say,
VI1-VI1 to VI2-VI1 to VI1-VI2, etc., to get a matching function) and found
allocations changed within sessions as a function of the differences
in LOCAL reinforcement rates (i.e., reinforcements per times spent at
an alternative) calculated in a temporal window just prior to the
measure of allocation. All this means is that differences in local
rates of reinforcement (again, reinforcements over some amount of
time at an alternative) seemed to be systematically "controlled" by
changes in behavioral allocation.
I don't know how relatively recent events get discounted (there are certainly
models in TEAB), but this reminds me of some discussions I've seen about
deprivation vis-a-vis reinforcement. I doubt if there's a simple relation
between the receipt of each food delivery and effective deprivation. There
appears to be some lag involved in any case (ever eat too much?).
Experiments on polydipsia, hyperphagia, and other adjunctives make these
relationships somewhat less straight-forward than one would like.
I know this is pretty verbose, but from the sample I've seen of csg-l so
far, I get a sense that "reinforcement theory" is not always confronted on
its own grounds; in other words, some may simplify what is already assumed
to be a (less than?) worthless approach. On the other hand, I recall that
Skinner repeatly stated that trying to analyze data from others' perspectives
was pretty useless. So it goes.
Regards,
cc