The Opposition

[From Bruce Abbott (990508.1835 EST)]

Rick Marken (990507.1350) --

I agree that Bruce's statement above is puzzling but I am
starting to think (I'm willing to be corrected) . . .

We'll see.

. . . that such a
program won't shed much light on the matching law per se. At
the level at which the matching law is described there is no
need (I think) to look at the details of how response patterns
are turned into reinforcement rates. I think just a "molar"
analysis alone can show that the matching law reveals nothing
more about behavior than the fact that matching feedback
functions produce "matching" behavior; that is, I think the
matching law tells us nothing about the organism who shows
matching; rather, it tells us about the nature of the environment
in which that organism is trying to feed itself.

Wrong.

The matching law says that

P1/(P1+P2) = r1/(r1+r2)

(where Pi is response rate on key i and ri is reinforcement
rate on key i). Bruce A. says that this law only applies to VI
schedules. I believe that, at the level this law is described,
it is sufficent to view a VI (or any) schedule as a "black box"
that transforms an input response rate (Pi) into an output
reinforcement rate (ri). We know, from the physical set up of
the operant situation, that an observed reinforcement rate
(ri) is completely determined (somehow) by the corresponding
response rate (Pi). So ri will be an observed proportion of
Pi; this proportion (ri/Pi), I argue, is a sufficient
characterization of the feedback function relating output
(Pi) to input (ri) for the VI (or _any) schedule -- at least
for purposes of analyzing the matching law.

This proportion (ri/Pi) shows up in Bill's equivalent algebraic
representation of the matching law:

r1/P1 = r2/P2.

This means that we will see matching [P1/(P1+P2) = r1/(r1+r2)]
only if the the feedback functions (ri/Pi) on the two keys are
equal; that is, only if the average rate of reinforcement per the
average rate of responding is the same on each key.

No. You will see matching only if the pigeon adjusts its rate of pecking on
the two keys so as to bring responding into the corresponding regions of the
_different_ feedback functions of the two schedules where the two ratios
become equal. That is a very different thing from your assertion. The
feedback function for VI schedules is not ri/Pi; that's the feedback
function for a ratio schedule.

I believe that this is a mathematical fact, independent of the
_type_ of schedule (VI, VR, R, I, etc) that determines the
transformation of response rate (ri) into reinforcement rate (Pi).

By definition, the feedback functions of two different schdules are
different. Yet matching occurs when the two schedules are different. Put
that in your pipe and smoke it.

I think all that a simulation of behavior on a VI schedule can
show (re: the matching law) is that some average rate of responding
(Pi) is transformed into some average rate of reward (ri). That is,
the simulation will show that a particular VI schedule is associated
with some proportional relationship between ri and Pi. This
proportion (ri/Pi) is likely to be different for different VI
schedules. But this seems irrelevant to the matching law because
we already _know_ from the mathematics of the matching law that
there will be matching (that is, P1/(P1+P2) = r1/(r1+r2)) _only_
if the feedback functions (ri/P1) for the VI schedules on the two
keys in the matching procedure are equal (or nearly so); ie.
r1/P1 = r2/P2. So we will only see matching if the VI functions
for the two keys have the same feedback functions (ri/Pi). This
must be true whatever those VI functions are (in terms of average
and variance of intervals) and even if the VI functions are
_different_ for each key. There will be no matching unless the
feedback functions (ri/Pi) for the schedules on the two keys are
_equal_.

I did test this out in a spreadhseet; everything looked OK. But
feel free to fire away.

See above. I'll wager that your spreadsheet tests fixed ratios of response
to reinforcement, which is equivalent to programming two fixed-ratio
schedules. In VI schedules, the ratio depends on the rate of responding.

Regards,

Bruce

[From Rick Marken (990508.1830)]

Me:

This means that we will see matching [P1/(P1+P2) = r1/(r1+r2)]
only if the the feedback functions (ri/Pi) on the two keys are
equal; that is, only if the average rate of reinforcement per the
average rate of responding is the same on each key.

Bruce Abbott (990508.1835 EST) --

No. You will see matching only if the pigeon adjusts its rate
of pecking on the two keys so as to bring responding into the
corresponding regions of the _different_ feedback functions of
the two schedules where the two ratios become equal.

I suppose I shouldn't have called ri/Pi a "feedback function".
I know that the real feedback function is very complex; probably
not even analytic. But I see what you are saying. You are saying
that a particular VI schedule will produce very different ri/Pi
ratios depending on the pattern of responding. I doubt this.
But I guess this is why doing a simulation is a good idea.

But even if it's true that different pecking patterns _can_
produce different ri/Pi ratios for the _same_ VI schedule,
I find it hard to believe that a pigeon would adjust its
pecking pattern to keep r1/P1 = r2/P2. This seems like a _very_
complex adjustment. But, we'll see.

I _still_ think that ri/Pi is a good measure of the "transfer
function" of even a VI schedule; it's a ratio so it doesn't depend
on the average rate at which the bird pecks on the key (ri). I
think the main aspect of response pattern that the bird can vary
(and that can make a difference on a VI schedule) is basically
average pecking rate (ri). So I think what we are seeing in ri/Pi
is a measure of the rate at which a bird can get reinforcement
from a VI schedule as a function of its response rate. If ri/Pi
is 2 for a particular VI schedule, then I think this means that,
if average response rate is 1/sec, then average reinforcement rate
will be 2/sec; if average response rate is 10/sec then average
reinforcment rate will be 20/sec _on this VI schedule_. But I
guess this is what we have to determine using simulation.

Best

Rick

···

--
Richard S. Marken Phone or Fax: 310 474-0313
Life Learning Associates e-mail: rmarken@earthlink.net
http://home.earthlink.net/~rmarken/

[From Bruce Abbott (990508.2230 EST)]

Rick Marken (990508.1830) --

I suppose I shouldn't have called ri/Pi a "feedback function".
I know that the real feedback function is very complex; probably
not even analytic. But I see what you are saying. You are saying
that a particular VI schedule will produce very different ri/Pi
ratios depending on the pattern of responding. I doubt this.
But I guess this is why doing a simulation is a good idea.

The "pattern" of responding can have some effect, but the main thing I'm
talking about is _rate_ or responding. It doesn't take a simulation to see
that different rates of responding will produce very different ri/Pi ratios
on a VI schedule: at very low rates ri/Pi approaches 1/1; at higher rates
typical of such schedules ri/Pi could be very high, e.g., 1/200. Only the
first response after the completion of an interval produces food; with an
average interval length of, say, 30 seconds, it would be easy for a pigeon
to produce 200 responses on average between food presentations.

But even if it's true that different pecking patterns _can_
produce different ri/Pi ratios for the _same_ VI schedule,
I find it hard to believe that a pigeon would adjust its
pecking pattern to keep r1/P1 = r2/P2. This seems like a _very_
complex adjustment. But, we'll see.

It _is_ a very complex adjustment, and a large number of sessions of
exposure to a given combination of schedules is required before the relative
rate of responding on the keys stabilizes. Just why performance tends to
move toward matching is a theoretical question that continues to be hotly
debated.

I _still_ think that ri/Pi is a good measure of the "transfer
function" of even a VI schedule; it's a ratio so it doesn't depend
on the average rate at which the bird pecks on the key (ri).

On a VI schedule and at typical rates of responding, the variation in ri/Pi
reflects mainly the variation in Pi.

I
think the main aspect of response pattern that the bird can vary
(and that can make a difference on a VI schedule) is basically
average pecking rate (ri). So I think what we are seeing in ri/Pi
is a measure of the rate at which a bird can get reinforcement
from a VI schedule as a function of its response rate.

Yep.

If ri/Pi
is 2 for a particular VI schedule, then I think this means that,
if average response rate is 1/sec, then average reinforcement rate
will be 2/sec; if average response rate is 10/sec then average
reinforcment rate will be 20/sec _on this VI schedule_. But I
guess this is what we have to determine using simulation.

What you've overlooked is that ri/Pi will change as ri changes: the ratio
itself changes, so you can't just plug in a new value of Pi into the old
ratio and get the new value of ri.

Regards,

Bruce

[From Dick Robertson,990509.0737CDT]

I have been trying to follow this discussion, but I still need
clarification of my original question in everyday terms.

[From Bruce Abbott (990507.1010 EST)]

Richard Kennaway (990507.1439 BST) --

I'm not clear on what these mean. Assuming an experiment involving
lever-pressing to get food-pellets, with each lever yielding identical
pellets, is Ra the rate of delivery of pellets per unit time, or per press?

Ra is the rate of delivery of pellets per unit time.

In that case the matching law states that the rate of pressing on each
lever would be adjusted by the subject so as to produce an equal ratio
of lever-presses per pellet across the schedules, within the limits of
experimental error.

To answer one of Dick Robertson's questions, the data are reported for
individual subjects (they are not group averages), but do represent
average rates over the duration of a session.

Regards, Bruce

I wasn't being sarcastic when I asked originally whether the matching
law stated anything more profound than the common sense view that people
(or presumably, rats) do what they get rewarded for rather than what
they don't.

What interested me in the original report from Jeff's friend was the
apparent implication of the "matching law" that research had established
that: given two levers, one that would deliver a food pellet twice as
often as the second, the rat would divide his efforts to push the first
twice as often as the second, rather than the more "expectable" (to the
lay observer at least) act of devoting all his efforts to the lever that
gave the best payoff.

If that IS what the matching law says, I would find that somewhat
interesting. (And BTW in the original post there was some reference to
people in shopping malls, suggesting that the M-Law has some real-life
relevance.) I would wonder what the explanation might be, what could be
surmised about how the subject, person or rat, controls for getting
rewarded that would make such a strategy fruitful. If I have got the
wrong idea, I would like to know what the right one is.

Best, Dick R.

[From Bill Powers (990509.0902 MDT)]

Bruce Abbott (990508.1815 EST)-

I don't understand what you mean. Are you saying that the reinforcements
are "manipulated" independently of the behavior?

The schedule values are varied parametrically over the course of the
experiment to produce different values of r1/(r1+r2). For example, one
might program VI 15-s versus VI 45-s, VI 30-s versus VI 30-s, and so on.
(Here, the overall rate of food delivery programmed by the schedules was
kept constant at 2 deliveries per minute across the two conditions given.)

OK, I think I get it. You're really talking about _predicted_ values,
aren't you? If the animal spent the whole time pressing on one key, the
predicted values wouldn't appear on the other key, would they? You can't
actually "schedule" the reinforcements in sense of scheduling dosages of
medicine.

I thought the
reinforcements were generated as a function of the pattern of presses
acting via the properties of the apparatus.

They are. However, the rate of pressing associated with these schedules is
usually high enough so that fairly large variations in rate of pressing
produce very little variation in rate of food delivery.

So in effect, ra/(ra+rb) and rb/(ra+rb) are constants determined by the
schedules? I guess I'm just going to have to play with the numbers.

1. Does the next interval start at the instant the previous interval ends,
or when the reinforcer is actually delivered, or after the animal has
collected the reinforcer and is ready to start pressing again (and how
would that be determined)?

It's been done both of the first two ways; let's assume that the next
interval begins when the pellet is actually delivered.

OK. But what happens when the animal is pressing on key A at the time when
the schedule on Key B runs out? Eventually the animal will return to key B,
but a very long interpress time would be implied for key B, which would
skew the average. How do you handle the first press after a change from one
key to the other? That's not really a valid measure for either key's
schedule, is it?

2. If an animal is pressing rapidly and overruns, so there are several
presses after the interval ends and before the food is collected, are the
extra presses or pecks counted as "responses" in the previous interval, or
do they add to the total for the next interval, or are they dropped?

Typically all responses are counted.

Yes, but to which interval do they belong? You answered my question about
their being dropped, but not which interval gets an overrun count.

3. In a schedule designated as, for example, VI 5 minutes, how are the
minimum and maximum intervals determined, and what is the distribution
between the limits? It seems to me that not all "VI 5 min" schedules are
comparable without these three additional parameters being specified. What
should I use in setting up a VI schedule? I'll use whatever is customary.

It's been done in a number of ways, including but not limited to both
arithmetic and geometric progressions. Probably the best is to use a
constant probability schedule: you can program this by using the random
function to generate a number between 0 and 1 at each interval dt, and
"setting up" a food delivery if the value is less than or equal to some
value x that is computed based on the desirec average interval size. For
example, if dt = .1 sec, then to produce an average interval size of 30 sec
(for a VI 30-s schedule), x would be 1/(dt*I) = 1/(.1*30) = 1/300.

OK, I'll do that.

5. And finally, how are Ra, Rb, ra, and rb measured?

ra and rb are the number of food-accesses per minute, per hour, or per
session (it doesn't matter what the divisor is so long as it is the same for
all measures); Ra and Rb are the number of keypecks or, alternatively, the
time spent in each activity (the two measures correlate highly). As most
studies have used number of keypecks, let's stick to that; it's easier to
measure.

It does make a difference, because the relationship of interest is
nonlinear. If you compute ra/(ra+rb) over small intervals of time, then
average the values of the quotients together over the whole session, you'll
get a different number from what you'd get by just taking the
session-average values of ra and rb, and _then_ calculating ra/(ra + rb).
The only way in which "matching" could make any physical sense would be for
running or cumulative averages of ra/(ra+rb) and (Ra/(Ra+Rb) to be
compared. Otherwise, the animals would have to be adjusting their rate of
pressing at a given time during the session to match a session-average
value that will not exist until the end of the session.

Those are excellent questions; I hope you found my answers adequate.

Thanks, but they just reflect my ignorance.

Best,

Bill P.

[From Bill Powers (990509.1036 MDT)]

Bruce Abbott (990508.1835 EST) writing to
Rick Marken (990507.1350) --

Rick:

The matching law says that

P1/(P1+P2) = r1/(r1+r2)

(where Pi is response rate on key i and ri is reinforcement
rate on key i). Bruce A. says that this law only applies to VI
schedules.

Bill (now):

That's the impression I get, too: that Bruce A. agrees that P1/(P1+P2) etc.
does apply in the case of VI schedules. I'm less interested here in whether
it applies to _all_ schedules.

But when Rick says

This means that we will see matching [P1/(P1+P2) = r1/(r1+r2)]
only if the the feedback functions (ri/Pi) on the two keys are
equal; that is, only if the average rate of reinforcement per the
average rate of responding is the same on each key.

Bruce replies as follows:

No. You will see matching only if the pigeon adjusts its rate of pecking on
the two keys so as to bring responding into the corresponding regions of the
_different_ feedback functions of the two schedules where the two ratios
become equal. That is a very different thing from your assertion. The
feedback function for VI schedules is not ri/Pi; that's the feedback
function for a ratio schedule.

Well, Bruce, be that as it may, but if the matching law says that
P1/(P1+P2) = r1/r1+r2) then it ALSO says, necessarily, P1/r1 = P2/r2,
whether you like it or not. You can't have the first condition without the
second. And the second says that the ratio of obtained reinforcement per
press is equal across schedules.
I agree that in a VI schedule it may be achieved by finding a Pi that will
yield that result through a nonlinear relationship, but the final result is
that equal ratios of reinforcements to presses are achieved.

I agree that Rick's implication is wrong, if he meant that the feedback
functions were simple ratios on the VI schedule.

Herrnstein's formula for matching, which looks complex, is equivalent to a
much simpler formula. Either Herrnstein had some reason for preferring the
more complex way of writing it, or he didn't realize that his formula had a
simpler form. In mathematical analysis, the simplest possible form, or
lacking that, an agreed canonical form, is always to be preferred, so that
situations just like this will not occur: two parties arguing about which
formula is correct, when in fact the two formula express one and the same
relationship.

Best,

Bill P.

[From Bill Powers (990509.1122 MDT)]

Dick Robertson,990509.0737CDT--

I wasn't being sarcastic when I asked originally whether the matching
law stated anything more profound than the common sense view that people
(or presumably, rats) do what they get rewarded for rather than what
they don't.

That's not the common sense I want to sell people on using. I want to say
that when people or animals lack something, they begin trying to get it by
looking first in one place and then in another, and this search only
_ceases_ when they find it. Then they start doing whatever is needed to get
it. It's not the reward that causes the behavior; it's the behavior that
causes the reward.

Best,

Bill P.

···

What interested me in the original report from Jeff's friend was the
apparent implication of the "matching law" that research had established
that: given two levers, one that would deliver a food pellet twice as
often as the second, the rat would divide his efforts to push the first
twice as often as the second, rather than the more "expectable" (to the
lay observer at least) act of devoting all his efforts to the lever that
gave the best payoff.

If that IS what the matching law says, I would find that somewhat
interesting. (And BTW in the original post there was some reference to
people in shopping malls, suggesting that the M-Law has some real-life
relevance.) I would wonder what the explanation might be, what could be
surmised about how the subject, person or rat, controls for getting
rewarded that would make such a strategy fruitful. If I have got the
wrong idea, I would like to know what the right one is.

Best, Dick R.

[From Rick Marken (990509.1800)]

Here are some results from a VI schedule model I just completed.
The model computes the average reinforcement rate (ri) that
results when an organism is responding on a VI schedule at a
particular average rate (Pi). The VI schedules are characterized
by the maximum duration of an interval (I'll call it VImax).
Intervals are then drawn randomly from a uniform distribution
of intervals that ranges from 1 sec to VImax secs. I believe
this means that the avergae duration of an interval is then
VIMax/2; I forget the formula for the variance of a uniform
distribution but I think it may be the square of the average.

The user of the model determines the average response rate (Pi)
by entering a maximum inter-response interval (IRmax). Inter-
response intervals are then drawn randomly from a uniform
distribution of intervals that ranges from 1 sec to IRmax secs.
The program then runs through the specified schedule, keeping
track of the interval between reinforcements and responses.
(The program counts all response intervals prior to a reinforcement
and measures the inter reinforcement interval as the time between
the last and the current _obtained_ reinforcement; if a reinforcement
was missed becuase there was no response in the interval then
that reinforcement interval is added to the next to make a
"totoal reinforcement interval").

When the program is finished it computes the actual average
response rate (Pi in responses/sec) and average reinforcement
rate (ri in reinforcements/sec) observed in the schedule.

Here is a table of some results for 4 schedules (with maximum
intervals of 50, 30, 10 and 5 seconds) at two response rates
(approximately .22 and .117 responses/sec). I have also
included the reinforcement rate per response rate (ri/Pi)
calculation for each schedule.

               VIMax (in secs) for VI Schedules

        50 30 10 5

Pi 0.221 0.224 0.229 0.223 IRmax 7
ri 0.037 0.059 0.138 0.184
ri/Pi 0.17 0.26 0.60 0.83

Pi 0.117 0.117 0.116 0.115 IRmax 15
ri 0.035 0.056 0.094 0.105
ri/Pi 0.30 0.48 0.81 0.91

The results seems to make sense; as the average interval
of a schedule decreases (moving across columns from left
to right), the rate of reinforcement obtained from the
schedule (ri) increases. As response rate decreases (going
from the top to the bottom of the table) the reinforcement
rate obtained from all schedules decreases.

The table can be used to predict what might happen in
a matching experiment if the organism were responding
randomly (as the model does) at the same rate on both
keys. For example, if one key were VI 10 and the other
were VI 5 and the organism were responding randomly to
both keys at the rate of .22 responses/sec then

P1/(P1+P2) = .51 and r1/(r1+r2) = .43

There values are not too close; no matching. But
suppose that .22 responses/sec were the overall
responses rate; in that case, the rate on each key
would be closer to .112 responses/sec. In that case,

P1/(P1+P2) = .5 and r1/(r1+r2) = .47.

Now were getting a lot closer to matching. In fact,
my results suggest that, as the average response rate
on two VI schedules goes down, P1/(P1+P2) comes closer
and closer to r1/(r1+r2) for those schedules.

I would sure like to see some real matching data now.
One prediction I would make is that matching will
_increase_ if you test an organism on _three_ simultaneous
VI schedules instead of two. I am assuming that what may
be happening in these simultaneous VI schedules is that
the organism simply divides his "responding" between the
different alternatives.

Best

Rick

···

--
Richard S. Marken Phone or Fax: 310 474-0313
Life Learning Associates e-mail: rmarken@earthlink.net
http://home.earthlink.net/~rmarken/

from Bill Powers (990509.2032 MDT)]

Rick Marken (990509.1800)

Hey, great that you're taking on the VI-VI schedule. Your first results
look like a good start. Let's make sure, though, that we're accurately
representing what goes on in these experiments. For example, saying that
the response rate is 0.22 per second on each key implies that they're
running in parallel, but we know that while the animal is pecking one key
it can't be pecking the other one. I've been sitting here trying to figure
out how to handle this, and haven't really settled on anything.

Also, my impression of an interval schedule is that pecks occurring before
the current interval has expired have no effect, while the first peck after
expiration of the current interval produces the reinforcer and starts the
next interval. So unless the animal totally stops pecking, there is no way
that a reinforcer can be "missed." As I understand it, expiration of the
current interval "cocks" a trigger, and the next peck (whenever it happens)
pulls it to deliver the reinforcer. That would also start the next
interval, per Bruce A.'s recommendation. Bruce A., is this correct?

I'm thinking of a model in which the animal pecks at some mean rate plus or
minus maybe 50% ALL THE TIME, and simply shifts at random intervals back
and forth between side 1 and side 2. Interpeck intervals would range from
about 0.2 sec to 0.6 sec, and the switching would occur with a mean rate of
perhaps 5 seconds, with a big variance. The two schedules run concurrently,
but if the current interval on side 2 runs out while the pecking is on side
1, side 2 simply has to wait until the random switching brings the pecking
back to side 2. I think this expresses the hypothesis that there is no
actual effect of the schedule on _contiguous_ rate of pressing, and that
the switching is actually random. I suppose it would be appropriate to
include perhaps 2 to 4 seconds of collection time after each reinforcement.

I don't know how long to let a session run -- another question for Bruce A.

I probably won't have much time to put in on this for a week. I repeat what
I hope has gotten into some post: that I'm going to Boston to be a speaker
at the first "National conference on internal control psychology" at
Northeastern University's Burlington, MA campus. The speakers in order will
be Bill Powers, Albert Ellis, Bill Glasser, and Alfie Kohn. The organizer
is Larry Litwack, editor of the International Journal of Reality Therapy.
Each speaker gets an hour the first day to present the case, and on the
second day will be questioned for an hour by the audience. I'm playing the
role of presenting a scientific theory of internal control called PCT.

Best of all, Mary and I will see some old friends in and out of PCT, and
visit Isaac Kurtzer in his lair at Brandeis to see his goodies. Back a week
from Monday night.

Best,

Bill P.

[From Rick Marken (990509.2100)]

Bill Powers (990509.2032 MDT)--

Hey, great that you're taking on the VI-VI schedule.

It's fun! I was never much interested in this esoteric
stuff 'til now. Suddenly I'm really curious about what's
going on with this "matching" stuff.. It's especially
interesting because behavirists make such a fuss about it;
I have to find out what's _that's_ about?

Let's make sure, though, that we're accurately representing
what goes on in these experiments.

I _knew_ this was too easy:-)

my impression of an interval schedule is that pecks occurring
before the current interval has expired have no effect, while
the first peck after expiration of the current interval produces
the reinforcer and starts the next interval.

Yes. This is quite diffenent than what I did. I just generated
an interval; if one or more responses fall in it then the animal
gets the reward at the end of the interval, otherwise there
is no reward and a new interval starts.

I guess I'll wait to get the emmis from Bruce Abbott before
going any farther.

Have a great time in Boston! I hope you write a "trip report"
for us when you get back.

Best

Rick

···

---
Richard S. Marken Phone or Fax: 310 474-0313
Life Learning Associates e-mail: rmarken@earthlink.net
http://home.earthlink.net/~rmarken/

[From Bruce Abbott (990510.way too early)]

Rick Marken (990509.1800) --

Here are some results from a VI schedule model I just completed.
The model computes the average reinforcement rate (ri) that
results when an organism is responding on a VI schedule at a
particular average rate (Pi). The VI schedules are characterized
by the maximum duration of an interval (I'll call it VImax).

By convention, VI schedules are described in terms of their average interval
size, not their maximum interval size.

Intervals are then drawn randomly from a uniform distribution
of intervals that ranges from 1 sec to VImax secs. I believe
this means that the avergae duration of an interval is then
VIMax/2; I forget the formula for the variance of a uniform
distribution but I think it may be the square of the average.

If the intervals are drawn from a uniform distribution ranging from 1 s to
VImax s, then the average interval size is (VImax-VImin)/2 + VImin. As the
minimum here is 1 s, this computes to (VImax-1)/2 + 1.

The user of the model determines the average response rate (Pi)
by entering a maximum inter-response interval (IRmax). Inter-
response intervals are then drawn randomly from a uniform
distribution of intervals that ranges from 1 sec to IRmax secs.
The program then runs through the specified schedule, keeping
track of the interval between reinforcements and responses.
(The program counts all response intervals prior to a reinforcement
and measures the inter reinforcement interval as the time between
the last and the current _obtained_ reinforcement; if a reinforcement
was missed becuase there was no response in the interval then
that reinforcement interval is added to the next to make a
"totoal reinforcement interval").

This isn't right. A VI schedule "sets up" the reinforcer when the currently
programmed interval elapses; the next response produces the reinforcer and
starts the timing of the next interval. Responses occurring _during_ the
timing of the interval are counted but have no other effect on the apparatus.

When the program is finished it computes the actual average
response rate (Pi in responses/sec) and average reinforcement
rate (ri in reinforcements/sec) observed in the schedule.

Here is a table of some results for 4 schedules (with maximum
intervals of 50, 30, 10 and 5 seconds) at two response rates
(approximately .22 and .117 responses/sec). I have also
included the reinforcement rate per response rate (ri/Pi)
calculation for each schedule.

These would be rather low response rates. The responses/s given are
equivalent to averages of 4.55 and 8.55 s/response. More typical values
would be averages in the range of .5 to 2 s/response, although rate tends to
decrease with increases in the average VI interval.

Regards,

Bruce

[From Bruce Abbott (990510.still-too-early)]

Bill Powers (990509.2032 MDT) --

Rick Marken (990509.1800)

Hey, great that you're taking on the VI-VI schedule. Your first results
look like a good start. Let's make sure, though, that we're accurately
representing what goes on in these experiments. For example, saying that
the response rate is 0.22 per second on each key implies that they're
running in parallel, but we know that while the animal is pecking one key
it can't be pecking the other one. I've been sitting here trying to figure
out how to handle this, and haven't really settled on anything.

I wonder how the pigeon decides when to switch . . .

Also, my impression of an interval schedule is that pecks occurring before
the current interval has expired have no effect, while the first peck after
expiration of the current interval produces the reinforcer and starts the
next interval. So unless the animal totally stops pecking, there is no way
that a reinforcer can be "missed." As I understand it, expiration of the
current interval "cocks" a trigger, and the next peck (whenever it happens)
pulls it to deliver the reinforcer. That would also start the next
interval, per Bruce A.'s recommendation. Bruce A., is this correct?

Yes. This version implements what is called an "unlimited hold" on the
reinforcer, because the reinforcer is "held" until it is collected.

I'm thinking of a model in which the animal pecks at some mean rate plus or
minus maybe 50% ALL THE TIME, and simply shifts at random intervals back
and forth between side 1 and side 2. Interpeck intervals would range from
about 0.2 sec to 0.6 sec, and the switching would occur with a mean rate of
perhaps 5 seconds, with a big variance. The two schedules run concurrently,
but if the current interval on side 2 runs out while the pecking is on side
1, side 2 simply has to wait until the random switching brings the pecking
back to side 2. I think this expresses the hypothesis that there is no
actual effect of the schedule on _contiguous_ rate of pressing, and that
the switching is actually random. I suppose it would be appropriate to
include perhaps 2 to 4 seconds of collection time after each reinforcement.

Yes, that would do it. In pigeons there is a fixed length of time the
hopper is raised, during which the pigeon can access grain from the food
magazine. Typical values would be 3 or 4 s.

I don't know how long to let a session run -- another question for Bruce A.

An hour would be good. Data usually are collected until behavior stabilizes
within some criterion, such as no variation greater than x across 5
consecutive sessions. The data from the stable sessions are usually then
averaged to produce a more stable estimate of the final value.

It won't matter much in the simulation, so long as a sufficiently long time
is allowed to smooth out local random variations.

Regards,

Bruce

[From Bill Powers (990510.0510 MDT)]

Bruce Abbott (990510.still-too-early)--

I've been sitting here trying to figure
out how to handle this, and haven't really settled on anything.

I wonder how the pigeon decides when to switch . . .

For purposes of establishing a baseline I'm going to assume it's cosmic
rays -- whatever the reason, it is independent of the schedule. Then we can
see how much the predictivity of the model can be improved.

In pigeons there is a fixed length of time the
hopper is raised, during which the pigeon can access grain from the food
magazine. Typical values would be 3 or 4 s.

Woops. What happens if the pigeon is pecking key 2 at the moment that the
interval on key 1 runs out? Which key-press gets the reinforcing effect? Is
there just one food dish that is raised and lowered? And are you saying
that it _is_ possible for the pigeon to miss a reinforcement? Another
preconception bites the dust -- I thought there would be one unique food
delivery place for each key. Now you seem to be saying that a known amount
of food isn't just dropped into the cup, but the animal eats as much as it
can in a fixed time, and -- unless I misread you -- the _same_ reinforcing
event occurs no matter which key is pressed. And wouldn't it eat a lot more
when it was maximally hungry than later in the session? How can you tell
how much reinforcement it's getting? Jeez, you're always throwing me curves
like this.

Best,

Bill P.

[From Rick Marken (990510.0750)]

Bruce Abbott (990510.way too early) --

This isn't right. A VI schedule "sets up" the reinforcer when
the currently programmed interval elapses...

Great. Thanks. I'll work on this as soon as I get a chance
(which may not be for a while now that the work week is
upon us).

In the mean time, could you give a quick explanation of
why the "matching" phenomenon is such a big deal for
behaviorists?

Best

Rick

···

--
Richard S. Marken Phone or Fax: 310 474-0313
Life Learning Associates e-mail: rmarken@earthlink.net
http://home.earthlink.net/~rmarken

[From Bruce Abbott (990510.1110 EST)]

Bill Powers (990510.0510 MDT) --

Bruce Abbott (990510.still-too-early)

In pigeons there is a fixed length of time the
hopper is raised, during which the pigeon can access grain from the food
magazine. Typical values would be 3 or 4 s.

Woops. What happens if the pigeon is pecking key 2 at the moment that the
interval on key 1 runs out? Which key-press gets the reinforcing effect?

I'm puzzled as to why you are now confused about this. The two schedules
run independently on the separate keys. As you noted earlier, the schedule
on one key can set up reinforcement while the pigeon is pecking at the other
key. That reinforcement will simply wait until the pigeon switches keys.
Meanwhile, if the pigeon keeps pecking at the current key, and the
associated schedule sets up reinforement, then the next peck on that key
will deliver it.

Of course, the only thing the pigeon can know about this is that a peck at a
given key is or is not immediately followed by access to the grain. (The
pigeon doesn't know when a reinforcer has been set up on a given key, and
can discover that fact only by pecking at the key.)

Is
there just one food dish that is raised and lowered?

Yes.

And are you saying
that it _is_ possible for the pigeon to miss a reinforcement?

No. The schedule holds until the reinforcement is collected.

Another
preconception bites the dust -- I thought there would be one unique food
delivery place for each key. Now you seem to be saying that a known amount
of food isn't just dropped into the cup, but the animal eats as much as it
can in a fixed time, and -- unless I misread you -- the _same_ reinforcing
event occurs no matter which key is pressed.

That's correct. Unless a difference in hopper presentation time is
explicitly programmed (done only when there is some experimental question
whose answer requires it), the same hopper is raised for the same duration
no matter which schedule paid off. Some cue, such as a light going on in
the magazine and/or the sound of the solenoid operating that raises the
hopper, tells the pigeon when the hopper is raised, and the pigeon quickly
learns to approach the magazine and feed whenever the cue occurs.

And wouldn't it eat a lot more
when it was maximally hungry than later in the session? How can you tell
how much reinforcement it's getting?

On the VI schedules typically used, the pigeon isn't going to get enough to
eat over the course of a session to substantially change its hunger level.
Because the same event (hopper-presentation for x seconds) follows
completion of either schedule, there is little reason to expect anything
more than small random variations in the amount of grain consumed per
presentation on the two schedules. For rats, the reinforcer is usually
standard food pellets, so there the total amount of food earned per
presentation on the two schedules will be the same. In support of my
assertion that the pigeon data are not skewed by differential access to
grain, the rat and pigeon results are not distinguishable except for the
fact that pigeons can peck faster than rats generally press.

Jeez, you're always throwing me curves

like this.

Curves? Jeez, Bill, we've gone over all of this before! Of course, it's
been a while . . .

Regards,

Bruce

[From Bruce Abbott (990510.1240 EST)]

Rick Marken (990510.0750) --

In the mean time, could you give a quick explanation of
why the "matching" phenomenon is such a big deal for
behaviorists?

There is a number of reasons. Here's a couple. First, it seems to apply
more broadly than first suspected; for example, it can characterize animals'
foraging in a "patchy" environment (one in which the stuff being foraged for
occurs in relatively dense patches separated by relatively large expanses of
little or none). Thus, whatever underlies matching, it appears to occur not
only in the laboratory, but under natural conditions as well, so it's not
just a laboratory curiosity. Second, the reason or reasons why the matching
relation emerges (when it does emerge) is/are not understood, and several
competing explanations have been proposed and are being tested in various
ways. Any time you have a number of theoretical accounts vying to explain a
particular phenomenon, a lot of researchers usually get involved in
developing tests of those views and identifying what variables are involved
in the phenomenon and which are not. The present inability to arrive at
concensus as to the underlying mechanism(s) highlights the fact that there's
still a lot that cannot be readily understood from the perspective of
traditional reinforcement theory, and so any advance here would be viewed as
significant for the field in broadening, supplementing, or even replacing
current explanatory principles.

Regards,

Bruce

[From Dick Robertson,990510.1221CDT]
Bill Powers wrote:

[From Bill Powers (990509.1122 MDT)]

Dick Robertson,990509.0737CDT--

>I wasn't being sarcastic when I asked originally whether the matching
>law stated anything more profound than the common sense view that people
>(or presumably, rats) do what they get rewarded for rather than what
>they don't.

That's not the common sense I want to sell people on using. I want to say
that when people or animals lack something, they begin trying to get it by
looking first in one place and then in another, and this search only
_ceases_ when they find it. Then they start doing whatever is needed to get
it. It's not the reward that causes the behavior; it's the behavior that
causes the reward.

Best, Bill P.

I understand that. What I want to ask is whether this matching law says, and
has confirmation for, subjects (people or animals) choosing alternative options
for getting payoffs in proportion to the relative proportions of the payoff,
instead of simply devoting all their efforts to the option that has the best
payoff? I don't undersatnd why it is so hard for me to get an answer to this
question. It seems a simple one to me. What am I missing?

Best, Dick R.

···

>
>What interested me in the original report from Jeff's friend was the
>apparent implication of the "matching law" that research had established
>that: given two levers, one that would deliver a food pellet twice as
>often as the second, the rat would divide his efforts to push the first
>twice as often as the second, rather than the more "expectable" (to the
>lay observer at least) act of devoting all his efforts to the lever that
>gave the best payoff.
>
>If that IS what the matching law says, I would find that somewhat
>interesting. (And BTW in the original post there was some reference to
>people in shopping malls, suggesting that the M-Law has some real-life
>relevance.) I would wonder what the explanation might be, what could be
>surmised about how the subject, person or rat, controls for getting
>rewarded that would make such a strategy fruitful. If I have got the
>wrong idea, I would like to know what the right one is.
>
>Best, Dick R.
>
>

[From Bruce Abbott (990510.1110 EST)]

Bill Powers (990510.0510 MDT) --

Bruce Abbott (990510.still-too-early)

In pigeons there is a fixed length of time the
hopper is raised, during which the pigeon can access grain from the food
magazine. Typical values would be 3 or 4 s.

Woops. What happens if the pigeon is pecking key 2 at the moment that the
interval on key 1 runs out? Which key-press gets the reinforcing effect?

I'm puzzled as to why you are now confused about this. The two schedules
run independently on the separate keys.

My surprise came from finding that there is no way to say which of several
different actions is being reinforced when a reinforcement occurs. But your
subsequent explanation clears up the problem: the "cocking" of the schedule
is not detectable by the pigeon; only the "firing" is, when the pigeon
pecks a key that is ready to deliver a reinforcement.

You say:

On the VI schedules typically used, the pigeon isn't going to get enough to
eat over the course of a session to substantially change its hunger level.

Is this, then, a schedule on which the pigeon could not survive
indefinitely? If so, then I may have some reason to think of this as more
than a baseline experiment. When life-threatening conditions appear, I
would expect reorganization to start, and if I am right about
reorganization being essentially random, it could well be that the pigeons
simply cast about at random for some action that will provide more food.

The very complicated variations on the basic choice experiment may be an
instance of complex results being mainly a function of a complex
environment, with the actual organization of the pigeon being much less
complex than it is interpreted to be. While I believe devoutly in animal
intelligence, I don't believe they have very much of it.

Best,

Bill P.

···

At 11:05 AM 5/10/99 -0400, you wrote:

[From Bill Powers (990510.1146 MDT)]

Dick Robertson,990510.1221CDT--

What I want to ask is whether this matching law says, and
has confirmation for, subjects (people or animals) choosing alternative

options

for getting payoffs in proportion to the relative proportions of the payoff,
instead of simply devoting all their efforts to the option that has the best
payoff? I don't undersatnd why it is so hard for me to get an answer to this
question. It seems a simple one to me. What am I missing?

Yes, as I understand it that is the claim: the choice is supposedly
proportional to the payoff (sort of). You have to understand, however, that
the payoff is VERY SMALL, so the animal may well have reason to continue
searching for a better payoff by trying other keys. Pressing the key with
the higher payoff exclusively would still not produce subsistence-level
food intake. I suspect that with much higher payoffs, animals would tend
much more to pick the best key and ignore the other one. But that's just my
guess.

Hmm. Thinking out loud.

Apparent matching _could_ be explained by saying that animals basically
select the best key and use it, but as the total payoff declines, they
begin more and more often trying the other key. Actually, with concurrent
schedules I think that there is always a net gain from trying the other key
(this just occurred to me). An occasional press on the less productive key
does not substantially reduce the delivery rate from pressing the best key.
In fact, it need not reduce it at all unless the animal uses the other key
too often. The schedules are set up, according to Bruce, so that once the
interval runs out, the key is primed to produce a reinforcer on the next
peck, so anytime between the start of an interval and its termination, the
other key can be pecked without any penalty at all.

This means that pressing the best key exclusively is NEVER the best
strategy, under the schedules as defined. The animal should peck one key
for a short time, and if that doesn't produce a bit of food immediately, it
should peck the other key for a short time, and so on back and forth as
quickly as possible. This will produce what looks like a tendency to
matching, because the animal will give up on the less productive key
without receiving a reward sooner than on the more productive one.

Hmm again.

Hey, it should use the e. coli strategy! The "tumbles" here are switches
from one key to another. If there is no food, switch to the next choice; if
there is food, delay the switch to the next choice -- all the while pecking
away at a constant rate at whichever key is the current choice. That will
lead to delivering more pecks on the side that produces the most
reinforcement.

We can generalize: one of the "keys" could be "no key" -- i.e. going away
from the keys and pecking at different places in the cage. Of course the
yield there is zero, so there will be a quick switch to the next place, and
a return to pecking on keys. Since the keys do usually provide food on the
first peck after a long delay, the time spend on keys as opposed to away
from them will increase. It should be easy to set the parameters of an e.
coli model to get the best match of model behavior to the data.

I expect that by the time I get back from Boston, three people will have
this model working.

Best,

Bill P.

[From Bruce Abbott (990510.1410 EST)]

Bill Powers (990510.1141 MDT) <--- reconstructed

Bruce Abbott (990510.1110 EST)

On the VI schedules typically used, the pigeon isn't going to get enough to
eat over the course of a session to substantially change its hunger level.

Is this, then, a schedule on which the pigeon could not survive
indefinitely? If so, then I may have some reason to think of this as more
than a baseline experiment. When life-threatening conditions appear, I
would expect reorganization to start, and if I am right about
reorganization being essentially random, it could well be that the pigeons
simply cast about at random for some action that will provide more food.

If they could stay in the apparatus 24 h/day the pigeons could get enough to
eat on most tested values of the VI schedule; what limits satiation is that
the sessions typically are short (an hour or less).

Some years ago Bill Baum placed an apparatus in his attic that was
accessible to wild pigeons. The apparatus reinforced some natural response
(I don't now recall what) on concurrent VI VI schedules. Over time, and
despite the fact that data from several birds were being mixed (the
apparatus did not discriminate which pigeon was doing the responding),
relative rate of responding matched relative rate of reinforcement. These
birds were not food deprived in the sense of having food withheld from them
by the experimenter -- they presumably dropped in when they became hungry or
possibly because they preferred the seed mix being offered over what was
available naturally in the neighborhood. It's a little dangerous to
generalize these results to the performance of a single, food-deprived bird,
but they do seem to suggest that chronic error conditions are not crucial to
the result.

Regards,

Bruce