Reinforcement and control models

[From Bill Powers (951216.1245 MST)]

Bruce Abbott (951215.2030 EST)--

     Just to tweak your interest a little, behavior on ratio schedules
     has a strong tendency to be two-valued: high rate or zero. A high
     rate is not infinite, of course, but the instablility is just what
     the first equation predicts. Put THAT in you snipe and poke it!

According to your tentative findings last summer, repetitive pressing-
behavior on ALL schedules may have a strong tendency to occur at a high
rate or not at all. The main research question before us is whether
animals _ever_ vary their rate of pressing as a function of schedule. It
seems perfectly possible that the main variable is the intervals of non-
pressing that occur during a run. Of course when pressing is computed
over long periods of time as total presses divided by total time, there
can be an appearance that there is a smooth relationship, but this gives
entirely the wrong picture -- as your analysis suggested quite strongly.

Your "interest-tweaking" factoid is misleading, because it refers not
just to behavior on a single schedule but also to the division of
behavior between alternate keys -- so-called "matching." If your prior
analysis holds up, it may be true of both situations, but there are
really two distinct situations that have to be handled.

During the "acquisition" phase of bar-pressing behavior, there are two
dimensions of control to be considered by the rat: _where_ to apply
actions, and _how much_ action to apply (there are others: what kind of
action and under what conditions to act, for example). The "where"
dimension involves moving around in the cage and finding places where
there is a maximum in the rate of reinforcement. Then, when the
coordinates of maximum reinforcement rate are found, the problem is then
to vary the behavior in that one place to further increase the
reinforcement, if possible bringing the rate up to the reference level.

The "where" problem involves finding the location in space that goes
with the maximum yield of reinforcers. The organization of the control
system that does this must be like the organization we use for tuning a
radio (if it's systematic). A simple design is to monitor the rate of
change of reinforcement rate and reverse the direction of spatial
movement whenever the rate of change becomes negative. This will result
in an oscillatory solution where the location moves back and forth
across the peak yield. If the velocity of spatial movement of the target
position is proportional to the rate of change of reinforcement rate,
the oscillations can converge to a steady position. Many other designs
are possible, including an E. coli biased random walk. By observing how
animals zero in on the right location, we should be able to model the
strategy that is actually used.

The "how much" problem is confounded with the "where" problem when we
use only the apparent rate of behavior at a single location as the
criterion. A rise in the apparent rate of behavior can come from (1) an
approach of the position-control system to the right location with
constant amount of food-getting action, (2) a quantity-control system
that is increasing the amount of food-getting action (i.e., learning the
right pattern of actions to bring the reinforcement rate toward the
reference level) while the position of action remains in the right
place, or (3) any combination of the two. To separate the two dimensions
of control for modeling purposes, we have to keep track of both
variables: position of acting and amount of acting. These can be under
independent control if the controlled variables are reasonably
orthogonal. The value of a variable and its first derivative are
considered as defining independent dimensions in systems analysis.

In an ideal experiment, we should be able to monitor control of "where"
and "how much." Imagine a cage with a lot of identical levers ranged
along one wall (I saw a paper in which an arragement similar to this was
used, but don't have it at hand right now). One lever is the "right
place" and the other levers have no effect except to produce the same
click (but no food). After the rat has learned to press the right lever
for food, we can vary both the schedule and the assignment of the right
lever, and observe both "where" control and "how much" control.

The reinforcement and control models I described apply to the situation
where "where" is being maintained in the right position, and only "how
much" behavior is varied. It assumes that rate of behavior varies with
the schedule, but if that is not the operative output variable, the
analysis can be done using whatever the correct output variable is. The
actual output variable might be a combination: a crude adjustment of
press/don't-press combined with a fine adjustment of interval between
bursts of pressing. A single control-system model can be made to behave
this way (OPCOND5 will do this if the output gain is set high and the
error limit is set low).

Finally, the critical test of the reinforcement model and the control-
system model involves applying a disturbance to the reinforcement rate.
In the reinforcement model, this should _increase_ behavior, or if
limits are used, leave it the same. In the control model, it should
_decrease_ behavior.

Have fun in the sun. This should give you something to think about while
you roast.

···

-----------------------------------------------------------------------
Best,

Bill P.

[From Bruce Abbott (951216.2140 EST)]

Bill Powers (951216.1245 MST) --

    Bruce Abbott (951215.2030 EST)

    Just to tweak your interest a little, behavior on ratio schedules
    has a strong tendency to be two-valued: high rate or zero. A high
    rate is not infinite, of course, but the instablility is just what
    the first equation predicts. Put THAT in you snipe and poke it!

According to your tentative findings last summer, repetitive pressing-
behavior on ALL schedules may have a strong tendency to occur at a high
rate or not at all. The main research question before us is whether
animals _ever_ vary their rate of pressing as a function of schedule. It
seems perfectly possible that the main variable is the intervals of non-
pressing that occur during a run. Of course when pressing is computed
over long periods of time as total presses divided by total time, there
can be an appearance that there is a smooth relationship, but this gives
entirely the wrong picture -- as your analysis suggested quite strongly.

If I recall correctly, we only investigated fixed ratio schedules, so I'm
not sure whether it would be appropriate as yet to infer that such
tendencies would be characteristic on ALL schedules. The pattern of
behavior that tends to develop on fixed ratio schedules is a high-rate,
continuous "knocking off" of the response requirement (ratio "run"),
followed by a zero-rate "post-reinforcement pause" whose length tends to be
proportional to the size of the ratio.

Your "interest-tweaking" factoid is misleading, because it refers not
just to behavior on a single schedule but also to the division of
behavior between alternate keys -- so-called "matching." If your prior
analysis holds up, it may be true of both situations, but there are
really two distinct situations that have to be handled.

I don't see where it is misleading; it's just an observation of how behavior
rate varies in fixed ratio schedules. And I'm not sure how you see that
prior analysis relating to that observation.

During the "acquisition" phase of bar-pressing behavior, there are two
dimensions of control to be considered by the rat: _where_ to apply
actions, and _how much_ action to apply (there are others: what kind of
action and under what conditions to act, for example). The "where"
dimension involves moving around in the cage and finding places where
there is a maximum in the rate of reinforcement. Then, when the
coordinates of maximum reinforcement rate are found, the problem is then
to vary the behavior in that one place to further increase the
reinforcement, if possible bringing the rate up to the reference level.

Perhaps more generally, the problems for the rat are _what_ to do, and how
fast to do it. (Generally speaking, the what will include the where: e.g.,
press the lever _here_.) Initially the rat must learn both; when the
contingency is broken, and later reestablished, the only real question for
the rat is whether to do it or not.

The "where" problem involves finding the location in space that goes
with the maximum yield of reinforcers. The organization of the control
system that does this must be like the organization we use for tuning a
radio (if it's systematic). A simple design is to monitor the rate of
change of reinforcement rate and reverse the direction of spatial
movement whenever the rate of change becomes negative. This will result
in an oscillatory solution where the location moves back and forth
across the peak yield. If the velocity of spatial movement of the target
position is proportional to the rate of change of reinforcement rate,
the oscillations can converge to a steady position. Many other designs
are possible, including an E. coli biased random walk. By observing how
animals zero in on the right location, we should be able to model the
strategy that is actually used.

Your first proposal is a "hill-climbing" strategy and is equivalent to what
we do in modeling to find the parameter that yields the best fit. Your
point about needing a model that will find the optimum location is well
taken. I've read over the rest of the post and agree that we will have to
consider -- and gather data on -- not only the "intensity" of behavior but
where it is being directed.

Finally, the critical test of the reinforcement model and the control-
system model involves applying a disturbance to the reinforcement rate.
In the reinforcement model, this should _increase_ behavior, or if
limits are used, leave it the same. In the control model, it should
_decrease_ behavior.

That depends on whether on not _rate_ of reinforcement turns out to be a
controlled variable, but if not, I suspect there will be other CVs we can
disturb to provide this kind of test.

Have fun in the sun. This should give you something to think about while
you roast.

I don't know about "roasting" (it's been running in the 60s lately around
St. Petersburg), but it sure will be a welcome relief from THIS weather
(freezing rain came in last Thursday). But yes, I'll think about it and see
where it leads.

Happy holidays to everyone!

Regards,

Bruce