# TRT and PCT modeling

[From Bill Powers (941101.0740 MST)]

Last night's post, with the graphs, contains some errors in the final
section. For a FI schedule, the schedule curve follows the FR1 line to
the right, then turns _vertical_ (not horizontal) at a behavior rate
equal to the reciprocal of the interval. Also, for a VI (constant
probability) schedule, the curve bends vertically, not horizontally, and
the formula for the average curve is R = a(1 - exp(bB)). I believe that
is the right formula, with Herrnstein's formula being an approximation
to it. At least I know that a curve of that form fits the results of the
constant-probability VI simulation very closely.

Bruce Abbott (941031.1500 EST) --

In Traditional Reinforcement Theory (I'll call it TRT), reinforcers not
only alter the probability of behavior, they also serve to MAINTAIN it.

It's very hard for me to get past this -- pardon me -- animistic
language. You have to admit that this sounds as if the reinforcers, the
little bits of kibble or the little drops of water, have some ability to
initiate an effect on behavior (and now to maintain it), all by
themselves. "Alter" and "maintain" are transitive verbs, usually used
after the name of an agent capable of causing something. Help me out
with this.

My e. coli model is intended to illustrate control over behavior by
consequences as envisioned by TRT ...

Again, this puts consequences in the position of an independent agency
capable of controlling something, doesn't it? For me to take your
statement literally, I would have to imagine that the consequence can
sense the behavior, compare it with a desired behavior, and exert
effects on the organism until the actual behavior matches the desired
behavior. In fact, one organism can do this to another organism by
applying disturbances to a variable the other organism is controlling,
but I can't see how a mere consequence could do the same thing. A kibble
doesn't have the internal organization required to control anything,
does it?

I think we need to take a brief break from the E. coli modeling and
spend some time with the basic model of a control system. When we work
with an example that uses on-off comparison and probabilistic outputs, a
lot of the fine detail is hidden in the noise, leaving a lot of room for
vague interpretations. I'm going to work up a short program that will
illustrate the basic relationships in a generic control system, to give
us a better base for this discussion.

So, my TRT model is REALLY a cleverly disguised control system! That's
hard to see, because there is no mention of a reference level, feedback
loop, gain, output function, or perceptual signal. These components
lurk hidden in the alien vocabulary of "reinforcers," "punishers,"
"contingencies of reinforcement," "strengthening," and so on. This is
NOT to say that TRT theorists recognize it as such (they usually
don't), but I think we have made important progress here in
understanding why these folks do not immediately see why TRT is
wrong...it seems to work!

_Certain examples_ of TRT are organized like a control system, but by no
means all of them. But separating the wheat from the goats is not easy
until the basic control model is completely understood. Behind the few
examples of TRT that I've seen, there is a strong bias to see the
environment as the causal agency, and behavior as its effect. But rather
than arguing about that, let's look at the properties of a basic control
system and work from that model into the more complex models of operant
conditioning.

In PCT, how is a control system learned? Trial and error? If so is
not trial and error selection-by-consequences?

Let's set up the basic model; then I'll show, in a simple-minded way,
how the model can learn to set its parameters appropriately for its
environment and its own basic properties. Trial and error
(reorganization) in the PCT model is a control process of the E. coli
type.

ยทยทยท

-----------------------------------
In your "learning" model, the effect of the learning is to convert a
goal-seeking process into a random walk around the goal that never gets
to the goal except by chance. This program does not prevent the organism
from getting into regions of "lethal" concentration (as you mentioned);
It simply disrupts its control in a large region around the target
position.

You have the organism adjusting speed on the basis of the error signal
dNut < 0, but there is a second comparator the use of which is
contingent on the output of the first:

if dNut <= 0 then
begin
if Speed > 0.3 then Speed := Speed - LearnRate;
{else} Tumble(Angle);
end
else
if Speed < 2.0 then Speed := Speed + LearnRate;

The statement

if Speed > 0.3 then Speed := Speed - LearnRate;

implements a little control system which maintains speed at or less than
0.3. The implication is that speed is sensed, compared with a reference
level of 0.3, and if there is a positive error, reduced in steps of
"LearnRate". This is simply a one-way speed control system.

There is another one-way speed control system which acts only if dNut is

0. It maintains speed at or above 2 units.

I don't see any learning going on here. You have three control systems
related to each other in rather complex ways, but this is a fixed
structure with properties set by your program, and it doesn't alter its
properties at all. It simply behaves according to its design.

I'll start working on that control-system program. It shouldn't take
long.
-----------------------------------------------------------------------
Tom Bourbon (941031.1413) --

I have seen many creators of models say their models do not contain
reference signals, when they do.

So have I. I saw a model by -- some famous writer of a book on
biological control systems -- who didn't believe in set points, but in
his equations for an organism's temperature regulating system there it
was, nestled in the middle of a page of equations: the number 37. He was
measuring temperatures in centigrade, of course.