[From Rick Marken (941101.1100)]
Bruce Abbott (941031.1500 EST) claims that reinforcement theory (TRT) is
control theory:
In its crude, inept, and misguided way, TRT IS a model of a control system,
as I think my e. coli simulation demonstrates. But by failing to recognize
the reference, by ignoring in purely verbal descriptions the role of
feedback, and by mistakenly placing the emphasis on the disturbance-behavior
relationship, TRT commits a set of fundamental errors.
These are fundamental errors, indeed. But what is the reference a reference
for? Why does it matter if feedback is ignored? And why is it a mistake to
place emphasis on distrubance- behavior relationships?
The answer to all these questions is: because a control system's behavior is
organized around the control of its own perceptual input. What is
important about your model is that it _controls_ a perceptual variable,
dNut, (the same perceptual variable is controlled in your new version of the
model -- with varying movement rate -- also). Since dNut is a consequence of
the varying outputs of the system, the system acts to control the
consequences of its actions. The system selects the consequence it wants to
perceive (this is determined by the reference setting for dNut) and acts to
keep the actual consequences of its actions near this reference setting. So
control is appropriately described as _selection OF consequences_.
There is no selection BY consequences in your control model. If you had read
Bill Powers' (941031.0700 MST) post more carefully, you would have seen why
this is true -- graphically. Here's the important graph:
Tumble>rate
>
//////////////////////////////////
> /
> /
-dNut | / +dNut
================================0======//////////////////////////==
> ref^signal setting
Whether the organism tumbles or not (tumble rate) depends on the setting of
the reference signal; it does not depend on the consequence of actions, dNut.
You might be able to see this more clearly if you make the reference setting
in your model a random variable that is changing relatively slowly over time.
If you keep track of the probability of tumbling (or of continuing in the
same direction) as a function of dNut, I think you will find that, in the
long run, this probability for all values of dNut, will approache 0.5. In
other words, no consequence of action is inherently reinforcing or punishing;
no value of dNut "selects" tumbling or continuing in the same direction. The
behavior of E. coli is not selected BY its consequences; the model selects
(via the reference signal setting) the consequences it wants to perceive and
it makes those consequences happen.
Thus, when you say that the contribution of control theory to TRT is explicit
incorporation of a reference setting, you are missing a very important point;
you are missing the point that this reference setting is a specification for
a perceived consequence of action. You say, for example, that:
It [missing reference setting] is why my poor e. coli will continue moving
up the concentration gradient even if the concentration becomes high enough
to kill it. But THIS is where TRT fails, not in selection-by-consequences.
But this statement misses the main point of the control theory model of
behavior, which is that behavior is the _control OF perception_. What is
perceived (dNut in this case) is (at least in part) a consequence of action.
Actions (tumbles or non-tumbles) produce consequences that are close to the
selected consequence. This is control and it clearly involves selection OF
consequences.
If behavior were actually selected BY consequences, then the organism would
not have control of the consequences of its actions; the organism would not
be in control. Moreover, by maintaining their belief that behavior
is selected BY its consequences, reinforcement theorists are able to ignore
the central question about the behavior of a control system, viz. WHAT
CONSEQUENCE(s) (PERCEPTUAL VARIABLE(S)) IS THE ORGANISM CONTROLLING?
Regarding your model I said:
So, is this the way reinforcement works. Bruce? And, if so, why aren't EAB
researchers talking more about reinforcement as controlled perceptual input?
And you0 replies:
Rick, our posts must have crossed paths, so see my previous one (941031.1500
EST) for my "reply." (Hey, I seem to be developing ESP! I am now answering
your questions BEFORE you ask them!)
I don't mind if you answer my questions before or after I ask them as long as
you answer them;-) In this case, you didn't answer my main question, viz. why
aren't EAB researchers talking more about reinforcement as controlled
perceptual input? After all, you claim that reinforcement theory IS control
theory, but with different names for the components of control:
my TRT model is REALLY a cleverly disguised control system! That's hard to
see, because there is no mention of a reference level, feedback loop, gain,
output function, or perceptual signal. These components lurk hidden in the
alien vocabulary of "reinforcers," "punishers," "contingencies of
reinforcement," "strengthening," and so on.
Ok, so reinforcement theory is control theory. Then why aren't EAB
researchers trying to determine the perceptual inputs that organisms actually
control?
By the way, have you run ECOLI?
I haven't run your particular implementation (because I still only have Turbo
Pascal for the Mac) but I have run many versions of the model that are very
much like yours; after all, it is a control model. Bill and I wanted to know
how to build a reinforcement model. If the control model IS a reinforcement
model then, mazel tov; we've already tested the reinforcement model and
found that it works. But now we have the problem of figuring out why TRTists
don't know HOW their model works (it controls perceeption) even though
they've been working with it for years. I find this very puzzling -- it makes
me think that, perhaps, the reinforcement model is NOT really a control model.
Anyway, I'll write up your model in QuickBasic and try to get some of the
data I suggested you get (like the probability of tumbling vs continuing as a
function of dNut). Is there any data that would convince you that the
behavior of a control system (like E. coli) is not selected by its
consequences? I would be convinced that the tumbling and continuing behavior
of E. coli is selected by its consequences if these consequences did what was
necessary to keep behavior in some state even if that behavior were
disturbed. Bill Powers (941030.1845 MST) actually tested this already:
I added a disturbance to the output[ of Bruce's model], by creating
random tumbles independently of the value of dNut. If the consequence
were actually controlling the tumbles, disturbing the tumbles should
result in a change of dNut that would oppose the change in tumbling, and
restore it to (or toward) the undisturbed state.
As far as I can see, that doesn't happen. When you put in the disturbance,
the number of tumbles needed to get near the goal goes way up (a factor of
10 with the numbers I used), and dNut increases much more slowly.
So it looks like consequences are not selecting behavior. But maybe you have
a way to show that behavior is selected BY its consequences. I look forward
to seeing what it is.
You say:
Are we reaching some kind of consensus here about the nature of traditional
reinforcement theory?
I don't think so. You seem to be arguing that reinforcement theory is
the same as control theory except that it doesn't include an explicit
reference signal. I think that there are far more fundamental differences
than that. If, however, reinforcement theory really IS control theory, then
I argue that reinforcement theory is about selection OF consequences; that
is, it is about the control OF perception. In that case, reinforcement theory
has missed the boat, big time, because I can find absolutely no evidence that
reinforcenment theorists have dedicated even a fraction of their time to
determining the variables that organisms control.
Do you see why TRTists have not been especially nonplussed by your e. coli
simulation?
Actually, I have known for quite some time why TRTists have not been
especially nonplussed by my E. coli simulation. It is because TRTists are
control systems (like everyone else) who are controlling for (among
other things) being right rather than for learning about how control systems
work. TRTists will say anything to make it seem like they already know all
about how behavior works; they will do whatever is necessary in order to
protect their preconceptions from the effects of a disturbance (like the
notion of control of peception). TRTists would be a lot easier to convince
if their behavior were controlled by consequences; unfortunately, they are in
control of the consequences of their actions, and in very good control of
those consequences, at that.
Bruce Abbott (941031.1500 EST) --
Oh, and one more thing: I'd be interested to hear from Sam Saunders. Sam, do
my descriptions of TRT agree with your understanding of it?
I'd be interested, too. I've heard at least 3 different explanations of how
reinforcement theory accounts for the E. coli effect (from several different
"top notch" reinforcement theorists who reviewed my E. coli papers). One or
two described what is essentially your control model but most described
something else... curiouser and curiouser.
Best
Rick