selection by consequences; E. coli; S-R dead?

William_T_Powers2 · October 28, 1994, 12:25am

[From Bill Powers (941027.1650 MDT)]

Bruce Abbott (941027.1000 EST)]

I hope everyone appreciates that arguments about similarities between
PCT and EAB will become less and less interesting as we start running
models and doing experiments. Nevertheless there are some points that
deserve to be aired a bit more.

I DID say that consequences select behavior (although not just any
behavior, but those that reduce the error signal) and I concede this as
an unfortunate lapse. However, if you substitute Bill Powers's
reformulation my point survives, which is that in both views learning
involves selection by consequences (even if they differ as to what is
selected).

In a closed-loop system it's difficult to attribute causation to any one
stage in the process. Behavior does have consequences that we can
observe (delivery of reinforcers), but what makes these consequences
important is what follows upon them: a change in the internal state of
the organism. If we're talking about real learning, these are intrinsic
states we're talking about. And we can't stop there, because the effect
of a change in an intrinsic variable on learning depends strongly on
what the internal reference setting for that variable is.

I would guess that an animal would learn to press a lever to obtain salt
in its food, if on a salt-deficient diet. So it would appear that the
consequence of obtaining salt selects for the behavior that causes salt
to be delivered. But if the amount of salt in the food is continually
increased by the behavior, there would come a time when delivery of more
salt would select for LESS salt-producing behavior, not more.

Reinforcement theory would thus attribute the power of selection of
either more salt or less salt to a salt reinforcer. The only way out of
this contradiction is to introduce a new concept, satiation. Satiation
reduces or even reverses the selective power of salt as a reinforcer.
But without the control-system model satiation is an ad-hoc hypothesis
introduced for the sole purpose of correcting a defect in reinforcement
theory.

You can see the problem I have with this. Salt, which is simply NaCl, is
spoken of as if it had some special properties other than its physical
or chemical ones. The language suggests that it is the delivered salt
that does the selecting, that causes behaviors of a certain kind to be
selected. This gives to salt powers that I do not think it has. This is
the sort of thing Rick was talking about when he mentioned "animism."

The PCT model of reorganization would say that behavior reorganizes
until it raises the salt input to a certain level because the organism
contains an inner specification for how much salt it is to ingest (or
better, what the electrolyte balance in body fluids is to be). The
selection is governed by a comparator, which compares the current level
of salinity with a reference level and turns on reorganization if there
is a significant difference, either higher or lower. So in a fundamental
sense, it is the organism, not the reinforcer, that does the selecting.

···

--------------------
Let's look at Rick's E. coli experiment.

Please correct me if I'm wrong, but my impression of the concept of
reinforcement is that _past_ reinforcements are supposed to select for
_future_ behavior that will tend to produce the reinforcers. That is, if
a certain behavior has produced reinforcements in the past (up to the
previous instance of behavior and reinforcement), that behavior will
tend to become more probable as the next behavior to be emitted. Natural
variations in behavior assure that the organism will keep producing more
effective behaviors as long as any are possible.

The point of Rick's experiment was to demonstrate a control system that
uses a random output in such a way that the past consequences of
behavior can't possibly select for a "best" next behavior. There is no
"best" next behavior, because the only behavior possible is a random
tumble after some interval of swimming in a straight line. Moreover, the
time-delay used prior to a previous tumble can't be used to indicate a
"best" length of delay during the _current_ episode of swimming, before
the next tumble, no matter how much experience the animal has, or how
long its memory. No matter what the delay times between tumbles, the
next direction of swimming will be chosen randomly, with equal
probability of being favorable or unfavorable. Information from the past
can have no effect on the outcome; this is a memoryless system. I should
add that one can even artificially bias the randomness of the tumbles
_against_ moving in the right direction, and the E. coli process will
still select strongly for progress in the right direction.

The only factor that can be used to bias the tumbles in the right
direction is the _current_ effect of the _current_ swimming direction on
the _current_ setting of the limit at which the timer will trigger a
tumble. This is the only mechanism I have found that will create the E.
coli effect. There is simply no way to use information about previous
behaviors and previous consequences that will create the same biasing of
the random process, or any biasing at all.

I've forgotten what strategy I used in the "tumble" model to represent
reinforcement, but I do remember one I tried at some time. The rule was
that if a previous increase in the delay time resulted in a favorable
direction of swimming after the tumble, the next delay time should be
increased, too -- and the opposite for an unfavorable consequence. This
produces a pure random walk with no tendency to swim upstream or
downstream. So did a strategy that involved averaging over several
previous episodes of delay-time and tumbling. And so did all the other
strategies I tried -- at least four or five others -- that fit the basic
concept of reinforcement as I understand it.

Since you're a modeler, and also know authoritatively how reinforcement
is supposed to select behavior, perhaps you can come up with a
reinforcement simulation that will produce the E. coli effect. Perhaps
both Rick and I failed because we didn't find the right model. If you
could find such a model, that would be an important result, because it
would disprove what Rick and I thought the E. coli model proved: the
production of a systematic effect without any reinforcement as (we
thought) reinforcement is conceived. Demonstration of a successful
method of reinforcement using a random output would give us a second
choice for how the reorganizing system is supposed to work. Either way,
we should know the truth about this matter.
---------------------------------
RE: S-R thinking.

The problem here is that we are finding fault with an aspect of S-R
theory that is not considered central by conventional psychologists.
When psychologists began repudiating S-R theory, what they repudiated
was a particular formulation of it. But they did not repudiate the idea
that behavior is the end of a causal chain that begins somewhere else.

PCT shows that behavior, the actions of an organism, will vary with
every disturbance. Yet the result is to create a consistent outcome. In
operant-conditioning experiments, actions are clearly distinguished from
outcomes: the action is the behavior of pressing a bar, and the outcome
is the rate of delivery of reinforcers. Because of this clear
separation, it is possible to show that schedules have reliable effects
on behaviors -- that is, on the bar-pressing activities. But they have
far less effect on the outcomes, because the outcomes are what the
organism is controlling.

In other fields of psychology, actions and outcomes are not clearly
distinguished. In fact, the term "behavior" is normally used to mean the
outcomes, not the actions. We speak of maze-running behavior, problem-
solving behavior, territory-marking behavior, and so forth. We speak of
what is _accomplished_ by the actions of the organism.

If the outcomes of behavior are under an organism's control, then they
will repeat even though disturbances cause the organism to vary its
actions in the course of maintaining the outcome the same. So we have
the situation where a repeatable "behavior" is produced by actions that
may never repeat.

Unfortunately, the common assumption is that if outcomes repeat, the
actions that produced them must have repeated, too. This assumption is
essential if one is to believe that behavior lies at the end of a causal
chain. The causal chain must be at least as reliable as the outcome is.
So we have theories in which antecedent conditions are said to cause
behavior, and all these theories require that the causal chain from
antecedent to outcome be regular. Both cognitive and other theories make
this assumption.

That is the aspect of S-R theory that we say is not dead.
----------------------------------------------------------------------
Best,

Bill P.