The Party's Over

Tom Bourbon [941130.0833]

[From Bruce Abbott (941129.1230 EST)]

I can almost see Bill, Rick, and Tom doing "high fives" over the "success" of
their recent attempts to discredit the ECOLI4a reinforcement-based learning
model. The problem from my perspective is that these manipulations have
nothing whatever to do with establishing the truth or falsity of the law of
effect, nor are they at all relevant to the thesis I have been developing and
defending. So don't break out the party hats just yet. It's time to go back
to class.

Hmmm. That seems like a strange reading of the recent rounds of modeling,
Bruce. When were there any, ". . . attempts to discredit the ECOLI4a
reinforcement-based learning model"? I have seen a series of alterations to
the environment shared by _both_ the TRT model and the PCT model. The
altered environments provide tests for _both_ models, not just the TRT model.
Under the new conditions, either model, or both of them, can succeed, or
fail. The fact that the TRT model repeatedly came up short, until you gave
it a special retrofit of new features, and that the PCT model continued to
work with no revisions, means the unmodified PCT model passed all of those
tests, but the unmodified TRT model did not. That's all.

The situation here is like the one in "Models and their worlds," where Bill
and I compared the performances of three models of behavior when they ran in
three different environmental conditions. All three models were subjected
to the same tests. One model succeeded in only one environment; another
succeeded in two environments; the PCT model succeeded in all three
environments. That's all. :wink:

PCT Defenders of the Faith have been viewing this debate about ECOLI4a as a
test between PCT and traditional reinforcement theory, and have endeavored to
construct critical tests that will show the superiority of the former to the
latter.

The fact that one model succeeds where the other does not is a result of the
simulations. Why should we conclude that the result reflects a deliberate
attempt to show that one model is superior to another? _Both_ models were
tested.

I am a strong believer in this strategy (my publication record
includes several examples in which I pitted one theoretical view against
another), but as I have repeatedly warned, it is not appropriate here. You
will note that my description of ECOLI4a above includes no mention of
traditional reinforcement theory terms such as reinforcement. Instead I have
attempted to place ECOLI4a firmly within the PCT framework as an example of an
efficient reorganizing control system. Seen in this perspective, your efforts
to discredit ECOLI4a amount to an attack on PCT. Is this really what you had
in mind?

Hmm. Then one alleged kind of PCT model outperforms another alleged kind of
PCT model across a wider range of environments. That's interesting. Which
of the two kinds of PCT model works under the wider range of conditions,
with fewer post hoc modifications? Isn't that one of the questions we want
to answer?

Class dismissed.

Party? Class? Hmmm.

Later,

Tom

[From Bruce Abbott (941129.1230 EST)]

I can almost see Bill, Rick, and Tom doing "high fives" over the "success" of
their recent attempts to discredit the ECOLI4a reinforcement-based learning
model. The problem from my perspective is that these manipulations have
nothing whatever to do with establishing the truth or falsity of the law of
effect, nor are they at all relevant to the thesis I have been developing and
defending. So don't break out the party hats just yet. It's time to go back
to class.

Review of the ECOLI4a Model

ECOLI4a demonstrates a ubiquitous method whereby more complex organisms than
e. coli learn to bring perceptual variables under control, a process called
"selecting and connecting" by Thorndike but more typically referred to as
trial-and-error learning. Although the physical mechanisms that implement
this form of learning have yet to be elucidated, they can be modeled for the
purpose of simulation as conditional probabilities that are altered by certain
consequences of behavior.

For the ECOLI4a simulation, the behavior to be learned is when to tumble. To
determine whether a tumble is beneficial or harmful to e. coli's interests
under a given circumstance, ECOLI4a monitors the change in the rate of change
in nutrient concentration that follows a tumble. This requires a number of
abilities no real e. coli has, including the ability to compare dNut before
and after a tumble, the ability to "remember" whether conditions were
favorable or unfavorable immediately prior to a tumble, and the ability to
assign "credit" to the appropriate prior favorable or unfavorable state (i.e.,
adjust the correct probability).

As modeled, e. coli will adjust its behavior according to its internal
criteria: increase the probability of a tumble associated with the condition
present prior to a tumble when tumbling makes things better, and decrease that
probability when tumbling makes things worse. These criteria work in the
standard environment: as ECOLI4a senses the effects of its tumbling behavior
under favorable and unfavorable conditions, it quickly learns that tumbling is
a good thing to do when conditions are unfavorable and not a good thing to do
when conditions are favorable.

ECOLI4a as a Two-Level Control System

ECOLI4a can be described as a two-level control system. The bottom-level
system actuates tumbles under appropriate conditions as determined by the
current state of dNut (positive or negative) and the associated probabilistic
tumble generators, which control tumble rate under each dNut state. The top-
level system determines the parameters of the lower-level system as a function
of experience with the effect of tumbling on dNut under the two dNut states.
Given an environment that offers a different set of experiences (tumble
outcomes) from those provided by the standard environment, ECOLI4a will learn
to behave differently in the presence of positive and negative nutrient
gradients, consistent with my claim that these changes can be appropriately
described as learning. From the PCT perspective the top-level system is not a
control system per se but a reorganization system, as was nicely stated by
Hans Blom. As Hans noted, reorganization is different from control.

Artifact Busters

I tried Rick's "artifact buster" code in ECOLI4a and it does indeed destroy
the model's ability to learn appropriate behavior from its experiences with
tumbling under favorable and unfavorable dNut conditions. Rick could not
state why his code had this effect but seemed very pleased that it did. What
this code does is to displace e. coli 15 units from its previous position, in
a random direction, following a tumble. The reason it disrupts ECOLI4a's
learning is that the random displacement changes the value of dNut following a
tumble, even if the tumble results in movement along the same pre-tumble
trajectory as before. The change in dNut following displacement is a direct
consequence of the nutrient gradient, which falls off with the square of the
distance from the nutrient source. Make the displacement large enough and the
change in dNut produced by the tumble (change in trajectory) is overwhelmed by
the change induced by the displacement.

The conclusion to draw here is that the learning criterion used by ECOLI4a is
no longer appropriate given e. coli's new behavior. A mutated ECOLI4a that
engages in random "tumble-and-jump" behavior is an evolutionary failure
because its learning criterion is no longer appropriate given the new
consequences of its behavior. It cannot learn to behave appropriately. A
mutated one-level e. coli with the sign of its gain reversed would similarly
fail.

ECOLI4a as a Model of Human Behavior

In an earlier post I suggested that ECOLI4a MAY (and I emphasized the word
MAY) provide a model of human performance in the e. coli task. In that task,
the participant was told to make a moving spot on the screen move to another,
stationary spot. The only available way to control the moving spot was by
pressing the space bar, which initiated a tumble that set the spot moving off
in a new, randomly selected direction. Participants quickly mastered the
task.

Rick informs us that the mutated ECOLI4a fails to climb the nutrient gradient,
but that human participants have no trouble making the "tumble-and-jump" e.
coli reach the center of the screen. He concludes that he has proven the law
of effect (trial-and-error learning) invalid while once again showing the
superiority of the control model. The actual state of affairs is somewhat
different.

Unlike ECOI4a, Rick's human participants can see the location of the nutrient
source, the current position of the e. coli spot, and the latter's angle of
travel relative to the nutrient location. They do not know anything about the
nutrient concentration around the e. coli spot, nor can they sense the change
in nutrient concentration as the spot moves toward or away from target. I
must conclude that the human participants are using a different set of
perceptual variables from those modeled by ECOLI4a. Those variables are not
greatly disturbed by a random displacement. For example, the participants may
be observing the angle of travel of the spot relative to the target. While
trying out the space bar, they soon learn that pressing the space bar while
the spot is moving more-or-less toward target (angle < 90 degrees away from
target) tends to make things worse, whereas pressing the space bar while the
spot is moving more-or-less away from target usually makes things better. The
random jump has, under most circumstances, little effect on the computed
change in angle following a tumble.

Because the human participants and ECOLI4a base their learning on different
perceptual variables, the conclusion drawn by Rick about the relative validity
of the two models is invalid. All that Rick has succeeded in demonstrating is
that the perceptual variables in the two cases are different.

When I said that ECOLI4a MAY provide a model of human performance, I did not
intend that statement to be taken literally unless the human participants were
given only the information about the consequences of a tumble that ECOLI4a
has. I meant that basic reorganization system employed in ECOLI4 might
provide a model, using an appropriately defined consequence-variable. To
transform ECOLI4a into a model of human performance, one would have to use the
same perceptual variable for learning that the human participants are using--
one that, as I have described, is less disturbed by those random jumps. Of
course, the new model will no longer pretend to be an e. coli, capable of
sensing only the change in nutrient concentration.

In addition, one has to keep in mind that humans are far more capable than a
simple two-level model. If one variable does not seem to work, they will use
their multi-variable perceptual capabilities and other capacities to identify
a perceptual variable that does work, if one is present and not too obscure.
To model this sort of ability would require a far more sophisticated model
than any of us has yet written--or is likely to write. It would be unfair to
require ECOLI4a to shift its learning-criterion variable the way humans can
when the currently-used criterion is made to fail by some clever
"deartifacting" change in contingencies.

ECOLI4a and the Plain-Vanilla Control Model

It is also inappropriate to compare a mutated ECOLI4a's ability to control
with that of the standard one-level control model, as in the recent Powers
demo. The altered consequences of a tumble destroy ECOLI4a's ability to learn
the correct behavior from its experience. These changes have no effect on the
simple control model because the simple control model does not learn--the
appropriate behaviors are built into the code. If we take ECOLI4a and set the
probabilities to their terminal values reached under the normal conditions in
which ECOLI4a was designed to learn, we will have reduced ECOLI4a to the non-
learning lower-level system equivalent of the Powers model. You will find
that this model has no more trouble finding its way to the nutrients than the
Powers version does.

No Contest

PCT Defenders of the Faith have been viewing this debate about ECOLI4a as a
test between PCT and traditional reinforcement theory, and have endeavored to
construct critical tests that will show the superiority of the former to the
latter. I am a strong believer in this strategy (my publication record
includes several examples in which I pitted one theoretical view against
another), but as I have repeatedly warned, it is not appropriate here. You
will note that my description of ECOLI4a above includes no mention of
traditional reinforcement theory terms such as reinforcement. Instead I have
attempted to place ECOLI4a firmly within the PCT framework as an example of an
efficient reorganizing control system. Seen in this perspective, your efforts
to discredit ECOLI4a amount to an attack on PCT. Is this really what you had
in mind?

Class dismissed.

Regards,

Bruce (:->