[From Bill Powers (941125.1620 MST)]
Bruce Abbott (941122.1615) --
I'm sorry, but I just can't see any learning going on in your model.
Learning, as envisioned in PCT, involves a reorganization of the
system in such a way as to establish or improve negative-feedback
control over some perceptual variable. All I see is a fairly
ordinary two-level control system, one for regulating rate of change
in nutrients and one, operating through the first, for regulating
nutrient concentration. My ECOLI6 works in a similar way to control
stored nutrient levels. It's entertaining, it may represent the
actual way some organism regulates its nutrient levels, but it does
not model learning. In contrast, ECOLI4a alters its response
probabilities in the presence of increasing or decreasing nutrient
gradients as a result of experience with the consequences of its
behavior. Its program code does not tell it how to behave in order
to climb the nutrient gradient; instead it must learn what works,
under what conditions.
What does "altering a response probability" mean in ECOLI4a, your
program? It means changing the delay before the next tumble. If you
increase the response probability, you shorten the delay, on the
average. What we observe is the length of the delay, not the response
probability. We don't see the many comparisons, random < response
probability, that precede the tumble. All we observe is that there is a
delay, and then the tumble occurs. So if your model is intended to be a
simple representation of observables, it doesn't achieve that intent: it
introduces an unobservable imaginary process.
So your computation of response probability is a proposed mechanism
whereby the delay is varied as a function of the logical conditions.
The logical conditions, too, contain a proposed mechanism. More than
that, they contain a value judgement: that it is good for dNut to
increase. The organism's reference level for dNut is contained in the
logic. But it is hard to see, because it is mixed in with the logic that
tells how the old dNut predicts the new one. It is contained in the
wiring diagram, so to speak. We see that if dNut(new) > dNut(old), the
result is to reward the behavior that was taking place during dNut(old)
and increase its probability. But why is that, where does that condition
come from? Why should we not punish rather than reward under that
condition? There is nothing about dNut itself that predicts whether the
organism will seek it or avoid it. We can only say after the fact that
it does seek a positive dNut, because that is the outcome we observe.
So your model contains no learning, either, in the sense you are using
the term. It is foreordained by the logic and the assumption about what
is rewarding that PTS+ will decrease inevitably to 0, and PTS- will
increase inevitably to 1. My simplified logic simply cut out all the
superfluous steps and created that result directly.
You haven't had time to respond to my post with the diagram in it, but
I'm very curious as to what you will say. If dNut was positive before
the tumble, and dNut increased from before to after the tumble, then it
should be that the behavior taking place when dNut was positive is
rewarded and the probability of responding to a present positive dNut
should be increased. But for purposes of going up a gradient, that is
the wrong response. Increasing the probability of responding to a
currently positive dNut will shorten the time delay, and result in the
tumble occurring sooner, not later as is required.
If you didn't spend much time looking at my alternate model, you may
have missed a significant change in sign. I say
if dNut > 0 then DECREASE PTS+. Decrease, not increase.
There is no condition under which it would be advantageous to tumble
sooner if you were already going up the gradient: under which increasing
the probability of a response with dNut > 0 would improve matters. In
your diagram, however, that path is present, and in your descriptions
you refer to that path, apparently not realizing that it would work to
reduce progress up the gradient.
What makes your model work is not the simple straightforward path that
you describe, but a branch of the "punishment" path which is somewhat
complex and certainly not the most obvious path.
Have you tried to make your model work in a gradient of a repellant? As
it is stated, it clearly won't work: an increase in a repellant is not
reinforcing. You will have to change some definitions, particularly the
definitions of S+ and S-. Now a positive dNut must be considered an
unfavorable condition, and an increase in dNut from before to after a
tumble must be considered punishing. So the logic has to adjusted on the
basis that we know the organism will see a positive dNut as having
negative _value to the organism_. We must change our concept of the
organism's reference level for dNut.
I said that somehow, a model that works must contain a model of the
organism, not just of its environment. I think we can see how that model
exists in your logical structure.
···
-----------------------------------------------------------------------
Best,
Bill P.