adaptive system; Rick's program

[From Bill Powers (950615.1230 MDT)]

Bruce Abbott (950615.1220 EST) --

     S+ and S- are only names denoting discriminably different stimulus
     conditions, such as red and green. The designations as S+ or S-
     simply denote whether those stimuli are consistently associated
     with relatively attractive versus relatively aversive conditions
     (i.e., given that two stimuli were associated with different
     nutrient concentrations, S+ would be the one associated with the
     greater concentration, S- with the lesser). If instead of
     nutrient, we substitute something toxic, S+ would be the stimulus
     associated with the relatively more favorable conditions, the lower
     concentration of the toxin, and S- with the greater.

This is the opposite of what I meant by "manually" changing the model:
you're "mentally" changing it by changing what you mean by S+, without
making any corresponding change in the model.

     There is no need to "manually change the definitions" of S+ and S-
     as these definitions have no functional significance in the model.
     Red and green are still red and green. If you change what is
     associated with them, then WE might want to change what we call
     them (S+ or S-) in order to be consistent with the definitions,
     but the model does not change.

OK, try this. Mentally associate a repellent with what you call Nut and
dNut and so forth and mentally consider S+ to be S- and vice versa, and
run the model without changing anything. It still maintains a positive
gradient of the variable you are now thinking of as a repellent. So it
is making its experience of the repellent increase, not decrease.

As the model now is written it will always act to keep dNut (or whatever
symbol you use in that position) greater than zero. If you do a global
find and replace, changing every instance of "nut" to "rep" (for
repellent), the model will STILL seek to keep the gradient positive, and
will move to regions of greater "rep". You could also change every
instance of the symbol "S+" to "S-" and vice versa without affecting how
the model works. You would have to "manually" change something in the
program to make it maintain dRep negative, which is what is required if
dRep is a repellent (note that a positive dRep is not simply a negative
dNut: concentrations are always positive, and they are inverse-square
fields). Ideally, the model should do this itself automatically. It
should move E. coli down the gradient of a repellent and up the gradient
of an attractant. It can't do this now without reprogramming.

You can't make the model work properly by just changing symbols, in your
mind or in the program. The _functional relationships_ among variables,
not our mental associations with symbols in the program, determine how
it behaves. That's why we do simulations: the simulation doesn't care
about our private associations with the variable names. If you want the
program to move e. coli down an inverse-square gradient instead of up,
you have to change one or more functional relationships in the model.

     To handle a toxin gradient, the "valence" of the substance located
     at the source in the environment, as perceived by the organism,
     must be changed from positive to negative. This can be done by
     hand in the present model by reversing the signs of the effects on
     PTS+ and PTS-, but a better approach would be to designate a sensed
     valence for the environmental substance and have the model use that
     valence as a multiplier in the equations that modify the two
     probabilities.

OK, after all that talk you ended up realizing that you have to
reprogram the model to make it work with a toxin. Your suggestions about
how to do this are reasonable. When you implement them, you will have a
truly adaptive model, if not the simplest one possible.

     But all this is rather academic. The e. coli situation is not one
     in which reinforcement principles actually apply beyond the simple
     notions of attraction and repulsion. There is no learning going
     on, only behavior determined by the internal structure of the
     organism and the properties of its environment. I think most
     reinforcement theorists would simply suggest that there is no
     reinforcement here, only the functioning of a hard-wired adaptive
     mechanism for seeking nutrients--an instinctive or reflexive
     mechanism rather than one operating according to reinforcement
     principles.

So is learning NOT "behavior determined by the internal structure of the
organism and the properties of its environment?" When we model anything
that resembles learning, we come down to specialized circuits that have
the effect of modifying parameters in other circuits. What else could
learning be?

The notion of "hard-wired" isn't very useful: if you hard-wire
multipliers into the system, it can change its organization just as
readily as it could do under "software" control. Anything you can do
with a program can be done with hard-wired components. If you stop and
think about it, anything you can do with a program is ALWAYS done with
hard-wired components.

If reinforcement principles work, they don't work by magic. They work
because of the way certain components in the behaving system work. That
is also true of "reflexive" and "instinctive" behaviors. These are old-
fashioned distinctions that don't mean much any more.

     For this reason I believe that it would be fruitless to demonstrate
     that reinforcement principles cannot "handle" some restrictive
     situation, such as this one, that the control model works well in.
     Reinforcement theorists would be more than happy, in that case, to
     grant that the control model works, but would declare that the
     failure of the reinforcement model in this situation would say
     nothing at all about its validity, as it would not be expected to
     operate under those conditions. Reinforcement theorists do not
     claim that reinforcement is the ONLY source of "control over
     behavior."

If we remove, one at a time, all the examples in which control theory
works as well as or better than reinforcement theory, sooner or later
there will be no pure examples of reinforcement left. That is, there
will be no case in which ONLY reinforcement theory can explain the data.

It may be true that "the failure of the reinforcement model in this
situation would say nothing at all about its validity", but that is also
true of SUCCESS of the reinforcement model. All that success shows is
that the reinforcement model fits the observations, not that it explains
them correctly in terms of the actual mechanisms involved.

If you set up the E. coli model in the PCT fashion, without using the
four logical conditions, you could record the successive values of dNut
before and after tumbles, and show that the same relationships still
exist: the percentages of each case would be exactly the same, and you
could compute the states of the reinforcement and of the discriminative
stimulus. They would be completely consistent with reinforcement theory.
However, while these observations would fit what we observe, they would
not be correct descriptions of the mechanisms actually creating the
behavior in the PCT model, or whatever non-reinforcement model is
actually running.

Wherever both PCT and reinforcement theory can explain the same
observations, choosing between the models has to depend on something
other than the fit of predictions to data.

···

-----------------------------------------------------------------------
Rick Marken (950615.0900) --

From what Bruce says, if reinforcement theory doesn't explain how we

learn to do the task in your new program, that will only show that
reinforcement theory isn't _expected_ to work in that situation.

I note that you have a disturbance in this experiment. I think that is
the key to distinguishing between a reinforcement model and the PCT
model. It will be interesting to see if the reinforcement model has any
problem with this task.
-----------------------------------------------------------------------
Best to all,

Bill P.