# Bruce:100; Bill and Rick: 0

[From Bill Powers (950613.0815 MDT)]

Bruce Abbott (950611.1635 EST) --

Rick and I have been wrong and you have been right.

Last night I finally did a little experiment. I set up the following
program:

···

====================================================================
program testecol;
uses dos,crt;
{
if R+ then
if S+ then
inc(PTS+)
else inc(PTS-) *
else if R- then
if S+ then
dec(PTS+) *
else dec(PTS-).

R := n2 > n1; S = n1 > 0;

inc(pts+) = n2 > n1 and n1 > 0
inc(pts-) = n2 > n1 and n1 < 0
dec(pts+) = n2 < n1 and n1 > 0
dec(pts-) = n2 < n1 and n1 < 0
-------------------------------------
}

var n1,n2,q1,q2,q3,q4,t,tmax: real;
i: word;
ch: char;
begin
q1 := 0.0;
q2 := 0.0;
q3 := 0.0;
q4 := 0.0;
randomize;
clrscr;
tmax := 1e6;
t := 0.0;
while t < tmax do
begin
n1 := cos(2*pi*random); {component of velocity to right}
n2 := cos(2*pi*random);

if (n1 > 0.0) and (n2 > n1) then q1 := q1 + 1.0; {inc(pts+)}
if (n1 > 0.0) and (n2 < n1) then q2 := q2 + 1.0; {dec(pts+)}
if (n1 < 0.0) and (n2 > n1) then q3 := q3 + 1.0; {inc(pts-)}
if (n1 < 0.0) and (n2 < n1) then q4 := q4 + 1.0; {dec(pts-)}
t := t + 1.0;
end;

# writeln('inc(pts+) = ',q1*100.0/tmax:6:1,          '% dec(pts+) = ',q2*100.0/tmax:6:1,'%', chr(13),chr(10),          'inc(pts-) = ',q3*100.0/tmax:6:1,          '% dec(pts-) = ',q4*100/tmax:6:1,'%'); ch := readkey; end.

The results of this "Monte Carlo" test with one million trials were

inc(pts+): 12.5% of the trials
dec(pts+): 37.5%
inc(pts-): 37.5%
dec(pts-): 12.5%

This is exactly what you said would happen, and what I have been trying
to deny through verbal reasoning for months. You are perfectly right in
saying that everything I have accused you of, I have been doing myself.

Well, I knew SOMEBODY was doing it.

The condition "N1>0" is equivalent to S+; "N2>N1" is equivalent to R+.
It doesn't matter what N stands for. If any series of numbers is
generated randomly within a fixed zero-centered range, the two most
common conditions will be

(n1 > 0.0) AND (n2 < n1) and
(n1 < 0.0) AND (n2 > n1)

One or the other of these two conditions will occur 75% of the time; one
of the other two, the remaining 25% of the time. I don't know how you
arrived at this result, but you were right. It doesn't matter whether
you compute the velocity to the right, as above, or simply

N1 = random - 0.5
N2 = random - 0.5

(Where "random" returns a real number between 0.0 and 1.0).

The condition that occurs 75% of the time is the "right" condition, the
one leading either to a decrement in probability of a tumble given S+,
or an increment in probability of a tumble given S-. When we subtract
the two "wrong" cases, we find that on the average, the probabilities
are adjusted the right way at half the rate they would be adjusted
without the added pair of conditions N2 > N1 or N2 < N1. But they are
adjusted the right way and by a large margin.

II. The Purpose of the Demonstration

The purpose of the demonstration was to prove that a model based on
reinforcement principles _could be constructed_ which would behave
as specified (a proof of principle). It was asserted that one
could not.

That assertion was wrong. Rick and I have been assuming without proof
that there is no systematic way to predict the outcome of random tumbles
using the history of results before and after a tumble. In the E. coli
case or any similar case this is not true.

Note that if we apply the above analysis to the PCT model, we will get
exactly the same results. Since the tumbles are generated at random, it
will still be true that the logical functions [(N1 > 0) and (N2 > N1)]
and the other three will occur with the same probabilities, 12.5% and
37.5% as in the table above. This distribution is a property of
randomly-generated numbers within a fixed range, passed through the
appropriate logical filters. This has nothing to do with the operation
of the PCT system, which does not make use of this fact.

reinforcement-based model could behave properly in the test
situation is equivalent to asserting that the Ptolemaic system
could not properly describe the motions of Mars. Ptolemaic theory
may be wrong, but is it true that it can't handle these data?

You are right. Reinforcement theory does fit the data. It does so not
because it is necessarily based on the mechanism that is actually
producing the data, but because it expresses a natural law concerning
random numbers, a law which the framers of reinforcement theory noticed
empirically. The parallel to epicycles is quite exact, because what
epicycles do is express the true idea that any waveform can be
represented as the sum of a series of sine-waves with suitable phases
and amplitudes. Using such a description, one can fit a curve to any
series of planetary positions, regardless of the mechanism that is
actually governing those positions. And while I haven't worked this out
at all, it may be that reinforcement theory, using a change in a
variable to define reinforcement and the prior state of that variable
(or one related to it) to define a discriminative stimulus, can be fit
to any purposive behavior regardless of the mechanism actually
responsible for the behavior.

What's more important, I think, is that a physical system could be
constructed with a logical perceptual function that worked according to
the above program steps, and that it could become part of the mechanism
of a control system. In other words, if a physical system is in fact
organized to take advantage of the natural law concerning randomly
generated numbers in a bounded range, the reinforcement model could be
the correct model. If the stars and planets were in fact attached to
rotating crystalline spheres, the epicycle model would be the correct
model.

It's just as important, however, to realize (as you do) that the
reinforcement model is not necessarily the correct model of a system
even if it correctly describes the observable relationships. The PCT
model of E. coli does not make use of the special property of bounded
random numbers, yet it produces the same overall result. By the same
token, the fact that the PCT model produces the right behavior does not
automatically make it the right model. To select the right model, or at
least the more correct one, we must turn to auxiliary evidence that can
help us choose.

I am getting a feeling that you've been ahead of me for some time.

For these relationships to hold, the organism must be assumed to
have appropriate structures that provide the necessary functions.
For reinforcement and punishment to work, there must a sensory
structure that detects the rate of change in nutrient
concentration, a structure to store the rate immediately prior to a
tumble, and a structure to compare this rate to the rate
immediately after a tumble. It is not difficult to imagine a set
of molecular components that might provide these functions, but
these would involve pure speculation on my part, so I included in
the model only what they do, not how they do them.

Yes, I can now see that such a perceptual function is possible. What I
did not see before, or believe, was that there would be anything for it
to perceive. I thought the distribution of percentages above would be
equal.
------------------------------------------
Now let's talk about the output mechanism.

For discrimination to take place, there would also need to be a
mechanism that could selectively associate the stored state of
nutrient change prior to a tumble with the appropriate structural
representation of tumble probability (perhaps the concentration of
a chemical whose effect on tumble probability is mediated by an
enzyme whose concentration represents the stored value of, say S+,
but again, such mechanisms are speculative; only the functions are
modeled).

When probabilities are reduced to physical mechanisms for generating
them they usually turn out to be something pretty simple. A device for
converting the probable mean value of a signal to a signal representing
that number could consist of a resistor and a capacitor.

In the case of your model, the probability generator is a program
statement that is executed over and over until a tumble occurs;
typically

if (Random < pTumbleGivenSplus) then DoTumble

The adjustment of probabilities is not done until a tumble finally
occurs. So this program step does nothing but create a delay: the lower
the probability of a tumble, the more iterations are likely to occur
before the tumble occurs. Since the modeled E. coli continues to move on
every iteration, the more distance is likely to be covered.

When the tumble occurs and the probabilities are to be adjusted, we have
a (typical) program step

pTumbleGivenSplus := pTumbleGivenSplus - LearnRate;

The desired effect could just as easily be obtained by writing

DelayGivenSplus = DelayGivenSplus + LearnRate

because decreasing the probability is the same as increasing the delay.
The mechanism for creating the delay is unimportant.

Thus using the concept of "changing a probability" is merely a way of
altering the delay before the next tumble. There are many physical
circuits that can create a delay that can be varied by varying a signal
entering the delay generator. The circuit least likely to be found in a
real system is one which literally calculates a random number and
compares it with a fixed number, as in the program step

if (Random < pTumbleGivenSplus) then DoTumble.

In my model for operant conditioning, I convert an error signal to a
frequency of bar-pressing by letting the error signal determine the rate
at which a timer (an integrator) counts upward. When the integrator
output reaches a fixed trigger level, an output event is generated and
the timer is reset. This is a simple circuit that is easy to implement
in neurons or biochemistry: a "variable-frequency relaxation
oscillator."

In models of operant conditioning you and others have proposed, the
variation in output event rate is accomplished just as DoTumble is
calculated: by comparing a random number generator's output with a fixed
value representing a probability. If the probability is increased, the
next event is generated sooner: the event rate is increased. The net
result is the same as increasing the error signal in my model: the
threshold for the event generation is reached sooner.

So the concept of "the probability of a response" is converted simply to
"the frequency of response generation," and the mechanism is converted
from the literal generation of random numbers to the operation of a
simple relaxation oscillator.

If the real system operates by using a simple relaxation oscillator,
characterizing it as varying a "probability" is unnecessary and
inappropriate. Applying the concept of probability relies on an analogy
rather than a description of the actual system. If we have a choice
between a literal probability calculation and a simple relaxation
oscillator as elements of the model, but no direct evidence as to the
actual mechanism, we would choose the simpler model rather than
introducing complexity for its own sake. Or at least, I would.
---------------------------------------
The Challenge was not to build a model that would work in any
gradient, it was to build one that would work in the gradient
supplied. This model does. Whether it works in other environments
is irrelevant.

Actually, the relationships in the "testecol" program will appear in any
that don't change the order of the random numbers make no difference.

When you were arguing that your model did work, you did not go through
the model as I have done to see whether it worked as you said it
worked. You went rapidly through some verbal arguments, but the
clincher for you was that the right result occurred: E. coli approached
the target.

Now these are real fighting words, Bill. I suggest you get out
those old posts of mine concerning ECOLI4a and READ THEM CAREFULLY.

Fighting words, indeed -- Champ.

As I recall, I expended considerable effort carefully describing
the mechanism of ECOLI4a. We went through at least two
misdescriptions on your part and I posted not only a clear diagram
of the model's logic but an equally clear diagram as to how the
specific nutrient gradient determined the outcome of the
simulation. Remember those "Marken probabilities," you know, where
Rick said the outcome HAD to be 50-50, so that no learning was
possible, his computer program said so, never mind the diagram? I
strongly encourage you to go review those exchanges and see whether
your recollection of the events matches what appears there.

At least you can give me a crumb of credit for continuing to worry over
the problem and finally coming up with what was for me the missing fact.
If you had known what the key problem was -- the actual distribution of
probabilities -- you would no doubt have come up with a rigorous proof
that they were as you claimed. I can claim distraction by terms like
"reinforcement" and "discriminative stimulus," which sound very complex
until you realize they can be reduced to (N2 > N1) and (N1 > 0), and
nonphysical ideas like "probability of a response", which reduces to the
rate of response generation or its reciprocal, delay to the next
response.

Sorry, Bill. (1) My arguments are sufficient. (2) They do not
gloss over any defects, fundamental or otherwise, in logic. (3)
Therefore your speculations as to why I thought they were
sufficient are moot. I thought they were sufficient because they
are sufficient, not because I was being led by the nose by any
forgone conclusions.

Yes, yes, yes. Can we talk about something else pretty soon, I hope? I
have been a living example of goal-directed reasoning, and what's worse
have projected my own fault onto you.

Bill, before you go calling the kettle black, I suggest you take a
good, long look at your own performance in this little debate. You
may find it to be an eye-opener. Even PCT theorists can't escape
from behaving as PCT predicts. (;->

Come on, somebody else make a comment that is really truly wrong. I need
someone to beat up on.