[From Bill Powers (950613.0815 MDT)]

Bruce Abbott (950611.1635 EST) --

Rick and I have been wrong and you have been right.

Last night I finally did a little experiment. I set up the following

program:

## ···

====================================================================

program testecol;

uses dos,crt;

{

if R+ then

if S+ then

inc(PTS+)

else inc(PTS-) *

else if R- then

if S+ then

dec(PTS+) *

else dec(PTS-).

R := n2 > n1; S = n1 > 0;

inc(pts+) = n2 > n1 and n1 > 0

inc(pts-) = n2 > n1 and n1 < 0

dec(pts+) = n2 < n1 and n1 > 0

dec(pts-) = n2 < n1 and n1 < 0

-------------------------------------

}

var n1,n2,q1,q2,q3,q4,t,tmax: real;

i: word;

ch: char;

begin

q1 := 0.0;

q2 := 0.0;

q3 := 0.0;

q4 := 0.0;

randomize;

clrscr;

tmax := 1e6;

t := 0.0;

while t < tmax do

begin

n1 := cos(2*pi*random); {component of velocity to right}

n2 := cos(2*pi*random);

if (n1 > 0.0) and (n2 > n1) then q1 := q1 + 1.0; {inc(pts+)}

if (n1 > 0.0) and (n2 < n1) then q2 := q2 + 1.0; {dec(pts+)}

if (n1 < 0.0) and (n2 > n1) then q3 := q3 + 1.0; {inc(pts-)}

if (n1 < 0.0) and (n2 < n1) then q4 := q4 + 1.0; {dec(pts-)}

t := t + 1.0;

end;

# writeln('inc(pts+) = ',q1*100.0/tmax:6:1,

'% dec(pts+) = ',q2*100.0/tmax:6:1,'%', chr(13),chr(10),

'inc(pts-) = ',q3*100.0/tmax:6:1,

'% dec(pts-) = ',q4*100/tmax:6:1,'%');

ch := readkey;

end.

The results of this "Monte Carlo" test with one million trials were

inc(pts+): 12.5% of the trials

dec(pts+): 37.5%

inc(pts-): 37.5%

dec(pts-): 12.5%

This is exactly what you said would happen, and what I have been trying

to deny through verbal reasoning for months. You are perfectly right in

saying that everything I have accused you of, I have been doing myself.

Well, I knew SOMEBODY was doing it.

The condition "N1>0" is equivalent to S+; "N2>N1" is equivalent to R+.

It doesn't matter what N stands for. If any series of numbers is

generated randomly within a fixed zero-centered range, the two most

common conditions will be

(n1 > 0.0) AND (n2 < n1) and

(n1 < 0.0) AND (n2 > n1)

One or the other of these two conditions will occur 75% of the time; one

of the other two, the remaining 25% of the time. I don't know how you

arrived at this result, but you were right. It doesn't matter whether

you compute the velocity to the right, as above, or simply

N1 = random - 0.5

N2 = random - 0.5

(Where "random" returns a real number between 0.0 and 1.0).

The condition that occurs 75% of the time is the "right" condition, the

one leading either to a decrement in probability of a tumble given S+,

or an increment in probability of a tumble given S-. When we subtract

the two "wrong" cases, we find that on the average, the probabilities

are adjusted the right way at half the rate they would be adjusted

without the added pair of conditions N2 > N1 or N2 < N1. But they are

adjusted the right way and by a large margin.

II. The Purpose of the Demonstration

The purpose of the demonstration was to prove that a model based on

reinforcement principles _could be constructed_ which would behave

as specified (a proof of principle). It was asserted that one

could not.

That assertion was wrong. Rick and I have been assuming without proof

that there is no systematic way to predict the outcome of random tumbles

using the history of results before and after a tumble. In the E. coli

case or any similar case this is not true.

Note that if we apply the above analysis to the PCT model, we will get

exactly the same results. Since the tumbles are generated at random, it

will still be true that the logical functions [(N1 > 0) and (N2 > N1)]

and the other three will occur with the same probabilities, 12.5% and

37.5% as in the table above. This distribution is a property of

randomly-generated numbers within a fixed range, passed through the

appropriate logical filters. This has nothing to do with the operation

of the PCT system, which does not make use of this fact.

To return to our favorite illustration, asserting that NO

reinforcement-based model could behave properly in the test

situation is equivalent to asserting that the Ptolemaic system

could not properly describe the motions of Mars. Ptolemaic theory

may be wrong, but is it true that it can't handle these data?

You are right. Reinforcement theory does fit the data. It does so not

because it is necessarily based on the mechanism that is actually

producing the data, but because it expresses a natural law concerning

random numbers, a law which the framers of reinforcement theory noticed

empirically. The parallel to epicycles is quite exact, because what

epicycles do is express the true idea that any waveform can be

represented as the sum of a series of sine-waves with suitable phases

and amplitudes. Using such a description, one can fit a curve to any

series of planetary positions, regardless of the mechanism that is

actually governing those positions. And while I haven't worked this out

at all, it may be that reinforcement theory, using a change in a

variable to define reinforcement and the prior state of that variable

(or one related to it) to define a discriminative stimulus, can be fit

to any purposive behavior regardless of the mechanism actually

responsible for the behavior.

What's more important, I think, is that a physical system could be

constructed with a logical perceptual function that worked according to

the above program steps, and that it could become part of the mechanism

of a control system. In other words, if a physical system is in fact

organized to take advantage of the natural law concerning randomly

generated numbers in a bounded range, the reinforcement model could be

the correct model. If the stars and planets were in fact attached to

rotating crystalline spheres, the epicycle model would be the correct

model.

It's just as important, however, to realize (as you do) that the

reinforcement model is not necessarily the correct model of a system

even if it correctly describes the observable relationships. The PCT

model of E. coli does not make use of the special property of bounded

random numbers, yet it produces the same overall result. By the same

token, the fact that the PCT model produces the right behavior does not

automatically make it the right model. To select the right model, or at

least the more correct one, we must turn to auxiliary evidence that can

help us choose.

I am getting a feeling that you've been ahead of me for some time.

For these relationships to hold, the organism must be assumed to

have appropriate structures that provide the necessary functions.

For reinforcement and punishment to work, there must a sensory

structure that detects the rate of change in nutrient

concentration, a structure to store the rate immediately prior to a

tumble, and a structure to compare this rate to the rate

immediately after a tumble. It is not difficult to imagine a set

of molecular components that might provide these functions, but

these would involve pure speculation on my part, so I included in

the model only what they do, not how they do them.

Yes, I can now see that such a perceptual function is possible. What I

did not see before, or believe, was that there would be anything for it

to perceive. I thought the distribution of percentages above would be

equal.

------------------------------------------

Now let's talk about the output mechanism.

For discrimination to take place, there would also need to be a

mechanism that could selectively associate the stored state of

nutrient change prior to a tumble with the appropriate structural

representation of tumble probability (perhaps the concentration of

a chemical whose effect on tumble probability is mediated by an

enzyme whose concentration represents the stored value of, say S+,

but again, such mechanisms are speculative; only the functions are

modeled).

When probabilities are reduced to physical mechanisms for generating

them they usually turn out to be something pretty simple. A device for

converting the probable mean value of a signal to a signal representing

that number could consist of a resistor and a capacitor.

In the case of your model, the probability generator is a program

statement that is executed over and over until a tumble occurs;

typically

if (Random < pTumbleGivenSplus) then DoTumble

The adjustment of probabilities is not done until a tumble finally

occurs. So this program step does nothing but create a delay: the lower

the probability of a tumble, the more iterations are likely to occur

before the tumble occurs. Since the modeled E. coli continues to move on

every iteration, the more distance is likely to be covered.

When the tumble occurs and the probabilities are to be adjusted, we have

a (typical) program step

pTumbleGivenSplus := pTumbleGivenSplus - LearnRate;

The desired effect could just as easily be obtained by writing

DelayGivenSplus = DelayGivenSplus + LearnRate

because decreasing the probability is the same as increasing the delay.

The mechanism for creating the delay is unimportant.

Thus using the concept of "changing a probability" is merely a way of

altering the delay before the next tumble. There are many physical

circuits that can create a delay that can be varied by varying a signal

entering the delay generator. The circuit least likely to be found in a

real system is one which literally calculates a random number and

compares it with a fixed number, as in the program step

if (Random < pTumbleGivenSplus) then DoTumble.

In my model for operant conditioning, I convert an error signal to a

frequency of bar-pressing by letting the error signal determine the rate

at which a timer (an integrator) counts upward. When the integrator

output reaches a fixed trigger level, an output event is generated and

the timer is reset. This is a simple circuit that is easy to implement

in neurons or biochemistry: a "variable-frequency relaxation

oscillator."

In models of operant conditioning you and others have proposed, the

variation in output event rate is accomplished just as DoTumble is

calculated: by comparing a random number generator's output with a fixed

value representing a probability. If the probability is increased, the

next event is generated sooner: the event rate is increased. The net

result is the same as increasing the error signal in my model: the

threshold for the event generation is reached sooner.

So the concept of "the probability of a response" is converted simply to

"the frequency of response generation," and the mechanism is converted

from the literal generation of random numbers to the operation of a

simple relaxation oscillator.

If the real system operates by using a simple relaxation oscillator,

characterizing it as varying a "probability" is unnecessary and

inappropriate. Applying the concept of probability relies on an analogy

rather than a description of the actual system. If we have a choice

between a literal probability calculation and a simple relaxation

oscillator as elements of the model, but no direct evidence as to the

actual mechanism, we would choose the simpler model rather than

introducing complexity for its own sake. Or at least, I would.

---------------------------------------

The Challenge was not to build a model that would work in any

gradient, it was to build one that would work in the gradient

supplied. This model does. Whether it works in other environments

is irrelevant.

Actually, the relationships in the "testecol" program will appear in any

gradient. We're talking about logical relationships, so nonlinearities

that don't change the order of the random numbers make no difference.

When you were arguing that your model did work, you did not go through

the model as I have done to see whether it worked as you said it

worked. You went rapidly through some verbal arguments, but the

clincher for you was that the right result occurred: E. coli approached

the target.

Now these are real fighting words, Bill. I suggest you get out

those old posts of mine concerning ECOLI4a and READ THEM CAREFULLY.

Talk about selective memory! Wow!

Fighting words, indeed -- Champ.

As I recall, I expended considerable effort carefully describing

the mechanism of ECOLI4a. We went through at least two

misdescriptions on your part and I posted not only a clear diagram

of the model's logic but an equally clear diagram as to how the

specific nutrient gradient determined the outcome of the

simulation. Remember those "Marken probabilities," you know, where

Rick said the outcome HAD to be 50-50, so that no learning was

possible, his computer program said so, never mind the diagram? I

strongly encourage you to go review those exchanges and see whether

your recollection of the events matches what appears there.

At least you can give me a crumb of credit for continuing to worry over

the problem and finally coming up with what was for me the missing fact.

If you had known what the key problem was -- the actual distribution of

probabilities -- you would no doubt have come up with a rigorous proof

that they were as you claimed. I can claim distraction by terms like

"reinforcement" and "discriminative stimulus," which sound very complex

until you realize they can be reduced to (N2 > N1) and (N1 > 0), and

nonphysical ideas like "probability of a response", which reduces to the

rate of response generation or its reciprocal, delay to the next

response.

Sorry, Bill. (1) My arguments are sufficient. (2) They do not

gloss over any defects, fundamental or otherwise, in logic. (3)

Therefore your speculations as to why I thought they were

sufficient are moot. I thought they were sufficient because they

are sufficient, not because I was being led by the nose by any

forgone conclusions.

Yes, yes, yes. Can we talk about something else pretty soon, I hope? I

have been a living example of goal-directed reasoning, and what's worse

have projected my own fault onto you.

Bill, before you go calling the kettle black, I suggest you take a

good, long look at your own performance in this little debate. You

may find it to be an eye-opener. Even PCT theorists can't escape

from behaving as PCT predicts. (;->

Come on, somebody else make a comment that is really truly wrong. I need

someone to beat up on.

Bruce, I thank you for your steadfastness and keeping your temper during

what must have been an extraordinarily frustrating debate. We will no

doubt have more disagreements in the future, but I won't soon forget the

lessons of this one.

-----------------------------------------------------------------------

Respectfully,

Bill P.