[From Bill Powers (941127.0730 MST)]
Bill Leach (941126.2358 EST) --
Glad you and Dick Robertson, at least, find the discussion interesting.
You bring up an excellent point: that when controlled variables are
involved, it is almost inevitable that people will think that
consequences are selecting behaviors. In fact, it takes a pretty bold
person to look at the obvious causal relationships running from the
behavioral acts to the consequences, and conclude that the consequences
are somehow selecting the acts. But in one sense the interpretation is
reasonable: what varies from circumstance to circumstance are the acts,
and the consequences just keep repeating. The consequence is far less
likely to change than the actions, which makes the consequence look like
an independent variable and the actions look like a dependent variable.
When you start from some random initial condition, you end up with the
same consequence, and the action becomes whatever is needed in order for
that consequence to appear. Inducing even a small change in the
consequence leads to a large change in the behavior. So the consequence
appears to rule. This is a natural misinterpretation of a copntrol
process.
Your other point is also very pertinent: that when the consequences have
to do with intrinsic states, states that bear directly on survival,
behavior will be very similar from organism to organism when such
consequences are disturbed and similar means of correcting errors are
offered. If you withhold water, practically any higher organism will
learn, or try to learn, whatever behavior will actually produce some
drinkable water. If only one kind of behavior will accomplish that, the
organisms are highly likely to select that behavior. But there are very
few organisms that will try very hard in order to get half an hour of
watching a Cheech and Chong video, no matter how long you withhold the
opportunity, or how standardized the means you offer for getting that
opportunity back.
···
----------------------------------------------------------------------
On the general subject of modeling and probabilities, I woke up this
morning with a few thoughts.
Suppose a rat is pressing a bar pretty regularly every 5 seconds (don't
tell me about real rats, we're supposing). If you know this, and turn
around to look at the cage when you happen to think of it, what is the
probability that you will see a press within the next second? It's not
hard to figure out that you have a 20% chance of seeing a press in the
next second. In fact, you can see that the probability density of
observing a press is constant, at 0.2 per second.
So how would you _model_ this bar-pressing rate? One way is to say that
a press occurs every time the expression "random < 0.2" is true
("random" is a function that returns a number between 0 and 1). If you
sample 10 times per second, then the expression is "random < 0.02", and
so on. In general, it's "random < 0.2t," where t is the sampling
interval.
Suppose we propose a model of what is causing the rat's pressing the bar
every 5 seconds. There's a substance that gradually builds up in the rat
until it reaches some trigger level, and when it hits that level the rat
presses the bar and the substance is discharged back to a level of zero,
where it starts building up again. Now we have a model for why the rat
is pressing the bar every 5 seconds: it takes 5 seconds for the
substance to build up from 0 to the trigger level. There's a little
noise in the system, so the intervals range from 4.9 to 5.1 seconds. We
can add a little noise to the model so it does the same.
How does knowing about this model affect the response probability? It
has no effect at all on our computation of the chance of seeing a press
within a time t if we look at the cage at some randomly-selected time.
It is still 0.2t. But now what about the idea of using "Pr{press in time
t} = 0.2t" as a model of the behaving system?
If the timer model is correct, the probabilistic model is no longer
correct. The timer model tells us that if we look at the behavior at any
time up to 4.9 seconds after the previous press, the probability of a
press per unit time is 0. If we look at times ranging upward from 4.9
seconds after the last press, the probability rises to a peak at 5.0
seconds and falls back to zero at 5.1 seconds, with the area under the
curve being 1.0. The probabilistic model above tells us that there is an
equal probability-density of the rat's producing a press at any time.
Both models will predict the same mean behavior rate. But they will
predict very different distributions of the probability of producing a
press. In the model that proposes a systematic process containing some
small amount of noise, the distribution is peaked sharply; in the other
model, the distribution is very broad.
Notice that if we don't know the right criterion for when to sample the
behavior, both models give the same probability of _observing_ a press
in an interval t, with the same broad distribution. Only if we
synchronize our observations to the correct event, the previous press,
will we discover the regular law and the narrow distribution. And only
then will we see the need for a model of a _regular_ process to
represent the internal workings of the rat. Of course there may be many
possible regular models that would represent this simple regular
behavior, but the point is that we know that a statistical model would
not be appropriate.
----------------------------------
Let's apply these thoughts to the E. coli model (just the simple model,
without learning). According to the probabalistic model, a tumble occurs
every time the expression
(dNut > 0 AND random < p) OR (dNut <= 0 AND random < (1.0 - p)
is true, where p is the probability density of a press when dNut > 0.
The simplest systematic model says that a tumble will occur when
dNut < 0
is true -- but this model is actually not the best systematic model.
By measuring the behavior of the actual E. coli, and knowing the
concentration gradient everywhere, we can measure the actual
relationship between tumble delay and dNut. When the actual
concentration gradient is zero, there is some baseline interval between
tumbles regardless of swimming direction. The systematic model we need
says (in one possible form) that
delay := k*(dNut'- dNut) + delta
where dNut' is a constant.
k*dNut'is the mean delay when dNut is zero, k*dNut is the amount by
which the delay changes as dNut departs from dNut', and delta is a
representation of statistical deviations from this relationship. Rick
Marken used a formula like this to match model behavior against the
behavior of people playing the E. coli game.
The probabilistic model has to be changed to take into account that when
the gradient is zero, there is still a finite probability of a tumble.
The mean tumble probabilities per unit time are (I think)
pr{tumble|(dNut > dNut')} * pr{(dNut > dNut')| (random < p)}
and
pr{tumble|(dnut <= dNut')} * pr{(dNut <= dNut') | (random < (1.0 - p))}
Did I do that right?
We now have two models that can be fit to the same data. We will choose
between them on the basis of goodness of fit. I predict that the worst
fit will come from the probabilistic model, because it lacks any way to
express the _degree_ to which dNut differs from dNut', and thus lacks
any way to evaluate k. I'm assuming that for each model we could choose
(or measure) the distribution of the random component for best fit.
Whether or not we actually bother to do this with people playing E.
coli, I think this is the way to settle the question of how best to
model E. coli-type behavior. The verbal window-dressing we add to the
mathematical models is irrelevant.
-----------------------------
Now let's think a minute about operant conditioning, along the same
lines. A probabilistic model will say that a reward increases the
probability of a bar-press per unit time. This casts the bar-press as a
purely stochastic event which has an equal probability of occurring per
unit time (or perhaps some other distribution can be justified). With
each reinforcement, the probability becomes greater. In order to prevent
this probability from simply going to 1 after a sufficient time, it will
be necessary to propose that the probability decays naturally toward
zero. Perhaps intervals without a reinforcement have a negative effect
on the probability.
The systematic (control) model posits a desired rate of reinforcement
and a comparator which compares the sensed rate against the desired
rate. The resulting error signal is converted in a systematic way to a
rate of pressing of the bar. I do this by letting the error signal
determine the size of a quantum that is added to a counter on every
iteration, and when the counter reaches a fixed trigger level a bar-
press occurs and the counter is reset to zero. So the larger the error
signal, the more rapidly the counter will reach the trigger level and
the sooner the bar-press will occur. Bar-pressing rate is thus
proportional to the error signal, by a gain factor. If we want to model
uncertainty, we can add a random variation somewhere in the output
process.
In the control model, the perceived rate of reinforcement is implemented
by a leaky integrator. Each reinforcement contributes an impulse input
to the integrator, of a size dependent on the reinforcement size, a
parameter of the model. The output of the integrator decays toward zero
at an adjustable rate (another parameter). One result of this is that
for certain decay rates and reinforcement sizes, each reinforcement
causes the perceived reinforcement rate to rise above the reference
level, driving the error signal to zero (negative values are set to
zero). There is thus a pause in the incrementing of the timer until the
perceptual signal falls below the reference signal, and counting
resumes. For long time constants of decay in the perceptual function,
this can lead to slowdowns or complete pauses in the behavior rate right
after each reinforcement. From this we get the scalloping or pause
effect that is seen in real behavior. By examining detailed records of
rat behavior, we should be able to deduce the decay rate of the
perceptual function, the effect of one reinforcer on the output of the
leaky integrator, and the gain of the output function in converting
error to rate of behavior. The reference level for reinforcement rate
should also become apparent from fitting the model to the data.
I don't know how Bruce Abbott proposes to fit the probabilistic model to
the data, but at the end we will have two models and two fits to the
data. This gives us one basis for choosing between the models.
We can then change the schedule and see how well each model fits the new
data without any change in parameters. Presumably, they will both work
over some range. The preferable model will be the one that fits the data
the best over the widest range.
When the schedule is changed enough, both models will probably begin
predicting incorrectly. The question then is what modifications have to
be added to achieve a good fit over the extended range. This is done by
first determining how much change there must be in which parameters to
restore the former degree of fit. Then, from examining the way the
parameters have to change, we can propose a mechanism for changing them,
as additions to the models: a statistical mechanism, or a systematic
one.
Finally, we change the type of schedule, starting with fixed-ratio, then
fixed-interval, then variable-ratio, and so forth.
As I see it, the point in all of this will be to find models that fit
asymptotic behavior, not the process of acquisition. Once a model has
been developed that fits all schedules over the whole range, we can
start trying to model the process of acquisition of the behaviors -- the
way the parameters change on the way to the asymptotic condition.
This seems to me a sensible program that is aimed at comparing two basic
approaches to modeling operant conditioning, with (we can hope) outcomes
that make the choices between models clear at each stage. It may be that
the control model will do better at predicting some aspects of behavior
and the statistical model better at predicting others. It doesn't really
matter: whatever happens, we will have more confidence in our answers
than we can have now.
-----------------------------------------------------------------------
Best to all,
Bill P.