[From Bill Powers (950623.0730 MDT)]
Martin Taylor (950622.1530) --
Each perceptual function you have ever described is at least as
ponderous as one component of a Fourier Transform. Most are far more
so, as they involve more than the linear weighted sum of a set of input
values.
So, what I am saying is not that we do or that we do not use localized
Fourier transforms in our neural system, but that they or related
transforms are so easy and natural for neurons to do that we have to be
careful about seeing them where they don't exist.
What doesn't strike me as easy and natural is the placement of sampling
points along a shift register at intervals corresponding to the sine and
cosine of oscillations at a specific frequency and gathering all those
signals into a summation device, then doing the same for a sufficient
set of other frequencies, and so forth. This is what the mathematical
treatment demands, but we have the advantage of knowing what the final
arrangement has to be to get a Fourier Transform. If learning or
evolution is to come up with such a result, then _each step_ from simple
random connections to this final highly systematic connection scheme
must somehow yield a decrease in error, or an increase in fitness -- in
a system that has no blueprint guiding it toward any particular
organization. Or perhaps to put that better, there must be some path
consisting of small reversible changes of organization that leads from
the initial organization to the final one, with a reasonably monotonic
positive slope in some measure of benefit to the organism.
I'm reminded of early attempts to produce a self-programming machine-
language program. These attempts failed because there was no such path
from an initial to a final program. At any point, the wrong change,
however small, could create an irreversible disaster. I'm reminded also
of genetic algorithm methods in which the evolving system is rewarded
for a move in the right direction, without that move itself having to
confer any benefit. SOMEBODY has to know what the right direction is.
Any model involving complex mathematical operations has this problem,
especially models in which any slight change of the basic organization
would render it unworkable. The more complex the mathematics, the harder
it becomes (on a steeply-accelerating curve) to understand how the
circuitry required to embody the mathematical operations could develop.
PCT certainly has this problem, too, which is one reason we have no
models for higher perceptual functions. Even in our models of simple
tracking, we have to draw a box standing for the perceptual input
function and say that the inputs consist of visual information and the
output is a signal representing cursor position relative to target
position. How that information is derived from differential retinal
illumination is still a mystery to me. I would happily look up the
information if you could cite an authoritative reference!
ยทยทยท
-----------------------------------------------------------------------
Peter Burke (950622.1530 PDT) --
I have always wondered if it would not be possible to use such
"jitter" in the output of a system to "test" the environment for
the proper sign to put on the output by tracking the effects of
these small changes (jitters) on the movement of the perceptual
signal closer to or further from the reference signal.
Norbert Wiener (1948) illustrated such a system in _Cybernetics_, p.
134. The assumption is that a variable test signal can pass through the
output function and its effects can be detected in the environment, to
allow an auxiliary system to determine some measures of performance
which then could be optimized by adjusting system parameters. Presumably
the effects of the high-frequency oscillator he used would be outside
the bandwidth of control, or else would be removed by a narrow-band
notch filter from the main feedback path. This idea was probably not
worked out in much detail in 1948.
I think we commonly use strategies (at a higher level) for determining
the required sign of the control action. When you come up to a closed
door, you may not immediately be able to see which way it opens. So you
turn the knob, give a tentative push, and if that doesn't work you give
a tentative pull. This tells you which sign of output to use for making
the door be open. This isn't necessarily an infallible method. Long ago,
car manufacturers used reverse threading for the wheel-nuts for the
wheels on the left side of the car, so if you assumed the standard
relationship of "counterclockwise loosens," you could twist off the
studs trying to loosen the nuts on the wrong side of the car. Tentative
actions wouldn't get you anywhere, of course. My Subaru uses the
standard convention of "counterclockwise unlocks" for the door lock on
the driver's side -- but the opposite convention on the passenger side.
Evidently the rule is "top of key toward front of car unlocks." But I do
manage to get the doors unlocked eventually, even the hatchback where
the rule is -- guess which.
So there are really two approaches: systematic and trial-and-error. If
you're smart enough, you can learn a systematic approach and formalize
it from watching the outcomes of the trial-and-error approach. The
higher-level system then chooses the sign for the lower-level control
process, maybe reversing the sign of the connection between error signal
and output function.
There is probably some point in the animal kingdom where the systematic
approach doesn't develop because the basic brain equipment needed for
developing the required higher-level system isn't there. Many dogs lack
the ability to solve the simple problem of how to get the leash unwound
from around a tree. Each time it happens they use trial and error, where
the probability of success drops sharply as a function of the number of
turns of the leash around the tree. After the 50th time, you'd think
they would have worked out a systematic approach, but NO. Yap, Yap, Yap,
until the bleary-eyed master in a bathrobe stumbles out through the snow
to solve the problem.
Well, come to think of it, I guess that _is_ a systematic approach.
Problem-solving algorithms have to be learned, so I don't count them as
reorganization. You still have to account for how they were learned. The
particular algorithm that is learned depends on the particular
properties of the local environment, so evolution can't help. The only
truly all-purpose method is random trial and error.
-----------------------------------------------------------------------
Bruce Abbott (950622.1740 EST) --
Welcome back.
What Rick is trying to do (beneath all the din) is find a situation that
will distinguish between the reinforcement model and the control model.
I don't think he's done it yet -- it's necessary to equalize the
probability that a behavior will be rewarding or punishing, without
making control impossible, too.
I was able to modify the properties of the tumbling in the E. coli model
so that no matter what direction E. coli is traveling, the probability
of a tumble improving the rate of change of nutrient is equal to the
probability of making it less favorable. This drastically reduces the
success rate of your basic Ecoli4 model. I say drastically reduces
rather than eliminates because by chance it is possible that the
probabilities will depart from equality in the direction that produces
some control, so the goal may be reached eventually. In such cases we
find that the four frequencies of outcomes were in fact not equal. On
many trials, of course, the spot disappears from the screen and never
(i.e., within five or ten minutes) comes back. With this setup, a human
operator can still get to the goal every time, but the model can't.
All this shows is that in this special environment, the control model
works where the reinforcement model, as given, doesn't. But I can't show
that NO definition of a reinforcer and discriminative stimulus would
work.
The only final answer will come from comparing the details of real
behavior with the details of what the model predicts. For example, when
you vary the delay between tumbles by using a probability calculation,
the relationship between delay and angle of travel will have a rather
large random component; there should be a calculable proportion of long
delays, for example, while traveling away from the goal. In a model that
uses a systematic output function, this would not happen. When the
probabilities reach 0 and 1, however, this difference would disappear.
Other possibly observable differences would be in the apparent reference
setting, and in the slope connecting error to delay.
I don't think, however, that we really want to spend a lot more time on
E. coli. If it takes this much work to distinguish between two theories,
the results will never be clear-cut. And we're not really comparing the
two models correctly anyway, because the control model is a performance
model, not a learning model. To test the reinforcement theory of
learning, we would have to propose _other learning models_ and test them
against the reinforcement model. We should be comparing the
reinforcement model with a Kalman Filter model, or a random
reorganization model, or some such model. With E. coli that would get
very confusing, because the behavior that is to be learned also uses the
E. coli method.
-------------------------------------------
So far, we've been playing the game using the conventional rules:
constant reference signal, no disturbance. I think this is really the
reason for the difficulty in distinguishing the two models. Even just
introducing disturbances of the controlled variable should show up some
major differences. A change of reference signal would also cause
problems for reinforcement theory: a sudden change in the definition of
the reinforcer, for no externally-explainable reason (is this a hint
about the data Rick presented?).
We could make such a comparison in the rat experiments that will be
coming up pretty soon now, I hope. If we start with a simple fixed ratio
schedule, we could develop a control model and a reinforcement model
that would fit the behavior over a range of schedules.
Once the behavior is well-learned, we would be talking about a
performance model (in the PCT case at least). According to the PCT
model, slowly but unpredictably changing the ratio should lead to a
change in behavior rate in a specific relationship to the varying ratio,
with the obtained reinforcement rate varying considerably less than the
behavior rate does (and in the _opposite_ direction). In fact, from the
model fitted to the data over the range of schedules, we would be able
to predict how much the reinforcement rate and behavior rate would
change as the ratio is varied. For example, we could start with a
baseline ratio of 6, and vary it slowly and randomly in the range from 1
to 11 (five ratios higher than 6, and five lower). We would then compare
the behavior of a simulated control model with fixed parameters with the
behavior of the real rat (we would fit the control model to each rat,
not to the average over rats). Of course we would do the same using the
reinforcement model you develop.
Another way of introducing disturbances would be to add and subtract
reinforcements arbitrarily. This, however, would be difficult to
distinguish from varying the ratio.
With respect to my proposal of yesterday, by keeping track of when the
rat is actually at the bar and when it is elsewhere, we might be able to
test the idea that the reinforcement model applies most naturally to the
process of acquiring the right kind of behavior rather than the right
amount of a single kind of behavior.
------------------------------
As to John Powers, I don't know much about my own family's history. As I
recall, my Powerses passed through Ohio and Illinois before striking out
to homestead in the Far West (my father spent his youngest years on a
homestead in Christmas Lake Valley, Eastern Oregon). It's possible that
there's a connection. If my ancestor didn't finish paying for that deed
from your ancestor, I may owe you some money. Sounds like your ancestor
settled down while mine moved on: typical Powers restlessness.
-----------------------------------------------------------------------
Avery Andrews (950623) --
I haven't been able to follow the model-based control discussion too
closely, but I certainly don't grasp why there is an argument: a
`world-model' *is* a kind of perception, even if rather heavily
processed; surely the only significant questions are (a) what kind of
processing are required to get particular jobs done (which Hans would
surely know a lot more about in his field, than we would) (b) what
kinds are in fact employed by living systems.
The world model is really an imagination connection: the model is
controlled INSTEAD OF the real-time perception. The real-time perception
is used only to update the model. The basic problem is that unless the
world model contains accurate models of all possible disturbances, this
sort of model can't resist disturbances of the controlled variable.
There are other problems, too, not worth raking up again.
------------------------------------
[lizards] have some kind of perceptual or cognitive ability such that
when they fall into a standard pitfall trap once, they learn something
that enables them not to fall into one again.
Any learning model is going to have a parameter determining how much
learning occurs on a given trial. "Accounting" for this behavior is just
a matter of making the parameter large enough. I don't think that this
really answers your question.
One thing we can be sure of: an evolutionary model won't work. The same
lizard that falls into the trap is the lizard that has to do the
learning. This won't work if the lizard falls into the trap and is
eaten.
-----------------------------------------------------------------------
Best,
Bill P.