[Martin Taylor 921218 18:30]
(Tom Bourbon 921218 14:38)
May I repeat and elaborate on an invitation I made about a year
ago? I will send you one of my programs in which two systems (two
people, the two hands of one person, two models, or a person and a
model) interact and produce controlled relationships. ...
I guess I'd better try to describe, as I did a year or so ago, wherein
information theory helps in the understanding of PCT. I didn't succeed
in getting across then, and I'm not sure I'll do any better now. I should
think that the prediction for your proposed system would be no better and
no worse than you would get without it, because you are dealing with a
transparent system of one control level. The understanding you get with
information theory is not at the level of setting the parameters.
If I were to try to develop a model to make predictions in your experiment,
I expect it would look essentially identical to yours, because the key
elements would be the gain and delays in the two interacting loops.
Now consider the interchanges of a week or two ago about planning and
prediction, continued in Bill's post of today to Allan Randall. In those,
the situation is greatly different. The information required from the
lower level for the upper level to maintain control through a hiatus
in sensory acquisition depends greatly on the accuracy of control maintained
at the lower level. Where does that come from, and where does it go?
We come back to the fundamental basis of PCT. Why is it necessary, and is
it sufficient? Let's take two limiting possibilities for how a world might
be. Firstly, consider a predictable world. PCT is not necessary, because
the desired effects can be achieved by executing a prespecified series of
actions. No information need be acquired from the world. From the world's
viewpoint, the organism is to some extent unpredictable, so the organism
supplies information to the world. How much? That depends on the probabilities
of the various plans as "perceived" by the world.
At the other extreme, consider a random world, in which the state at t+delta
is unpredictable from the state at t. PCT is not possible. There is
no set of actions in the world that will change the information at the
sensors.
Now consider a realistic (i.e. chaotic) world. What does that mean? At
time t one looks at the state of the world, and the probabilities of the
various possible states at t+delta are thereby made different from what
they would have been had you not looked at time t. If one makes an action
A at time t, the probability distributions of states at time t+delta are
different from what they would have been if action A had not occurred, and
moreover, that difference is reflected in the probabilities of states of
the sensor systems observing the state of the world. Action A can inform
the sensors. PCT is possible.
In a choatic world, delta matters. If delta is very small, the probability
distribution of states at t+delta is tightly constrained by the state at t.
If delta is very large, the probability distribution of states at t+delta is
unaffected by the state at t (remember, we are dealing with observations and
subjective probabilities, not frequency distributions--none of this works
with frequentist probabilities; not much of anything works with frequentist
probabilities!). Information is lost as time goes by, at a rate that can
be described, depending on the kinds of observations and the aspect of the
world that is observed.
The central theme of PCT is that a perception in an ECS should be maintained
as close as possible to a reference value. In other words, the information
provided by the perception, given knowledge of the reference, should be as
low as possible. But in the chaotic world, simple observation of the CEV
provides a steady stream of information. The Actions must provide the same
information to the world, so that the perception no longer provides any
more information. Naturally that is impossible in detail, and the error
does not stay uniformly zero. It conveys some of the information inherent
in the chaotic nature of the world, though less than it would if the Actions
did not occur. The Action bandwidth determines the rate at which information
can be supplied by the world, the nature of the physical aspect of the world
being affected, and the delta t between Action and sensing the affected CEV
determines the information that will be given to the sensors (the unpredicted
disturbances, in other words), and the bandwidth of the sensory systems
determines how much information can be provided through the perceptual
signal. Any one of these parts of the loop can limit the success of
control, as measured by the information contained in the error signal.
So far, the matter is straightforward and non-controversial, I think.
Think of a set of orbits diverging in a phase space. The information
given by an initial information is represented by a small region of
phase space as compared to the whole space. After a little while, the
set of orbits represented by the initial uncertainty has diverged, so the
uncertainty has increased. Control is to maintain the small size, which
means to supply information to the world.
Things become more interesting when we go up a level in the hierarchy.
Now we have to consider the source of information as being the error
signals of the lower ECSs, given that the higher level has no direct
sensory access to the world, and that all lower ECSs are actually controlling
(both restrictions will be lifted later, especially the latter). Even
though the higher ECSs may well take as sensory input the perceptual
signals of the lower ECSs, nevertheless the information content
(unpredictability) of those perceptual signals is that of the error, since
the higher ECSs have information about their Actions (the references supplied
to the lower ECSs) just as the lower ones have information about theis
Actions in the world. The higher ECSs see a more stable world than do
the lower ones, if the world allows control. (Unexpected events provide
moments of high information content, but they can't happen often, or we
are back in the uncontrollable world.)
What does this mean? Firstly, the higher ECSs do not need one or both of
high speed or high precision. The lower ECSs can take care of things at
high information rates, leaving to the higher ECSs precisely those things
that are not predicted by them--complexities of the world, and specifically
things of a KIND that they do not incorporate in their predictions. In
other words, the information argument does not specify what Bill's eleven
levels are, but it does make it clear why there should BE level of the
hierarchy that have quite different characteristics in their perceptual
input functions.
It is that kind of thing that I refer to as "understanding" PCT, not the
making of predictions for simple linear phenomena. Linear models are
fine when you have found the right ECS connections and have plugged in
model parameters. I am talking about seeing why those models are as they
are. Look, for example, at the attention and alerting discussions, which
come absolutely straight from the Shannon theory. But the results of the
(almost) a priori argument agree with the (despised) results of experiments
in reading that I discussed in our 1983 Psychology of Reading. had I known
about PCT then, I could have made a much stronger case than I did, but only
because of Shannon. The whole notion of Layered Protocols in intelligent
dialogue depends on Shannon, and demonstrates the impossibility of simple
coding schemes (which some people have claimed as the basis of Shannon
information).
I invite you to add to the PCT model any features of information
theory that you believe must be there. If necessary, use features
from information theory to replace those from PCT.
Have I done that to your satisfaction?
If your changes
improve the predictions by the model, there will be no argument and
no complaint: You will have demonstrated that a person who does
not understand Shannon or information theory does not understand
PCT.
An electricity meter reader does not need to understand the principles
of electromagnetism to get an accurate meter reading. This challenge
is misdirected. If there are places where I think the prediction would
be improved, they are likely to be structural, such as in the division
of attention, monitoring behaviour or some such. What should be improved,
in general, is understanding, not meter reading.
I said that if you don't understand Shannon, you won't understand PCT. I
didn't say you won't be able to use PCT to make predictions.
···
================
Having said all that, I might soften a bit, and take the analogy of the
use of the mathematically ideal observer in psychophysics. The ideal
observer is assumed to take whatever information is in principle available
in the signal, and to use it to determine whether or not some specified
class of even has occurred. Trained psychoacoustic listeners often perform
rather like an ideal observer who is presented with a signal some 3 or 4 dB
weaker than the actual one. We say they are within 3 or 4 dB of ideal.
Now we take the observer and add or eliminate possibilities for getting
information about the event. For example, we may let the observer know
what the waveform of the event would be if the event actually occurred,
by presenting it to the other ear. They now approach the performance of
an ideal observer who knows the waveform. Can they do this if the "cue"
is delayed? It depends how much delay, and by looking at the performace
over a variety of delays, we can tell something about what information
the real observers are losing. Is it phase, frequency, or amplitude?
Change the cue and do some more variation, and determine what the ideal
observer might be capable of doing if it lacked this or that kind of
information.
By analogy, it might be possible to make ideal controller predictions,
given different kinds of disturbance or sensory prediction aids. An
example comes to mind (not of an ideal controller). A submarine reacts
very slowly to changes in its control surfaces, but in stable water there
are reasonably simple algorithms to determine where it will be if nothing
changes in the control surfaces over the next few minutes (the chaotic world
doesn't provide much information under these circumstances). So it is
possible to make a display that shows a line indicating where the submarine
will go if the steerer does nothing. If that's where it should go, fine.
Otherwise the steerer moves the controls until the line goes where the
submarine should go. But there are currents and so forth (the world provides
information), so the submarine does not go where the line said it would.
However, the line still predicts where it would NOW go if nothing happens,
so the helm can still control that future position to some extent. How
far should the line go? That depends on the information rate of the world.
If the currents are swirling and unpredictable, probably it should not go
very far, but if they are steady, they provide little information, and the
line can compensate.
Too long. I must go home. I hope that this has been helpful.
Martin