Information theory vs control theory

[From Bill Powers (921219.0130)]

Martin Taylor (921218.1830) --

Your remarks about the predictability of the world and PCT are
mostly cogent. This is particularly true if you include the
predictability of the output effectors as well.

I can quibble, however, about a number of points.

In a choatic world, delta matters. If delta is very small, the
probability distribution of states at t+delta is tightly
constrained by the state at t. If delta is very large, the
probability distribution of states at t+delta is unaffected by
the state at t ...

It matters a heck of a lot more to a plan-then-execute model than
it does to a control model. Remember that control of a variable
depends only on the ability of the system to affect that
variable, directly, in present time. It isn't necessary to
produce an output and wait to see its future effects. If progress
doesn't follow the intended path, correction occurs right away,
after a short delta. So in properly-designed control systems,
delta is always small. Even if the stated goal is far in the
future, the path to the goal can be defined as a reference path,
and control can assure that progress stays on that path. This, in
fact, is the only practical way to control for long-delayed ends.

In circumstances where perceptions are uncertain and there are
long delays in the control loop, control is simply going to be
poor. Predictive control with a large delta is doomed to be
lousy.

The central theme of PCT is that a perception in an ECS should
be maintained as close as possible to a reference value. In
other words, the information provided by the perception, given
knowledge of the reference, should be as low as possible.

I think you'd better take one that back to the drawing board. The
reference in no way predicts the perception by its mere
existence. The best control requires the widest bandwidth in the
system, including its input function, up to the point where noise
begins to become significant. I don't see how this is consistent
with saying that the information provided by the perceptual
signal should be as low as possible.

It may be that given an excellent control system, the state of
the reference signal does predict the perceptual signal well, so
in that case an observer will find knowledge of the actual state
of the perceptual signal redundant. The information provided by
the perceptual signal TO THAT OBSERVER is low. But that would
have absolutely nothing to do with the design of the control
system. That's a calculation by an external observer, not a
property of the system. The only reason for which the perceptual
signal provides low information to the observer is that it
provides a great deal of information to the comparator.
I still think that your analysis is simply descriptive, and has
nothing to do with the design of control systems. It may apply to
a successful design, but it can't provide a successful design.

In other words, the information argument does not specify what
Bill's eleven levels are, but it does make it clear why there
should BE level of the hierarchy that have quite different
characteristics in their perceptual input functions.

It is that kind of thing that I refer to as "understanding"
PCT, not the making of predictions for simple linear phenomena.

Dennis Delprato, here is another addition to your list of myths
about PCT: that we can predict only simple linear phenomena.
Martin, have you looked at the Little Man? It is chock full of
nonlinearities. We have actually tested it using a much more
realistic muscle model -- in fact one that proved to be far more
nonlinear (6th power) than the actual muscle (2nd power) owing to
a misunderstanding of a published model (which tried to include
the limits of limb travel in the same equation!). And the
feedback path from output torques to visualized position of
fingertip, and the method of depth perception for the target, are
highly nonlinear.

We can't solve the nonlinear equations analytically (nobody can),
but that is not a constraint on the simulations. In my psych
review article so long ago, I showed tracking data with a cubic
relationship between handle position and cursor position that
actually reversed slope in the middle. The model handled it just
the way the real subject did, by skipping across the region of
positive feedback. Rick Marken's experiment with size control
modeled one case in which the controlled variable changed as the
square of handle position. No problem. We have tried all sorts of
nonlinear functions. But there's no point in teaching control
theory using nonlinear equations that nobody can solve.

But the results of the (almost) a priori argument agree with
the (despised) results of experiments in reading that I
discussed in our 1983 Psychology of Reading.

What do you mean, they "agree?" Do you mean that you predicted
the reading performance of every single subject with an error of
five percent or so? Or even 20 percent? Come on, what are you
calling "agreement?"

I said that if you don't understand Shannon, you won't
understand PCT. I didn't say you won't be able to use PCT to
make predictions.

Why am I reminded of that poem about Hiawatha?

And in a control
model, the signals in the various paths normally carry far
less information than the theoretical limits allow.

Dubious. I would like to be able to figure out how to test
that assertion.

It's easy. Most perceptions occur on a scale between 0 and
maximum magnitude, and vary at a rate between 0 and some maximum
cutoff frequency. To accomodate the maximum magnitude and
frequency, the perceptual channel must have a certain information
capacity. As perceptual signals can be controlled at any level
within the whole range and can be varied at any rate up to the
maximum, it follows that unless the perception is being
controlled at maximum magnitude and the reference signal is
changing at the maximum rate that still permits control, the
actual information flow must be much less than the channel
capacity. Most perceptions are not controlled at their extremes;
hence most perceptions must use less than the whole channel
capacity.

But
information theory doesn't tell you there will BE an error
signal.

As I pointed out to Tom, it does.

Show me where Shannon's theory says there must be a comparator, a
reference signal, and a perceptual signal.

Natural laws are no use without
boundary conditions to describe particular situations. But if
you understand the abstract principles, you can make better
[bridges/kettles/radios/control systems].

This brings us right back to Tom's challenge. We have, for
example, a simple tracking model that predicts behavior with an
accuracy of better than 5%, measured as RMS error between model
and real handle behavior divided by peak-to-peak real handle
excursion (a signal-to-noise ratio of 20:1, by a standard measure
in electronics). What can we do to this model, using information
theory, that will make it predict any better?

I think that information theory is by its very nature a post-hoc
description, not a model. You can't start with information theory
and come up with a system design. Or so sez I.

Backing up to pick a sentence I passed over:

The reason is that things in the world can change continuously
and (despite Bill's claim) most perceptions are based on very
poor information--the SNR is very low.

I think you're letting theory triumph over observation. I have
just shown that we can predict behavior with an SNR of 20:1. We
do this in a routine way. The SNR of the perceptual channel
certainly can't be _lower_ than that. Do you consider 20:1 very
low?
I come back to my basic statement about the levels I've defined
in the hierarchy: they refer to the world you observe. The world
you observe does not have a predominance of noise in it. I know
that you have said, "Oh, you're talking about _conscious_
perception, which is a different matter." But I see no reason to
suppose that conscious perception consists of anything but neural
perceptual signals, the same signals that are there when we are
using the same control systems unconsciously, as in standing
erect. My theory of perception agrees with a largely noise-free
experienced world; yours appears to predict a world in which
perception barely stands out over the background noise. If your
model were correct, precise control would be impossible. Yet we
manage to control variables of all kinds with great precision,
even variables like the form of an algebraic expression (as in
proving trigonometric identities). We don't control ALL variables
precisely, but the reason is usually not that there is some
inherent imprecision in the process itself, but that we're
attempting to control something we have conceived poorly, or that
nature dictates is not amenable to control (like another person).

One last observation:

The way around this is that category boundaries are not
thresholds, but fold catastrophes.

That's a pretty fancy term for a Schmidt trigger. Anyway, saying
that categories are fold catastrophes says nothing that my
description of hysteresis didn't say. Categorizing categorizing
doesn't tell us how it works. It doesn't work the way it does
because it's a fold catastrophe. It's a fold catastrophe because
of the way it works, which remains undisclosed.

ยทยทยท

--------------------------------------------------------------
Best,

Bill P.

[Martin Taylor 921221 12:00]
(Bill Powers 921219.0130)

Well, given last year's experience, I didn't expect my information-theory
posting to be understood, and I wasn't disappointed in my expectation. Is
it worth trying some more? I'll give it a little shot, and then give up if
it still doesn't work (rather like CSG papers trying to get into conventional
psychology journals, isn't it!).

In a choatic world, delta matters. If delta is very small, the
probability distribution of states at t+delta is tightly
constrained by the state at t. If delta is very large, the
probability distribution of states at t+delta is unaffected by
the state at t ...

It matters a heck of a lot more to a plan-then-execute model than
it does to a control model. Remember that control of a variable
depends only on the ability of the system to affect that
variable, directly, in present time. It isn't necessary to
produce an output and wait to see its future effects.

The statement is completely independent of what is acting on or looking at
the world. It has to do only with the rate at which the world supplies
information that can be looked at. Of course it matters "a heck of a
lot more to a plan-then-execute model than it does to a control model."
Didn't I demonstrate that adequately in my posting?

But the delta between the output and observable effects on sensory inputs
is important to the amount of information contained in the error signal
and thus valuable for higher levels of control.

The central theme of PCT is that a perception in an ECS should
be maintained as close as possible to a reference value. In
other words, the information provided by the perception, given
knowledge of the reference, should be as low as possible.

I think you'd better take one that back to the drawing board. The
reference in no way predicts the perception by its mere
existence. The best control requires the widest bandwidth in the
system, including its input function, up to the point where noise
begins to become significant. I don't see how this is consistent
with saying that the information provided by the perceptual
signal should be as low as possible.

The word "should" seems to be ambiguous. It refers in my posting to the
results of having a good, properly functioning ECS. In your comment, you
take it to refer to how a functioning ECS is to be designed, and that the
perceptual bandwidth should be low. If the perceptual bandwidth is low,
then the ECS will have difficulty matching the perceptual signal to the
reference signal, and thus the error signal will have high information
content. Now it is true that if the perceptual signal has lower bandwidth
than the reference signal and the same resolution, then the error signal
will in part be predictable, thus having lower information content than
would appear on the surface. But I had the presumption that we are always
dealing with an organism with high bandwidth perceptual pathways, so I forgot
to insert that caveat.

It is that kind of thing that I refer to as "understanding"
PCT, not the making of predictions for simple linear phenomena.

Dennis Delprato, here is another addition to your list of myths
about PCT: that we can predict only simple linear phenomena.
Martin, have you looked at the Little Man? It is chock full of
nonlinearities.

I think you know from all my postings, including the one you are
commenting on, that I don't subscribe to that myth. Sloppy wording.
Sorry. But Tom specifically asked me to improve upon the numerical
predictions made by a linear model, which is why I made the posting
in the first place.

And in a control
model, the signals in the various paths normally carry far
less information than the theoretical limits allow.

Dubious. I would like to be able to figure out how to test
that assertion.

It's easy. Most perceptions occur on a scale between 0 and
maximum magnitude, and vary at a rate between 0 and some maximum
cutoff frequency. To accomodate the maximum magnitude and
frequency, the perceptual channel must have a certain information
capacity. As perceptual signals can be controlled at any level
within the whole range and can be varied at any rate up to the
maximum, it follows that unless the perception is being
controlled at maximum magnitude and the reference signal is
changing at the maximum rate that still permits control, the
actual information flow must be much less than the channel
capacity. Most perceptions are not controlled at their extremes;
hence most perceptions must use less than the whole channel
capacity.

The last sentence is a non-sequitur. What follows the semicolon has
no relation to what precedes it.

My theory of perception agrees with a largely noise-free
experienced world; yours appears to predict a world in which
perception barely stands out over the background noise. If your
model were correct, precise control would be impossible.

You place great store on the conscious impression of precise perception.
This impressions really has nothing to say about whether evolution has
worked well or not. Conscious impressions can be, and probably are built
from many noisy samples which are used as rapidly as possible in the actual
perceptual processes that are involved in control. Furthermore, if most
of the control is done in the central part of the range, most of the
channel capacity would be expected, in an efficient syste, to be devoted
to accurate perception within that region.

Show me where Shannon's theory says there must be a comparator, a
reference signal, and a perceptual signal.

Show me where Euclid's axioms say that the sum of the squares on the
two sides of a right-angled triangle equals the square on the hypotenuse.

I am not sure that the discrete individualized ECS is predicted by
Shannon. What I did point out is that Shannon's theorems demonstrate
that S-R and plan-then-execute will not work in a chaotic world, whereas
perceptual control will work. The thesis is that if a structure is to
be stable in the world, perceptual control is nessary, though it may not
always be sufficient. I cannot prove the necessity, because it may depend
on hidden assumptions (like the relatively high sensory bandwidth) that
I have not seen. But the only other way I can see to make a stable structure
is to have one in which the binding energies are high compared to the
thermal regime in which the structure finds itself.

I think that information theory is by its very nature a post-hoc
description, not a model. You can't start with information theory
and come up with a system design. Or so sez I.

There's usually an interplay between abstract principles and practical
prototyping. If you understand the Carnot cycle, you know that superheated
steam engines can be more efficient than ones operating at lower temperatures.
James Watt didn't know that, but he came up with a principle for making
steam engines. Could Sadi Carnot have built Watt's steam engine from
first principles? I doubt it, and quite probably he wouldn't have
invented his cycle either, if Watt's engine hadn't been there. Yes,
the Carnot cycle is a description, not a model. But it's useful.

Maybe this is an appropriate place to enter a reminder that we have a
difference of opinion about there being a qualitative distinction between
a description and a model. I deny it, whereas you think it important.
I think Occam's razor is important, and that the difference between
what you call a "description" and a "model" is that your "model" is a
more precise description over a wider range than is your "description."
Occam's razor thereby gives more credence to your "model" than to your
"description." It's simply a question of what is nowadays called
Kolmogorov complexity.

One last observation:

The way around this is that category boundaries are not
thresholds, but fold catastrophes.

That's a pretty fancy term for a Schmidt trigger. Anyway, saying
that categories are fold catastrophes says nothing that my
description of hysteresis didn't say. Categorizing categorizing
doesn't tell us how it works. It doesn't work the way it does
because it's a fold catastrophe. It's a fold catastrophe because
of the way it works, which remains undisclosed.

Actually, I thought it did point out a necessary condition for it to
work--positive feedback. You can't have category perception without
some form of positive feedback, whether it occurs by cross-linking and
mutual inhibition among perceptual functions at a given level, through
some kind of modelling/imagination loop, or through temporal recurrence.

A Schmidt trigger provides a very specific kind of fold catastrophe, which
loses all information other than the category. There's no need to lose
that information, and as the fold approaches the cusp of the three-dimensional
version (stress being the third variable), the "adjectival" information begins
to dominate. What your description didn't say was that the categorical
aspect is of variable importance, and that the degree of overrun is
affected by the amount of stress. The cusp catastrophe, of which the
fold is a cross-section, does say that.

And yes, it is "only" a description, with a mechanism.

Martin