A challenge to information theorists

[From Bill Powers (930306.0700)]

It has been said that information theory contains the real meat
of control theory. If this is true, then as Rick Marken has said
information theory ought to tell us how to improve our models of
control systems. It would also follow that information theory
should lead to correct predictions about control processes, or at
least not contradict what is observed in simple experiments. I
believe that I have an example of a control situation in which
information theory will have difficulty doing that.

The situation is simply an implementation of the general diagrams
W. Ross Ashby used to describe disturbance-driven and error-
driven regulation. These two situations, and combinations of
them, are easy to set up on a computer with a human subject to
provide quantitative data. From quantitative data, the
information theorist should be able to calculate the amount of
information (or variety) represented by all the variables, and
from the principles of information theory (or the Law of
Requisite Variety) show that the observed degree of regulation
follows from the theory.

I am issuing a formal challenge to information theorists. I will
program this experimental situation and offer the program free to
any information theorist who wants to use it to run the
experiment, or will run experiments on a human subject (myself)
and provide the raw data on disk or on the net in ASCII-numerical
form, for analysis. I don't want to waste my time preparing this
experiment if there are no takers, so before I do it I want to
know who is accepting the challenge, if anyone. I believe that
people on the net have seen enough of my programming output to
know that I can produce a program that will do what I claim. But
I will allow challengers to write their own programs and run
their own experiments, as long as they can convince me that the
program satisfies the conditions to be described here.

The experiment:

The basic experiment involves a disturbance that affects an
essential variable through a transmission channel of specified
properties, and a regulator that acts through the same
transmission channel on the same essential variable.

The experiment involves three conditions:

1. The regulating person R has continuously available direct
information about the state of the essential variable AND the
state of the disturbing variable, and acts to keep the variations
in the essential variable as small as possible.

2. The information directly available from the state of the
essential variable is denied to the regulating person.

3. The information directly available from the state of the
disturbing variable is denied to the regulating person.

Condition 2 corresponds to Ashby's diagram of a disturbance-
driven regulator, which I call a compensator:

          D ------> T -------> E
          > >
          > R
          > /
           ---->-

Condition 3 corresponds to Ashby's diagram of an error-driven
regulator, which I call a control system:

          D ------> T -------> E
                    > >
                    R |
                     \ |
                       --<-----

Condition 1 combines these situations:

          D ------> T -------> E
          > > >
          > R |
          > / \ |
           --->-- --<-----

As I understand the information-theoretic approach, information
passes from D to E via the transmission channel T. To the extent
that E is regulated, the actions of R via T are such that some
information is kept from being transmitted to E, thus reducing
the information content of the variations in E. This is
equivalent to regulating E.

In condition 2, the Regulator receives information directly from
D alone, and so in principle could produce outputs affecting T
that completely block the flow of information, thus permitting
the information in E to be reduced to zero and achieving perfect
regulation.

In condition 3, the regulator receives information directly from
E alone. The better the regulation, the less information is
available to R from E, because the action of R via T is
diminishing the information flow from D to E. As a result,
perfect regulation is not possible because perfect regulation
would reduce the information content in variations of E to zero,
preventing any information from passing from E to R.

In condition 1, the regulator receives information from both D
and E. In principle, perfect regulation should be possible
because of the information received from D. The information
received from E is redundant.

Experimental details

The disturbance D is the output of a pseudo-random-number
generator passed through a three-stage low-pass filter, each
stage being a simple time constant of 0.3 second. The
transmission channel T is a simple noise-free adder which adds D
to the output of the regulator R, passing the result to E. The
essential variable E is a visual display, a moveable object on
the screen, the vertical position of which relative to stationary
reference marks is proportional to the output of T. A second
moveable object of the same kind, located adjacent to the path of
E, also moves vertically in a way proportional to the magnitude
of D, so that D is represented visually in the same way that E is
represented. D is scaled so its position accurately represents
its effects on E, with the zero point corresponding to the
position of the reference marks and to zero effect on E.

The time-course of both variables -- D and E -- is sampled 50 to
80 times per second (depending on the display characteristics)
and stored in arrays sufficiently long to save the data from a 2-
minute experimental run (3000 to 4800 data points per table). In
addition, the output of the regulating person is saved in a third
table, containing a record of the mouse positions during the run
and thus the person's contribution to the state of E. The tables
are saved to disk after a run, in ASCII format with triples of
decimal numbers separated by spaces and terminated by a carriage-
return-line-feed (\x0d\x0a).

The task of the participant is to use a mouse to maintain the
object E exactly even with the reference marks. The subject is
allowed to practice on condition 1 as long as necessary to reach
and maintain a minimum in the RMS variations of E averaged over 1
minute.

Then condition 2 is established by turning off the display of the
state of D, and the participant is allowed to continue practicing
until a minimum in the RMS variations of E is achieved.

Finally, condition 3 is established by turning the display of E
back on, and turning the display of D off. Once again, practice
is allowed until a minimum in the RMS variations of E is
achieved.

The runs for each condition are saved in separate files: cond.1,
cond.2, and cond.3.

The predictions.

Because absolute information content is hard to specify, the
challenge issued here is concerned only with relative measures.
The problem is to form a theoretical ranking of the goodness of
regulation for cases 2 and 3 above, on the basis of either
information theory or PCT.

The PCT analysis predicts that condition 3 will provide the best
regulation, condition 1 either the same degree of regulation or
slightly worse, and condition 2 a degree of regulation that is
worst of all by a large margin. In other words, between
conditions 2 and 3, PCT predicts that regulation will be
unequivocally BEST when the participant gets the LEAST
information about the actual state of the disturbance D --
condition 3.

I believe that information theory will make the opposite
prediction: that condition 2 will provide better regulation than
condition 3. But I will leave it up to information theorists to
derive and explain the authoritative prediction. I expect them to
make their prediction, as I do, before the experiment is run.

The experimental data should then settle the question of the
relative power of PCT and information theory.

Best to all,

Bill P.

[Martin Taylor 930307 00:50]
(Bill Powers 930306.0700)

Bill issues a challenge to~r compare PCT versus information theory. I don't
understand this. If by "information theorist" Bill means me and Allan
(since I have had much to say on this topic I suppose he does), then
there is hardly likely to be a situation in which PCT and information
theory could make opposite predictions. Information theory as I understand
it leads directly to PCT as Bill understands it, or so I believe and have
tried to explain. Since before Xmas, I have been trying to get time to
write this in a serious paper, but I haven't got as far as I would have
hoped with it.

Somebody's Law: Things take longer.

(Sorry if this is full of spurious characters. I see them in my echo, and
I try to edit them, but I may not catch them all)

The difference I see between PCT with information theory and PCT without
is that with IT it should be possible to make models that need not have
as much arbitrary fitting of parameters in order to account for
real data. If you know that the acquisition rate of information from
say a cursor position for one condition is 60 bps, and for another condition
is 20 bps (perhaps because of viewing distance or brightness contrast
or something) then you should be able to derive the predictions for the
second condition from those for the first. Without IT, a PCT modeller
would just have to find a new best-fitting parameter set--perhaps a
different slowing factor.

There's no conflict in my mind between the information-theoretic approach
and "straight" PCT. I've said this over and over. The models are the same.
The ways of finding the parameters are different. At least I have no
reason so far to think there is a difference in othr respects.

If you want to compare Ashby's diagrams as skeletons for models to contrast
with PCT models, that's a completely separate issue, an issue that does
not concern me.

In respect of the specific experiment Bill proposes, the information
theorist has to worry about the information transmission between R and T
as well as T and E. If the task is to keep E small, information about
the future of D might be useful, but E is where the action is. Given a
rigid robot so that there is no loss of information between R and T and
between T and E, information about D just might permit keeping E within
reasonable tolerance, but the time lags woud have to be very small
compared to the bandwidth of the disturbance for condition 2 to work
at all well even under such unrealistic conditions.

Martin

[Martin Taylor 930307 12:40]
(Bill Powers 930306.0700)

I suspect my response early this morning might have been a little incoherent.
To it, I would like to add a quote that more or less agrees with what I
want to say:

(Bill Powers 911119.1000)

I suppose that
in principle one could construct an entire control hierarchy out of
probabilistic calculations. Such a model might actually come closer to
the basic mode of operation of a nervous system. But a model that ignores
statistical noise and treats signals as smoothly varying frequencies is
far simpler to express, and its behavior is much easier to calculate. Is
the added rigor of a probabilistic treatment needed, considering the
level at which we can measure and characterize behavior?

The heart of my feeling about using information theory is in that word
"needed" in the second-last line. It is not needed if all you want to do
is to describe the results of specific experiments by fitting parameters
to correctly constructed control circuit diagrams. It is needed if you
want an explanation of why those diagrams are correct and how the parameters
may change under different circumstances of task, environment, or perceptual
(or I guess motor) skill.

PCT is often touted as explanation in contrast to description. Looking
from the other side, it can seem to lack explanatory power. See my Occam's
razor discussion, posted a few weeks ago. There really isn't any difference
between explanation and description except in succinctness of expression
and range of validity. Incorporate information theory within PCT and you
improve both (since nothing need be added to IT already described when it
is used within a PCT framework, and it should reduce the specific fitting
of model parameters to experimental data).

The child asks "why is the sky blue." The adult explains that the different
wavelengths of light are differentially scattered and obsorbed{by the
particles in the air, which is an explanation. The child asks "why does
that happen." The adult answers with lots of quantum-mechanical equations,
which provide an explanation. The child asks "why do those equations work."
The adult has no answer. They are just a description of the way the world
seems to be. So it is with all explanation.

Let me try to get the PCT-IT paper written (God knows when), and we can
reopen this discussion with a hope of doing what you want--providing
numerical predictions for experiments. If I don't get it done by July,
we can at least talk about it around a blackboard in Durango.

Martin

Bill Powers (930306.0700) writes:

...PCT predicts that regulation will be
unequivocally BEST when the participant gets the LEAST
information about the actual state of the disturbance D --
condition 3.

I believe that information theory will make the opposite
prediction: that condition 2 will provide better regulation than
condition 3. But I will leave it up to information theorists to
derive and explain the authoritative prediction. I expect them to
make their prediction, as I do, before the experiment is run.

The experimental data should then settle the question of the
relative power of PCT and information theory.

This whole challenge is based on the idea that information theory
makes opposite predictions about human control situations than PCT
does. In order for this to be interesting, I think you need to do
one of two things:

1) Explain in a little more detail why you think information theory
predicts that the compensatory system will perform better. If you do
not have the time to do a formal proof, even a rough outline or
sketch might give us some idea what you're getting at. Personally,
I wouldn't expect information theory, all by itself and without PCT,
to make ANY prediction about which will do better (unless it is
possible, as I think Martin is claiming, that PCT can be derived from
information theory, but in that case there's hardly any conflict).

2) Find an information theorist that actually does make this claim.
I am NOT claiming that there are no such information theorists, but
I am not one. Martin is not one. Ashby is not one. For a challenge
to have any clout, you need to have someone to challenge. However, if
you can provide #1 above, then that's fine too, since you would be
able to show that information theoretic PCTers like Martin and I
are being inconsistent.

···

-----------------------------------------
Allan Randall, randall@dciem.dciem.dnd.ca
NTT Systems, Inc.
Toronto, ON