# Challenge: info theory & PCT

[Bill Powers (930315.0700)]

Allan Randall (930314.1830) --

This post goes on for nearly a little more than 6 pages, and I
have already cut out large parts of it. I don't seem to have had
enough to do today. Just a warning to those who would like to use
the delete key and make some space.

To my knowledge, calculus actually CANNOT make the kind of
predictions you are asking of information theory. Calculus can
be used to make predictions only in combination with a physical
model such as Newtonian mechanics.

What are physical models but mathematical forms, manipulated
according to mathematical rules, that model or idealize
observations? Given that each element of mass attracts each other
element with a force proportional to (exactly) the product of the
masses and (exactly) the inverse square of the distance between
them, and given expressions for the conservation of potential +
kinetic energy, one can apply the calculus and derive the fact
that orbits are conic sections. That is the kind of prediction I
am asking of information theory: a prediction of how the system
will actually behave through time.

···

----------------------------

If I were trying to model behavior that takes place under
difficult conditions, this analysis might offer more of
interest by way of predicting limits of performance.

You seem to be admitting here that information theory might
have something useful to say about control systems.

Insofar as information theory could predict the limits of
performance given signals and signal-handling devices with
certain characteristics and in a known organization, sure.

So I guess I still don't understand exactly where you stand. Is
information theory completely wrong-headed or is it correct,
but of little use to PCT?

Information theory rests on definitions and mathematical
manipulations. Unless someone has made an undetected mathematical
blunder, the calculations of information theory follow correctly
from the premises. It's unlikely to be "incorrect" in those
terms. The problems I see come not in the internal consistency of
IT, but in its applications to observations. Premises can be
wrong; when they are wrong, no amount of mathematical correctness
will make the conclusions right. I don't yet see how IT is
actually linked in any rigorous way to specific physical
situations.
--------------------------------------
RE: statement of the challenge.

Using the three conditions (prior to the experiment being
performed) make some assumptions about the bandwidths of the
system, and compute in information theory terms the amount of
control required for the three conditions.

The challenge was a response to the assertion that PCT could be
derived from information theory. The prediction I'm asking for is
not how much control is required, but how much control there will
be in the two situations. To use a theory to derive the fact that
control will result from either arrangement means to make
predictions by manipulations that follow the rules of the theory.

I'm going to skip a lot that I wrote here, because prediction is
the real point.
------------------------------------------------------------

... from Ashby's diagrams + information theory, one cannot
predict what exactly R, the regulator, is doing. You cannot
predict that R is going to oppose the disturbance. Whether this
will meet your requirements for the challenge is the main point
I'd like clarified before accepting.

If you stick with these conclusions, the challenge is unnecessary
because you have agreed to my original claim. You are agreeing
that information theory can't provide the predictions of behavior
that control theory provides, but can only be applied once those
predictions are known and verified.
-----------------------------------------------------------

The very concept of a "real physical entropy" distinct from the
"formal entropy" is quite meaningless. There is no reason to
separate the physical entropy from the informational entropy.

You're just reasserting the claim that they are the same. This
claim is based on nothing more (I claim) than a similarity in
mathematical forms. In all the information-theoretic stuff I have
seen, the assumption is that information and energy flow the same
way. I was showing that energy does not go in the direction that
is commonly assumed in the nervous system. But in addition to
that, it's possible to show that messages can be sent with energy
flowing EITHER way (for instance, sending a message to the guy
operating a winch by intermittently putting a frictional drag on
the cable). If information can flow one way with energy going
either way, then the "entropy" involved in information flow is
not physical entropy. Physical entropy is always transferred
opposite to energy flow.

When the information is cancelled, this means, in my
terminology, that E has experienced "information loss," and
thereby has high information content!

Are you saying that the system begins with a certain information
content, and after losing information it has a greater
information content? I don't follow your arithmetic.

The transfer of one bit *requires* an increase in entropy to
accomplish the local decrease involved in the transmission of
information.

This seems to imply that at the terminus of the transmission,
there is a local decrease in entropy. I would like to know where
that occurs. I can see no point in the transmission of a neural
impulse where there is anything but a loss of energy (and an
increase in physical entropy) in the next place that the impulse
reaches. Where is this final place where we suddenly get the
opposite effect (taking energy FROM the arriving signal?).

This usage of "requires" is odd: I think you have the logical
implication backward. Transmission of a bit PREDICTS a change of
entropy, but it does not follow that a change of entropy PREDICTS
transmission of a bit. Entropy is a not a cause, but an effect.
The logical implication is "It is not the case that a bit is
transferred and entropy does not change."

There is no requirement that this be anywhere
near the theoretical lower limit (work must be done to transmit
information, but it need not be done with near-perfect
efficiency).

But aren't you assuming that this work is done by the transmitter
on the receiver? I can think of numerous methods to transmit a
message in a way that requires the receiver to do work on the
transmitter. For example, I can detect the presence of a control
system by pushing and seeing if there is resistance to the push.
I have to do work on the system to receive that message, and the
message can be extracted from the amount of work I do. So by
resisting and not resisting, you can send me 1's and 0's. 1 means
I'm doing some work on you and you are controlling; 0 means I'm
not and you're not.
-------------------------------------------------------------

If the system is controlling, the percept signal carries
few bits. The channel carries more bits only if there is a
disturbance.

The latter is true only for homeostasis. Changing the reference
signal to a new value will also cause more bits to be carried,
for a moment, by the percept signal. Even without a disturbance,
the system will then reduce those bits as closely as possible to
zero. The action is based on the state of the percept, not the
state of the disturbance. And the state of the percept does not
depend on the state of the disturbance alone. This means that it
does not depends on the disturbance in a known way AT ALL.

As Rick Marken has been trying to say, if you know only the sum
of two numbers, you know nothing about either of the numbers by
itself. The percept represents the sum of effects from the
system's own output and from the disturbance. A percept of 3
units could represent the sum of 6 and - 3, 6000 and -5997, -200
and 203, and so forth. There is NO information in the percept
about the disturbance, in a control system.
-----------------------------------------------------------

E is the thing under control, according to Ashby, so it HAS to
be the percept doesn't it? You seem to be placing the thing
controlled in the external environment. Ashby places it clearly
internal to the control system.

It's clearly inside the control system according to PCT. I doubt
that Ashby saw it that way, as he made the "regulator" into a
different unit. I was taking the external view in treating E as
the environmental controlled quantity, with the sensor and
percept being internal details of the regulator. But your
interpretation is just as good.
--------------------------------------------------------------

Perhaps we are using the term "disturbance" differently? I
would define it as the net effect of things in the world on the
CEV.

I have spoken of this ambiguity before. I use the term
"disturbing variable" to mean the independent CAUSE of the
disturbance. If control is tight, a 100-unit change in the
disturbing variable might produce only a 1-unit disturbance of
the controlled variable, where without the control it would
produce a 100-unit disturbance. Just speaking of "the
disturbance" is ambiguous -- do you mean the applied force, or
the movement that results from it?

You can unambiguously specify the disturbing variable without
considering what the system it affects is doing. I can specify
that I am pushing on your arm with a force of 20 pounds, or a
slowly-changing sine-wave force with an amplitude of 20 pounds
and a certain frequency. By itself, this doesn't say anything
about how your arm will move; that depends on what muscle forces
are also acting at the same time and how they are changing. You
can pull back on your end of the rubber bands, but you can't say
how the knot will move as a result.

When multiple disturbing variables exist, you can reduce them to
a single equivalent disturbing variable acting though some
equivalent path on a one-dimensional controlled variable. But you
can't say what changes in the controlled variable will actually
occur without knowing how the output of the system is acting.
That output also affects the controlled variable; it can cancel
most of the effect that the disturbing variable would have had in
the absence of control. You can't arbitrarily specify the actual
disturbance of the controlled variable; that depends on the
properties of the system being disturbed. All you can specify
arbitrarily is the state of the disturbing variable.

When you model a control system, you MUST apply modelled
disturbances via a disturbing variable. If you simply assume a
given change in the controlled variable, you're breaking the
loop: you're saying that the output makes no contribution to the
state of the controlled variable. The amount of change in the
controlled variable is one of the effects to be calculated, not
an independent variable.
-----------------------------------------------------------

Note that I never said that the control system gets "direct
information" about D - that whole notion is meaningless. But it
most definitely DOES get information about D, however you
choose to arbitrarily define it.

This is not true. If I send you a Morse Code message and John
simultaneously sends you a Morse Code message, and you receive
only the OR of these two messages, how much information do you
have about either John's message or mine? None at all. If the
receiver gets only the OR of the messages, it has no way to sort
out which dot or dash should be attributed to me or to John. The
only information it can get is about the resulting combined
message -- which in fact will be semantic gibberish.

A control system's input function receives information only about
the controlled variable. It can't tell how much of the amplitude
of that variable is due to an independent disturbance and how
much is due to its own output function. It experiences only the
sum.

This information is definitely contained in the perceptual
signal, or the organism would be unable to control against the
disturbance.

You're toying with the same paradox that got Ashby. Suppose the
disturbance is transmitting 100 bits per second to the controlled
variable. According to the Law of Requisite Information, the
output must also transmit 100 bits per second to the controlled
variable if perfect control is to be achieved. This is clearly
impossible, because then there would be zero bits per second
coming from the controlled variable into the system's output
function, while the output function is producing 100 bits per
second of information. So what level of control would be
possible? Suppose that the output function transmitted only 50
bits per second, the amount required by Law to "block" 50 bits
per second from the disturbance. That leaves 50 bits per second
unblocked, reaching the controlled variable, which is just
sufficient to cause a perfect output function to produce the 50
bits per second assumed. On this basis you would predict that the
error-driven control system could reduce the information flow
from the disturbing variable to the controlled variable by at
most one half.

At the same time, a compensating regulator could be perfect: 100
bits per second could pass from the disturbancing variable to the
controlled variable and also to the regulator. A perfect
regulator would pass the whole 100 bits per second to its output,
which according to the Law is just sufficient to block the 100
bits from the disturbance entirely. So no bits would reach the
controlled variable.

By this reasoning, the disturbance-based system has a wide margin
of performance over the error-based system even with imperfect
signal transmissions.

I claim that the experiment will show that this is not true. If
it does, something is wrong with the Law of Requisite
Information.

...The control system needs no information about them
[multiple disturbances], singly or collectively.

How can you say this, when the sole purpose of the control
system is to oppose the disturbance? It can't oppose something
it has NO information about. It simply cannot.

When you get this figured out, you can finally claim to
understand PCT.
------------------------------------------------------------
RE: tautology in defining compensating control

No. This is a tautology, as you well know. I wasn't trying to
say that if such a thing were possible, then it would be
possible. I was simply trying to state that it *is* possible.

That's what I said. You asserted that it is possible, but you
made your assertion sound like a deduction because you didn't
fill in all the premises that the deduction would require.

You did present the statements as a (partial) deductive argument
(930310):

If I can actually have complete knowledge of the disturbance D,
it is theoretically possible for me to respond appropriately
before it has had an effect on the controlled variable.

I was pointing out that the single "if" you supplied was
insufficient; you might have complete knowledge of D but be
unable to calculate or produce an output of the exactly-required
amount, or you might respond a little early or a little late, and
so forth. To make your conclusion true (I could respond
appropriately) you must supply all the required premises, among
which are those that define what "appropriately" means.

If you had filled in all the premises, you would have found that
your assertion had to be one of them. So you were simply making a
groundless assertion, and the rest is window-dressing.

Anything can be made into a tautology in the manner that you
just did. If I state A, you reword it into "If A then A," and
you have your tautology. This is a common debating tactic, but
it holds no water.

It's a common debating tactic, but it's by no means possible to
make ANY statement into a tautology:

If it rains tomorrow, my car will get wet. It will rain tomorrow;
I conclude that my car will get wet.

It is not tautological to say that my car will get wet, because
that statement does not depend on assuming that it will get wet.
It depends on assuming a fact: that it will rain tomorrow. So the
conclusion becomes testable; the car might not get wet tomorrow.

When you say that a perfect compensator compensates perfectly,
you are just asserting the same statement twice. The premises on
which a perfect compensator depends contain exactly, and only,
the assumption of perfect compensation. The only way to make the
argument non-tautological is to introduce factual premises: it is
possible for a system to (fill in the requirements on each part).
If those premises should hold true, then perfect compensation is
possible. But perfect compensation may not be possible, because
the assumed premises may not be factual. That's the only way to
say something meaningful about perfect compensation.

A tautology is simply an argument that looks like a deductive
argument but contains no possibility of being false. I agree that
people often present such arguments, trying to make them seem to
lead to a necessary conclusion and to disguise the hidden
assertion of the conclusion. That man must have been guilty of
something; the police arrested him, didn't they?.
-------------------------------------------------------------
Best,

Bill P.

[Martin Taylor 930315 18:30]
(Bill Powers 930315.0700)

Bill,

In answering Allan, you say:

The challenge was a response to the assertion that PCT could be
derived from information theory.

Two points: (1) It's unfair to ask Allan to justify a claim of mine. He
may or may not be able to do so, but it was my claim, and I told you I
was working on a paper on it. I have shown the beginning of the paper,
describing how "probability" should be interpreted, and he has problems
with that. So if he could derive PCT from IT, he would probably do it
differently from how I would do it.

(2) I see no conceivable way your challenge relates to a derivation of
PCT from IT. All a solution to your challenge can do is compare the
informational consequences of two circuits. My intention is to show
that your circuits are the only reasonable way in which a system can
maintain stability. I want to derive the fact of the hierarchy. I may
not succeed in doing so in a non-circular way (as Tom Bourbon pointed
out, many uses of IT have come to/from circular conclusions/premises).
The challenge is fun, and will be addressed. But it's irrelevant.

I said I hoped to get at it over the weekend, but I didn't get the review
drafted completely until Sunday evening, and now Ina has had a go at
editing it (we are joint reviewers of two books, not only one. Now we
have all four drafts completed, and must coalesce them into two.) So
it still isn't done. But the Control System Editor (Chris Love) is almost
ready to send to Rick for beta testing.

···

------------------------------------------------------------
Allan:

... from Ashby's diagrams + information theory, one cannot
predict what exactly R, the regulator, is doing. You cannot
predict that R is going to oppose the disturbance.

Bill:

If you stick with these conclusions, the challenge is unnecessary
because you have agreed to my original claim. You are agreeing
that information theory can't provide the predictions of behavior
that control theory provides, but can only be applied once those
predictions are known and verified.

That's a real non-sequitur. If I can see a coffee cup on the table
I must drink it, or the theory falls apart? Why does R have to do
anything particular just because it has some information available to it?
All Allan is saying is that information theory doesn't tell whether
I will drink the coffee, just that I will be able to (or otherwise)
if I want.

----------------------------

When the information is cancelled, this means, in my
terminology, that E has experienced "information loss," and
thereby has high information content!

Are you saying that the system begins with a certain information
content, and after losing information it has a greater
information content? I don't follow your arithmetic.

Terminology is confusion. I hate the wording "information content" as
if it were a conserved quantity. One can talk about uncertainty,
which is a property of a probability distribution. One can talk about
information, which is a difference between two uncertainties. But
"information content" misleads.

The transfer of one bit *requires* an increase in entropy to
accomplish the local decrease involved in the transmission of
information.

This seems to imply that at the terminus of the transmission,
there is a local decrease in entropy.

There may or may not be, depending on the uncertainties involved,
but one thing is for sure: when a non-invertible event occurs in
the universe, overall entropy increases. It may or may not increase
in an open system.

I can see no point in the transmission of a neural
impulse where there is anything but a loss of energy (and an
increase in physical entropy) in the next place that the impulse
reaches. Where is this final place where we suddenly get the
opposite effect (taking energy FROM the arriving signal?).

The energy comes from the neuron's "food" supply. The energy flow
between the input and its excreta allows the local entropy to increase
or decrease.

---------------------

...the state of the percept does not
depend on the state of the disturbance alone. This means that it
does not depends on the disturbance in a known way AT ALL.

Known where? If the ECS maintains a record of the change in the
reference, it has the same potential sources of information as does
one for which the reference level is always zero.

I sometimes use ALL CAPS to set off a word or comment, but I have
the feeling that this capitalization of "AT ALL" is, like much of
Rick's capitalization, more defensive than useful emphasis. If we get
into a divide-by-zero situation (infinite gain, infinite speed),
AT ALL may be justified. But I detect in a lot of both your comments
a lack of middle ground between "totally perfect" and "non-existent
and useless." Rick has said, without your correction, that if there
is noise in the perception there is NO CONTROL. Or words with similar
effect. Poor control is between the unachievable perfect control of
a noiseless infinite gain system and the useless attempt to "control"
without perception. Imperfect control is not no control. It is the real
world.

----------------

If I send you a Morse Code message and John
simultaneously sends you a Morse Code message, and you receive
only the OR of these two messages, how much information do you
have about either John's message or mine? None at all. If the
receiver gets only the OR of the messages, it has no way to sort
out which dot or dash should be attributed to me or to John.

Not so. We are confronted with a very analogous situation almost
all the time, with everything we hear. We do identify and interpret
the music, the voice of our friend, the footsteps in the hall ... all
at the same time. As a closer analogy, there is substantial research
in the technology of recognizing one voice while a louder one is talking
nearby. For the Morse code, there are relations among the keying "hand"
characteristics, such as relative dot and dash length, that can start
the separation of the two streams. There are redundancies both in the
code itself and in the word stream (assuming it isn't encrypted to remove
the redundancy). These can help sort out the two streams. It may not
be possible to segregate and properly interpret both streams exactly,
but to say that the amount of information you can get about either message
is "None at all" is both theoretically and experientially wrong.

I'm not sure where this argument started from, but it doesn't feel to
me as if it is going in a useful direction. Statements like the following
don't help:

How can you say this, when the sole purpose of the control
system is to oppose the disturbance? It can't oppose something
it has NO information about. It simply cannot.

When you get this figured out, you can finally claim to
understand PCT.

Bill, your comment is exactly saying that PCT applies only to entities
that operate in a universe thermodynamically isolated from the disturbing
variable. It would make for a very uninteresting area of application
for PCT. Let's deal in possible worlds, shall we? I think PCT applies
to a very interesting part of the real world. If what you say is true,
then I am much less interested in PCT, which has suddenly become a branch
of abstract mathematics.

Martin