[Allan Randall (930314.1830)]
Bill Powers (930310.1845 MST) writes:
To make the parallel work better, however, you should think of
something other than the calculus, which in fact CAN be applied
to specific problems in mechanics, so Ike Newton would never have
rejected it.
To my knowledge, calculus actually CANNOT make the kind of
predictions you are asking of information theory. Calculus can
be used to make predictions only in combination with a physical
model such as Newtonian mechanics.
I'm sure that AFTER I have designed a control system to behave in
a certain way, an information theorist could estimate the
information flows in the system, the entropies, and all that lot.
If I were trying to model behavior that takes place under
difficult conditions, this analysis might offer more of interest
by way of predicting limits of performance.
You seem to be admitting here that information theory might have
something useful to say about control systems. It just isn't
something that interests you. However, in the past, I believe you
have indicated otherwise, going so far as to say that information
theory is invalid (I think you basically called it a bad analogy).
So I guess I still don't understand exactly where you stand. Is
information theory completely wrong-headed or is it correct,
but of little use to PCT?
I am now going to lay out a proposal for the challenge. This may not
meet what you had in mind. Please let me know what fits and what
doesn't. Forgive me for being so neurotically picky, but I do not
wish to accept a challenge if it is unclear to me exactly what
is being asked for and what the position is of the person doing
the challenging.
THE CHALLENGE:
Using the three conditions (prior to the experiment being
performed) make some assumptions about the bandwidths of the system,
and compute in information theory terms the amount of control
required for the three conditions. This will require PCT *and*
information theory together. No prediction will arise from
information theory alone. This is for the very reasons you have
given: from Ashby's diagrams + information theory, one cannot
predict what exactly R, the regulator, is doing. You cannot predict
that R is going to oppose the disturbance. Whether this will meet
your requirements for the challenge is the main point I'd like
clarified before accepting.
The result might be something like: "Condition x requires
such-and-so entropy flow, while condition y requires so-and-such
entropies in order for control to occur. Condition x, as you can
see, requires an output bandwidth much higher than that allowed for
in our assumptions about the real-world experimental setup. Thus,
we predict that condition x will not work, while condition y might."
My point was that when you characterize signals in terms of
information flow rather then in terms of amplitude and phase, it
no longer is possible to predict the result of the above
convergence. If ordinary statistical measures like variance are
used, we would NOT in general expect the variance of E to be less
than that of D or of R.
That is correct. It could be either. Again, I assume you must mean
information when you say "variance." But why do you talk
about characterizing "signals in terms of information flow rather
than in terms of amplitude and phase"? Using information content as
a measure does not justify throwing out such vital and related
concepts as amplitude and phase. I don't think even Martin would
be that radical!
>...as long as the process gives off heat,
>the resulting information channel may or may not have higher
>entropy than the sum of the original two.So you appear to agree that we can't predict whether regulation
will actually occur in the above arrangement on the basis of
information theory alone.
Yes! We agree! This is exactly the point I've been trying to make.
Where I predict a problem for an information theorist in trying
to meet my challenge is in explaining, on IT grounds alone, why
the amplitude of E becomes less than the amplitude of D.
It is the "IT grounds alone" that bothers me. I can't imagine
why you would insist on this condition. You do not require
planetary orbits to be predictable on "calculus grounds alone."
The phenomenon of control depends on the oppositeness of the
signals, not on their respective information contents.
The "oppositeness" can be seen in the blockage of the channel.
This is an information loss - a decrease in entropy that requires
work and heat production to accomplish. This is a classic kind of
situation for an information theoretic description.
The two
signals could have the same information content by any common
measure, yet their amplitudes would not, just because of that,
have to cancel each other quantitatively.
Right. For instance, my apartment sometimes gets into a goshawful
mess. Entropy just takes over - clothes and half-eaten pizzas
knee-deep. To clean this mess up requires work to decrease the
entropy. For that, I need information about the state of the mess.
I can collect this information, but that fact alone does not mean
that I will "cancel out" the mess. In fact, I could amplify it
(mess it up even more). Or I might simply not use the information
to do work at all. I have the information required to clean the
apartment. The bandwidth is there in the output channel to allow
the work to be done. That does not by itself mean I will actually
do the work, thereby decreasing the entropy of the apartment and
increasing the entropy of the universe. Usually, I just clean it in
imagination. Sometimes I go one better, and simply change my
basis for the entropy calculation, and suddenly - presto! - my
apartment is already sparkling clean without lifting a finger!
For example, one half eaten pizza two-thirds of the way from my
desk to the sofa isn't "messy" - that's exactly where the pizza goes!
(Of course, to change the basis, I still must do work to accomplish
the necessary reorganisation of my PCT hierarchy).
I sort of object to using a physical term like entropy in dealing
with information in living systems. If you think that
informational entropy is connected to physical entropy, you can't
separate the formal entropy in a neural signal from the real
physical entropy involved.
The very concept of a "real physical entropy" distinct from the
"formal entropy" is quite meaningless. There is no reason to separate
the physical entropy from the informational entropy. In
order to process information, I need to do work and produce heat.
Information theory is not an analogy to thermodynamics - it is the
same theory. There is no distinction. Information = Physical
Entropy. Both are subjective properties of a system requiring a
basis (model, encoding scheme, subjective probability distribution)
with which to describe them. Macroscopic thermodynamics
is a form of information theory using temperature for this basis.
But temperature is not an absolute property of a system! There is no
inherent reason to consider temperature as an absolute basis for
physical entropy. Believe it or not, this is widely recognized in
standard textbook thermodynamics, although many physicists no doubt
do not fully appreciate it (it is recognized in the Zeroth Law of
Thermodynamics).
Quantum Mechanics digression (ignore if you don't care about physics):
Interestingly enough, in quantum theory you need this arbitrary
assumption just to separate our universe out from other alternate
universes in the wave-form! In other words, there is no absolute
standard by which our universe can be said to exist as a separate
entity from all other possible universes. If you deny the objective
existence of the other universes, and you wish to claim such an
existence for our own universe, then you must apply an arbitrary
subjective standard as an objective property of the universe and you
end up in a hopeless paradox. In PCT terms, the subjective
standard is the hierarchy. Thus, the PCT hierarchy is what actually
defines the universe as a cohesive entity! In a very real sense, our
universe is defined by our perception of it. For this reason,
PCT may have a great contribution to make to quantum theory.
I would guess that the entropy increase involved in simply
transmitting a neural spike from one neuron to the next would be
hundreds of times greater than the hypothetical decrease involved
in the actual synaptic event - the transfer of a bit of
information.
Yes, so what? The transfer of one bit *requires* an increase in
entropy to accomplish the local decrease involved in the transmission
of information. There is no requirement that this be anywhere
near the theoretical lower limit (work must be done to transmit
information, but it need not be done with near-perfect efficiency).
...but if they [Shannon and others]
intended [negentropy] to mean some mysterious connection with
dQ/T, they had it backward...
The actual direction of information flow in the
nervous system is opposite to the direction of energy flow. So
there can't be any connection between entropy as physicists use
the term and entropy as it appears in information theory.
Just to be clear, dQ is the heat flow, not the net energy flow.
Heat flow is arbitrarily defined (in terms of temperature for
macroscopic thermo). In terms of the nervous system, yes, of
course entropy is increased by the transmission of information, but
it is decreased locally (where it counts, in biological terms).
This local decrease in entropy must result in a total increase
in entropy in the organism, so the system must input low entropy
energy as food. The connection with dQ/T is not "mysterious,"
but quite well defined. To really understand this subject, you
need to put macroscopic thermo. aside and learn the full-blown
statistical version. Otherwise, trying to match information
theory to these macroscopic, temperature-based measures
is just going to cause endless confusions.
Some of the confusion here may be simply the terminology.
Negentropy is information content. Something not completely random
is considered to have information content. Note that *high*
information content in this sense means *fewer* bits to describe
it, not more. This is one of the most frustrating confusions in
trying to first learn information theory or thermo. It sounds so
strange: how can high information content mean fewer bits? The
result is a terminological confusion, where some people continually
talk about "lots of information" as meaning more bits and thus
higher entropy, while others mean fewer bits and lower entropy.
My preference, and I think a fairly standard one, is to talk
about "information" strictly as the entropy, or number of bits,
and "information content" as negentropy, which is nonetheless
still measured in bits. Information content is high when the
"amount of information," or number of bits, is low! Its like a
golf score - lower means more "golf-score content." It sounds
contradictory, but its not. It is, however, unbelievably confusing
when you are first learning information theory, since the experts
tend to forget the distinction (taking it for granted) and they
don't spell it out. People often lapse and fail to even make
the distinction (this probably includes myself!). The physics
and information science communities would do well to work for
better standardisation of the terminology. Here's an example
of something in your post that I think is nothing more than
confusion over these terms:
>The signal to E has very low entropy if the system is
>controlling.Now why would you say that? When the system is not controlling, E
is varying exactly as D is, and if D is a signal within a certain
bandwidth, then E is also a signal with that bandwidth. All the
information in D is being transmitted to E.
When the system is controlling, the information formerly reaching
E is now very nearly cancelled; E now contains very little
information. Would not this loss of negative entropy amount to a
great INCREASE in entropy?
No. In fact, this is not a "loss of negative entropy," but a gain!
When the information is cancelled, this means, in my terminology, that
E has experienced "information loss," and thereby has high information
content! After all, work is expended and total entropy increased in
order to produce this blocking of the channel. We have to distinguish
between information loss and decrease in information content.
So yes, if the system is controlling, the disturbance will be
transmitted to the percept (E), as you say. So since the percept
is controlled, D is minimised, and has low entropy, as does the
percept E.
The reason this sounds so contradictory is that high and low entropy
can both seem intuitively like "lots of information," or "very
little information," depending how you look at it. A complete
random string of digits and a string of all zeros *both* seem
intuitively "lacking in information." The common, everyday use
of the word information is somewhat contradictory. Living systems
seem to exist somewhere between these two extremes, and there
are numerous attempts to define exactly where. This is an
exciting and open area of research in information theory - one in which
I think PCT has much to contribute.
>Now surely you will agree that if it were impossible to encode
>the disturbance D into a number of bits that could be handled
>by the output channel, then the system could not control.If that number of bits could not be handled by the output channel
(by which I presume you mean the path from T to E),...
No, I mean the path from R to T. I really prefer to talk in terms
of PCT, as I find it to be a more elegant formulation. T is
roughly the CEV and E is the perceptual signal. R is the
comparator, of course, and Ashby's C is the reference.
...then the only
the bits that survived passage through the channel would be able
to disturb E.
Yes, and since these are not enough bits to describe the disturbance
to the CEV, and hence E, the system will fail to control. Note that
these surviving bits are not the disturbance - they are correcting
for the disturbance by negating the representation of the disturbance
contained in the percept E.
All that the regulator R would have to do would be
to provide bits to T that subtract, bit for bit, from those that
survive the passage through T, and regulation would be perfect,
Again, you are confusing R->T and T->E. T->E is the input channel,
while R->T is the output. R cannot provide these opposing
bits because the output channel R->T cannot handle the information
load.
>...you need the perceptual functions to compress
>the inputs into a single scalar value (for comparison) that
>STILL RETAINS THE ESSENTIAL CHARACTERISTICS OF THE DISTURBANCE.No! It is not the disturbance that has to be represented, but the
state of the controlled variable itself.
If the system is controlling, the percept signal carries
few bits. The channel carries more bits only if there is a
disturbance. The only way in which the percept is used (ignoring
higher levels) is to compute an error which does directly
represent the disturbance. So I agree that the percept directly
represents the CEV, not the disturbance. But it is the disturbance
to the CEV that is relevant to control. So the percept does indeed
contain information of "the essential characteristics of the
disturbance" - hence the ability to oppose the disturbance in the
output. The "essential characteristics" part is important. D is
transmitted to E, the percept, only through a transducer that
represents the disturbance with a particular encoding scheme
(there's that information theory again).
The output only has to
act on that variable with sufficient speed to keep the perceptual
representation of that variable matching the reference signal.
The disturbance itself (D above) could contain megahertz
variations; most of those would disappear because E can't respond
significantly to them. E, however, will still contain frequency
components that are not represented in the perceptual signal.
Again, we're talking on different wavelengths. I consider E to be
the perceptual signal, so obviously its hard to respond to this.
E is the thing under control, according to Ashby, so it HAS to be the
percept doesn't it? You seem to be placing the thing controlled
in the external environment. Ashby places it clearly internal to the
control system.
Those frequency components will be uncontrolled (from the
viewpoint of an external observer). The perceptual signal itself,
however, will be controlled.
Of course, the "viewpoint of an external observer" is irrelevant to
the organism. It is the percept, E, that counts. Components in the
world that E can't respond to obviously are not characteristics
essential to control. This is the whole idea of filtering the information
You have to make up your mind whether you're talking about the
variable responsible for the disturbances (D), or the effect of
that variable (variations in E).
You are right - this is an important distinction and gets at the
heart of what I'm trying to say. Perhaps we are using the term
"disturbance" differently? I would define it as the net effect
of things in the world on the CEV. The disturbance is not
an absolutely defined entity - it depends on the model you use
for your description. When you talk about "disturbing variables"
that the control system has no information about, you probably
mean "a description of the disturbance in an 'objective' language
external to the organism." This is how an external observer is
apt to view the disturbance - and its entropy under this basis
is very high, call it H(D | observer). Now, the organism views
D through a very different lense. The same real-world entities
the observer just described are defined within the organism in
terms of the hierarchy (i.e. in terms of the CEV). This description
is much shorter, and has low entropy, call it H(D | organism).
Note that H(D) is meaningless. According to Ashby's Law, the
hierarchy must bring the entropy of the disturbance in line with
the output capacity. If H(D | observer) is high, then we will tend
to call the environment complex. If, in addition,
H(D | organism) << H(D | observer), then we say that the organism
controls in a complex world.
Note that I never said that the control system gets "direct
information" about D - that whole notion is meaningless. But it most
definitely DOES get information about D, however you choose to
arbitrarily define it. This information is definitely contained
in the perceptual signal, or the organism would be unable to
control against the disturbance. How can you control against something
you have no information about? The very notion is nonsensical.
>So I ask you: where above did my reasoning go astray?
In assuming that the control system needs information about the
state of the disturbing variable D, and in assuming that control
is in terms of the external variable E rather than the perceptual
representation of it.
Again, E *is* the perceptual representation, and the only
information about the "disturbing variable" relevent to control
is what survives in E.
The first is the most important error.
There can be any number of independent disturbing variables
acting through environmental links on E
Right, but the "disturbance", from the viewpoint of the organism,
can only be the net affect of these forces on the CEV. You can't define
disturbance otherwise without getting into an arbitrary mess that is
completely irrelevant to control.
...The control system needs no
information about them, singly or collectively.
How can you say this, when the sole purpose of the control system
is to oppose the disturbance? It can't oppose something it has NO
information about. It simply cannot.
>If I can actually have complete knowledge of the disturbance D,
>it is theoretically possible for me to respond appropriately
>before it has had an effect on the controlled variable.What you actually should have said, to be precise, was "If I can
actually have complete knowledge of the disturbance D, and if I
can use this knowledge to produce actions having a precisely
calculable effect on E, and if the actions actually produced
affect E exactly as calculated so as to cancel the effects due to
variations in D, then the actions will affect E so as to cancel
the effect of variations in D."
No. This is a tautology, as you well know. I wasn't trying to say
that if such a thing were possible, then it would be possible. I was
simply trying to state that it *is* possible. Anything can be made
into a tautology in the manner that you just did. If I state A, you
reword it into "If A then A," and you have your tautology. This is
a common debating tactic, but it holds no water. This is getting silly.
I stand by my original wording. Do you deny that compensatory
systems can in principle work? Do you deny that I could create a toy
world in my computer with a compensatory controller? (The very
idea of complex living organisms controlling via such a mechanism is,
of course, absurd - but that is not the issue).
ยทยทยท
-----------------------------------------
Allan Randall, randall@dciem.dciem.dnd.ca
NTT Systems, Inc.
Toronto, ON