Information about (was Ashby's Law of Requisite Variety)

[Martin Taylor 2012.]

[From Erling Jorgensen (2012.12.14.0830 EST)]
    Another question I tried to address, or at least ask, was

uncertainty about what? That came up in several ways, & I am
still not sure whether you or anyone else sees it the same way.

EJ: The technical understanding of information here means reduction in uncertainty. So isn't the error signal conveying information (i.e., reducing uncertainty) about the _perception_, not the disturbance, by saying the perception is not yet equal to the reference?
  Erling, you may be interested in the opening paragraphs of a

chapter (actually called an Annex) I have recently written for a
NATO report, which seems to address your question. The Annex
(chapter) is actually about information and uncertainty in
arbitrary networks. In that context, the “recipient” or “observer”
is a node of the network, and should not be construed as
necessarily biological.

  Short answer: Yes, the error signal conveys information about the

reference and the perception; the perception carries information
about the disturbance and the output; the output carries
information about the error and the history of the error (assuming
some kind of integration in the output function). The error signal
therefore does carry information about the disturbance (and the
reference and the output, as well as its own history).

  ---------start quote---------

      Historically, the three main

conceptual approaches to the quantitative study of information
and entropy may be labelled with the names of their major
proponents, namely Boltzmann-Gibbs, Shannon, and
Kolmogorov-Chaitin. All three start from quite different basic
considerations, but all three arrive at very similar
conceptual results. In this Annex, we will use Shannon for the
most part, but that is a matter of choice, not of necessity.


      Boltzmann (1866) recognized

that the state of an ideal gas was determined completely by
the positions and momenta of the individual molecules of the
gas, while at the same time the observable properties of the
gas were limited to measures such as temperature, volume,
pressure, and the like. A myriad different configurations of
positions and momenta could give rise to any given set of
values of the observable variables. The values of the
observable variables constitute a “macrostate”, while the
actual positions and momenta of the individual particles
constitute the “microstate” of the system. He discovered that
under very reasonable assumptions, the number of different
configurations consistent with a particular set of observable
values was proportional to its entropy. The entropy therefore
represents what you do not know about some World in which you
can specify a few general measures. Gibbs (1875-8) refined
Boltzmann’s ideas by taking the combined probabilities of the
microstates in a macrostate, rather than their number, to
define the probability of the macrostate. The difference is
important, but the basic idea was due to Boltzmann.

      Boltzmann assumed that for

a given total energy of the gas no point in the microstate
space that had that energy (sum of squares of molecular
velocities) would be more probable * a
priori* than any other.
Accordingly, the hypervolume of a macrostate within the
microstate descriptive space would be proportional to the
probability that the microstate would be in that macrostate.
The negative logarithm of that probability was the entropy of
the macrostate, the measure of what you do not know about
microstate of the gas when you have specified its macrostate.
Gibbs’s refinement allowed for the mixture of different
constituents and for the interactions among constituents, but
both Boltzmann and Gibbs proposed the –K*Σp(i)log p(i)
formulation that later suggested to Shannon that the word
“entropy” might be used for his “lack of information” measure.


      Shannon (Shannon and

Weaver, 1949) was concerned with telephone communication. His
basic question was how much you can learn of what a
transmitter intended to send when you receive a message
through a channel. Before receiving the message, you know
something of what messages the transmitter might intend to
send, and after receiving the possibly garbled message you
know something more about what the transmitter did intend to
send. The difference between these two states is the
information about the transmitter’s intent obtained from the

      Being concerned with the

properties of the telephone network, Shannon was concerned
with the rate at which a noisy, distorting channel can allow
the receiver to become more precise about the transmitter’s
intent, not about what that intent might be for different
users of the channel. As he pointed out in the introduction to
his monograph, the meaning of the messages that might be
transmitted through the communication channel is irrelevant
when considering the properties of the channel itself. As
recently as Denning and Bell (2012), this comment has almost
universally been misinterpreted as saying that quantitative
information has nothing to do with meaning. In fact,
information always
has to do with meaning, since it depends entirely on what the
receiver believed about something related to the transmitter
before and after receiving the message.

      Shannon took the view of

an engineer observing both ends of the channel, who could know
the set of messages that the transmitter could have sent and
their probabilities, as well as the set of messages compatible
with whatever was received by the receiver; based on the
properties of the message received, the receiver could assign
probabilities to the various possibilities for the
transmitter’s intent. Before the message was received, the
receiver had a certain set of probabilities for the different
possibilities, and this set of probabilities limited the
amount of information that the receiver could possibly get
from receipt of the message. Setting the initial probabilities
to be equal for all possible messages defined the maximum
amount of information an initially ignorant receiver could
possibly get from the message ensemble. This ignorance,
however, did not extend to ignorance of the set of

      The receiver’s prior

knowledge of both the message coding and the set of possible
intended messages was key to the effect of the actual received
message on the receiver. In other words, although Shannon did
not say this explicitly, the inference from what he did say is
that a message that is meaningless to the receiver is a null
message that conveys no information. Meaning is irrelevant to
designing and measuring a communication channel, but meaning
is critical to the interpretation of “information.”

      Shannon gave the name

“Entropy” to the amount that the receiver does not know about
the transmitter’s intent, because the expression for its
measurement is exactly the same as the Boltzmann-Gibbs
expression for the entropy of a gas. The entropy of a gas was
described by Boltzmann and Gibbs as what you don’t know about
the microstate of the gas after you observe its macrostate.
For Shannon, a “message” defines a macrostate, while the set
of possible variants that could specify the same message are
the microstates. If the message is perfectly received, there
remains only one microstate in the macrostate determined by
the receiver of the message.

      It is important to

recognize that for Shannon, what constituted a “message” can
be treated at several different levels of perception. Suppose
that the transmitter intended to send a message inviting the
recipient to meet at a certain place and time. This message
had to be translated into words, the words into a letter or
sound stream, and the stream into a pattern of variation in an
electric current. At the receiving end, the receiver might
interpret the electrical pattern into a sound or letter
stream. At one level of perception, the message would have
been correctly received if the stream matched the transmitted
stream. However, if that were all that happened at the
receiving end, the message would not have been received. From
the stream, the message still must be converted into words,
and the words into an understanding that the transmitter
wanted a meeting, that the meeting should be in a particular
location, and that it should be at a particular time. Suppose
the transmitter had identified the location as “at Jake’s” and
the receiver knew of no place that would be so identified.
Would the message have been correctly received at that level?
It would have been correctly received in that all the words in
it had been exactly as transmitted, but the meaning would not
have been conveyed. The message would have conveyed
information about when the meeting was requested, but it would
not have conveyed information about where. Without considering
the meaning of a message to the recipient, the concept of
information makes no sense.

      Garner (1962) treated

observation of the world as the equivalent of receiving a
message. The key difference is that the world had no intention
of being observed, so the observation does not result in the
attribution of intent to a transmitter. It does, however, have
the same property of changing the observer’s distribution of
probabilities over the observable states of the world, and of
creating a meaning in the observer’s head. Before looking out
of the window, you might say there was a 50-50 chance that it
is raining. After looking out of the window, your probability
distribution is likely to be 100-0. You now know whether or
not it is raining, and if the result is “raining”, that means
you would get wet if you went out without protection. You have
gained one bit of information. Of course, if beforehand you
thought the odds were 4:1 that it was raining, you got less
information from your observation.

      Shannon initially used

“uncertainty” for his “missing information” measure, until von
Neumann suggested “Entropy” because of the identity of
Shannon’s and Gibb’s equations. Garner (1962) used
“Uncertainty” exclusively, because “Uncertainty” takes the
viewpoint of the receiver/observer, whereas “Entropy” seems to
imply a value computed by some omniscient being able to know
correctly all the possibilities inherent in a situation. Such
omniscience may be available to the engineer designing a
communication system, but it is seldom available either to the
user of the communication system or to the observer of the
world. We use “Uncertainty” in this Annex. Uncertainty is the
opposite of precision. If Uncertainty is reduced by one bit,
precision is doubled, or increased by one bit.

      The concept of uncertainty

implies that we take the viewpoint of Shannon’s receiver, for
it is at the receiver that the uncertainty of the message is
reduced by what is transmitted through the communication
channel. The receiver observes the transmitter through the
channel in order to determine what the transmitter means by
sending the message. The transmitter may not even be intending
to send a message, but by observing, the receiver can learn
something meaningful about the transmitter, regardless.

      The word “observe” is

important. For Garner, the receiver of a message is simply
observing its environment through the communication channel.
Whether there is an active transmitter at the other end is
irrelevant and possibly unknowable. What matters is how the
observation changes the receiver’s (“observer’s) understanding
of the thing observed. If the observer initially believes a
measure is between 4 and 6 mm, and after an observation
believes it to be between 4.5 and 5.5 mm, the observation has
provided one bit of information about the thing measured (has
halved the range of uncertainty about the length in question).

      The relation between

Shannon’s transmitter-receiver language and Garner’s
observer-observed language may be encapsulated in Table D.1:

      Table D.1: Mapping between Shannon’s

communication terminology and the language of observation



Natural Observation


Observable World





displays, and processing



Sensor data

Received Message


perception and understanding


distinct messages


distinct states of the observed world





      “Information” is a term

widely used, and, as a technical construct, almost as widely
misunderstood. The problem is that the same word is used to
mean several different but related concepts, three in
particular within the Shannon conceptual structure

      Information gained from a message,

which is the change in uncertainty about the transmitter’s
intent as a consequence of receipt of the message; one bit of
information equates to a doubling of precision.

      Information about the state of the

transmitter, which is obtained from consideration of all the
messages so far received from that transmitter;

      Information that could yet be

obtained about the state of the transmitter, which is the
remaining uncertainty about the transmitter’s state. Sometimes
this is misleadingly called “Information in” the transmitter.
This usage is the one that causes most misunderstanding, as
there is no sense in which “information” resides in anything
waiting to get out. It will not be used in this chapter.
However, a legitimate form of statement is “information
potentially available about” the transmitter, and occasionally
a form such as this may be used in what follows.

      Although we will avoid

mathematical descriptions where possible, one crucial formula
should be kept in mind in any discussion of information and
information. Because of interests of the moment, the
receiver/observer may divide the messages or observations that
might be received into classes. For the receiver/observer, at
any moment “t” the ith possibility has a
probability pt(i) ,
where “i” ranges over all the possibilities. Given this set of
classes and probabilities the receiver/observer’s uncertainty
at that moment about the transmitter’s intent or the state of
the thing observed is U(t) = –Σpt(i)log pt(i) ,
and * the information
gained by receipt of a message or making an observation is
the change in the value of U(t)*.

      Shannon’s concept and

Boltzmann’s coincide at one point. Shannon’s entropy and
Boltzmann’s both depend on the idea that a macrostate contains
many possible microstates, and that only the macrostate enters
into a measurement or an observation. For Shannon, the
question is which macrostate contains the current microstate.
For Boltzmann, the macrostate is known, and it is the
microstate that is unknown.

      The boundaries of a Shannon

macrostate depend on the receiver’s interest at the moment of
receiving a message. The
symbols A, a, A, a might be all in the
same macrostate, depending on whether or not the receiver is
interested in font and capitalization, but A and Q would
usually not be in the same macrostate. Macrostate boundaries
define the possible meanings to the receiver/observer of the
possible messages or observations.

      The receiver may be

completely sure of having received a message correctly
(uncertainty zero about the message) while remaining ignorant
of other information that might have been obtained from the
message had the receiver’s interest been different. For
example, in the days of paper mail the recipient of a typed
letter might note only the sender and the content, whereas the
detective, after the murder of the recipient, finds
information in the paper quality, the envelope postmark, and
DNA from the licked stamp, all of which could have been
available but were of no interest to the original recipient.

      It is worth re-emphasising

the fact that information is always about something. What that something might be, the
possible meaning of the information, is determined only by the
observer/recipient of the information. The receiver’s initial
uncertainty is determined not by the actual set of possible
messages from which the transmitter might select the actual
message, but by what the receiver believes to be the set of
possibilities and the probabilities associated with the
different messages in that set.

      Likewise, although the

possibilities are unlimited for what an arbitrary observer
might observe of some part of the real world, any particular
observer will observe only in which macrostate the observation
belongs. Again, the set of possible macrostates and their
initial probabilities is determined only by the observer, and
it varies over time. Although the potential information
obtainable from observation may be unlimited, the actual
information obtainable is limited by the number of macrostates
that make a difference to the observer — the limit is log2 N
bits (each of N macrostates having a probability 1/N). That
limit is reduced if the observer knows anything about the
observed part of the world prior to making the observation.

---------end quote--------

  I don't know if this any help to you in understanding the context

of the recent discussion. I hope it is. And it reminds me that I
should not talk about information about the disturbance being “in
the output.” I should say “carried by the output” or some such.