[Martin Taylor 2012.12.14.10.05]

## ···

`[From Erling Jorgensen (2012.12.14.0830 EST)]`

`Another question I tried to address, or at least ask, was`

uncertainty about what? That came up in several ways, & I am

still not sure whether you or anyone else sees it the same way.`EJ: The technical understanding of information here means reduction in uncertainty. So isn't the error signal conveying information (i.e., reducing uncertainty) about the _perception_, not the disturbance, by saying the perception is not yet equal to the reference?`

```
Erling, you may be interested in the opening paragraphs of a
```

chapter (actually called an Annex) I have recently written for a

NATO report, which seems to address your question. The Annex

(chapter) is actually about information and uncertainty in

arbitrary networks. In that context, the “recipient” or “observer”

is a node of the network, and should not be construed as

necessarily biological.

```
Short answer: Yes, the error signal conveys information about the
```

reference and the perception; the perception carries information

about the disturbance and the output; the output carries

information about the error and the history of the error (assuming

some kind of integration in the output function). The error signal

therefore does carry information about the disturbance (and the

reference and the output, as well as its own history).

```
---------start quote---------
Historically, the three main
```

conceptual approaches to the quantitative study of information

and entropy may be labelled with the names of their major

proponents, namely Boltzmann-Gibbs, Shannon, and

Kolmogorov-Chaitin. All three start from quite different basic

considerations, but all three arrive at very similar

conceptual results. In this Annex, we will use Shannon for the

most part, but that is a matter of choice, not of necessity.

Boltzmann-Gibbs

```
Boltzmann (1866) recognized
```

that the state of an ideal gas was determined completely by

the positions and momenta of the individual molecules of the

gas, while at the same time the observable properties of the

gas were limited to measures such as temperature, volume,

pressure, and the like. A myriad different configurations of

positions and momenta could give rise to any given set of

values of the observable variables. The values of the

observable variables constitute a “macrostate”, while the

actual positions and momenta of the individual particles

constitute the “microstate” of the system. He discovered that

under very reasonable assumptions, the number of different

configurations consistent with a particular set of observable

values was proportional to its entropy. The entropy therefore

represents what you do not know about some World in which you

can specify a few general measures. Gibbs (1875-8) refined

Boltzmann’s ideas by taking the combined probabilities of the

microstates in a macrostate, rather than their number, to

define the probability of the macrostate. The difference is

important, but the basic idea was due to Boltzmann.

```
Boltzmann assumed that for
```

a given total energy of the gas no point in the microstate

space that had that energy (sum of squares of molecular

velocities) would be more probable * a

priori* than any other.

Accordingly, the hypervolume of a macrostate within the

microstate descriptive space would be proportional to the

probability that the microstate would be in that macrostate.

The negative logarithm of that probability was the entropy of

the macrostate, the measure of what you do not know about

microstate of the gas when you have specified its macrostate.

Gibbs’s refinement allowed for the mixture of different

constituents and for the interactions among constituents, but

both Boltzmann and Gibbs proposed the –K*Σp(i)log p(i)

formulation that later suggested to Shannon that the word

“entropy” might be used for his “lack of information” measure.

Shannon

```
Shannon (Shannon and
```

Weaver, 1949) was concerned with telephone communication. His

basic question was how much you can learn of what a

transmitter intended to send when you receive a message

through a channel. Before receiving the message, you know

something of what messages the transmitter might intend to

send, and after receiving the possibly garbled message you

know something more about what the transmitter did intend to

send. The difference between these two states is the

information about the transmitter’s intent obtained from the

message.

```
Being concerned with the
```

properties of the telephone network, Shannon was concerned

with the rate at which a noisy, distorting channel can allow

the receiver to become more precise about the transmitter’s

intent, not about what that intent might be for different

users of the channel. As he pointed out in the introduction to

his monograph, the meaning of the messages that might be

transmitted through the communication channel is irrelevant

when considering the properties of the channel itself. As

recently as Denning and Bell (2012), this comment has almost

universally been misinterpreted as saying that quantitative

information has nothing to do with meaning. In fact,

information *always*

has to do with meaning, since it depends entirely on what the

receiver believed about something related to the transmitter

before and after receiving the message.

```
Shannon took the view of
```

an engineer observing both ends of the channel, who could know

the set of messages that the transmitter could have sent and

their probabilities, as well as the set of messages compatible

with whatever was received by the receiver; based on the

properties of the message received, the receiver could assign

probabilities to the various possibilities for the

transmitter’s intent. Before the message was received, the

receiver had a certain set of probabilities for the different

possibilities, and this set of probabilities limited the

amount of information that the receiver could possibly get

from receipt of the message. Setting the initial probabilities

to be equal for all possible messages defined the maximum

amount of information an initially ignorant receiver could

possibly get from the message ensemble. This ignorance,

however, did not extend to ignorance of the set of

possibilities.

```
The receiver’s prior
```

knowledge of both the message coding and the set of possible

intended messages was key to the effect of the actual received

message on the receiver. In other words, although Shannon did

not say this explicitly, the inference from what he did say is

that a message that is meaningless to the receiver is a null

message that conveys no information. Meaning is irrelevant to

designing and measuring a communication channel, but meaning

is critical to the interpretation of “information.”

```
Shannon gave the name
```

“Entropy” to the amount that the receiver does not know about

the transmitter’s intent, because the expression for its

measurement is exactly the same as the Boltzmann-Gibbs

expression for the entropy of a gas. The entropy of a gas was

described by Boltzmann and Gibbs as what you don’t know about

the microstate of the gas after you observe its macrostate.

For Shannon, a “message” defines a macrostate, while the set

of possible variants that could specify the same message are

the microstates. If the message is perfectly received, there

remains only one microstate in the macrostate determined by

the receiver of the message.

```
It is important to
```

recognize that for Shannon, what constituted a “message” can

be treated at several different levels of perception. Suppose

that the transmitter intended to send a message inviting the

recipient to meet at a certain place and time. This message

had to be translated into words, the words into a letter or

sound stream, and the stream into a pattern of variation in an

electric current. At the receiving end, the receiver might

interpret the electrical pattern into a sound or letter

stream. At one level of perception, the message would have

been correctly received if the stream matched the transmitted

stream. However, if that were all that happened at the

receiving end, the message would not have been received. From

the stream, the message still must be converted into words,

and the words into an understanding that the transmitter

wanted a meeting, that the meeting should be in a particular

location, and that it should be at a particular time. Suppose

the transmitter had identified the location as “at Jake’s” and

the receiver knew of no place that would be so identified.

Would the message have been correctly received at that level?

It would have been correctly received in that all the words in

it had been exactly as transmitted, but the *meaning* would not

have been conveyed. The message would have conveyed

information about when the meeting was requested, but it would

not have conveyed information about where. Without considering

the meaning of a message to the recipient, the concept of

information makes no sense.

```
Garner (1962) treated
```

observation of the world as the equivalent of receiving a

message. The key difference is that the world had no intention

of being observed, so the observation does not result in the

attribution of intent to a transmitter. It does, however, have

the same property of changing the observer’s distribution of

probabilities over the observable states of the world, and of

creating a meaning in the observer’s head. Before looking out

of the window, you might say there was a 50-50 chance that it

is raining. After looking out of the window, your probability

distribution is likely to be 100-0. You now know whether or

not it is raining, and if the result is “raining”, that means

you would get wet if you went out without protection. You have

gained one bit of information. Of course, if beforehand you

thought the odds were 4:1 that it was raining, you got less

information from your observation.

```
Shannon initially used
```

“uncertainty” for his “missing information” measure, until von

Neumann suggested “Entropy” because of the identity of

Shannon’s and Gibb’s equations. Garner (1962) used

“Uncertainty” exclusively, because “Uncertainty” takes the

viewpoint of the receiver/observer, whereas “Entropy” seems to

imply a value computed by some omniscient being able to know

correctly all the possibilities inherent in a situation. Such

omniscience may be available to the engineer designing a

communication system, but it is seldom available either to the

user of the communication system or to the observer of the

world. We use “Uncertainty” in this Annex. Uncertainty is the

opposite of precision. If Uncertainty is reduced by one bit,

precision is doubled, or increased by one bit.

```
The concept of uncertainty
```

implies that we take the viewpoint of Shannon’s receiver, for

it is at the receiver that the uncertainty of the message is

reduced by what is transmitted through the communication

channel. The receiver observes the transmitter through the

channel in order to determine what the transmitter means by

sending the message. The transmitter may not even be intending

to send a message, but by observing, the receiver can learn

something meaningful about the transmitter, regardless.

```
The word “observe” is
```

important. For Garner, the receiver of a message is simply

observing its environment through the communication channel.

Whether there is an active transmitter at the other end is

irrelevant and possibly unknowable. What matters is how the

observation changes the receiver’s (“observer’s) understanding

of the thing observed. If the observer initially believes a

measure is between 4 and 6 mm, and after an observation

believes it to be between 4.5 and 5.5 mm, the observation has

provided one bit of information about the thing measured (has

halved the range of uncertainty about the length in question).

```
The relation between
```

Shannon’s transmitter-receiver language and Garner’s

observer-observed language may be encapsulated in Table D.1:

```
Table D.1: Mapping between Shannon’s
```

communication terminology and the language of observation

```
Shannon
```

communication

Natural Observation

Transmitter

Observable World

Receiver

Observer

Channel

```
Sensors,
```

displays, and processing

```
Transmitted
```

Message

Sensor data

Received Message

```
Observer’s
```

perception and understanding

```
Possible
```

distinct messages

```
Possible
```

distinct states of the observed world

Entropy

Uncertainty

Information

Information

```
“Information” is a term
```

widely used, and, as a technical construct, almost as widely

misunderstood. The problem is that the same word is used to

mean several different but related concepts, three in

particular within the Shannon conceptual structure

```
Information gained from a message,
```

which is the change in uncertainty about the transmitter’s

intent as a consequence of receipt of the message; one bit of

information equates to a doubling of precision.

```
Information about the state of the
```

transmitter, which is obtained from consideration of all the

messages so far received from that transmitter;

```
Information that could yet be
```

obtained about the state of the transmitter, which is the

remaining uncertainty about the transmitter’s state. Sometimes

this is misleadingly called “Information in” the transmitter.

This usage is the one that causes most misunderstanding, as

there is no sense in which “information” resides in anything

waiting to get out. It will not be used in this chapter.

However, a legitimate form of statement is “information

potentially available about” the transmitter, and occasionally

a form such as this may be used in what follows.

```
Although we will avoid
```

mathematical descriptions where possible, one crucial formula

should be kept in mind in any discussion of information and

information. Because of interests of the moment, the

receiver/observer may divide the messages or observations that

might be received into classes. For the receiver/observer, at

any moment “t” the i^{th} possibility has a

probability p_{t}(i) ,

where “i” ranges over all the possibilities. Given this set of

classes and probabilities the receiver/observer’s uncertainty

at that moment about the transmitter’s intent or the state of

the thing observed is U(t) = –Σp_{t}(i)log p_{t}(i) ,

and * the information

gained by receipt of a message or making an observation is

the change in the value of U(t)*.

```
Shannon’s concept and
```

Boltzmann’s coincide at one point. Shannon’s entropy and

Boltzmann’s both depend on the idea that a macrostate contains

many possible microstates, and that only the macrostate enters

into a measurement or an observation. For Shannon, the

question is which macrostate contains the current microstate.

For Boltzmann, the macrostate is known, and it is the

microstate that is unknown.

```
The boundaries of a Shannon
```

macrostate depend on the receiver’s interest at the moment of

receiving a message. The

symbols A, a, A, a might be all in the

same macrostate, depending on whether or not the receiver is

interested in font and capitalization, but A and Q would

usually not be in the same macrostate. Macrostate boundaries

define the possible meanings to the receiver/observer of the

possible messages or observations.

```
The receiver may be
```

completely sure of having received a message correctly

(uncertainty zero about the message) while remaining ignorant

of other information that might have been obtained from the

message had the receiver’s interest been different. For

example, in the days of paper mail the recipient of a typed

letter might note only the sender and the content, whereas the

detective, after the murder of the recipient, finds

information in the paper quality, the envelope postmark, and

DNA from the licked stamp, all of which could have been

available but were of no interest to the original recipient.

```
It is worth re-emphasising
```

the fact that information is always *about* something. What that something might be, the

possible meaning of the information, is determined only by the

observer/recipient of the information. The receiver’s initial

uncertainty is determined not by the actual set of possible

messages from which the transmitter might select the actual

message, but by what the receiver believes to be the set of

possibilities and the probabilities associated with the

different messages in that set.

```
Likewise, although the
```

possibilities are unlimited for what an arbitrary observer

might observe of some part of the real world, any particular

observer will observe only in which macrostate the observation

belongs. Again, the set of possible macrostates and their

initial probabilities is determined only by the observer, and

it varies over time. Although the potential information

obtainable from observation may be unlimited, the actual

information obtainable is limited by the number of macrostates

that make a difference to the observer — the limit is log_{2} N

bits (each of N macrostates having a probability 1/N). That

limit is reduced if the observer knows anything about the

observed part of the world prior to making the observation.

---------end quote--------

```
I don't know if this any help to you in understanding the context
```

of the recent discussion. I hope it is. And it reminds me that I

should not talk about information about the disturbance being “in

the output.” I should say “carried by the output” or some such.

```
Martin
```