[Martin Taylor 980314 2305]

Bill Powers (980314.0529 MST)

I must thank you for providing a largely accurate set of statements,

examples, and analogies which may help others to understand the concept

of information.

Martin Taylor 980313 20:50] (replying to Bruce Abbott)--

"Information" means neither more nor less than "reduction in uncertainty."

The problem here is measuring the uncertainty. Consider your example:

Example: I look at

a clock, a calendar, and a book of tide tables. This gives me information

about the phase of the tide at Southampton, and reduces my uncertainty

also about the height of the high tide

When you speak of reduction in uncertainty, this implies a difference in

two measures of uncertainty, one before and one after obtaining the

information. But how are such measures made? What is amount of your current

uncertainty about the phase of the tide at Southampton?

Approximately 13 hours. The point of the example was that there is no

link of influence either way between my clock and books on the one hand

and the height of the sea on the other. For one variable to convey

information about another no more indicates a physical relationship

between them than does the existence of a high correlation between two

variables.

Also, the amount of information received (as you define it) can't be

computed simply from examinining the message. The same message might

greatly reduce one receiver's uncertainty about something, but hardly

affect another's uncertainty. This says that the information can't be in

the message.

This is a VERY important point that few people notice. I do try to

keep hammering away at it, especially in my Layered Protocol writings,

but so often people talk about how much (or what) information

is in a message. You are absolutely correct. The amount of information

a message conveys is determined by what the recipient of the message

does with it. Shannon knew that, and wrote it. But few of his followers

note it.

Shannon showed that there is a maximum reduction of uncertainty

about the state(s) of the source of a message that can be achieved by

looking at the state(s) of the receiver of the message. That's the

channel capacity. But he didn't say that this amount of uncertainty

reduction would be achieved with a given message. As you say later,

if the receiver already knew in some other way the state of the source,

the message would convey no information.

Isn't it possible that a reduction in uncertainty can be obtained from

information that one already has, but that one hasn't thought about

sufficiently to understand?

Yes, definitely. One's uncertainty about something can clearly be changed

by thinking about it. A good example is the correlation message I posted

this evening. A few days ago I would have given only something like 2:1

odds that the a Fourier-decomposable waveform with a zero DC component

was always orthogonal to its integral. Now I would give about 10:1 odds.

I have gained perhaps half a bit of information about that by thinking

(I haven't calculated the amount, but it's a lot less than 1 bit).

Incidentally, it might be worth making the point that accuracy and

uncertainty reduction are not the same thing. I could be very sure that

X is between 0.5 and 0.5005, and very inaccurate, if X is actually 6.

Another observation of X might lead me to change my uncertainty, so that

I now believe X is between 0 and 8. The information I got from the measure

was negative. But I might be more accurate.

For example, you probably know the phase of the

moon right now, and the relative location of Southampton, and the fact that

the high tide lags the position of the moon by about 90 degrees, so if you

thought about it you could probably make a reasonable estimate of the state

of the tide at Southampton without using the tide tables. But not having

worked this out, you are more uncertain about the state of the tide than

you need to be.

Yes, just so.

A simpler example would be to ask you the quotient of 4267/13. When I ask

you what it is, you are very uncertain, but a second's thought will show

you that it must be between 300 and 400, which is a great reduction in

uncertainty. All the information you needed to accomplish this reduction

was already inside of you, but it hadn't yet affected your uncertainty.

Yep.

A perfect control system would

completely block the flow of that information.

The word "flow of information" also tends to be misleading. To say it is

like saying "flow of correlation." "Information" and "correlation" are

very similar terms, mathematically. In fact, under certain conditions

of normal distributions and so forth, each can be numerically converted

into the other. Richard Kennaway did that for an example a few weeks ago.

You're the one with the archives indexed, but I seem to recall that you

have spoken more than once about the flow of information from the

disturbance to the perceptual signal.

If I did, I should not have done. One does, of course, when talking

colloquially, just as one uses a lot of technical terms loosely. If I

tell you something you didn't previously know, and you believe me, I am

quite likely to talk of a "flow of information" from me to you. But what

it means is a reduction of uncertainty about something in your total

state.

When there is a physical link from A to B, such that the state

of A influences the state of B, one is quite likely to talk about the

"flow" of information from A to B, despite that mathematically it could

equally be the other way. It's an easy way of talking, useful at most

times, but misleading when the topic is precisely about information,

uncertainty, causality, and control. The fact that the disturbance

signal influences the perceptual signal leads one to talk incautiously

about the flow of information from one to the other. When I'm talking

about recovering the disturbance signal from the perceptual signal, I

don't talk about the "flow of information" the other way, though I

guess I could--it would seem unnatural, since the perceptual signal

does not influence the disturbance signal.

Richard's calculation must be based on the assumption that the receiver had

never obtained any prior information about the subject. If the same

correlation is computed twice, the amount of information obtained the

second time must be less than it was the first time.

This is, generally speaking, true. But it's not true that the receiver

never obtained any prior information about the subject. The receiver

had information about the probability distribution of measures of the

subject. The correlation indicates how much the probability distribution

tightens up for the next sample of B when you know the corresponding sample

of A. Measuring the correlation again doesn't change that, though it may

make your estimate of the correlation itself a bit more precise.

Richard's calculation

was based strictly on the "channel capacity" concept of information, not on

yours.

The "channel capacity" is a perfectly valid consequence of Shannon's work.

It is what Shannon was mainly interested in. What you call "my concept" of

information is Shannon's axioms, from which he derived channel capacity.

There's no contradiction.

It's quite possible that a high correlation would coincide with zero

reduction in uncertainty, if the receiver already had the relevant

information.

I'm not sure how to reword this so that it is a technically meaningful

statement. Perhaps you are correct. The part that bothers me is

"already had the relevant information." Technically that means "already

had achieved the reduction in uncertainty that would be achieved if...".

"If..." what?

Are you saying that if A (= {a1, a2,...an}) and B = ({b1, b2, ...,bn})

are correlated, and you have already measured bi, you get no further

improvement in your estimate of bi from a measurement of ai? If so, of

course you are quite correct. That's all of a piece with the earlier

part of your message.

Anyway, thank you for clear and accurate points and examples.

Martin