Inconsistent theories (was Re: Chasing the Wind)

[From Bill Powers (2009.09.30.1442 MDT)]

Martin Taylor 2009.09.30.11.14 –

MT: May I requote from my
paragraph you quoted above? “Only a communication engineer would
consider the uncertainty of the character by character reception as being
related to the information transmitted, and the reason for that is that
the link for which the engineer is responsible must carry messages of
arbitrary types.” Isn’t that what Shannon says in your quote (which
I had not seen before)?

BP: Yes, and Shannon precedes that by saying

"…I wish to review briefly what we mean by redundancy. In
communication engineering, we regard information perhaps a little
differently than some of the rest of you do. In particular we are not at
all interested in semantics or the meaning implications of information.
Information for the communication engineer is something he transmits from
one point to another as it is given to him, and it may not have any
meaning at all. It might, for example, be a random sequence of digits, or
it might be information for a guided missile or a television signal.

For communication work, we abstract all properties of the messages except
the statistical properties which turn out to be very important."
Page 123.

Clearly, Shannon represented himself as an example of “we”
communication engineers. His presentation is titled “The redundancy
of English.” The publication is titled "Cybernetics: circular
causal and feedback mechanisms in biological and social systems.
Transactions of the seventh conference, March 23-24, 1950, New York, NY.
Edited by Heinz von Foerster, assistant editors Margaret Meade and Hans
Lukas Teuber. Publisher: The Josiah Macy, Jr. Foundation, 565 Park
Avenue, New York, Copyright 1951. Sorry I didn’t give the full
particulars before.

MT: My reference to Shannon is
to the book that popularized information analysis (Shannon and Weaver,
“The Mathematical Theory of Communication” U of Illinois Press,
1949).

BP earlier: I think your definition of information as reduction of
psychological uncertainty in the receiver of the message is
impractical

Shannon’s definition, not mine. You added the word
“psychological”.

Yes, because you are talking about reduction of subjective uncertainty,
not calculable uncertainty in a given message. Shannon says very plainly
that he is a communications engineer who is not concerned with
meaning.

MT: Shannon deals
throughout the book with using observation of what was received to reduce
the receiver’s uncertainty as to what might have been transmitted.

BP: When Shannon speaks of the receiver and transmitter, he is talking
about the electronic devices, not the human being.

MT: That reduction is the
information transmitted. I think it’s made most explicit with the diagram
on page 41 and the text in that neighbourhood.

BP earlier:, because to
calculate that you would have to know every possible meaning of
all possible messages and I believe that is impossible to know.

MT: Why do you make those two
assertions? To me, they both come out of thin air.

They come out of my imperfect understanding of what is meant by reduction
of uncertainty. What I have understood is that to calculate the
information content of a message you must first determine the number of
possible messages, so you can calculate the number of bits required to
distinguish one message from all others. Shannon, in his article on
English, does this in terms of alphabets and statistical distributions of
characters used in messages, plus distributions of digrams and trigrams
and so on. He calculates the redundancy in this way, to see what the
actual minimum number of bits is for sending an unambiguous message using
English letters.

As a receiver, you can’t
ever know the meaning intended by the sender, though you can use
“The Test” to make a pretty good guess after enough
back-and-forth interactions. As a sender, you can’t ever know whether the
receiver has gathered the meaning you intended to convey, although you
can make a pretty good guess by applying “The Test” over a
series of interactions. But as a receiver you CAN know something about
what range of meanings you anticipate the sender might be wanting you to
perceive, and you usually do. In most circumstances, one has a reasonably
restricted range of messages one might expect.

Here you are using the term “receiver” in a way not intended by
Shannon, as far as I can see. His receiver is the box with dials and
lights out of which comes the wire going to a printer. Your receiver is
the person, the recipient, to whom the printed output is handed. These
are two completely different things, and the universe of possible
meanings is very different from the universe of possible messages (in
part because the messages themselves, once received, become experiences
that can be meanings: The meaning of “Yours of the 21st inst”
is a particular message). You propose that we can know something about
the range of possible meanings the person reading the message can get
from it, but I doubt, and indeed flatly reject, the idea that this is how
we get meanings from messages. Certainly, the meaning we end up with is
influenced by the meaning we expect to get, based on the general context
of the communications that are going on. But this is hardly a
quantifiable judgment and it’s not arrived at in the same way we compute
the information content of string of characters. You can’t go through the
list of all possible meanings to determine what either you or another
person mean. Often you may see that there are several possible meanings
and have to pause to decide which one is intended (or find that you can’t
decide), but the fact that several meanings come to mind by no means is
the same as the number of possible meanings that exists.

If you were to accept my definition of meanings in the sense used here,
meanings would be simply the perceptions that are indicated by the
messages you get. The number of possible meanings, therefore, would be
the total number of memories of experiences at all levels that you have
associated with words or other symbols. I have no idea, and I doubt that
anyone else does, either, how many of those there are. It must be an
exceedingly large number. The larger the number, of course, the less is
the information contained in any one message, because it takes a large
number of bits to distinguish one element of a large set from all the
others. But that is moot, because I see no possibility of determining the
number of bits required to isolate one meaning, or even determining the
meaning of “one bit” in that context.

Take your own example. You
hypothesised that the listener quite clearly heard and understood
"Meet me " and later in the message “but leave your car at
the station and walk to the corner of” and “and West”.

I merely assumed that those strings of characters were transmitted
without error, while some of the others were distorted because of
missing, extra, or flipped bits in the transmitted codes. I assume the
recipient of the message (to distinguish that agent from the electronic
“receiver” that accepts the transmission) could find meanings
to go with all the strings that had meanings for the person, except those
strings that did not have any associations. The latter would result
result from extreme distortions like those I included in the message and
which I trust you had no trouble distinguishing from meaningful (for you)
strings. Of course strings like “Bring Jane with you” might
also lose meaning (Jane who? I don’t know any Jane I might bring) if you
don’t happen to have a relevant experience with the referent of that
string. On the other hand, a communication engineer would probably find
the most meaning in the garbled strings, because they indicate errors
that the engineer is supposed to prevent.

MT: " The listener also can
perceive roughly the length of the missing bits. Now it’s extremely
unlikely that the first gap contained a dissertation on Hawking radiation
from black holes, and quite likely that it contained a time of day.
Moreover, it is quite likely that the time of day in question is on the
same day, and at a time both parties are expected to be free of other
obligation. The probability distribution of messages that the receiver
might expect is quite a bit less wide than “the meaning of all
possible messages”. The same is true of the second gap, where the
receiver almost certainly have an uncertainty distribution for the
meaning that ranges only over a small number of street corners, and does
not have a high probability for issues concerning solar
energy.

BP: These are all very fuzzy and iffy guesses based on imagination more
than data. You assume that the message will never have any extraneous
references in it, like “Meet me outside the store that has solar
panels in its window.” You have very little ability to narrow the
possible meanings that might occur in any message. And it’s the number of
meanings that MIGHT occur that are in the denominator when you calculate
information content.

MT: On the other hand, if the
listener was in the middle of listening to a discussion of Hawking
radiation, the words “Meet me”, though clearly spoken, might
not have been immediately or easily understood, since their meaning would
not have been anywhere in the high-likelihood region of meanings within
the lecture.

BP: They might if Hawking says “you will know what a tidal force in
a black hole is if you meet me on my way into it.” You simply can’t
anticipate all the meanings that might crop up in a message.

BP: The message “I don’t
know” has a meaning that’s entirely dependent on what messages
precede and follow it and that is practically an infinite universe of
possibilities. The preceding message might have been “what’s the
density of tungsten?” or “What was John’s answer?”
or “What’s a good crossword clue for
ignorance?”

MT: Yes, of course. I don’t know what point you are trying to make, here.
Everything that happened prior to the message constrains the expectations
for what might be in the garbled gaps, does it not? In what I said above,
I assumed the situation to be that the two people had expected to meet,
but had not arranged when and where. The situation always constrains
one’s expectation of what might be said. I’m sure you have experienced
occasions when someone has said something that comes “out of left
field”, and have not immediately understood it, perhaps to the
extent of asking “Could you say that again” or “What do
you mean”.

BP: I guess you just don’t see what I’m talking about. Every word we use
in communicating has more than one meaning even inside a single person;
all I’m trying to get across is that I don’t see any reasonable way to
compute the information content in a message in terms of the meanings
recipients get from the message, compared with the total number of
possible meanings. In my opinion, you’re overstraining the information
metaphor when you take it out of the literal, quantitative engineering
context Shannon talked about. Perhaps you have some dazzling succinct
argument that will change my mind about that, but I haven’t seen it so
far.

MT: We verge on another
discussion of what is meant by “meaning”. It’s not a discussion
in which I have either interest or time to contribute much. But in
this context, I take “meaning” to be “influence on a
perception, whether that perception be controlled or not”.

BP: Influence of what on what perception? What perception does the word
“red” influence? I think it makes more sense to say that the
meaning of a word (itself a perception) is some other perception that is
indicated (perhaps through memory association) by the word serving as a
label or pointer.

I should think you would be vitally interested in trying to figure out
what the word “meaning” means, since everything else you say
depends on that.

Best,

Bill P.

[From Rick Marken (2009.09.30.1550)]

Bill Powers (2009.09.30.1442 MDT) to Martin Taylor 2009.09.30.11.14 –

Hey, I thought it was time for us to give up internal nit-picking.

I’m so confused!

Best

James Dean

PS. But I love it! Keep at it;-)

···

Richard S. Marken PhD
rsmarken@gmail.com
www.mindreadings.com

[From Bill Powers (2009.09.30.1653 MDT)]

Rick Marken (2009.09.30.1550) --

RM: Hey, I thought it was time for us to give up internal nit-picking.

BP: Just one more for the road and then I swear I'm off that stuff for good.

Best,

Bill P.

[Martin Taylor 2009.09.30.14.51]

[From Bill Powers (20089.09.30.0946 MDT)]
to [Martin Taylor 2009.09.30.10.35] –

[BP] I’ve been annoyed at you before, Martin, and got over it. This
time it’s going to be hard.

[MT now] I’m sorry you are annoyed. It was certainly not my intention
to annoy you. I don’t suppose it is usually your intention to annoy me
when you do it (which is not uncommon). I generally assume you meant no
annoyance, so I don’t worry about it for too long. It is also
interesting that we get annoyed at each other so often, when in
practice we are 95% in agreement about most things (a figure I plucked
out of the air, but which can’t be too far wrong).

This message of yours is one such occasion on which you really have
annoyed me, I hope inadvertently. Perhaps I should use the word
“puzzled” rather than “annoyed”, because I know you are better informed
than you pretended to be in what you wrote – and yet because I know
this I am led to believe you intended to annoy me, despite that
historical evidence leads me not to believe that to be the case. I
therefore have two perceptions in direct conflict: (1) that you
intended to annoy, and (2) that you did not intend to annoy. But (2)
presupposes that you did not understand what I wrote, and on this
occasion I find that truly hard to believe.

Imagine yourself as an uninvolved observer reading the portion of the
dialogue quoted below, and see whether you would think your
contribution to be scientifically rational or simply confrontational.

I started by making an observation that I really had half-expected you
yourself to make so that you could clarify my earlier messages for the
uninitiated. Some of the less engineering-minded might have thought
there was a contradiction between the lack of information about the
current value of d contained in the current value of p and the complete
information about d contained in the complete waveform of p. Since you
didn’t make that clarification as I had half-hoped, I stepped into the
breach, apologetically.

I meant it when I said: “I apologise if it is too obvious to make
explicit”, because I seriously assumed that you would already seen that
it was obvious, and had just forborne to make the point because you had
assumed everyone else would know it, too.

Here’s the dialogue chunk in question – it’s not the only part of your
message that I found annoying and unfair, but it’s perhaps the worst
part:

*start quote

MT: It might also be worth noting, though I
apologise if it is too obvious to make explicit, that even though the
entire disturbance waveform can be reconstructed from the entire
perceptual waveform, there is almost no information about the current
value of the disturbance in the current value of the perceptual
waveform.

BP: Martin, that’s beneath you, or should be. “Apologize if it is too
obvious to be made explicit!” The whole discussion was about whether
the current value of the disturbance could be deduced from the current
value of the perception. I said it couldn’t, and all my arguments were
aimed at showing why it couldn’t. Now, in effect, you’re saying I was
right all along about the point I was trying to make, but that you were
really making a different point which was “too obvious to be made
explicit.”

MT: As Richard and I have mentioned many times
over the years, if G is a pure integrator, o is completely uncorrelated
with p; informationally, the information from p to o is distributed
equally over all informationally independent samples in its history, a
notionally infinite number unless we start at some time t0 with known
values of p and o. Hence the information available about the current
value of o in the current value of p approaches (using the usual
mathematical sense of “approaches”) zero.

BP: Now you’re really off into some never-never-land. This kind of
gobbledegook smacks of desperation, not erudition. It’s a smokescreen,
a snow job. “Notionally infinite” – Jesus Christ.

MT: I mention this trivial fact because some
might have seen an apparent contradiction between (1) there being
essentially no information about the current value of d in the current
value of p under conditions of excellent control, and (2) the fact that
the d waveform can be totally reconstructed from the p waveform, and
therefore that all the information about d is available in p and its
history.

“Trivial fact.” Well, that does it for me. If you know what you’re
talking about, you’re the only one here who does. I doubt that you will
ever be able actually to perform the calculations you make these claims
about, and the only way you’ll ever convince me that you can do this is
to do it. Call back when you have something to show me.

******** end quote *******
Point 1. You wrongly say that “The whole discussion was about whether
the current value of the disturbance could be deduced from the current
value of the perception. I said it couldn’t, and all my arguments were
aimed at showing why it couldn’t.” And I have also said it couldn’t,
and showed the information-theoretic reasons why this was so. You have
shown the linear mechanical reasons why it was so. We agreed, coming at
the same result by different routes. The discussion was not about that
at all.
Point 2. You wrongly say: "Now, in effect, you’re saying I was right
all along about the point I was trying to make, but that you were
really making a different point which was “too obvious to be made
explicit.” This is simple nonsense, seemingly aimed only at making me
feel bad by casting me as opposing the point I myself made. I was
making the same point I made in 1992 or 1993 and again in my most
recent messages, that although you can reconstruct the disturbance from
the perceptual signals’ total history (or from the perceptual signal
and the output signal, which is the same thing), you can’t get
information about the current value of the disturbance
from the current value of the perception if control is very good.
Point 3. After I pointed out the truism that the information about any
one moment in the disturbance waveform was distributed over a
notionally infinite duration of the perceptual waveform (because that’s
what a pure integrator does), you come out with : “Now you’re really
off into some never-never-land. This kind of gobbledegook smacks of
desperation, not erudition.”
There’s no erudition, gobbledegook, or anything other than a simple
statement of (dare I say it) self-evident fact. That
the number of independent samples in the history of the perceptual (and
disturbance) waveform is notionally infinite is the reason you can
simultaneously have epsilon information about the current value of the
disturbance in the current value of the perception (where epsilon
approaches zero) and nevertheless be able to reconstruct the whole
disturbance from the whole perceptual waveform. And note that I did
include the case of finite duration, too. In that case, the number of
independent samples is not notionally infinite, but even so, the
information from the current value of the perceptual signal about the
current value of the disturbance signal declines sharply over time
since t0.

Point 4. [BP] “Trivial fact.” Well, that does it for me. If you know
what you’re talking about, you’re the only one here who does. "

[MT now] I certainly hope that’s not true. It should be trivial to just
about anyone (including you) with enough elementary calculus to know
what an integral is; you don’t even have to be able to compute one. I
believe the readership of CSGnet to be pretty well educated.

Are you surprised I am annoyed at your response? Maybe you are
surprised. I don’t think you should be.

Anyway, I don’t expect to stay annoyed for very long. Saddened, for
sure. But that’s the way the world is, and no getting around it.

Now for the rest of your comments.

MT: Bill  [From Bill Powers (2009.09.29.1020

MDT)] managed to “refute” my argument by altering the fundamental
equation of the control system, in that he introduced the noise that he
usually strongly asserts to be irrelevant to control.

BP: I do? I believe that presence of system noise was my very first
explanation, many years ago, of why there is a low correlation between
control actions and controlled quantities, which gets lower as control
improves, and also between disturbances and controlled quantities. If
you can remind me of when I said noise is irrelevant to control, I’ll
be glad to admit that I no longer believe it. It is irrelevant,
usually, to how well the perceptual signal tracks the reference signal
because we’re comparing the noise to the dynamic range of the reference
signal. It’s certainly not irrelevant to the error signal or the effect
of the error on the output.

I’m not going to do an archival research job, which I think would come
up with just about every occasion on which I have introduced
information-theoretic considerations in PCT. So I will accept your
denial. I would, however, point out that noise never is included in the
equations of control normally cited on CSGnet, and it is those
equations that Allan and I used in our analysis. I might also point out
(as I have done before) that noise has little or nothing to do with the
low correlation between control actions and control quantities. Noise
does reduce the correlation, but usually as a secondary effect, the
primary effect being control with a time-binding output function.

MT: One thing I had hoped that Rick or Bill

would bring up, but they didn’t, was that the signal “o” is a function
of the history of the perceptual signal, if the reference signal is
constant over time. Even if the output function G is a simple
proportionality with no inherent time-binding, the signal paths in the
loop have transport lags that ensure the output value added to the
current disturbance is a function of a past value of the perceptual
signal, not of its present value.

BP: The lags in tracking behavior are measured at about 8 60ths of a
second – about 130 milliseconds. The bandwidth of these control
systems is roughly 2.5 Hz, a number that’s been known for around 60
years. This means that the lags are about 35% of the period of a
sine-wave disturbance at the upper frequency limit of control. This is
also indicated by the phase relationships in the Bode plots of such
systems.

The result is that over most of the bandwidth range, the feedback
effects are opposed to effects of any disturbances that the control
system can resist, and are not independent of the disturbances that
caused them. If the feedback effects always came too late to affect the
input variations, we would have not a control system but an oscillator.

All true. But why did you write the last sentence, which seems
disconnected with what I wrote and at variance with your preceding
sentences? You start by confirming what I wrote, and then produce a
non-sequitur that deals with a different condition entirely. I can
think of no reason why you would do that, unless it was in order to
show that I must be wrong.

This is obvious when you think about what we observe of the effects of
disturbances on controlled quantities. In fact the changes in the
controlled quantity due to disturbances are very much less than what
they would be if there were no feedback. This says that the feedback
effects occur in time to cancel most of the effect of the disturbance.
If the effects of lags were as you describe them, so that the feedback
effects came only from “past values” of the disturbance, the effects on
the controlled quantity would not be reduced. Only if you use
disturbances with bandwidth greater than the bandwidth of good control
do you see a significant effect from the lags – and then the effect is
to allow greater effects of the disturbance and to reduce the ability
to control. The variations in the perceptual signal at the higher
frequencies involved would then be much larger and we would start to
see information in the perceptual signal about the higher frequencies
in the disturbance – but not the lower frequencies.

What you say is true, except for “If the effects of lags were as you
describe them, so that the feedback effects came only from “past
values” of the disturbance, the effects on the controlled quantity
would not be reduced.” In the preceding paragraph, you quite explicitly
point out that the feedback effects in your tracking studies involve
lags of some 130 msec, meaning that they come only from past values of
the disturbance, and then you say that if such were the case, there
would be no control. Self-contradiction does not enhance the
persuasiveness of the argument.

It would be really nice if you were a little consistent in something
other than a determination to prove me wrong even when we are in
agreement.

MT: I wonder if Bill's original memory of the

analysis by Allan Randall and me as being “open loop” was engendered
by memory of our statement that one didn’t actually need to use the
signal “o” to recover the disturbance waveform from the perceptual
signal.

BP: Very likely, since that would be my present view, too. If you would
do me the favor of restoring my comments about system noise to their
intended meaning, as explained above, you can see that there is no way
to work backward from the perception to the disturbance. In effect,
you’re trying to solve an equation that has two variable quantities of
about equal size, the perceptual signal and the reference signal, being
subtracted one from the other, leaving a very small difference,
amplified to drive the output function. If the perceptual noise is
comparable in magnitude to that difference, the output will have a
large random component. Since you have to subtract the output from the
perceptual signal (in your simplified set of equations) to calculate
the size of the disturbance, your estimate of the size of the
disturbance will be very uncertain – and get more uncertain as the
loop gain increases.

All of which applies equally to the degree of control. If “the output
has a large random component” then control won’t be all that good. The
“open loop” element involves only the replication of the output
function in order to produce a signal that is exactly the output
signal. Whatever effects noise has on the computation of d is exactly
the effect noise has on the quality of control. Noise will, of course,
reduce the accuracy of the reconstruction, but it doesn’t prevent any
reconstruction. Where your argument would hold water would be if the
noise were different in the original output pathway and in its
replicate. Also, as I pointed out initially, reconstruction would fail
to the extent that the environmental feedback path was variable, which
includes noisy. And so, no, I do not see that in the presence of noise
there is no way to work backward from the perception to the
disturbance.

I should point out, though, that reconstruction was not the object of
my presentation. It was an extreme illustration of the fact that
information about the disturbance waveform is available in the
perception waveform. If the environmental feedback function is
nonlinear, the disturbance waveform will not be reconstructed
correctly, but the reconstructed waveform will be completely
informationally redundant with the disturbance waveform (and to
forestall a claim that I am trying to introduce some more gobbledegook
out of desperation – though desperation for what, I am unable to guess
– I did make that point in my earlier messages).

I’ve been annoyed at you before, Martin, and got over it. This time
it’s going to be hard.

It might help if you asked yourself what perception is disturbed by
what I write, especially if you remind yourself that I seldom write
anything intended to annoy. Usually, I try to write without “tongue in
cheek”, even when I am responding to something I think fundamentally
silly – as I am now.

Martin