[Martin Taylor 2009.04.06.14.23]
This comment
[From Bill Powers (2009.04.05.1650 MDT)]
In the Schouten experiment, you imagine (I am guessing) that there is
something gradually accumulating information about something, and that
the longer the information is accumulated the more precisely the
something is “known”, whatever that term is supposed to mean. I
can account for the data by imagining something different going on, but
that doesn’t mean I’m right. What it does mean is that the experiment
as
it stands is insufficient to let us choose. Given that we have more
than
one explanation, the next step has to be to modify the experiment to
pick
one over the other.
and this exchange
[Martin Taylor 2009.04.01.14.57]
If you don’t like to think that
“information” really exists, what do you think about “RMS
error”. Does that really exist?
[From Bill Powers (2009.04.01.1106 MDT)]
No, it doesn’t. It’s a computation that gives a number useful
for
measuring the average variability in a signal. There is nothing in the
environment that corresponds to it, just as there is nothing that
corresponds to an “average.”
may possibly have led me to an insight as to why we have had so much
trouble communicating over the concept of information.
I hope my so-called insight is true, because it has always puzzled me
why Bill and others are quite happy to use measures like RMS variation
and correlation, but are so very antagonistic to using measures related
to information and uncertainty. I’ve long wondered why Bill has kept
writing as though the use of an information measure implied that the
measured perception was happening at some high logical level rather
than being simply an ordinary perception. Why should any perception
subject to an information measure have to be a high-level perception? I
have privately mused as to whether Shannon might have insulted Bill in
some distant past episode, leading to this unwavering opposition to any
use of Shannon’s discoveries. But now I think I see that there is a
only simple misunderstanding about the concept and computation of
“information”.
The first comment quoted above puzzled me, because I could not think
what Bill could mean by “something different going on”, other than that
he disliked measuring information. He has never described anything
“different going on” from what I thought of as “going on”, namely a
rise in a perceptual signal out of a noise floor. So this was a puzzle
until I thought more about his strange wording “something gradually
accumulating information” in the first sentence.
That’s when the insight hit me that he might possibly be thinking of
“information” as some sort of a magic fluid that could be stored and
released. That conception would make sense of something he said in an
earlier message about “phlogiston”, which threw me for a loop at the
time, because I could not see any analogy between “information” and
“phlogiston”.
two variables has the same “reality” status as measuring their
correlation or the RMS difference between their values. I asked Bill
the question as to whether he thought “RMS error” really exists,
because if he said “No”, I would then understand why he says
“information” doesn’t really exist, but if he said “Yes” I would then
have had to ask why he thought “information” does not exist. To me, all
three – correlation, RMS error, and information and its companion
measures – are perceptions in the mind of an analyst. Each, in
conjunction with other perceptions, can be used to infer things about
the entities measured, but none of them taken alone tells you more than
that there is a particular kind of relationship between or among the
variables concerned. Like any other perception that is a function of
several variables, the analyst perceives them as having a reality in
the environment.
Now Bill says: "
In the Schouten experiment, you imagine (I am guessing) that there is
something gradually accumulating information about something, and that
the longer the information is accumulated the more precisely the
something is “known”, whatever that term is supposed to mean."
Here is a clue that I should have picked up months or years ago, that
Bill thinks of “information” as some magical fluid that is transported
alongside physical signals. Then he says “I
can account for the data by imagining something different going on,”
which, from previous postings, I take to be the rise of a perceptual
signal above a noise floor. I haven’t commented on this in prior
discussions, other than to point out that Schouten had the same
explanation. I just took it for granted that this is what was
happening. I don’t remember saying so explicitly, though, as I took it
as almost self-evident. I was more concerned to make the point that the
information measure was a measure of how fast that signal was rising
relative to the noise floor. “Relative to the noise floor” is what
makes the informational measure different from a simple measure of the
magnitude of the rising perceptual signal.
Now let’s consider a bit more closely what I think I’m talking about.
First, think of correlation, because that’s a fairly close analogue to
information. You can take a bunch of measures of two variables, xi and
yi. and create a measure called “correlation” = c(xi, yi) or another
called “RMS difference” = r (xi, yi), where c and r are the appropriate
functions over the values of i. You can take those same set of measures
xi and yi and determine various information measures, such as H(x|y) or
H(x,y) (uncertainty of x when you know y, or joint uncertainty of the
distribution of x and y).
So far, there’s no “reality” difference among the measures. They are
all just algorithms applied to sets of variables. They all tell you
what you can know about one variable if you know the other. None of
them say anything about whether one variable has any causal
relationship with the other. Causality between them may be one-way
(either way), two-way, or no-way. You can’t tell simply from measuring
the variables.
If you know from other sources that the two entities can NOT influence
one another, then you can make inferences. If H(x|(knowing y)) is much
less than H(x|(y unknown)), or if the correlation is high between x and
y, or if the RMS difference between x and y is much less than their
individual standard deviations, then if they cannot influence one
another it is almost certain that they are both subject to a common
influence. Despite that they cannot influence each other, nevertheless
if you know the values of one, you know more about the values of the
other than you otherwise would. There is an informational relationship
between them with no signal transmission between them. By observing
both, you may learn something about how the common influence is
varying, though you may not know what the common influence is.
At the other extreme, if you know that X (whose successive values are
xi) causally influences Y (values yi) and that there is nothing that
independently influences both X and Y, then the correlation, RMS
difference (after scaling), or informational relation between X and Y
lets you know whether there are other influences on Y (e.g., noise in
the channel of causal influence, or independent causal influences). The
measures tell you how important those other influences on Y are,
relative to the causal influence of X. The measures don’t tell you what
the other influences might be, but they guide you in considering how
important it is to look for them. Correlation is a relative measure –
a correlation of 0.5 means the other influences on Y outweigh that of X
by 3:1. Uncertainty and RMS variation (which are interconvertible if
the distributions are Gaussian) are absolute measures, which can be
made relative by comparing them to information and scaled RMS deviation.
The details of all this are unimportant. I mention them only to
illustrate the parallels between the measures based on Gaussian
variances and those based on uncertainty measures. In fact, Garner and
McGill (The relation between information and variance analyses.
Psychometrika, 21, 1956, 219ff) showed that everything in an
analysis of variance can be generalized by using information measures,
and that an analysis of information can be more reliable than an
analysis of variance because it does not depend on distributions being
joint Gaussian.
Now I don’t think anyone believes that there is a magic fluid called
“correlation” that flows between an input to a noisy channel such as a
nerve fibre and its output. Why then should anyone think that there is
a magic fluid called “information” that flows along such a connection?
The waveform that emerges from the output is related to the time
pattern of the inputs, and there are many ways of measuring that
relationship, correlation and information being two.
There are also many ways of describing properties of the connection
channel itself. One of them is bandwidth, which tells you how the
channel deals with signals that vary at different rates. Another is
precision, which tells you how variable the output is likely to be for
a given input. Another is channel capacity, which is a kind of amalgam
of the previous two. A communication channel has a higher capacity if
it is more precise or if it can follow quicker variations in the input.
When we talk about perception and the perceived environment, we assume
that variations in the environment have a direct causal influence on
the related perception, rather than that something else independently
influences both perception and the environment. We can be sure, since
this is a physical world, that there is some
limit on how precisely a perceptual signal matches the environment of
which it is a function. Where the limitation matters depends on the
circumstances. In this causal link, channel capacity may be the
important limiting quantity, or the limitation might be in bandwidth,
or it might be in ultimate precision achievable after long observation
of something.The momentary value of a perceptual signal is what it is,
but over time, it may fluctuate even when the input is steady. That
fluctuation would be measured as uncertainty in the perceptual signal
if we could ever track the values of a perceptual signal. The
fluctuations that follow fluctuations in the environmental variables of
which the perceptual signal is a function are the transmitted
information. They don’t mimic some magic flow of information; the
information transmitted is their measure.
I assume the misunderstanding about some kind of magic information
fluid flow comes from the phrasing that we tend to use. We always talk
about information transmission instead of signal transmission, because
it is a more general way of dealing with the relationships that are a
consequence of the signals that pass between entities. That should not
be taken as implying that anything is transmitted other than the
signals. Information, uncertainty, entropy, and the like, are
perceptions in the mind of an analyst, measures that may tell much or
little about the measured entities.
I hope this might help reduce the level of misunderstanding about
information as phlogiston, and help reduce miscommunication in future
discussions.
Martin
···
from my point of view, measuring the informational relationship between