Defining information; challenge

[From Bill Powers (930308.1520)]

Martin Taylor (930308.1200) --

What I like about you, Martin, is the civilized way you respond
when people attack your life's work with bludgeons.

I'm looking forward to seeing your paper on IT and PCT. It is
going to be different from the sort of IT work up with which I
have the most difficulty puting -- that seems clear from your
discussions, e.g.:

Probability is something relating only to the information
available at the point where that information is used. It
relates to individual possible events that might be detected at
that point, not to frequency.

I wonder whether you have considered this point yet:

In your terminology, the perceptual signal in an ECS, an
elementary control system, stands for the state of a CEV -- a
complex environmental variable. As we all seem to agree now, that
CEV is a construct created primarily by the forms of all the
perceptual functions lying between the primary sensory interface
and the place where the perceptual signal finally appears.

When we speak loosely (for coherence and convenience), we say
that the perceptual signal is an analog of some aspect of the
environment. This sets up the picture of the environment and the
aspect of it that exists outside the organism, and a perceptual
signal that represents that aspect. It is natural, then, to say
that the perceptual signal contains information ABOUT that aspect
of the environment.

From there, we can go on to ask how well the perceptual signal

represents that aspect. This is where the difficulty starts. In
order to answer that question, we must have an independent
measure of the aspect of the environment in question, one that
tells us its true state so we can compare that state with the
representation in the form of the magnitude of the perceptual
signal (or any other form, come to think of it).

But this contradicts the previous statements, which say that the
CEV is a construction by the nervous system. From the internal
point of view, the perceptual signal is always a PERFECT
representation of the CEV -- in fact, that signal plus all the
perceptual functions leading to its production defines the CEV.

What, then, if the perceptual signal is noisy? We could say that
this defines a noisy CEV, or we could say that there is a
noiseless CEV out there, and that the perceptual signal is being
derived through a noisy channel -- or any combination of these
effects.

Consider a television set tuned to a very distant station. The
screen shows a picture with a lot of noise dots dancing over its
surface. Inside the person looking at the TV set, presumably, is
a set of signals representing the state of the TV screen. This
set of signals would show an amplitude envelope that would be
varying at a high frequency, like some amount of noise
superimposed on a average picture. These noise signals don't
reflect the state of affairs we would measure with optical
instruments at the face of the TV screen -- for one thing, they
don't show the frame rate, and the highest-frequency noise dots
are smoothed out. So the human perceptions of the TV screen
aren't quite as noisy as the screen itself would look to fast-
responding physical instruments.

What the human being experiences is a noisy set of signals. Some
part of this noise, theoretically, is channel noise in the
perceptual system. So the CEV is defined as being noisy in
exactly the way the perceptual signals are noisy. The implies
that the noisy perceptual signal is a noiseless representation of
the hypothetical CEV.

Of course the person is trying to ignore the noise in the
perception -- meaning that some higher perceptual system must be
constructing something closer to a noiseless CEV, extracting the
average picture, which presumably is closer to what the TV
station is transmitting.

How, then, do we define information in this situation?

···

--------------------------------
Some of our disputes seem to depend on the inherent noisiness or
ambiguity of the information being transmitted; others seem to
come down to the nature of the transmitting channel itself (the
discrete-impulse nature of neural signals).

In the first case, the brain itself could be completely free of
noise and ambiguity, and still have a problem with deciding what
the message is. The brain may have to rely on (noise-free)
estimations of probabilities to resolve such problems, but we're
talking about learned algorithms now, not fundamental problems of
information theory or probability in the brain's own operation.

In the second case, the importance of the channel noise depends
on the signal amplitudes (frequencies) involved in the brain.
Here the problem is dynamic range -- how small an average signal
can be detected in the presence of the irreducible channel noise.
In this context, information theory is simply a version of
statistical analysis cast in terms of logarithms, isn't it? Now
the problem can be handled in many ways. Electronically, we would
handle it in terms of noise power spectra and filters designed to
favor the signal spectrum over the noise spectrum. In analyzing a
system operating in this region, we could use a statistical
analysis or a Bode diagram or probably many other methods -- the
results are equivalent. The statistical approach would call the
unpredictable component of the signal "uncertainty" while an
electronics analysis would call it "noise" -- but the phenomenon
is the same.

In this second case I doubt that we have any important
divergences of understanding.
---------------------------------
RE: the challenge experiment

If you change, say, the contrast of a tracked target (or the
tracking cursor), or the bandwidth of the disturbance, or put a
grid on the visual surface...is the integration factor the only
thing that changes?

Another way to ask this question is to ask about the conditions
under which such changes would make a difference. We can try
these things later, if you wish. In previous posts, the question
of disturbance bandwidth has already been raised, and the answer
is that the best-fit integration factor varies with bandwidth.
This could be treated as a problem with nonlinearity in the
system, or as a problem of information theory. I would take the
approach of trying to define a nonlinear function -- perceptual
or output -- as a way of making the same model work over a range
of disturbance bandwidths. The least nonlinearity possible would
be the addition of a square term in the output function, and that
is what I would try first.

As to varying contrast or background, I suspect that these
factors would not make any difference until some extreme was
reached -- very low contrast, or very strong interfering
background patterns. And there, information theory might surprise
me with some useful predictions. This, however, would bring us
into the realm of behavior near the limits of perception, which
isn't a consideration in most real behaviors.

... the challenge is really for me to show whether information
theory could predict the change of integration factor across
experiments, taking into account the conditions of the
experiment. I don't know how I would do that. The integration
factor is not a construct that I have so far analyzed in my
thinking.

The use of an integration factor is motivated by PCT. You may
find an equivalent motivation in IT, but the way you analyze the
situation for your purposes is independent of PCT. If the
integration factor isn't a natural part of your analysis, there's
no rule that say you have to use one.

In making an analysis for a specific experiment, I would use
other experiments to obtain data--perceptual discrimination
experiments, for instance, or control experiments that use
presumed lower-level control systems common to the specified
experiment.

You have two conditions, with data, to use as you wish -- one
without feedback and one with (conditions 2 and 3, and also 1 if
you can make any use of it). The PCT analysis does not need any
other experiments with perceptual discrimination -- that is, it
uses simple assumptions about perceptual discrimination on which
its performance depends. You can make any reasonable assumptions
you like. You can, for example, assume that given an output
signal, the handle will come instantly to a position proportional
to it. The PCT model assumes that.

I suspect you will find some difficulties at the point where the
disturbance effects and the regulator's output effects converge
to produce a net effect on the essential variable.

I'm glad you're going to do it. Don't feel pressed to hurry: I'll
take your IOU.
----------------------------------------------------------------
Allan Randall (930308) --
RE: challenge

In order for this to be interesting, I think you need to do
one of two things:

1) Explain in a little more detail why you think information
theory predicts that the compensatory system will perform
better.

I'm just guessing and could be wrong. The object is for an
information theorist to show that I am wrong by showing exactly
how IT would be used to arrive at a correct prediction.

My reason for suspecting that IT will produce the wrong result is
Ashby's analysis on p. 224 of Intro to Cybernetics. On reading
this, I realized that the kinds of measures involved in
information theory or "variety" calculations are such that they
can't predict the equilibrium state of a system with a negative
feedback loop in it. As I just said to Martin above, there is a
problem when two information channels come together to produce an
output signal that affects the essential variable. The nature of
these statistical calculations is that they can say nothing about
the coherence of the two converging signal channels. If the
variables are treated only in terms of information content or
probabilities, they will add in quadrature, so the variance of
the signal from T to E will be larger than the variance of either
signal entering T. It is hard, in fact, to see how even the
disturbance-driven regulator could be predicted to reduce the
variance of E.

2) Find an information theorist that actually does make this
claim.

Ashby made it: the disturbance-driven regulator was said to be
potentially perfect, while the error-driven regulator was said to
be inherently imperfect. But I'm simply voicing a strong
suspicion -- that if an information theorist were to analyze this
as a problem in information flow, the wrong answer would be
found. Anybody can prove my suspicion wrong right here, in
public.
----------------------------------------------------------
Best to all,

Bill P.

[Martin Taylor 930309 11:30]
(Bill Powers 930308.1520)

Probability is something relating only to the information
available at the point where that information is used. It
relates to individual possible events that might be detected at
that point, not to frequency.

I wonder whether you have considered this point yet:

In your terminology, the perceptual signal in an ECS, an
elementary control system, stands for the state of a CEV -- a
complex environmental variable. As we all seem to agree now, that
CEV is a construct created primarily by the forms of all the
perceptual functions lying between the primary sensory interface
and the place where the perceptual signal finally appears.

And Bill goes on to say that:

the CEV is a construction by the nervous system. From the internal
point of view, the perceptual signal is always a PERFECT
representation of the CEV -- in fact, that signal plus all the
perceptual functions leading to its production defines the CEV.

This is correct.

Between these two statements, Bill brings in an apparently contradictory
one that also seems correct, so there is an issue:

It is natural, then, to say
that the perceptual signal contains information ABOUT that aspect
of the environment.

From there, we can go on to ask how well the perceptual signal
represents that aspect. This is where the difficulty starts. In
order to answer that question, we must have an independent
measure of the aspect of the environment in question, one that
tells us its true state so we can compare that state with the
representation in the form of the magnitude of the perceptual
signal (or any other form, come to think of it).

The issue is, as usual, one of viewpoint. From the outside view,
there is a complex in the world that seems to be what the "subject"
is controlling. It is the experimenter's view of the putative CEV.
The theorist outsider can also "see" the subject's perceptual signal
that is the actual controlled variable. As far as the subject is
concerned, that signal IS the CEV. It is all that the ECS in question
can know about the state of the world.

There are various kinds of "outsiders" as Bob Clark has pointed out.
One of them is the DME, which views all sorts of signals in the hierarchy.
All outsiders use their own perceptions rather than the one actually
being controlled by the observed ECS. It is from the outsider's viewpoint
that we can see a dichotomy between the CEV in the world and the perceptual
signal. The subject cannot see it.

The outsider, who may be using very precise measuring instruments, can see
that there are discrepancies between state of the putative CEV and the state
of the derived perceptual signal, even if the total perceptual input function
is correctly interpreted. These discrepancies have to do with the resolution
of the perceptual system. The subject may not be able to detect that any
individual discrepancy exists, but may be able to detect the possibility
that discrepancy exists, by virtue of the success of control. (This is
much the same in principle as the way astronomers judge the numbers of
meteor craters on the moon that are smaller than they can see, or the way
ecologists judge the number of species never yet identified).

The perceptual signal, in this way of looking at things, does not define
the CEV. It defines the operations on the sensed world that create the CEV,
but the CEV is a structure in the world, not in the mind. It is a conceptual
structure that mirrors the mind, and it may not be detectable to anyone
else than the mind that created it, but nevertheless, it is in the world
not in the mind. For example, a CEV may be "the distance between my fingertip
and my nose." Forgetting the irregularities of skin and the like, there is
a perceived value for that CEV--the perceptual signal that corresponds to
it. If I hold up my finger, I may perceive that distance as stable (or
nearly so, with a slow drift), but I know from other information that if
I could only see it there is a rapid oscillation in the distance. Someone
with a laser interferometer could probably measure fluctuations that are
not in my perceptual signal. But I would say that they are in the CEV that
the perceptual input function determines. So, the CEV is not defined by
the perceptual signal; it is represented by the perceptual signal. It is
defined by the perceptual input function.

There's a hidden issue here, one that relates to reorganization. There is
no CEV that corresponds to the function that causes the actions of the
subject to control the intrinsic variable. Reorganization controls the
control operations, but it does not work on any perceptual signal in the
usual sense; a perceptual signal based on a function of sensory input
variables. Reorganization works, but it works only because the behaviour
of the world (unperceived) is factually stable over periods longer than
the time it takes to reorganize. That factual stability can be inferred
from the success of the reorganization. It cannot be perceived (I'm
tempted to say "in principle" but I don't know if I could argue that).
An outsider with a perceptual function that operated over a long time scale
(I include memory here) could perceive the stability that permits
reorganization to happen. Likewise, with a normal perceptual signal
and its corresponding CEV, an outsider could perceive discrepancies
between the CEV and the perceptual signal that represents it, even though
the user of the perceptual signal cannot. But as with reorganization,
the user of the perceptual signal might possibly infer that there is a
factual discrepancy.

I realize that the word "factual" in the above paragraph raises its own
issues about Boss Reality and the like. I assume that all such issues
are resolved against the solipsist position.

Martin

[Allan Randall (930310.1220)]

Bill Powers (930308.1520) writes:

Allan Randall (930308) --
>1) Explain in a little more detail why you think information
>theory predicts that the compensatory system will perform
>better.

I'm just guessing and could be wrong. The object is for an
information theorist to show that I am wrong by showing exactly
how IT would be used to arrive at a correct prediction.

Turning away from information theory and PCT for a moment, lets
travel back to the time of Isaac Newton, to an alternate universe,
where Isaac Newton became an apple farmer and it was his brother
Phil who came up with the laws of motion. Unfortunately, Phil
wasn't bright enough to also invent the differential calculus (but
then he could bake a really nice apple pie). Imagine that someone
has come up with a competing theory to Newton's. They've written a
lovely book full of ambiguous, vague language, making it
impossible to even formulate what exactly their theory is, let
alone put it to experimental test. This book is filled with many
"proofs" using Liebniz's differential calculus. Let's say a trend
develops and there is a whole spate of similarly motivated
physical theories, all in direct competition with Newton's theory.
Newton mounts an opposition against this new movement, and rightly
so. Says Newton:

"Calculus has absolutely nothing useful to say about physical
systems. It can't even make a valid prediction about what will
happen in a real physical experiment. It is obviously bogus and
invalid."

This is exactly how the anti-information theory arguments sound
must be able to make correct predictions about real world control
systems in order to be valid. Repeatedly, I have seen criticisms
of information theory in this group that treat it as if it MUST
be some kind of competing theory to PCT.

Information theory is NOT a theory of cognition or of living
systems. Shannon's theory defines information in a way that can
be separated from issues of perception and control in living
systems (it does not have to be, but it can be). Entropy is a
mathematical measure, based on probabilities. Like calculus, it
can be used to better understand many things, such as temperature,
work and heat flow. Ashby applies it to control systems. His
analysis jives with everything I have learned about PCT. Perhaps
I am wrong - I have much to learn. Perhaps, while it is a
valuable tool for some things, it adds no explanatory power to
PCT. Nonetheless, it is still not a competing theory. Of course,
if Martin can show that PCT follows from information theory, then
I will be most happy. But this would be a fundamental new
discovery. PCT is not currently a subset of information theory,
and predictions consistent with PCT should not be expected from
it, any more than differential calculus can be expected to predict
the orbit of Mercury.

...As I just said to Martin above, there is a
problem when two information channels come together to produce an
output signal that affects the essential variable. The nature of
these statistical calculations is that they can say nothing about
the coherence of the two converging signal channels. If the
variables are treated only in terms of information content or
probabilities, they will add in quadrature, so the variance of
the signal from T to E will be larger than the variance of either
signal entering T.

I guess I just don't understand this. I assume by "variance" you
mean the information content? When two information channels
converge, the result of course will be something of higher entropy.
This is the Second Law of Thermodynamics. However, as long as the
process gives off heat, the resulting information channel may or
may not have higher entropy than the sum of the original two.
In other words, the total information in the resulting channel
is greater than or equal to the sum of the information in the
individual channels only if there is no information loss at the
point of convergence. It sounds like you are assuming all
computations are reversible processes. This is not so.

The signal to E has very low entropy if the system is controlling.
This is pretty straightforward, no? Now surely you will agree
that if it were impossible to encode the disturbance D into a
number of bits that could be handled by the output channel,
then the system could not control. Thus, for real-world complex
systems with limited bandwidth output channels, you need the
perceptual functions to compress the inputs into a single scalar
value (for comparison) that STILL RETAINS THE ESSENTIAL
CHARACTERISTICS OF THE DISTURBANCE. If no information about the
disturbance can be extracted from this data, then there is
no way the system can translate the error from this signal into
an action on the world that will counter the disturbance. Is this
or is this not true? If you agree, then you are agreeing with an
information theoretic analysis. Minimal encodings are what
information theory is all about. So I ask you: where above did
my reasoning go astray?

Note that to talk about minimal encodings (i.e. information
content), you need an encoding scheme. This is a basic principle
of information theory (the encoding scheme is what Martin calls
the subjective probability distribution). In HPCT, the encoding
scheme is the structure of the hierarchy. The act of structuring
the world into CEVs can be understood in these terms. Maybe it
can be understood in other ways too, but that does not
invalidate the information theoretic perspective.

Note that if we are dealing with a very simple system, and the
disturbance entropy already matches the output capacity (with
some trivially simple encoding scheme) then the Law of Requisite
Information does indeed become rather redundant. But then so
does the hierarchy.

>2) Find an information theorist that actually does make this
>claim.

Ashby made it: the disturbance-driven regulator was said to be
potentially perfect, while the error-driven regulator was said to
be inherently imperfect.

Well, I guess you know how I'm going to respond, but I'll say it
anyway. Yes, Ashby does make this claim, but this is NOT the claim
that you were attributing to information theorists. You were
suggesting that information theory would predict better control
with condition 2 (compensation) in a real world situation than
with condition 3 (error-control). Ashby's claim is quite different
You have stated it correctly above, and it is quite true. If I can
actually have complete knowledge of the disturbance D, it is
theoretically possible for me to respond appropriately before
it has had an effect on the controlled variable. For instance,
the thermostat in the bath sees someone coming with cold water.
This thermostat also has incredibly powerful sensors that can
detect all the relevant positions/momentums of the particles in
the room necessary to precompute the EXACT location, amount and
timing of hot water it must add to counteract the cold water. Of
course, it also has unlimited (but not infinite) time to compute
the results it needs. This thermostat could actually achieve
perfect regulation of the water temperature within the desired
range! An error control system could never do this. It is
impossible in principle, since by definition the error-control
thermostat cannot act until an imperfection is introduced. The
fact that the compensatory system will NEVER work in the real
world does not change the fact that I can build a simple toy
world in which it does (such as the compensatory systems Ashby
discusses).

Note that in the case of this perfect compensatory thermostat,
information theory still applies. The system has a high bandwidth
for output, unbelievably humungous bandwidth for input, and
equally ridiculous processing power. But it still has the problem
of all control systems in complex environments (unlike Ashby's
toy systems): its input bandwidth is much much bigger than its
output. In this case, the input is compressed by the
algorithms that pre-compute the necessary outputs to correct
for the coming cold water.

As for your challenge, I would only respond if we could nail down
what the challenge really is. If all you want is a quantitative
information-based analysis of the data that is consistent with
a PCT analysis, then this is a perfectly reasonable request.
However, you seemed to be asking for a *prediction* about which
(condition 2 or 3) will be better in a real-world situation.
This seems to be beyond the scope of information theory as it
stands now. Information theory is not a theory of living systems,
and as I mentioned, the Law of Requisite Information applies
equally to the (unrealistic) perfect compensatory system as it
does to the (realistic) error control system. So it will NOT
provide the prediction you are looking for.

···

from my end. Information theory is a tool. There is no reason it

-----------------------------------------
Allan Randall, randall@dciem.dciem.dnd.ca
NTT Systems, Inc.
Toronto, ON