information

[Avery Andrews 930316.1103]

I haven't had time to follow the information argument with the attention
it deserves, but here's a thought anyway: maybe the problem is one
of scales. A good control system presumably reduces error
(reference - perception) to zero on an ecologically relevant scale
of precision. E.g. the difference doesn't matter. But it can't
do this on the physiologically relevant scale, since the error times
the gain must produce enough output to oppose the disturbance.

The ecologically relevant scale is also the one relevant for
control-systems higher-up the hierarchy - as long as the lower
level systems are moving the steering wheel close enough their
reference levels, the driver doesn't have to attend to wheel
manipulation.

  Avery.Andrews@anu.edu.au

[Martin Taylor 930316 10:10]
(Avery Andrews 930316.1103)

On time scales, I think you are clearly getting the idea, just as you
are with the "mechanical" side of PCT. You cut through to where I have
for a long time been trying to guide the argument step by agreed step,
but with no success because every smallest attempt meets with strong
efforts to maintain the state of a controlled perception "that there
is no information about the disturbance in the perceptual signal."
Once you accept that there is, you ask about rates of information,
and you find very quickly that the posting you made on time scales,
precision and ecological relevance is right on target.

You see through problems to their heart, don't you.

Martin

[From RIck Marken (930316.2100)]

Avery Andrews (930317.1404) --

Spending a bit more time on the information controversy, it seems
pretty clear to me that Rick Marken & Bill Powers have run afoul
of the bizarre fact that `information theory' has nothing to do
with content.

So that's why it seemed to have nothing to say.

Now knowing only p(t) or o(t), we
can't recover d(t), but miraculously, knowing both, we can

Vat miracle? I already admitted that, when p(t) = d(t) + o(t) you
can recover d(t) from p(t) if you know o(t) also -- it's called
algebra (not information theory) and algebra is also pretty content
free.

What we have to do is figure
out how the information can in some sense be there, and yet be
uninformative.

Not a problem. If d(t) is in p(t) (which it is) and you don't have
o(t) then the information about d(t) in p(t) is uninformative;
about as informative as having hoeroglyphics and no rosetta stone.
I already conceeded to Martin that if all he means is that d(t) is
part of p(t) then, no problem, I admit it; I assume it. But it's
not much good to know that (from the control system's point of
view) because the control system has no access to o(t) -- which,
by the way, is the variable that is presumably being generated from
the information in p(t) about d(t) -- information that is only
"informative" when o(t) is available (don't forget, this is a CIRCLE).

Alternatively, we need to understand the sense in
which it is there, and the sense in which it is not.

Hallelujah. That's what I've been trying to find out. My intuition is
that the information is not there (even if d(t) is part of p(t)) since
there is no way that the system can get to it.

This requires that people actively *think* and try to make sense
out of what other people are saying, rather than relentlessly defending
their positions.

OK. Let's try this. If the fact that p(t) = d(t) + o(t) means
that there is information in p(t) about d(t) then YES, THERE
IS INFORMATION ABOUT d(t) IN p(t). So it's there but it is of
absolutely no use to the system. So the fact that o(t) = -d(t)
has nothing to do with the fact that there IS this precious
information about d(t) in p(t) -- because the system can't use it.
The relationship of the information in p(t) to the output o(t)
is purely coincidental -- its like me reading the gettysburg address
in hebrew. I can't read any hebrew but the information ("forescore
and seven ...") is there, encoded in those funny square squiggles.
I can "read" it because I have the gettyburg address memorized.
My output o(t) would be seen as matching the input p(t) by a bilingual
(hebrew-english) observer. But my output is not based on the information
in p(t) -- even though it's there, for me, it's not.

So if admitting that there is uninformative information in p(t) will
help settle this then I'll sign up to it; somehow I don't feel like
I've given up much ground, though.

Best

Rick

[Avery Andrews 930317.1830]
(RIck Marken (930316.2100))

>So if admitting that there is uninformative information in p(t) will
>help settle this then I'll sign up to it; somehow I don't feel like
>I've given up much ground, though.

I'm not sure whether you have to give up any ground at all - the issue
as I understand it is whether so-called `information theory' (which
maybe should have been called `channel capacity theory' or something
like that' has some relevance to PCT. This debate can't even start
until the content issue is set aside as the terminological kafuffle
that it is. But the idea that channel capacity limitations have
some relevance to the design and function of nervous systems seems
eminently plausible to me.

Here's an argument something vaguely deserving of the name `information
about d(t)' is present in p(t) and o(t). Suppose we add to our system
two random noise generators, one into p(t), one into o(t), both
downstream from where we are taking our measurements. Switching
either of these generators on will clearly degrade our information
about d(t), and switching both on will degrade it more than switching
one on (to the same level of noisiness). The idea being if we can
damage the info by injecting noise into a channel, it must in some
sense be there. But in what sense is kinda mysterious, isn't it.

Avery.Andrews@anu.edu.au

[Allan Randall (930317.1030 EST)]

Before continuing too far with this discussion, I'd like to lay
out some points that I propose we all agree to first, so as to
minimise the terminological confusion that has been rampant in
this discussion so far.

I think we are all realizing that we must be sure that we agree on a
definition of "disturbance" before we continue this discussion -
otherwise we'll just be talking past each other. So, is everybody
agreed on the following defintion?:

disturbance: the total sum environmental influence on the CEV

This is my understanding of the word and what it means, and I think
it is what is indicated by the usual PCT diagrams. The other possible
definition, is to talk about "perceived disturbance," if I may call
it that. This is the sum total disturbance to the CEV, and thus
includes the output of the control system. It is easy to confuse the
two, since the "disturbance" is seen from an outsider's point of view
and "perceived disturbance" is seen from the control system's point
of view. I think we can agree that both are reasonable uses of the
everyday word "disturbance." But "disturbance" in this discussion
should refer to the *external* environmental influences, completely
separated out from the output of the control system itself. Agreed?
Are we also agreed that this disturbance, while defined in this
external point of view, is nonetheless defined in terms of the
CEV, which is defined according to the internal point of view? This
seems to be the meaning of disturbance as it appears in most of the
PCT diagrams: it inputs into the CEV (defined internally) but excludes
the output of the control system (defined externally).

Shall we also agree that the hypothetical entities out there in the
universe that actually cause the disturbance are to be called the
"disturbing variables"?

The problem with all this, and something that must be addressed, is
this: At what point in time, if any, do we include the effects of
*past* output as part of the *current* environmental disturbance? Once
the control system outputs to the environment, it can become quite
intractable to isolate the environmental influences from the past
output of the control system. Even our hypothetical external observer
would not be likely to make such an absolute separation between
control system output and environmental influences. The more time
that goes by, the less tractable it is to separate the two. If the
environment is chaotic, as our universe is, then the trajectory of
the disturbing variables in their phase space will exponentially
diverge with even the tiniest deviation in the output of the
control system. Because of quantum effects, it will at some point
become impossible, even in principle, for our external observer
to separate the disturbing variables from the past output of the
control system. At this point, at the latest, the information to
do the separation is truly lost and I think we should agree that
the effects of the past output be included in the disturbing
variables. The other extreme would be to include *all* past output
(up until a single iteration ago, or some tiny dt) in the current
disturbing variables. I would find it preferable, however, to
recognize that this separation can be to some degree arbitrary. So
long as we realize that there are external "disturbing variables" that
can, for some arbitrary time window, be considered external to the
organism.

Now we need to agree on a working definition of "information." Can
everyone agree that if, by making use of B, it is possible to describe
A with fewer bits, then B contains information about A? In this context,
the percept P contains information about disturbance D if using P
would allow a more compact description of D (with fewer bits) than
not using P. This is as opposed to the complete reconstruction of
D from P, which should not be required to say that P "has information
about" D. In other words, "having information about" does not mean
having *complete* information.

I think we should also decide to stop using the term "information
content" and "negentropy." These terms tend to be endlessly confusing,
as seen in my discussion with Bill Powers, and they are not necessary.
Instead, we will talk in terms of "amount of information," "number of
bits," "entropy," "information loss and gain," and similar terms.

Are we also agreed that the reference signal can be considered, for
the purposes of this discussion, to be constant?

Are we also agreed that the *output*, if not the percept, contains
information about D (however we end up defining the time window of
the disturbing variables)?

I think these are things we need to agree on. If anyone disagrees
with any of these points, then THAT argument will have to be settled
before the current debate will go anywhere.

···

-----------------------------------------
Allan Randall, randall@dciem.dciem.dnd.ca
NTT Systems, Inc.
Toronto, ON

[Martin Taylor 930317 14:20]
(Avery Andrews 930317.1830)

I said I wasn't going to post any more publicly about information theory
for a while. But it's a bit like a New Year's resolution. Hard to
renounce the fun things in life for too long.

I'm not sure whether you [Rick] have to give up any ground at all.

The concept of "giving up ground" suggests a contest where there is a
winner and a loser. The metaphor seems very wrong. When someone
has an insight, that person wins, and if the insight can be communicated,
other people win as well. If they change their previous views (as I
suppose they must if they have an insight) they still win. They haven't
"given up ground" by changing their opinion unless maintenance of that
opinion was more important than the truth of that opinion.

the issue
as I understand it is whether so-called `information theory' (which
maybe should have been called `channel capacity theory' or something
like that' has some relevance to PCT.

I think the major reason why Information Theory has got a bad name is
that it is confused with channel capacities. Shannon, for sure, developed
it in that context, but the metaphor just makes no coherent sense other
than from the Engineer's viewpoint (Bob Clark). When you take the
position on which bothe relativity theory and PCT are built (that you
can only do something about what you can observe) there are at least three
different information numbers relating to a "channel," and one of them
really does deal with content.

One thing to keep in mind is that information is a differential quantity.
It makes no more sense to talk about the information content of something
(an event, a message) than it does to talk about the "velocity content"
of an object. The conserved quantity is sometimes called "uncertainty"
and is a function of the distribution of subjective probabilities of some
state or event. The absolute uncertainty usually depends on the resolution
of measurement, so it is not a unique number, but if the resolution is
constant, the uncertainty after an event happens can be legitimately
subtracted from the incertainty before the event. The difference is the
information provided by the event about the uncertain situation. It can
be positive or negative.

The three uncertainties that could be of interest in a communication channel
are (1) the transmitter's uncertainty derived from the probability of
sending any of the possible symbols or signal values; (2) the receiver's
uncertainty about what symbols or signal values will be sent; (3) the
receivers uncertainty about some aspect of the world that might be affected
by the symbols or signals sent. (It is easy to add more, such as the
receiver's uncertainty about what will be received, and there are indefinitely
many aspects of the world that could be considered under type 3). None
of these represent the classic "channel capacity" except in limiting
cases in which the prior uncertainty is that all symbols (signal values)
of which the channel is capable are equiprobable at every sampling moment.
(A samplng moment is defined by the electronic or physical bandwidth of
the channel, and limits the rate at which independent information can be
obtained through the channel).

It is a mistake to see the circuits of PCT as "channels" with "capacities"
and to try to derive the behaviour of the circuit from those capacities.
The capacities impose limits, for sure, but they don't necessarily indicate
rates of information acquisition at any point in the system.

When you talk of information "about" something, you are talking of a type 3
number. When you talk of "channel capacity theory" you are talking about
the relation between type 1 and type 2 numbers as see by a third party.
Channel capacity numbers do not touch the idea of semantic content.
"Information about" does.

By the way, these numbers are invented for the purposes of this posting.
You won't find them in any published stuff on information theory that I
know of.

I'll shut up again, until I next break my resolution.

Martin

[Avery Andrews 930321.0509]

I'd agree entirely that the ECS `knows' nothing about the disturbance, but not
with the claim that the perceptual signal contains no information about it.
After all, we might claim that a video cassete contains the information that
Rosemary erased the tape, even though both it and the VCR it gets
played on aren't able to extract this information. If the reference
signal, output system, and connection between output and perception
are constant, we can reconstruct the disturbance from the perceptual
signal, so some info is there, I'd say (but not for the ECS).

Avery Andrews@anu.edu.au