Dear Bill,
With great interest I am trying to catch up on these interesting
posts.
Can you please elaborate a bit more on what you say about
cybernetics and Ashby in the following part:
Determining channel capacity, as you say, is a pretty simple
proposition – no metaphysics needed. But information theory introduces
metaphysics, and that is where IT and I part company. The concept that information
is a reduction in uncertainty comes from confusing an equation used to describe
a phenomenon with the phenomenon itself. I saw this happen in cybernetics, with
Ashby’s “Law of Requisite Variety.” The whole concept of uncertainty
in physics or in casinos is metaphysics. The fact that we sometimes say we are
“uncertain” about something has no meaning outside our private
experiences. It doesn’t mean that there is something in nature called
uncertainty and we are sensing it. And reducing uncertainty can be accomplished
by many means, including getting a good night’s sleep or regaining one’s
self-confidence (justifiably or not).
Thanks,
Arthur Dijkstra
···
Van: Control Systems
Group Network (CSGnet) [mailto:CSGNET@LISTSERV.ILLINOIS.EDU] Namens Bill
Powers
Verzonden: donderdag 16 april 2009 18:39
Aan: CSGNET@LISTSERV.ILLINOIS.EDU
Onderwerp: Re: The reality of “information”
[From Bill Powers
(2009.04.16.0819 MDT)]
Rick Marken (2009.04.15.2200)]
Martin Taylor:
The exact
same environmental situation can be perceived and controlled in a
literally
infinite number of different ways.
RM:
That seems to rule out the idea that perception is a process of
communicating to the mind what is actually out there in the
environment. If the same environmental situation can be perceived in
an infinite number of ways, the there is no information to be
transmitted about it. Information theory assumes that there is a
message to be transmitted and received. The message might be a binary
sequence like 1011010. There are 7 bits of information to be
transmitted in this message. If the received message is 1011010 then
we can say that 7 bits of information have been transmitted. If there
is noise in the transmission channel then the message received might
be 10x101x; only 5 bits were transmitted successfully. In this
situation, measuring the amount of information carried by a
transmission channel makes sense and it might even have practical
value; it can tell us how many times a message should be repeated over
a channel so that we can be sure it was received successfully.
BP:
I think you’re getting close to something here. Electrical engineers, or most
people (like me) when they’re being engineers, are naive realists. We assume
that the soldering iron is really there, that the circuit components are really
what they appear to be, and so on. And the communications engineer assumes that
the dots and dashes the telegrapher is sending are really in the sequence that
appears to be happening. Shannon’s job at Bell Labs was to figure out how
faithfully, and how fast, that sequence could be transmitted via some
particular channel to its destination. Fidelity is determined by comparing the
message that was sent against the message that was received. To define
information transfer, or determine channel capacity, you have to know both. If
you receive a message that says “Mary had a libble limb”, for all you
know that is exactly the message that was transmitted, and the channel capacity
was not exceeded. But if the original message was “Mary hab a labble
lamb,” the message was not transmitted faithfully, regardless of what you
expected the original to be. To know what the channel capacity is you have to
have a way of knowing what is really Out There – what message was really sent.
Determining channel capacity, as you say, is a pretty simple proposition – no
metaphysics needed. But information theory introduces metaphysics, and that is
where IT and I part company. The concept that information is a reduction in
uncertainty comes from confusing an equation used to describe a phenomenon with
the phenomenon itself. I saw this happen in cybernetics, with Ashby’s “Law
of Requisite Variety.” The whole concept of uncertainty in physics or in
casinos is metaphysics. The fact that we sometimes say we are
“uncertain” about something has no meaning outside our private
experiences. It doesn’t mean that there is something in nature called
uncertainty and we are sensing it. And reducing uncertainty can be accomplished
by many means, including getting a good night’s sleep or regaining one’s
self-confidence (justifiably or not).
Here is a quote from a Wiki article:
http://en.wikipedia.org/wiki/Information_entropy
Shannon’s entropy represents an absolute limit on the best possible lossless
compression of any communication, under certain constraints: treating messages
to be encoded as a sequence of independent and identically-distributed random
variables, Shannon’s source coding theorem shows that, in the limit, the
average length of the shortest possible representation to encode the messages
in a given alphabet is their entropy divided by the logarithm of the number of
symbols in the target alphabet.
A fair coin has an entropy of one bit. However, if the coin is not fair, then
the uncertainty is lower (if asked to bet on the next outcome, we would bet
preferentially on the most frequent result), and thus the Shannon entropy is
lower. Mathematically, a coin flip is an example of a Bernoulli trial, and its
entropy is given by the binary entropy function. A long string of repeating
characters has an entropy rate of 0, since every character is predictable. The
entropy rate of English text is between 1.0 and 1.5 bits per letter,[1] or as
low as 0.6 to 1.3 bits per letter, according to estimates by Shannon based on
human experiments.[2] . ==============================================================================
BP:
My immediate reaction to the first sentence is to start looking for exceptions
to this wild generalization. What do you mean, the “best possible lossless
compression of any communication?” Who says you have exhausted all the
possibilities ever known or that will ever be known? You can do this only by
defining some small universe with only a few possibilities so you can be sure
nothing has been left out – and this is exactly what information theory does.
That is why Shannon has to say “in a given alphabet”. As soon as he
said that, I knew two things: (1) information theory is not about the real
world, and (2) neither Shannon nor anyone else had any idea of the size of the
alphabet needed to encode all possible messages.
Channel capacity is a physical property of the transmission channel itself –
it does not change when you change alphabets. For example what is the message
you get if you call someone on the telephone and there is no answer? It doesn’t
matter what alphabet you expect the answer to be written or spoken in: no
answer gives you the information that nobody is answering that telephone. You
don’t know why, but there are endless possibilities, including a mass murder, a
fire, or a fickle friend. Considering all the things that might account for the
lack of an answer, it is clearly impossible to find any finite alphabet in
which every answer could be encoded. So information in the sense of knowledge
about the world is not the same thing as Shannon information. Channel capacity
does not tell you how much information the world has to give us, or how fast it
is generating that information.
An interesting thing happened on the way to the internet. Here’s another
reference, clearly somewhat dated:
[http://www.skepticfiles.org/cowtext/comput~1/9600info.htm
](http://www.skepticfiles.org/cowtext/comput~1/9600info.htm)And some quotes from it:
The roughly 3000-Hz available in the telephone bandwidth poses few
problems
for 300 bps modems, which only use about one fifth of the
bandwidth. A full
duplex 1200 bps modem requires about half the available bandwidth,
transmitting simultaneously in both directions at 600 baud and using
phase
modulation to signal two data bits per baud. “Baud rate”
is actually a
measure of signals per second. Because each signal can represent
more than
one bit, the baud rate and bps rate of a modem are not necessarilly the
same.
In the case of 1200 bps modems, their baud rate is actually 600 (signals
per
second) and each signal represents two data bits. By multiplying
signals per
second with the number of bits represented by each signal one determines
the
bps rate: 600 signals per second X 2 bits per signal = 1200 bps.
In moving up to 2400 bps, modem designers decided not to use more
bandwidth,
but to increase speed through a new signalling scheme known as
quadrature
amplitude modulation (QAM).
In QAM, each signal represents four data bits. Both 1200 bps and
2400 bps
modems use the same 600 baud rate, but each 1200 bps signal carries two
data
bits, while each 2400 bps signal carries four data bits:
600 signals per second X 4 bits per signal = 2400 bps.
- - - - - - -
-
*ECHO-CANCELLATION*
*This method solves the problem of overlapping transmit and receive*
*channels.*
*Each modem's receiver must try to filter out the echo of its own*
*transmitter*
*and concentrate on the other modem's transmit signal. This presents*
*a*
*tremendous computational problem that significantly increases the*
*complexity*
*-- and cost -- of the modem. But it offers what other schemes*
*don't:*
*simultaneous two-way transmission of data at 9600 bps.*
*The CCITT "V.32" recommendation for 9600 bps modems includes*
*echo-*
*cancellation. The transmit and receive bands overlap almost*
*completely, each*
*occupying 90 percent of the available bandwidth. Measured by*
*computations per*
*second and bits of resolution, a V.32 modem is roughly 64 times more*
*complex*
*than a 2400 bps modem. This translates directly into added*
*development and*
*production costs which means that it will be some time before V.32 modems*
*can*
*compete in the high- volume modem market.*
=================================================================
BP:
… and now we have dial-up modems that run at 56000 bits per second by
compressing the message before transmission and decompressing the received
message. Net-Zero and Juno, I read, can compress text (in the server) to 4% of
its original size and achieve another factor of 25.
A last quote from http://en.wikipedia.org/wiki/Modem:
================================================================================
List of dialup speeds
Note that the values given are
maximum values, and actual values may be slower under certain conditions (for
example, noisy phone lines).[4] For a complete list see the companion article
List of device bandwidths.
Connection
Bitrate
Modem 110
baud
0.1 kbit/s
Modem 300 (300 baud) (Bell 103 or
V.21)
0.3 kbit/s
Modem 1200 (600 baud) (Bell 212A or
V.22)
1.2 kbit/s
Modem 2400 (600 baud) (V.22bis)
2.4 kbit/s
Modem 2400 (1200 baud)
(V.26bis)
2.4 kbit/s
Modem 4800 (1600 baud)
(V.27ter)
4.8 kbit/s
Modem 9600 (2400 baud)
(V.32)
9.6 kbit/s
Modem 14.4 (2400 baud)
(V.32bis)
14.4 kbit/s
Modem 28.8 (3200 baud)
(V.34)
28.8 kbit/s
Modem 33.6 (3429 baud)
(V.34)
33.6 kbit/s
Modem 56k (8000/3429 baud) (V.90)
56.0/33.6
kbit/s
Modem 56k (8000/8000 baud)
(V.92)
56.0/48.0 kbit/s
Bonding Modem (two 56k modems)) (V.92)
112.0/96.0
kbit/s [5]
Hardware compression (variable)
(V.90/V.42bis) 56.0-220.0 kbit/s
Hardware compression (variable) (V.92/V.44)
56.0-320.0 kbit/s
Server-side web compression (variable) (Netscape ISP) 100.0-1000.0 kbit/s
BP:
Entropy is not easy to define. A good discussion is in
http://www4.ncsu.edu/unity/lockers/users/f/felder/public/kenny/papers/entropy.html
Here is a quote:
If I were able to measure the complete, microscopic state of the air
molecules then I would know all the information there is to know about the
macroscopic state. For example, if I knew the position of every molecule in the
room I could calculate the average density in any macroscopic region. The
reverse is not true, however. If I know the average density of the air in each
cubic centimeter that tells me only how many molecules are in each of these
regions, but it tells me nothing about where exactly the individual molecules
within each such region are. Thus for any particular macrostate there are many
possible corresponding microstates. Roughly speaking, entropy is defined for
any particular macrostate as the number of corresponding microstates.
To recap: The microstate of a system consists of a complete description of the
state of every constituent of the system. In the case of the air this means the
position and velocity of all the molecules. (Going further to the level of
atoms or particles wouldn’t change the arguments here in any important way.)
The macrostate of a system consists of a description of a few, macroscopically
measurable quantities such as average density and temperature. For any
macrostate of the system there are in general many different possible
microstates. Roughly speaking, the entropy of a system in a particular
macrostate is defined to be the number of possible microstates that the system
might be in. (In the appendix I’ll discuss how to make this definition more explicit.)==================================================================================
BP:
Since the number of possible microstates of the outside world is rather large,
about all that we can conclude is that the entropy of any macrostate is
infinite.
RM:
This is not the way I think perception works. Perception is
not a
channel that brings a message about the “true” state of the
environment into the brain. The “true” state of the environment could
be represented as a binary “message” like 1011010. But I don’t think
of this as a real message; it is just the state of a set of physical
variables. If what is perceived is, say, some linear combination of a
subset of the elements of this “message”, then it makes no sense (it
seems to me) to ask how much information about the state of the
environment is communicated by the perceptual signal. It’s just
doesn’t seem like a relevant question.
BP:
This is an important observation. If the taste of lemonade consists of
temperature, tartness, sweetness, and other sensations, the perception of
lemonade is not about some corresponding entity in the outside world. There is
no “message” about lemonade coming into the brain. Instead, the
perception is to the individual components as density is to the positions of
individual molecules. The world consists of microstates; perceptions are the
macrostates. One level of perception consists of the microstates of the next
level up, which relatively speaking consists of the macrostates. Obviously we
can’t go from the macrostates to the microstates, although by considering many
different macrostates derived from the same microstates outside, we can begin
to build a fuzzy picture of the microstates. And the control process, plus
reorgnaization, allows us to manipulate microstates in such a way as to give us
control of the macrostates without even knowing what the microstates are.
Well, that is metaphysics, too: it’s one level talking about other levels. I
think the most important point you make here is that we can’t consider
perceptual signals as “messages” passed to higher levels. The higher
levels take whatever inputs they want from the existing lower levels; they
create a new set of perceptions from some set of lower-perceptions, with the
“taking” being done by the recieving entity, not by some transmitting
entity. The lower levels do not decide what they want to say to the higher
levels. Yet the higher levels can tell the lower ones what they want to receive
from them.
Best,
Bill P.