Data compression and information theory in PCT

[From Bill Powers (940418.0800 MDT)]

Martin Taylor (9404xx and previous) --

I was thinking about how to transmit the great store of experimental
data from the sleep experiment from you to me. Using PKZIP will
compress the data, the amount of compression depending on the degree
of difficulty (because that effects the number of consecutive
repeats of the same binary number). Then it occurred to me that the
degree of compression must reflect the amount of Information in the
compressed file: redundancy is reduced by the compression, so the
compressed data would be a better indication of real Information
than the uncompressed data.

Then the idea popped up that the method of compression itself can be
used to further reduce redundancy. One part of the compression
method is to create a new kind of code in which the most frequent
bytes are represented by the shortest codes; this means that if the
amplitude of transmitted data could be reduced, there would be fewer
different bytes in the data and shorter codes would be used
throughout. Of course if the data were simply scaled down, this
would lose information. But there is a way to scale the data down
WITHOUT losing information, by using the behavioral model.

Given the name of the disturbance table (referring to the tables
that both you and I have), the value of delay, and the value of the
integration factor k, plus the algorithm for the model which we both
have, we can transmit the behavior of the model back and forth just
by naming the disturbance table and transmitting d and k. This will
transmit an exact reproduction of the model's behavior by sending
only 13 bytes (1 for delay, 4 for floating-point value of k, and 8
for the disturbance table name).

One of the plots shown during analysis of the data is the difference
between the real handle position and the model's handle position.
For good tracking, this has an amplitude that is ten percent or less
of the amplitude of the handle and disturbance excursions. This
difference is also exact, since it results from subtracting the
model's handle position from the real handle position, both being
integer tables.

If a file is created from (realh - modelh), it will have an
amplitude far less than that in the record of the real handle
positions. It will compress to a much smaller file. Transmitting
this file, plus the 16 bytes required to transmit d, k, and the
disturbance table name, will enable exact reconstruction of the
behavior of the real handle position. All that is required is to
plug in the delay and integration factor, plus the correct
disturbance table, and run the model to get the original model
handle behavior. Adding (realh - modelh) to this will reproduce the
real handle data exactly. That, plus the disturbance table, allows
complete and exact reproduction of all variables in the experimental
run.

I temporarily don't have enough space on my disk for the disturbance
tables, so I haven't tried this yet. But I thought I'd relay this
idea because it bears on the question of how to determine
Information content in experimental data. Clearly, the best-fit
model contains a _great deal_ of the total Information about the
subject's behavior. In fact, most of the _new_ Information about the
whole experimental run is contained in the record of real-handle
minus model-handle. All the rest of the new information is contained
in the 13 bytes required to reconstruct the model-handle behavior
using the model algorithm (which contains still more information).
And it is actually much less than 13 bytes-worth, because the range
of the disturbance is only about 15, the range of k is only about
100 (in terms of the least meaningful difference), and there are
only 168 disturbance tables. That data actually requires only about
3 bytes to transmit in the most compact way.

If the model were perfect (fit the real data exactly), then the N
bits that describe the model's parameters would perfectly predict
the behavior of the real person, given only the disturbance table.
Far more than N bits is needed to describe the algorithm, so most of
the Information about the behavior (other than that in the
disturbance and in noise) is contained in the algorithm -- in the
presumed physical structure of the controlling system.

In PCT we say that the disturbance, given a constant reference
signal, determines almost exactly the action of a good control
system. It stands to reason that the information in the disturbance
is reflected almost perfectly as information in the action waveform.
The structure of the control system acts as a transformer that makes
the output mirror the disturbance relative to the reference level of
the controlled variable, and so creates Information in the output
that is quantitatively almost equal to the Information in the
disturbance. There is no need for this information to exist between
the disturbance and the output: I believe you have said that there
is no law of conservation of information. Whatever information about
the disturbance there is inside the control system may well be in an
inaccessible form that reappears explicitly only at the output, or
in the effect of output on the controlled variable.

So in summary I'm suggesting a way of assessing information in a
control system that is different from simply starting with variables
and signals and trying to trace their information content through
the control system. This is a model-based approach, in which a model
is constructed that predicts almost all of the behavior that is
seen. Only the difference between the model and the actual behavior
contains new information about the control system.

And of course, this idea may also end up giving us a way of
transmitting large amounts of behavioral data in an extremely
compressed form.

I'm sending this publicly because it may have interest to others
outside our own applications.

Best,

Bill P.

<Martin Taylor 940418 14:50>

Bill Powers (940418.0800 MDT) and previous.

Bill,

I find it hard to reconcile your referenced posting (about information and
data compression) with <Bill Powers (940415.0705 MDT)>

I keep telling you that I'm not as smart as you think.

Your compression posting is spot-on. So far as I can see on first
perusal, there's nothing I would change.

A couple of observations by way of extension, not criticism:

So in summary I'm suggesting a way of assessing information in a
control system that is different from simply starting with variables
and signals and trying to trace their information content through
the control system. This is a model-based approach, in which a model
is constructed that predicts almost all of the behavior that is
seen. Only the difference between the model and the actual behavior
contains new information about the control system.

For this approach to work, both parties must have the same model. The
"information" in the message for the recipient is the description of how
those data relate to what is already known--the structure and parameters
of the model, and the prespecified disturbance table of values. Without
those, the message would be meaningless, just "data" in some people's
language. With them, the difference between model and actual behaviour
(the "message") becomes "information."

When that data set is compressed with a known algorithm for decompression,
the algorithm has also to be specified. Then the "information" in the
message is dependent on the recipient also being known to have the
decompression algorithm. (Incidentally, I picked up, finally, pcopy93.zip,
not pcopy92c.zip, from wuarchive. I hope the two are compatible).

Last summer, we, including Dag, had a discussion about "description" and
"modelling." My argument was that they were qualitatively the same and
only quantitatively different. It takes a finite amount of information
to specify one model (how much, depends on what is known beforehand by
the recipient of the specification). However, if you have the model, it
helps you to describe much behaviour, and it takes only a little information
to describe the deviation of actual behaviour from the model (if the model
is good). So you can recoup the information it takes to specify the model,
very quickly.

Only the difference between the model and the actual behavior
contains new information about the control system.

I'd add the word "behaviour" after both "model" and "control system." To
gain new information about the control system as such is to modify the model
itself. To do so does indeed use the information in that difference, so
the quoted statement is literally true. Its implication might not be true,
inasmuch as to modify the model using that new information, one requires
external (e.g. general science) knowledge as well.

In summary, Bill's posting is of both practical and theoretical value, and
should be noted and understood.

Bill, just HOW do you know how smart I think you are?

Martin