<Martin Taylor 940418 15:30>

Bill Powers (940418.1155 MDT)

The amount of compression achieved by sending the

difference between real and model handles instead of just the real

handle positions is not as dramatic as I had thought it would be.

That may well relate to the compression algorithm or to the model or to

the system noise. All matter, but we can't tell which is more important

directly.

The compression algorithm is related to the kind of stuff the algorithm

designers anticipated--what redundancies they look for. Sometimes, with

waveforms, the most effective compression algorithms are wildly different

from those that work best with text.

One thing that makes a big difference is the ability to reduce the size

of the numbers from two bytes to one. If the compression algorithm works

initially by looking at a sequences of bytes (as a text-compression

algorithm is likely to do), then strings of 001 135 001 135 ... will not

be well compressed even though they are the same number.

Waveform compression algorithms may work quite differently from text

compression algorithms. For one thing, they have to work with numbers.

They could start by reducing the data into principal component form,

or by differentiating it, or both (using principal components on the

values and the derivatives simultaneously). Or they might use the new

fractal decomposition techniques. But they wouldn't use the sequential

statistics of English! With waveforms, the peak sample-to-sample difference

is likely to be much smaller than the peak values of the waveform, and

one might get a two-byte to one-byte compression simply by presenting

a starting value and coding the rest as sucessive differences before

compression.

However, if the compression algorithm isn't the reason for the small

improvement, there are two other possibilities: system noise, or model

failure. If the system noise is truly Gaussian, there's nothing much

any compression algorithm can do. But if there is some improvement in

the model that would reduce the noise level appreciably, it would be

worthwhile wasting quite a few bytes of model description to do so.

So, don't be discouraged by a reduction of only 13% in the amount of data

needed after compression. There are lots of reasons why this might happen

that do not reflect on the value of the underlying concept.

Martin