[Martin Taylor 2007.01.07.14.35]

[From Bill Powers (2007.01.04.1100 MST)]

A much-delayed reaction to the discussion of correlations.

Delay in the service of consideration is no bad thing. Keeping track of threads with long delays would be much easier if we used a Forum rather than an e-mail list, though.

Martin Taylor 2006.12.22.20.58

All the same, it might not hurt for me to lay out the basis for thinking of a waveform as a vector, since doing so makes thinking about all these transforms so much more intuitive.

I think one of us fell into some kind of trap during this discussion. The problem is in the assertion that "the correlation of a function with the time integral of that function is zero."

Yes, I said several times that this seemed to be the potential Achilles heel of the demonstration. You make the point that it is not always exactly true, and you are correct.

I started a looong response to your post, going into the analysis in detail, since I'm trying to work on an answer to Rick's question about the correlations when the reference signal varies. Also, when I realized the correctness of your statement, I wanted to know just by how much the correlation for a real time-limited signal will deviate from the ideal case discussed on the Web page (perfect integrator, and therefore infinitely long signal). As I noted some time ago in a response to Bruce, the claim is true for the ideal, and as you note, it's not precisely true for a real signal. But it turns out to be pretty darn close if the signal has a reasonably long duration.

Rather than provide the long argument here, I'll put it on the Web page in a better form when I'm happy with it. Here, I'll give a more intuitive demonstration of why the deviation from the ideal is ordinarily very small.

The correlation between X and Y is defined as

SigmaXY

Rxy = -------------

N*SigmaX*SigmaYSigmaX is defined as SQRT[SUM(X - X')/N], and similarly for SigmaY and SigmaXY. X' and Y' are zero for sine and cosine waves.

The principle is right, but the expression isn't quite; when the means of x and y are both zero, a more readily understood expression might be

Sum(xi*yi)

Rxy = -------------------------

Sqrt(Sum(xi^2)*Sum(yi^2))

Because this is a continuous function, we would have to start with a finite number of N samples and then compute the limit of the function as N goes to infinity.

You don't have to do that. You could use the integral form if you want, rather than the using the sample sums, but if you use the samples, you have to note that there are only 2WT+1 independent samples for a waveform of bandwidth W and duration T

However we compute N or do the SUM, R is going to turn out to be a function of wt, and will be zero only for one specific set of values of wt.

That's not actually true. It is true that R will be a function of wt, but it's a function with many zeros.

It's convenient to treat the Fourier transform of the signal, which precisely defines the entire signal. The Fourier transform of a signal consists of a term for the DC component (the fixed average value, which we here take to be zero), and a set of pairs of components which can be written either as sin and cosine of the same frequency or as cosine and phase angle. The frequencies are those that have an integer number of cycles over the duration T of the signal, in other words, they are the frequencies fi = n/T. The bandwidth W limits fi to be less than W.

A signal that is truncated at an arbitrary point in its waveform has an infinitely sharp discontinuity, which means it has infinite bandwidth. This is unrealistic, but for the purposes of this analysis we will treat it as feasible, and work with the signal as though it had two components: (1) the signal from some zero crossing in the distant past up to the most recent zero-crossing, and (2) the part since the last zero-crossing.

The Fourier representation of part 1 can be band-limited (approximately), and has zero contribution to the correlation between the signal and its integral, since every component has an integer number of cycles, and over each quarter-cycle the contribution of each component to the numerator is zero, as you demonstrated:

For all other values of w or of t, it will be nonzero. Note that

sin(wt)*cos(wt) = sin(2wt)/2

which has an average value of zero and fluctuates at twice the rate implied by w. The denominator is nonzero.

The numerator, I think you mean. It's non-zero except at every half-cycle (or quarter-cycle of the base frequency, w).

This means that we need consider only what happens to the signal after its most recent zero-crossing -- and we can go further, treating the continuation of each Fourier component only after its own most recent zero-crossing.

Whatever the results of this "extra" chunk may be, notice that the magnitude of the denominator of the expression for the correlation continually increases throughout the duration of the signal, whereas the contribution to the numerator is zero up to the most recent zero-crossing of the signal, and fluctuates positively and negatively thereafter. The longer the signal, the smaller the correlation between it and its integral. Even if the signal and its integral correlated 1.0 since the last zero-crossing of the signal, the total correlation could be no greater than Ts/(T+Ts) where Ts is the time since the last zero-crossing and T is the signal duration up to the last zero-crossing.

This upper bound is actually extremely conservative. In the longer analysis (which may, of course, contain errors), I get the result that in the worst case no single sample point can contribute more than 0.83 to the numerator, only one sample in its neighbourhood can contribute so much, and the expectation is zero. But let's use 0.83 as though all samples cound contribute that, even though the analysis says otherwise. Then if the time since the last zero-crossing is 10% of the total signal duration, the highest possible correlation would be 0.083, or an angle of about 85 degrees.

In practice, this maximum is far too conservative. It's the same as saying that a mountain range has the height of its highest peak. Even a tenth of that is probably too high, though I haven't computed any better bound (yet).

Bottom line: Yes, it's true that a real signal and its integral are not correlated precisely zero over a finite time, but the deviation is tiny except possibly for specially crafted signals (a possibility that I haven't investgated).

------------------------------------------------------

All of this becomes moot if we switch from the idea of correlation to a formula like that for Chi-squared, a measure of the deviation of observed from expected values, the deviation of X from E(X).

Here you really are getting yourself into the domain of information theoretic analysis. Naturally, I approve, though I don't think it's relevant to the question at hand.

Why is it not relevant? Because for estimating the actual quality of control, the deviation of observed from expected values is of no interest except as a bound on the attainable control performance. What matters is how close the perceptual signal is to its reference value, not how well its value can be predicted from the reference value. It's the _cross_ relationship that matters, the deviation not from a signal's expected value, but from a value derived from a different signal. It wouldn't be very good control if the reference value performed a sine wave while the perceptual signal predictably stayed exactly zero or performed a cosine wave, would it?

The expected value of the integral of sin(wt) is -cos(wt). Correlation doesn't come into the picture. The effect of random variations shows up as deviations of X from E(X). This does away entirely with treating regular variations like sin(wt) as if they were random variables.

That's a red herring, since nobody has been treating them as though they were random variables, so far as I am aware. Correlation comes into the picture, because only when the perceptual signal is highly correlated with the reference variation and not with disturbance variation is there good control. Correlation isn't the be-all and end-all, though, because the perceptual signal variation has to match the reference variation in magnitude, as well as being correlated with it.

As it turns out, I have been using the chi-squared calculation all along in analyzing tracking experiments. For model-based theories, it is not the raw correlation between different variables that matters, but the correlation between the value of X that is measured and what the theory says should be the measured value of the SAME VARIABLE.

Quite so. Modelling is a different task from the task of determining what the correlation is between observables. When you are modelling, you want to compare signals in the model to signals observed, not one observed signal to another observed signal, or one modelled signal to another modelled signal. The current discussion is not about modelling. It's about the relations among signals within the model or among signals experimentally observed.

When there is anything but a simple linear relationship between two variables, it makes no sense to use correlations.

That depends on what you want to do. Rick, for example, wants to use correlations to determine which visible square a person is controlling. We still assume linear relationships, but correlations do seem to be useful in solving his problem.

What matters is a comparison of the measured and predicted values of each variable in the model.

Yes, if you are modelling. When you are controlling, what matters is how the perceptual signal matches the reference signal, two different signals within the controller.

Martin