# [spam] Re: correlations and integrators.

[Martin Taylor 2007.01.07.14.35]

[From Bill Powers (2007.01.04.1100 MST)]

A much-delayed reaction to the discussion of correlations.

Delay in the service of consideration is no bad thing. Keeping track of threads with long delays would be much easier if we used a Forum rather than an e-mail list, though.

Martin Taylor 2006.12.22.20.58

All the same, it might not hurt for me to lay out the basis for thinking of a waveform as a vector, since doing so makes thinking about all these transforms so much more intuitive.

I think one of us fell into some kind of trap during this discussion. The problem is in the assertion that "the correlation of a function with the time integral of that function is zero."

Yes, I said several times that this seemed to be the potential Achilles heel of the demonstration. You make the point that it is not always exactly true, and you are correct.

I started a looong response to your post, going into the analysis in detail, since I'm trying to work on an answer to Rick's question about the correlations when the reference signal varies. Also, when I realized the correctness of your statement, I wanted to know just by how much the correlation for a real time-limited signal will deviate from the ideal case discussed on the Web page (perfect integrator, and therefore infinitely long signal). As I noted some time ago in a response to Bruce, the claim is true for the ideal, and as you note, it's not precisely true for a real signal. But it turns out to be pretty darn close if the signal has a reasonably long duration.

Rather than provide the long argument here, I'll put it on the Web page in a better form when I'm happy with it. Here, I'll give a more intuitive demonstration of why the deviation from the ideal is ordinarily very small.

The correlation between X and Y is defined as

SigmaXY
Rxy = -------------
N*SigmaX*SigmaY

SigmaX is defined as SQRT[SUM(X - X')/N], and similarly for SigmaY and SigmaXY. X' and Y' are zero for sine and cosine waves.

The principle is right, but the expression isn't quite; when the means of x and y are both zero, a more readily understood expression might be

Sum(xi*yi)
Rxy = -------------------------
Sqrt(Sum(xi^2)*Sum(yi^2))

Because this is a continuous function, we would have to start with a finite number of N samples and then compute the limit of the function as N goes to infinity.

You don't have to do that. You could use the integral form if you want, rather than the using the sample sums, but if you use the samples, you have to note that there are only 2WT+1 independent samples for a waveform of bandwidth W and duration T

However we compute N or do the SUM, R is going to turn out to be a function of wt, and will be zero only for one specific set of values of wt.

That's not actually true. It is true that R will be a function of wt, but it's a function with many zeros.

It's convenient to treat the Fourier transform of the signal, which precisely defines the entire signal. The Fourier transform of a signal consists of a term for the DC component (the fixed average value, which we here take to be zero), and a set of pairs of components which can be written either as sin and cosine of the same frequency or as cosine and phase angle. The frequencies are those that have an integer number of cycles over the duration T of the signal, in other words, they are the frequencies fi = n/T. The bandwidth W limits fi to be less than W.

A signal that is truncated at an arbitrary point in its waveform has an infinitely sharp discontinuity, which means it has infinite bandwidth. This is unrealistic, but for the purposes of this analysis we will treat it as feasible, and work with the signal as though it had two components: (1) the signal from some zero crossing in the distant past up to the most recent zero-crossing, and (2) the part since the last zero-crossing.

The Fourier representation of part 1 can be band-limited (approximately), and has zero contribution to the correlation between the signal and its integral, since every component has an integer number of cycles, and over each quarter-cycle the contribution of each component to the numerator is zero, as you demonstrated:

For all other values of w or of t, it will be nonzero. Note that

sin(wt)*cos(wt) = sin(2wt)/2

which has an average value of zero and fluctuates at twice the rate implied by w. The denominator is nonzero.

The numerator, I think you mean. It's non-zero except at every half-cycle (or quarter-cycle of the base frequency, w).

This means that we need consider only what happens to the signal after its most recent zero-crossing -- and we can go further, treating the continuation of each Fourier component only after its own most recent zero-crossing.

Whatever the results of this "extra" chunk may be, notice that the magnitude of the denominator of the expression for the correlation continually increases throughout the duration of the signal, whereas the contribution to the numerator is zero up to the most recent zero-crossing of the signal, and fluctuates positively and negatively thereafter. The longer the signal, the smaller the correlation between it and its integral. Even if the signal and its integral correlated 1.0 since the last zero-crossing of the signal, the total correlation could be no greater than Ts/(T+Ts) where Ts is the time since the last zero-crossing and T is the signal duration up to the last zero-crossing.

This upper bound is actually extremely conservative. In the longer analysis (which may, of course, contain errors), I get the result that in the worst case no single sample point can contribute more than 0.83 to the numerator, only one sample in its neighbourhood can contribute so much, and the expectation is zero. But let's use 0.83 as though all samples cound contribute that, even though the analysis says otherwise. Then if the time since the last zero-crossing is 10% of the total signal duration, the highest possible correlation would be 0.083, or an angle of about 85 degrees.

In practice, this maximum is far too conservative. It's the same as saying that a mountain range has the height of its highest peak. Even a tenth of that is probably too high, though I haven't computed any better bound (yet).

Bottom line: Yes, it's true that a real signal and its integral are not correlated precisely zero over a finite time, but the deviation is tiny except possibly for specially crafted signals (a possibility that I haven't investgated).

------------------------------------------------------

All of this becomes moot if we switch from the idea of correlation to a formula like that for Chi-squared, a measure of the deviation of observed from expected values, the deviation of X from E(X).

Here you really are getting yourself into the domain of information theoretic analysis. Naturally, I approve, though I don't think it's relevant to the question at hand.

Why is it not relevant? Because for estimating the actual quality of control, the deviation of observed from expected values is of no interest except as a bound on the attainable control performance. What matters is how close the perceptual signal is to its reference value, not how well its value can be predicted from the reference value. It's the _cross_ relationship that matters, the deviation not from a signal's expected value, but from a value derived from a different signal. It wouldn't be very good control if the reference value performed a sine wave while the perceptual signal predictably stayed exactly zero or performed a cosine wave, would it?

The expected value of the integral of sin(wt) is -cos(wt). Correlation doesn't come into the picture. The effect of random variations shows up as deviations of X from E(X). This does away entirely with treating regular variations like sin(wt) as if they were random variables.

That's a red herring, since nobody has been treating them as though they were random variables, so far as I am aware. Correlation comes into the picture, because only when the perceptual signal is highly correlated with the reference variation and not with disturbance variation is there good control. Correlation isn't the be-all and end-all, though, because the perceptual signal variation has to match the reference variation in magnitude, as well as being correlated with it.

As it turns out, I have been using the chi-squared calculation all along in analyzing tracking experiments. For model-based theories, it is not the raw correlation between different variables that matters, but the correlation between the value of X that is measured and what the theory says should be the measured value of the SAME VARIABLE.

Quite so. Modelling is a different task from the task of determining what the correlation is between observables. When you are modelling, you want to compare signals in the model to signals observed, not one observed signal to another observed signal, or one modelled signal to another modelled signal. The current discussion is not about modelling. It's about the relations among signals within the model or among signals experimentally observed.

When there is anything but a simple linear relationship between two variables, it makes no sense to use correlations.

That depends on what you want to do. Rick, for example, wants to use correlations to determine which visible square a person is controlling. We still assume linear relationships, but correlations do seem to be useful in solving his problem.

What matters is a comparison of the measured and predicted values of each variable in the model.

Yes, if you are modelling. When you are controlling, what matters is how the perceptual signal matches the reference signal, two different signals within the controller.

Martin

[Martin Taylor 2007.01.07.22.41]

[From Bill Powers (2007.01.07.1730 MST)]

I think we need to settle a couple of basic issues before we go much farther with this. According to what I have learned about correlations, they are computed in a way that assumes an average straight-line relationship between X and Y, with superimposed random variations.

Random variation has nothing at all to do with it, nor does the notion of a functional relationship, linear or otherwise, between one variable and another. Sometimes, you use correlations to deal with situations in which there are random variations. Other tiems, you are just interested in the relationship between two sets of numbers (two vectors).

The "degree of relatedness" is a measure of the amount of randomness, not of the slope of the linear relationship.

When you are dealing with a linear relationship on which random variation is superimposed, that's correct. But think what underlies this "linear relationship". What you have is one set of numbers that you lay out on the X axis and another set, each associated with a particular one of the first set, the values of which you plot against the Y axis. Quite possibly, the values of the second set are functionally dependent on the numbers in the first set, with some other influences that cause what we see as "random" (i.e. unaccounted for) variation.

But maybe that's not where the numbers come from, and there's no dependence between them. it doesn't matter. Each set is a vector with the same number of ordered elements. That's all you need in order to compute a correlation. You just take each xi and its corresponding yi and plug them into the formula.

How you interpret the correlation is a separate issue. Quite often correlations are taken to imply things like causation, and that's quite wrong. Sometimes you do indeed consider a random variation about a linear trend.

If Y is any function of X other than Y = aX + b, the standard formula for a correlation (that is, Pearson's r) does not apply.

Oh, yes it does, if you have two vectors for which the elements have a one-to-one relationship. Just subtract the mean of each set of numbers from the values in the set, and compute

Sum(xi*yi)
Rxy = --------------------------
Sqrt((Sum(xi^2)*Sum(yi^2))

What you do with the number when you've got it depends on the problem you are trying to address. In quite another life, I am using it as one of many tools to try to generate a space within which the connotational meaning of words can be represented. There's no hint of linear relationships there!

In the case of Y = integral(X), according to my understanding, it is simply inappropriate to apply the formula for a linear relationship to find a correlation of Y with X.

Where in the discussion to this point has there been a suggestion that there's a linear relationship between a signal and its integral? For sure, the integration operator is a linear operator, but that's a different usage of the word. Put the idea of "linear function" entirely out of your mind, and think only of the relationship between two vectors.

If you want to find a functional relationship between the elements of the two vectors, for some purpose, that's a perfectly valid thing to do. But it's a different thing to do, and it probably won't involve correlation, at least not directly.

The second issue is whether correlations apply to cases in which relationships are exact rather than stochastic.

Yes. They do.

This includes the question of whether information theory applies when there is no uncertainty in a relationship.

Like arithmetic, information theory applies even when the value of a variable is zero, if the circumstances are otherwise appropriate for it to be applied. Of course, if you are analyzing a situation in which there is no uncertainty anywhere, to use information theory would be rather like using arithmetic when to do so would require you to divide zero by zero. You can do it, but it won't get you very far.

In the case of the control system we are both assuming (I think) there is no noise, so the state of every variable is fully determined by the waveforms of the two independent variables, the disturbance and the reference signal. If the waveforms are not analytical, we can still get as close as we like to the exact values of all the dependent variables by using numerical methods for solving the system of equations. We can, for example, compute the exact waveform of the error signal.

Yes, we agree on this. Where I said you were bringing in information-theoretic concepts was when you said you were using the uncertainty of X given Y as your measure.

I guess I should point out, though, that to compute the exact waveform of the error signal you have to know the exact waveform of the reference and disturbance signals. Usually when one applies informational concepts, the basic assumption is that these are not both known exactly. The question often becomes how much you can know about the internal signals based on how much you know about the external ones.

Both information and correlation are related to estimates of uncertainty.

Information, yes. Correlation, sometimes yes, sometimes no. I repeat that if you have two sets of numbers, in neither of which are all the elements of the set the same number, and in which you can identify each number in one set with a corresponding number in the other set, you can compute a correlation. What you do with it is another matter.

What Rick wants to do with it is find out which square the subject is controlling, and when the subject has chosen to stop controlling that square in order to control another. This seems to me to be a perfectly good situation in which to use correlation, and to help Rick is the primary reason why I am at this moment interested in looking into the bounds on the correlations among the signals in a control system. (I am interested in it as a general question, but Rick asked the question near the start of this discussion, and I've been looking into it in my spare time since then.)

When there is no uncertainty in a relationship, we simply analyze it mathematically and that is all there is to do. So I wonder if we're not trying to solve a nonexistent problem here.

Well, do let Rick know that his problem is nonexistent, and I'll stop worrying about it. But it won't change the actual correlational relationships among the signals in a control system -- whether with noise or without.

Martin

[From Bill Powers (2007.01.09.1445 MST)]

Martin Taylor 2007.01.09.15.22 --

You don't need "orthogonality" in the geometric sense. What you need is for the three disturbances to have very low correlations with each other.

That is _precisely_ "orthogonality in the geometric sense!" You just said the same thing in different words.

Orthogonality in the geometric sense means that the three disturbances are plotted in three independent directions, not that their magnitudes are uncorrelated. Consider three disturbances:

d1 = d2 = d3 = f(t)

These disturbances are perfectly correlated (r = 1), whether they operate in the same direction or in three independent directions, x, y and z.

You can also construct three other disturbances which are different functions of time, such that their correlations with each other are close to zero -- even if all three disturbances are directed the same way.

You are confusing covariations with correlations, which are not slopes but measures of randomness. Even knowing that your mathematical sophistication is a lot greater than mine, I truly think you are working under a misapprehension here.

Suppose you have two simple coplanar vectors making an angle of 60 degrees. You once said that the correlation between them was just the cosine of the angle between them. But those two vectors can vary in magnitude in such a way as to be completely uncorrelated (r = 0) or completely correlated (r = 1), even if the cosine of the angle between them remains 0.5. So I don't see how you can say there is any connection at all between the cosine of the angle between the vectors and the correlation between the vectors.

I expect either that I will now become more educated than I was, or hear a big aha from you.

Best,

Bill P.

[From Bill Powers (2007.01.09.1602 MST)]

Martin Taylor 2007.01.09.15.27 --

So just what is this new and improved way of thinking about correlations?

Should we say "more general" rather than "new and improved"?

Correlation has no built-in asymmetry such as is implied by "a simple linear slope-intercept model, y = ax + b." Correlation represents (is the cosine of) the angle between two vectors in a space of dimensionality at least as large as the number of elements in the vectors.

So you still say that correlation is simply the cosine of the angle between two vectors. If that's the case, why do we need a special word for something we can already describe simply as "the cosine of the angle between two vectors"?

And more to the point, since the context of this discussion has been statistics, what language do we then use to describe the "degree of relatedness" of the kind psychologists worry about -- the repeatability of any measure of the relationship between variables? The projection of one vector onto another may be measured by the cosine of the angle between the (coplanar) vectors, but how do we talk about the degree of systematicity in this projection? How do we know that the projection we measure is not simply due to a chance fluctuation?

Pearson's r is designed to show the probability that a measurement of one variable will be paired with some specific measurement of another variable. It applies in the situation where successive measurements of what is supposedly the same relationship differ. The usual way of putting the question being asked is "Is X related to Y?" The methods of statistics allow us to compute the chances that the answer is yes. They do not, however, tell us what that relationship is, other than its being at least as complex as a straight line. A significant correlation tells us only that the relationship is not that between independent random variables -- pure chance.

If successive measures of the projection of one vector onto another differ then we can speak of the (Pearson's) correlation between the magnitude of one vector and the magnitude of its projection, both of course being measurements. Variations in the measurements can take place even with a constant angle between the vectors. So it seems to me that we can talk separately about the cosine of the angle betweem the vectors, and the correlation between different measures of the vectors. But you seem to claim that these two subjects are really the same thing.

At the moment, it seems unlikely that I am going to understand this.

Best,

Bill P.

[Martin Taylor 2007.01.09.17.34]

[From Bill Powers (2007.01.09.1445 MST)]

Martin Taylor 2007.01.09.15.22 --

You don't need "orthogonality" in the geometric sense. What you need is for the three disturbances to have very low correlations with each other.

That is _precisely_ "orthogonality in the geometric sense!" You just said the same thing in different words.

Orthogonality in the geometric sense means that the three disturbances are plotted in three independent directions, not that their magnitudes are uncorrelated. Consider three disturbances:

d1 = d2 = d3 = f(t)

These disturbances are perfectly correlated (r = 1), whether they operate in the same direction or in three independent directions, x, y and z.

Red herring. The (x, y, z) directions are in a different subspace of space-time. Rick was dealing with the time dimension, if I understood him correctly (as I think I did, given his later message). The three disturbance vectors are each a set of numbers representing successive time sample values, as for example:
d1 = {1, 2, 1, -1, 0, 3,....}
d2 = {1, 2, 1, -1, 0, 3,....}
d3 = {1, 2, 1, -1, 0, 3,....}
f(t)= {1, 2, 1, -1, 0, 3,....}

They are the same vector, pointing in the same direction in the space of time samples.

You can also construct three other disturbances which are different functions of time, such that their correlations with each other are close to zero -- even if all three disturbances are directed the same way.

Who cares what direction they are pointing in visible space? Rick wanted disturbances to three squares to be orthogonal to each other -- uncorrelated in their time waveforms. He's now created such a set of orthogonal disturbances.

You are confusing covariations with correlations, which are not slopes but measures of randomness.

No, No, NO NO NOOOOOOO!!!!!!!!!! Correlations often are used in situations where there is variation due to unspecified influences (sometimes called randomness), but They ARE NOT MEASURES OF RANDOMNESS unless you interpret tham that way in situations where to do so makes sense.

Suppose you have two simple coplanar vectors making an angle of 60 degrees. You once said that the correlation between them was just the cosine of the angle between them.

OK, you have two vectors with elements {0, 1} and {sqrt(3)/2, sqrt(3)). The correlation between them is found from the formula. In typing, it's easier to subtract the means first, so the vectors become {-1/2, 1/2}, {-0.25*sqrt(3), 0.25*sqrt(3)}. Using the zero-centred vectors, we have

-0.5*-0.25*sqrt(3) + 0.5*0.25*sqrt(3)
Rxy = --------------------------------------------
2* sqrt((0.5*0.5 + 0.5*0.5)*((0.25*sqrt(3))^2 +(0.25*sqrt(3))^2))

0.25*sqrt(3)
= --------------- = 0.5 if I did the sums and got the formula right.
2*sqrt(0.5*3/2)

But those two vectors can vary in magnitude in such a way as to be completely uncorrelated (r = 0) or completely correlated (r = 1), even if the cosine of the angle between them remains 0.5.

You can't measure the angle in one space and talk about correlation in another!! You started off asking about the correlation between {0,1} and {sqrt(3)/2, sqrt(3)}, and then say that's different from the correlation between {0, 2, 1, -2, -1, 0, 1 ...} and {0. -1, -2, -1, 0, 2, 3, ...}. Of course they are different. What ARE you trying to pull here?

I expect either that I will now become more educated than I was, or hear a big aha from you.

I would love to have an "aha" experience, so that I could communicate with you instead of apparently talking at total cross-purposes. If I could fathom what your vision is that underlies your words, life would be much easier.

I made a guess about why perhaps you thought randomness was essentially connected with correlation. You told me I was wrong, and I haven't figured out any other reason why you should hold so strongly to this misapprehension, and misapprehension it surely is.

When I do get that "aha", maybe we will come to a better understanding.

···

----------------------

Later: [From Bill Powers (2007.01.09.1602 MST)]

So you still say that correlation is simply the cosine of the angle between two vectors. If that's the case, why do we need a special word for something we can already describe simply as "the cosine of the angle between two vectors"?

(a) It's simpler to say; (b) very often, the domain of interest is more readily understood in terms of a scatter of data than as pairs of vectors that actually represent the data. Different ways of looking at the same thing are more convenient for different purposes, and often different langauge is useful in the different domains;(c) a lot of people don't understand vectors in high-dimensional spaces.

And more to the point, since the context of this discussion has been statistics,

It wasn't. It was about the correlations that may be observed among the signals in a control system in the real or in models. Statistics didn't come into it until you interjected them for reasons known only to yourself.

Pearson's r is designed to show the probability that a measurement of one variable will be paired with some specific measurement of another variable. It applies in the situation where successive measurements of what is supposedly the same relationship differ.

Change that to "was designed" and you are repeating the "history" part of what I presumed to be the reason you keep interjecting ranfomness and functional relationships into the concept of correlation. Yes, this is a very common way of using correlation. But it isn't the be-all and end-all of correlation.

Martin