[From Bill Powers (2007.01.04.1100 MST)]
A much-delayed reaction to the discussion of correlations. If I had more brain cells left, I could assign more of them to problems like this and the results, right or wrong, would probably show up more quickly.
Martin Taylor 2006.12.22.20.58
All the same, it might not hurt for me to lay out the basis for thinking of a waveform as a vector, since doing so makes thinking about all these transforms so much more intuitive.
I think one of us fell into some kind of trap during this discussion. The problem is in the assertion that "the correlation of a function with the time integral of that function is zero."
The correlation between X and Y is defined as
SigmaXY
Rxy = -------------
N*SigmaX*SigmaY
SigmaX is defined as SQRT[SUM(X - X')/N], and similarly for SigmaY and SigmaXY. X' and Y' are zero for sine and cosine waves.
If X = sin(wt) and
Y = cos(wt),
SUM[sin(wt)cos(wt)]
R^2xy = -------------------------------
N* SUM[sin^2(wt)]*SUM[cos^2(wt)]
Because this is a continuous function, we would have to start with a finite number of N samples and then compute the limit of the function as N goes to infinity. It's past my bedtime so I won't try that now.
However we compute N or do the SUM, R is going to turn out to be a function of wt, and will be zero only for one specific set of values of wt. For all other values of w or of t, it will be nonzero. Note that
sin(wt)*cos(wt) = sin(2wt)/2
which has an average value of zero and fluctuates at twice the rate implied by w. The denominator is nonzero.
···
------------------------------------------------------
All of this becomes moot if we switch from the idea of correlation to a formula like that for Chi-squared, a measure of the deviation of observed from expected values, the deviation of X from E(X). The expected value of the integral of sin(wt) is -cos(wt). Correlation doesn't come into the picture. The effect of random variations shows up as deviations of X from E(X). This does away entirely with treating regular variations like sin(wt) as if they were random variables. As it turns out, I have been using the chi-squared calculation all along in analyzing tracking experiments. For model-based theories, it is not the raw correlation between different variables that matters, but the correlation between the value of X that is measured and what the theory says should be the measured value of the SAME VARIABLE. When there is anything but a simple linear relationship between two variables, it makes no sense to use correlations. What matters is a comparison of the measured and predicted values of each variable in the model.
Best,
Bill P.