# correlations and integrators.

[From Bill Powers (2007.01.04.1100 MST)]

A much-delayed reaction to the discussion of correlations. If I had more brain cells left, I could assign more of them to problems like this and the results, right or wrong, would probably show up more quickly.

Martin Taylor 2006.12.22.20.58

All the same, it might not hurt for me to lay out the basis for thinking of a waveform as a vector, since doing so makes thinking about all these transforms so much more intuitive.

I think one of us fell into some kind of trap during this discussion. The problem is in the assertion that "the correlation of a function with the time integral of that function is zero."

The correlation between X and Y is defined as

SigmaXY
Rxy = -------------
N*SigmaX*SigmaY

SigmaX is defined as SQRT[SUM(X - X')/N], and similarly for SigmaY and SigmaXY. X' and Y' are zero for sine and cosine waves.

If X = sin(wt) and
Y = cos(wt),

SUM[sin(wt)cos(wt)]
R^2xy = -------------------------------
N* SUM[sin^2(wt)]*SUM[cos^2(wt)]

Because this is a continuous function, we would have to start with a finite number of N samples and then compute the limit of the function as N goes to infinity. It's past my bedtime so I won't try that now.

However we compute N or do the SUM, R is going to turn out to be a function of wt, and will be zero only for one specific set of values of wt. For all other values of w or of t, it will be nonzero. Note that

sin(wt)*cos(wt) = sin(2wt)/2

which has an average value of zero and fluctuates at twice the rate implied by w. The denominator is nonzero.

···

------------------------------------------------------

All of this becomes moot if we switch from the idea of correlation to a formula like that for Chi-squared, a measure of the deviation of observed from expected values, the deviation of X from E(X). The expected value of the integral of sin(wt) is -cos(wt). Correlation doesn't come into the picture. The effect of random variations shows up as deviations of X from E(X). This does away entirely with treating regular variations like sin(wt) as if they were random variables. As it turns out, I have been using the chi-squared calculation all along in analyzing tracking experiments. For model-based theories, it is not the raw correlation between different variables that matters, but the correlation between the value of X that is measured and what the theory says should be the measured value of the SAME VARIABLE. When there is anything but a simple linear relationship between two variables, it makes no sense to use correlations. What matters is a comparison of the measured and predicted values of each variable in the model.

Best,

Bill P.

[From Bill Powers (2007.01.07.1730 MST)]

Martin Taylor 2007.01.07.14.35 --

Yes, I said several times that this seemed to be the potential Achilles heel of the demonstration. You make the point that it is not always exactly true, and you are correct.
...
I wanted to know just by how much the correlation for a real time-limited signal will deviate from the ideal case discussed on the Web page (perfect integrator, and therefore infinitely long signal). As I noted some time ago in a response to Bruce, the claim is true for the ideal, and as you note, it's not precisely true for a real signal. But it turns out to be pretty darn close if the signal has a reasonably long duration.

I think we need to settle a couple of basic issues before we go much farther with this. According to what I have learned about correlations, they are computed in a way that assumes an average straight-line relationship between X and Y, with superimposed random variations. The "degree of relatedness" is a measure of the amount of randomness, not of the slope of the linear relationship. If Y is any function of X other than Y = aX + b, the standard formula for a correlation (that is, Pearson's r) does not apply.

In the case of Y = integral(X), according to my understanding, it is simply inappropriate to apply the formula for a linear relationship to find a correlation of Y with X.

The second issue is whether correlations apply to cases in which relationships are exact rather than stochastic. This includes the question of whether information theory applies when there is no uncertainty in a relationship. In the case of the control system we are both assuming (I think) there is no noise, so the state of every variable is fully determined by the waveforms of the two independent variables, the disturbance and the reference signal. If the waveforms are not analytical, we can still get as close as we like to the exact values of all the dependent variables by using numerical methods for solving the system of equations. We can, for example, compute the exact waveform of the error signal.

Both information and correlation are related to estimates of uncertainty. When there is no uncertainty in a relationship, we simply analyze it mathematically and that is all there is to do. So I wonder if we're not trying to solve a nonexistent problem here.

Best,

Bill P.

[From Richard Kennaway (20070108.0740 GMT)]

[Martin Taylor 2007.01.07.22.41]
But maybe that's not where the numbers come from, and there's no dependence between them. it doesn't matter. Each set is a vector with the same number of ordered elements. That's all you need in order to compute a correlation. You just take each xi and its corresponding yi and plug them into the formula.

You can still crank the statistical handle, but what does it mean to do so? Pearson's r, by definition, measures how well the data are fitted by the best-fitting straight line (best by the least squares criterion). What can you learn by computing it when X and Y are not related by a linear equation plus noise?

How you interpret the correlation is a separate issue. Quite often correlations are taken to imply things like causation, and that's quite wrong. Sometimes you do indeed consider a random variation about a linear trend.

If Y is any function of X other than Y = aX + b, the standard formula for a correlation (that is, Pearson's r) does not apply.

Oh, yes it does, if you have two vectors for which the elements have a one-to-one relationship. Just subtract the mean of each set of numbers from the values in the set, and compute

Sum(xi*yi)
Rxy = --------------------------
Sqrt((Sum(xi^2)*Sum(yi^2))

What you do with the number when you've got it depends on the problem you are trying to address.

What problem is Pearson's r a solution to, for data that do not come from a linear process plus noise?

BTW, exp(ax) is its own integral, up to a constant factor, therefore correlates 1 with itself, so where does this idea come from that a function has a zero correlation with its integral?

···

--
Richard Kennaway

[From Richard Kennaway (20070108.0740 GMT)]

[Martin Taylor 2007.01.07.22.41]
But maybe that's not where the numbers come from, and there's no dependence between them. it doesn't matter. Each set is a vector with the same number of ordered elements. That's all you need in order to compute a correlation. You just take each xi and its corresponding yi and plug them into the formula.

You can still crank the statistical handle, but what does it mean to do so? Pearson's r, by definition, measures how well the data are fitted by the best-fitting straight line (best by the least squares criterion).

By definition?!? By historical accident, I think.

What can you learn by computing it when X and Y are not related by a linear equation plus noise?

Depends on your domain of interest. When you are dealing with, for example, commonalities of context (as I am, in attempting to develope "transforms" of meaning), neither linear equations nor noise have any relevance. Nor do they in the context of the current discussion.

What problem is Pearson's r a solution to, for data that do not come from a linear process plus noise?

Rick's problem, for one.

BTW, exp(ax) is its own integral, up to a constant factor, therefore correlates 1 with itself, so where does this idea come from that a function has a zero correlation with its integral?

As I pointed out in the immediately previous message, zero correlation between a signal and its integral applies (if my analysis is correct) between the zero-crossings of the signal. Where are the zero-crossings of exp(ax)?

We are dealing with signals of mean value zero. exp(ax) rather dramatically fails that criterion.

Martin

[From Bill Powers (2007.01.08.1608 MST)]

Martin Taylor 2007.01.07.22.41 --

Random variation has nothing at all to do with it, nor does the notion of a functional relationship, linear or otherwise, between one variable and another. Sometimes, you use correlations to deal with situations in which there are random variations. Other tiems, you are just interested in the relationship between two sets of numbers (two vectors).

But a correlation doesn't tell you anything about any relationship but a straight line, does it? It's the regression line (or whatever curve you decide to fit to the data) that says something about the (assumed) relationship.

The formula for the regression line is

Y = r(xy)*sigma(y)/sigma(x)(X - X')+ Y'

This reduces to

T = sigma(XY)^2/sigma(x)^2(X - X') + Y'

This is the least-squares line, The slope is a function of the correlation coefficient and two sigmas. But how would the correlation and the sigmas fit into any other assumed form of the relationship?

I speak from ignorance, so don't pay too much attention to me. I don't really care what the right answer is here, but whatever it is, I want to understand it. I had thought that a correlation was first of all a measure of deviations from a linear relationship between two variables, with zero correlation indicating that the two variables had no relationship to each other -- that they were random relative to each other. A correlation of 1 means a completely predictable relationship. But a correlation of 1 does not mean that the slope of the relationship is 1. It just means that the relationship is exact, with no uncertainty, whatever the slope is. If you know only the correlation between the two vectors, you have no idea what kind of relationship exists between them As I have understood these things, anyway.

If that's not correct, I guess I will need to start over. How was the formula for Pearson's r derived? I, know, "Ask Pearson."

Best,

Bill P.

[From Rick Marken (2007.01.08.1820)]

Martin Taylor (2007.01.07.22.41)

What Rick wants to do with it is find out which square the subject is controlling, and when the subject has chosen to stop controlling that square in order to control another. This seems to me to be a perfectly good situation in which to use correlation

I agree. And the reason is because, when the three squares are not being controlled we know from the physical situation that there will be linear relationship between disturbance, d, and square position, qi. That s, if the mouse, m, is not moved, because qi = d + m. But the mouse will be moving when one of the squares is being controlled so the linear relationship between qi and d will not be perfectly linear (a correlation of 1.0) when that square is not being controlled.

So when I compute the correlation between qi and d, I am measuring the degree of _linear_ fit between these two variables (qi and d), which I know will be less than 1.0 because of the variability of m. But the variability in m will be the same for all three squares. So the correlation of interest is the one between d and qi for each square (each square is a different qi with a different associated d). If control is perfect, the correlation between qi and d for that square will be 0. But there will always be a small non-zero correlation between m and qi so, even with good control, the correlation between qi and d for the controlled square will never be exactly 0. But it should be closer to 0 than it is for the uncontrolled squares; the correlation between qi and d will be higher for the uncontrolled squares than for the controlled square. But even for the uncontrolled squares the correlation between qi and d will not be 1.0.

By the way, this thread just helped me fix my Mind Readings demo (after 10 years;-)). Based on this discussion I realized that the correlation between d and qi for each square should be 1.0 when the mouse is stationary. That's because,when the mouse is stationary, qi (t) = d(t); the time variations in square position are completely determined by d(t) so the correlation between d(t) and qi(t) should be 1.0 for each square. But when I left the mouse stationary these correlations were pretty low -- like .6 or so. This was driving me nuts. I tried all kinds of things (that's how I spent me day but I figure it's worth it because I want to have this demo working correctly for class) and finally figured it out; I was combining the x and y vectors for the disturbance and cursor position into my calculation of the correlation. That is, I was treating all the disturbance values (in the x and y dimension) as a single d vector and all the cursor values (x and y) as a single qi vector. I Had a heck of a time trying to figure out why this is wrong -- after all qi = d in both the x and y dimensions -- but I finally realized that it results from the fact that the offsets (the intercepts of the line relating qi to d) were different in the x and y dimensions. That's just because of the way the square coordinates are plotted on the screen. I think I could have used the single vector approach of I had used the raw values of the cursor x and y position rather than the plotted values. But this little exercise does show that the correlation is measuring fit to a linear regression line so you have to be careful about how you use it.

Bill Powers--

When there is no uncertainty in a relationship, we simply analyze it mathematically and that is all there is to do. So I wonder if we're not trying to solve a nonexistent problem here.

Well, do let Rick know that his problem is nonexistent, and I'll stop worrying about it. But it won't change the actual correlational relationships among the signals in a control system -- whether with noise or without.

There is a problem in my case because there is uncertainly in the _linear_ relationships between qi and d, created partly by the effect of mouse movements on the uncontrolled squares and partly as a result of changes in the reference specification for the controlled square. Actually, I'm finding (from experiment) that changes in the reference for the controlled square seem to have very little effect on the correlation between qi and d for that square. I thought variations in reference would increase this correlation, making it difficult to tell which of the three squares was actually under control. But the main problem seems to be the effect of mouse movements on the uncontrolled squares. Mouse movements can spuriously _reduce_ the qi - d correlation for a square when the disturbance to that square is somewhat correlated with the disturbance to the controlled square. So the way to improve this mind reading demo is to figure out a way to factor out the effect of mouse movements on movements of the uncontrolled squares. I can make the disturbances to two possible controlled variables orthogonal (which sort of solves the problem) but with three or more possible controlled variables one pair will end up being somewhat non-orthogonal.

Any ideas?

Best

Rick

···

---
Richard S. Marken Consulting
Home 310 474-0313
Cell 310 729-1400

[From Bill Powers (2007.01.09.0340 MST)]

Richard Kennaway (20070108.0740 GMT) --

Pearson's r, by definition, measures how well the data are fitted by the best-fitting straight line (best by the least squares criterion). What can you learn by computing it when X and Y are not related by a linear equation plus noise?

I'm encouraged by your agreement with my understanding. Can't we see a correlation as a measure of deviations of observations from a model that defines the expected value of Y, just like the chi-squared measure? The specific model in that case is E(Y) = aX + b.

Suppose we say that the observed value of Y is

Y[i] = aX[i] + b + R[i]

where R is a random number with an average value of zero.

I use a prime (') to indicate the average value.

Y' = aX' + b

Y - Y' = a(X - X') + R[i]

The regression line is just that equation without the random term. In that case, "a" is

sigmaY
a = r*-------
sigmaX

r being Pearson's r.

I can't quite see where to go from here but maybe you can. The fuzzy goal I have in mind is to express R as a function of r. Maybe. I'm sleepwalking, so nothing is very sharp right now, including me.

Best,

Bill P.

[Martin Taylor 2007.01.09.09.55]

[From Bill Powers (2007.01.09.0340 MST)]

Richard Kennaway (20070108.0740 GMT) --

Pearson's r, by definition, measures how well the data are fitted by the best-fitting straight line (best by the least squares criterion). What can you learn by computing it when X and Y are not related by a linear equation plus noise?

I'm encouraged by your agreement with my understanding. Can't we see a correlation as a measure of deviations of observations from a model that defines the expected value of Y, just like the chi-squared measure? The specific model in that case is E(Y) = aX + b.

I've been puzzled by Bill's, and then Richard's, linkage of correlation with "best-fitting straight line", which to me has never been anything other than a fairly common application of correlation. They see it as the only reason for computing a correlation, and I've wondered why this should be. Perhaps I have an answer (very tentative, and I expect they will correct me).

My tentative answer is in two parts, the historical background and present thinking. The historical background is that many of these statistical analyses were developed in the context of experiments in which the experimenter set one variable (the "independent variable") to a particular value and looked to see what was the consequent value of a "dependent variable".

The consequent value of the dependent variable might be different on different occasions when the experimenter set the independent variable to a particular value, and tools were developed to assess how much of the changes in the dependent variable were "truly" caused by changes in the independent variable, and how much was due to other effects in the world. The dependent variable might be assumed to have a functional relationship with the independent variable, and the Pearson correlation allowed "the proportion of variance accounted for" by the function to be assessed if the functional relationship was linear.

That much isn't tentative. What is tentative is my guess that Bill and Richard are still thinking in this way, that when we are dealing with a correlation, we are always treating a situation in which there is one variable that someone (experimenter) has the ability to set at an arbitrary value, and another that must be observed when the first has been set, and that what matters is the degree of variation of the second at a particular setting of the first.

Am I anywhere near right?

Martin

[From Bill Powers (2007.01.09.0910 MST)]

Martin Taylor 2007.01.09.09.55 --

That much isn't tentative. What is tentative is my guess that Bill and Richard are still thinking in this way, that when we are dealing with a correlation, we are always treating a situation in which there is one variable that someone (experimenter) has the ability to set at an arbitrary value, and another that must be observed when the first has been set, and that what matters is the degree of variation of the second at a particular setting of the first.

Am I anywhere near right?

On my part, yes, except for the part about a person's setting one variable to an arbitrary value. For me, what matters is how reliably a proposed relationship between two variables is matched by observations of the variables. A low correlation indicates a poor match. Furthermore, my impression is still that the correlation formula is based on a first-approximation model of the proposed relationship, namely a simple linear slope-intercept model, y = ax + b. So the correlation measure is not appropriate for judging higher-order polynomial models or transcendental-function models.

When you say "still thinking that way," the implication is that there is now some advanced way of thinking about correlations with which Richard and I are unfamiliar or somehow missed out on while everyone else changed their ideas. I wouldn't be surprised if that were true of me, but I would be extremely surprised, not to say skeptical, to hear that assertion made about Richard Kennaway.

So just what is this new and improved way of thinking about correlations?

Best,

Bill P.

[From Bill Powers (2007.01.09.0935 MST)]

Rick Marken (2007.01.08.1820) –

But the main problem seems to be
the effect of mouse movements on the uncontrolled squares. Mouse
movements can spuriously reduce the qi - d correlation for a square
when the disturbance to that square is somewhat correlated with the
disturbance to the controlled square. So the way to improve this
mind reading demo is to figure out a way to factor out the effect of
mouse movements on movements of the uncontrolled squares. I can make the
disturbances to two possible controlled variables orthogonal (which sort
of solves the problem) but with three or more possible controlled
variables one pair will end up being somewhat non-orthogonal.

Any ideas?

You don’t need “orthogonality” in the geometric sense. What you
need is for the three disturbances to have very low correlations with
each other. This will be hard to do if you use sine-waves, but if you
construct disturbance tables using smoothed random numbers, you can write
a routine that generates tables and computes correlations between the
tables until you come across a set that has the required low
correlations, say less than 0.1.
If the mutual correlations of the disturbances are small, then the mouse
movement that cancels one disturbance will have a low correlation with
the other two disturbances. Then it’s only a question of how fast the
disturbances change, which determines how long the run has to last to get
good statistics.
The idea of factoring out mouse effects on other squares might work, but
requires more knowledge than the simple approach you’re using. The idea
is to deduce which square is being controlled without detailed
knowledge of what is going on. In fact the main point of this approach as
I originally thought of it was to illustrate how the deduction could be
made by looking for the lowest rather than the highest correlation. In
fact it might possibly be more efficient to look for the highest negative
correlation between mouse velocity and rate of change of
disturbance.

Your previous suggestion of using a model of the controller to deduce the
reference signal might also prove quite efficient, even if the model is
only a one-size-fits-all approximation, a simple integrating control
system. But I know you want to get this going for your class, so that’s
probably a longer-term project.

Best,

Bill P.

[Martin Taylor 2007.01.09.15.22]

[From Bill Powers (2007.01.09.0935 MST)]

Rick Marken (2007.01.08.1820) --

But the main problem seems to be the effect of mouse movements on the uncontrolled squares. Mouse movements can spuriously _reduce_ the qi - d correlation for a square when the disturbance to that square is somewhat correlated with the disturbance to the controlled square. So the way to improve this mind reading demo is to figure out a way to factor out the effect of mouse movements on movements of the uncontrolled squares. I can make the disturbances to two possible controlled variables orthogonal (which sort of solves the problem) but with three or more possible controlled variables one pair will end up being somewhat non-orthogonal.

Any ideas?

You don't need "orthogonality" in the geometric sense. What you need is for the three disturbances to have very low correlations with each other.

That is _precisely_ "orthogonality in the geometric sense!" You just said the same thing in different words.

And Rick can have as many orthogonal disturbance patterns as he wants, two, three, or fifty-three. It's not related to the evrtical and horizontal directions on the screen. That issue is quite different. If I understood him properly, he just had a programming bug.

Martin

[Martin Taylor 2007.01.09.15.27]

[From Bill Powers (2007.01.09.0910 MST)]

Martin Taylor 2007.01.09.09.55 --

That much isn't tentative. What is tentative is my guess that Bill and Richard are still thinking in this way, that when we are dealing with a correlation, we are always treating a situation in which there is one variable that someone (experimenter) has the ability to set at an arbitrary value, and another that must be observed when the first has been set, and that what matters is the degree of variation of the second at a particular setting of the first.

Am I anywhere near right?

On my part, yes, except for the part about a person's setting one variable to an arbitrary value.

That was the key point in what I thought was my insight. Apparently I was not right. I'll have to try to figure out what else it is that has you fixated on this idea of a functional relationship, if it isn't thinking of the effect of changing one variable on the value of the other.

When you say "still thinking that way," the implication is that there is now some advanced way of thinking about correlations with which Richard and I are unfamiliar or somehow missed out on while everyone else changed their ideas.

Perhaps I used an unfortunate wording. It comes from believing that you are both working from the historical sources of the notion, and for that reason thinking that to compute correlation you MUST have a situation in which the value of one variable might depend in some way on the value of another. The other element that seemed to me likely to have come from the historical background was the idea you both seem to have that correlation has some necessary connection with the idea of random variation.

This concept of correlation, that it necessarily involves functional relationships and/or randomness has nothing wrong with it, until you using that concept as the basis for correlation, rather than just an application. It's a bit like saying that because a newspaper was originally printed so that people could read the advertisements, therefore you can't use it to wrap fish and chips!

So just what is this new and improved way of thinking about correlations?

Should we say "more general" rather than "new and improved"?

Correlation has no built-in asymmetry such as is implied by "a simple linear slope-intercept model, y = ax + b." Correlation represents (is the cosine of) the angle between two vectors in a space of dimensionality at least as large as the number of elements in the vectors. That's one way of looking at it symmetrically. The idea of linearity does enter, but only through the assertion that the space is Euclidean, implying that for each dimension and for both vectors, the distance between, say, 0 and 1 is the same as between 100 and 101, or a million and a million and one.

If you don't like visualising things in high-dimensional spaces, just refer to the calculational formula, in which there's no assymetry, and no assertion of any kind of functional relationship between the variables, not a linear relationship, not anything (other than the idea that it's legitimate to compute things like means and sums of squares).

I don't know what more I can do on this topic, other than simply continue with the analysis I'm trying to do to extend the Web page to cover Rick's conditions. To understand that, all you really need to know is that zero correlation means geometric orthogonality, and the greater the correlation between two vectors, the smaller the angle between them. You can get a lot of qualitative understanding from knowing just that.

Martin

[From Rick Marken (2007.01.09.1255)]

Martin Taylor (2007.01.09.15.22)

Bill Powers (2007.01.09.0935 MST)

You don't need "orthogonality" in the geometric sense. What you need is for the three disturbances to have very low correlations with each other.

That is _precisely_ "orthogonality in the geometric sense!" You just said the same thing in different words.

And Rick can have as many orthogonal disturbance patterns as he wants, two, three, or fifty-three.

Yes. I want three disturbances that have low correlations with each other. This is actually pretty easy to do with sine waves since sine waves that are 90 degrees out of phase or at integral multiple frequencies of each other are completely orthogonal to each other. I think I've got it fixed up pretty well now.

Best

Rick

Richard S. Marken Consulting
Home 310 474-0313
Cell 310 729-1400

[From David Goldstein (2007.01.09.2016 EST)]

Here is a good discussion of this topic.

http://en.wikipedia.org/wiki/Correlation

David

···

-----Original Message-----
From: Control Systems Group Network (CSGnet)
[mailto:CSGNET@LISTSERV.UIUC.EDU] On Behalf Of Bill Powers
Sent: Tuesday, January 09, 2007 6:54 PM
To: CSGNET@LISTSERV.UIUC.EDU
Subject: Re: correlations and integrators.

[From Bill Powers (2007.01.09.1602 MST)]
Martin Taylor 2007.01.09.15.27 --

So just what is this new and improved way of thinking about
correlations?

Should we say "more general" rather than "new and improved"?

Correlation has no built-in asymmetry such as is implied by "a
simple linear slope-intercept model, y = ax + b." Correlation
represents (is the cosine of) the angle between two vectors in a
space of dimensionality at least as large as the number of elements
in the vectors.

So you still say that correlation is simply the cosine of the angle
between two vectors. If that's the case, why do we need a special
word for something we can already describe simply as "the cosine of
the angle between two vectors"?

And more to the point, since the context of this discussion has been
statistics, what language do we then use to describe the "degree of
relatedness" of the kind psychologists worry about -- the
repeatability of any measure of the relationship between variables?
The projection of one vector onto another may be measured by the
cosine of the angle between the (coplanar) vectors, but how do we
talk about the degree of systematicity in this projection? How do we
know that the projection we measure is not simply due to a chance
fluctuation?

Pearson's r is designed to show the probability that a measurement of
one variable will be paired with some specific measurement of another
variable. It applies in the situation where successive measurements
of what is supposedly the same relationship differ. The usual way of
putting the question being asked is "Is X related to Y?" The methods
of statistics allow us to compute the chances that the answer is yes.
They do not, however, tell us what that relationship is, other than
its being at least as complex as a straight line. A significant
correlation tells us only that the relationship is not that between
independent random variables -- pure chance.

If successive measures of the projection of one vector onto another
differ then we can speak of the (Pearson's) correlation between the
magnitude of one vector and the magnitude of its projection, both of
course being measurements. Variations in the measurements can take
place even with a constant angle between the vectors. So it seems to
me that we can talk separately about the cosine of the angle betweem
the vectors, and the correlation between different measures of the
vectors. But you seem to claim that these two subjects are really the
same thing.

At the moment, it seems unlikely that I am going to understand this.

Best,

Bill P.

[From Bill Powers (2007.01.09.1945 MST)]

David Goldstein (2007.01.09.2016 EST) --

Here is a good discussion of this topic.

http://en.wikipedia.org/wiki/Correlation

Thanks, David. I see that Martin is not alone in saying that the correlation coefficient is the angle between two vectors, although in the Wiki article the vectors are specifically drawn from a population of random variables, and aren't just any vectors. It's pretty clear that the vote on whether random variables are an aspect of correlations goes against Martin. There are several pretty clear statements that Pearson's r is based on a linear model. I also appreciate the small expansion on "correlations don't indicate causation -- but they sure go a long way toward doing that." The converse needs to be stated, too: " ... and neither do they rule out causation."

One also must remember that these articles represent someone's opinion. It's permitted to disagree until you get to the level of proofs.

Best,

Bill P.

[David Goldstein (2007.01.10.0343] EST

I am not sure it will be allowed, but here is an article that seems to
deal with 'proof.'

I found it at this website, if the article does not come through.

http://cnx.org/content/m12101/latest/

David

m12101.pdf (229 KB)

···

-----Original Message-----
From: Control Systems Group Network (CSGnet)
[mailto:CSGNET@LISTSERV.UIUC.EDU] On Behalf Of Bill Powers
Sent: Tuesday, January 09, 2007 10:06 PM
To: CSGNET@LISTSERV.UIUC.EDU
Subject: Re: correlations and integrators.

[From Bill Powers (2007.01.09.1945 MST)]

David Goldstein (2007.01.09.2016 EST) --

Here is a good discussion of this topic.

http://en.wikipedia.org/wiki/Correlation

Thanks, David. I see that Martin is not alone in saying that the
correlation coefficient is the angle between two vectors, although in
the Wiki article the vectors are specifically drawn from a population
of random variables, and aren't just any vectors. It's pretty clear
that the vote on whether random variables are an aspect of
correlations goes against Martin. There are several pretty clear
statements that Pearson's r is based on a linear model. I also
appreciate the small expansion on "correlations don't indicate
causation -- but they sure go a long way toward doing that." The
converse needs to be stated, too: " ... and neither do they rule out
causation."

One also must remember that these articles represent someone's
opinion. It's permitted to disagree until you get to the level of
proofs.

Best,

Bill P.

[Martin Taylor 2007.01.10.00.10]

[From Bill Powers (2007.01.09.1945 MST)]
Thanks, David. I see that Martin is not alone in saying that the correlation coefficient is the angle between two vectors, although in the Wiki article the vectors are specifically drawn from a population of random variables, and aren't just any vectors. It's pretty clear that the vote on whether random variables are an aspect of correlations goes against Martin.

And in the psychology community in general, the vote on whether behaviour is the control of perception goes against Powers. So what? Do you perceive some kind of a contest going on that i don't know about? We are talking about science, aren't we? You don't usually seem to accept that votes come into it when you discuss the validity of PCT.

When you looked at the article, perhaps you noticed the first sentence: "This article is about the correlation coefficient between two random variables." It's not very surpising that the main body of the article deals primarily with random variables and that the vectors concerned represent random variables, is it?

If you remember, I have consistently pointed out that correlation is most often used in dealing with random variation, especially when the user suspects a linear relationship between the variables. I have also said that random variation is not a NECESSARY aspect of correlation, and have demonstrated at least one situation in which correlation is useful in the absence of random variation for analyzing a control system. What I have said is totally consistent with the Wiki article. In fact I drew upon that article in checking some of my postings.

There are several pretty clear statements that Pearson's r is based on a linear model.

They are talking about a linear model that is assumed to represent the relationship between two variables. The vector approach assumes linearity by asserting that the representation space is Euclidean, and that both vectors can be represented in the same Euclidean space. The scatter plot and the vector representation are two complementary ways of looking at the same thing. The generalization implied by the vector representation is not in the mathematics, but in the way it induces one to look at problems. It removes the concept of "independent variable - dependent variable" that is implicit in drawing scatterplots with one variable (independent) on the X axis and the other variable (dependent) on the Y axis.

In the case in question, a scatterplot of the values the sum of a bunch of cosines against the values of their corresponding sines isn't going to show you very much. It will look like a noise scatterplot. The vector representation helps one to see the problem differently.

When you have more than one way of looking at a problem, your chances of seeing useful answers are often increased.

Martin

what you mean.

···

-----Original Message-----
From: Control Systems Group Network (CSGnet)
[mailto:CSGNET@LISTSERV.UIUC.EDU] On Behalf Of D Goldstein
Sent: Wednesday, January 10, 2007 3:48 AM
To: CSGNET@LISTSERV.UIUC.EDU
Subject: Re: correlations and integrators.

[David Goldstein (2007.01.10.0343] EST

I am not sure it will be allowed, but here is an article that seems to
deal with 'proof.'

I found it at this website, if the article does not come through.

http://cnx.org/content/m12101/latest/

David

-----Original Message-----
From: Control Systems Group Network (CSGnet)
[mailto:CSGNET@LISTSERV.UIUC.EDU] On Behalf Of Bill Powers
Sent: Tuesday, January 09, 2007 10:06 PM
To: CSGNET@LISTSERV.UIUC.EDU
Subject: Re: correlations and integrators.

[From Bill Powers (2007.01.09.1945 MST)]

David Goldstein (2007.01.09.2016 EST) --

Here is a good discussion of this topic.

http://en.wikipedia.org/wiki/Correlation

Thanks, David. I see that Martin is not alone in saying that the
correlation coefficient is the angle between two vectors, although in
the Wiki article the vectors are specifically drawn from a population of
random variables, and aren't just any vectors. It's pretty clear that
the vote on whether random variables are an aspect of correlations goes
against Martin. There are several pretty clear statements that Pearson's
r is based on a linear model. I also appreciate the small expansion on
"correlations don't indicate causation -- but they sure go a long way
toward doing that." The converse needs to be stated, too: " ... and
neither do they rule out causation."

One also must remember that these articles represent someone's opinion.
It's permitted to disagree until you get to the level of proofs.

Best,

Bill P.

[From Bill Powers (2007.01.10.0930 MST)]

Martin Taylor 2007.01.09.17.34 –

I’m a bit overwhelmed by all the mathematicizing that’s being showered on
me, so maybe my most prudent move would be to retire to the balcony and
watch. Having seen others in the same position I know that there’s a high
risk of saying foolish things out of ignorance. On the other hand, nobody
ever accused me of being prudent …but I will try not to tilt at
windmills.

Red herring. The (x, y, z)
directions are in a different subspace of space-time. Rick was dealing
with the time dimension, if I understood him correctly (as I think I did,
given his later message). The three disturbance vectors are each a set of
numbers representing successive time sample values, as for example:

``````    d1 = {1, 2, 1, -1, 0,
``````

3,…}

``````    d2 = {1, 2, 1, -1, 0,
``````

3,…}

``````    d3 = {1, 2, 1, -1, 0,
``````

3,…}

``````   f(t)= {1, 2, 1, -1, 0,
``````

3,…}

They are the same vector, pointing in the same direction in the space of
time samples.

I was describing the case where there are three dimensions,
i,j, and k normal to each other, with each
disturbance plotted in one of those dimensions. The angle between any two
disturbance vectors is either 90 degrees or 0 degrees in my examples…
The magnitudes of the vectors, however (their lengths) vary from
one determination to another, with some mean and standard deviation.
We can consider the X positions of the three squares on the screen in
Rick’s example as three independent dimensions of the display, since each
can be varied independently of the others. The three disturbances act in
this 3-D space. We could use 6 dimensions to take care of X and Y on the
screen, but that would make no difference to what follows.
First case:
The point of this example was that even with these vectors being at right
angles in disturbance space, their magnitudes can have a correlation of

1. Suppose that on successive measurements of the disturbances, we find
these sets of values:
n id1jd2kd3
1: 4 8 13
2 . 105 109 114
3 65 69 74
…;. and so on, adding any randomly-selected scalar number (mean of zero)
to all three vectors each time.
Now the correlation between the magnitudes of the three disturbances will
be found to be 1.
Second case:
Now let all three disturbance magnitudes be applied to the first square’s
X position only. All three disturbances affect the same square on
the screen display, so the disturbance vectors are collinear. Now,
however, we construct the disturbance magnitudes as
2. id1 = random(1000); jd2 = random(1000) kd3 =
random(1000)

and so on for each measurement, where random(1000) yields a new number in
the range 0 to 999. To get a mean of zero we subtract 500 each
time.

Now we find that d1, d2, and d3 have a mutual correlation of zero,
despite their all affecting the same dimension of the display, and
despite the angle between them having a cosine of 1.

Note, too, that in the first case, if we used different random numbers to
disturbances, the mutual correlations would be zero – but not because
the vectors are at right angles.

It seems to me that in generalizing the idea of a vector, more confusion
than illumination is created. If we start with the physical situation and
construct a mathematical description to fit it, there is no confusion
even if we use abstract coordinates. On the other hand, if we start with
the most general mathematical formulation possible and try to apply it to
a specific physical situation, we spend most of our time trying to
disentangle the multiple referents of “vector”,
“angle”, and “correlation”. The feeling of
understandingness you get from contemplating the most general formulation
falls to pieces when you try to apply it to a specific case in which
several different meanings of the same word show up.

At least, that’s what happens to me.

Best.

Bill P.

[From Rick Marken (2007.01.11.1040)]

Bill Powers (2007.01.10.0930 MST)

Martin Taylor 2007.01.09.17.34 --

I'm a bit overwhelmed by all the mathematicizing that's being showered on me, so maybe my most prudent move would be to retire to the balcony and watch.

I don't understand why we're even having this discussion. Of course correlation is a measure of linear relationship; no question; that's Stat 101. We use correlation in evaluating the results of our tracking experiments because the relationships we are measuring (like that between output and disturbance) are expected to be linear because of the way we typically set up our experiments (with qi = o + d so that when there is control of qi o = -d -- perfect negative linear relationship).

Now can we switch to something more interesting, like what to do when you are living in a country full of people who thought it was a good idea to impeach a competent, articulate and generally successful President like Bill Clinton but who will not even consider impeaching a couple miserable incompetents like Bush and Cheney. Even though Pelosi is beholden to the Israel lobby (like every other politician in the US, for some reason), I think she would make a great President. She is definitely my type.

Best regards

Rick

···

---
Richard S. Marken Consulting