DG: Under these conditions, the

textbook states:

[From Bill Powers (2009.05.22.0856 MDT)]

David Goldstein (2009.05.22.10:09

EDT) –

r measures the slope of the

regression lines and is thus the tangent of the angle between the X-axis

and the regression line for the regression of y on x; and is the tangent

of the angle between the Y-axis and the regression line for x on

y.

BP: OK, that completes the circle of confusion. In fact, I have to wonder

if this textbook is correct. If r measures the slope of the regression

line, then all data sets showing the same correlation will also yield the

same slope of the regression line. That can’t be true.

I think that is true for only one value of correlation: 1.0. The slope of

the regression line for that correlation (normalized by the ratio of the

x and y standard deviations, as you point out) must also be 1.0. However,

it can’t be true of any correlation less than 1.0. I don’t have an

analytical proof, but the way I constructed that set of scatter plots

long ago would support my claim. I plotted y = a*x + n*(0.5 - random) +

b. “Random” is a Pascal function which returns a random real

number between 0 and 1, so the expression 0.5 - random generates values

between -0.5 and +0.5 with a mean of zero. Therefore no matter what value

of n is used, the regression line for a set of x,y values will have a

slope of **a** and an intercept of **b**. As n increases from zero,

however, the x-y correlation will fall from 1 toward zero. So by

demonstration, the correlation does not determine the slope of the

regression line and indeed is independent of it.

Since I did not take a statistics course in kindergarten, or any time

after that, I did not know that the correlation coefficient could be

interpreted as the cosine of an angle in hyperspace (though of course I

eventually learned that cosines range from -1 to 1, so I would have

realized, had the occasion arisen, that any number in that range could be

mistaken for the cosine or sine of an angle). Neither did I ever take a

course in vector or matrix algebra, or any course which might have

informed me about the possibility of interpreting any random collection

of numbers as a vector in hyperspace, or the use of the dot-product

between two such vectors to compute an angle in hyperspace between the

vectors. I know such things now, from a distance, through reading and

picking up what I could, but can’t claim any intuitive or deep

familiarity with the subject.

I still don’t know the meaning of “angle” when the term is used

to refer to the arccos of the dot product of two vectors divided by the

product of their lengths. Obviously I can visualize this in two or three

dimensions (where the angle is always in a plane), but going to higher

dimensions doesn’t work for me – I find myself still visualizing

three-dimensional relationships, probably incorrectly. More to the point,

what does this hyperangle look like when projected into the space of x

and y, in which the original data set exists? It’s that space that I’m

interested in, because it’s in that space that the slope of the

regression line exists.

All of which still leaves me wondering how the slope of the regression

line can be the correlation times the ratio of the standard deviations.

I’ve tried to work it out by going to the underlying computations in

terms of X, Y, XY, N and so on, but the algebra quickly gets out of hand

and I don’t know the tricks for simplifying it. I imagine that the

equations will eventually reduce to an extremely simple form, but I’m not

the one to accomplish that. Not so far, anyway.

Martin Taylor is terminally peeved with us kindergartners, which shows

mainly that he doesn’t know how to communicate with people who know less

than he does. “Why can’t you people,” he sings to a tune from

My Fair Lady," “Be more like ME?” That’s all right, of

course, since he knows a lot and can probably come up with many useful

ideas. And sometimes he does descend to lucidity, when he admits that the

people to whom he’s teaching something actually need to have the details

filled in and explained. He does that when telling newcomers about PCT.

But I think he tires of teaching kindergartners and wishes for the

company of those to whom he doesn’t have to explain everything, and gets

annoyed because we are not that kind of people, and eventually is driven

to blaming his students for his discomfort. Understandable, but not

helpful from the students’ point of view. Not that he’s under any

particular obligation to care about that, but neither are the students

particularly obliged to defer to his wrath. If we just wait quietly for a

while, he’ll probably get over it.

Best,

Bill P.