Correlations, spreadsheets

[From Bill Powers (2004.02.26.0233 MST)]

Rick and Bjorn:

A word of caution. If you get a correlation of 1.0, this means you are
correlating a variable with itself, or with another variable proportional
to it. Calculate the regression coefficient to see which is the case. In
any case, the meaning of the correlation is not what you think it is. Note
that if one variable is a linear function of another, with no random
component, the two variables will correlate perfectly. Correlation is not
the right tool for that situation. Also, if you're accidentally correlating
an observed time function against itself, the correlation will be perfect
at zero time-shift, but not generally at any other shift. Again,
correlations are not the right tool here if the correlation is perfect at
any time shift.

I think it is almost impossible to check equations for consistency and
meaning in the spreadsheet format. You have to carry the meaning of each
cell designation in your head -- if you can avoid mistakes that way, you're
better men than I am. And even if you are able to carry off this feat, it
would be very helpful to those who can't do it if you would always present
the equations themselves, converting to cell designations only after you
have made sure of what each equation represents and explained it to us.
This permits others to check out the simulation in the language of their
choice.

Any system of equations, whether it means anything or not, will "run" when
you cast it in the form of a simulation and set it in motion. Before you
believe the results of any simulation, you should be able to explain
exactly what is going on with each variable and show that it makes sense.
Every variable, every coefficient, every function, must have an explicit
counterpart (observed or proposed) in the real system. If this is not true,
there is no way to understand the time-plots you get from running the
simulation. If you don't have some kind of independent confirmation that
the results are correct, such as tests for extreme values of independent
variables like zero and infinity or known sets of values at specific times,
it's damned hard to know whether you have a valid simulation or not.

Incidentally, for the time plots Bjorn is transmitting, all I see are blue
rectangles. I suspect that this has something to do with Apple computer
output being viewed on PCs. Of course I could have Eudora set up wrong. The
most reliable viewing for me is the JPG format. Am I the only one having
this problem?

Best,

Bill P.

[From Bjorn
Simonsen(2002.02.26,15:25 EuST)]

From Bill Powers
(2004.02.26.0233 MST)

A word of
caution. If you get a correlation of 1.0, this means you are

correlating
a variable with itself, or with another variable proportional

to it.
Calculate the regression coefficient to see which is the case. In

any case,
the meaning of the correlation is not what you think it is. Note

that if one
variable is a linear function of another, with no random

component,
the two variables will correlate perfectly. Correlation is not

the right
tool for that situation. Also, if you’re accidentally correlating

an observed
time function against itself, the correlation will be perfect

at zero
time-shift, but not generally at any other shift. Again,

correlations
are not the right tool here if the correlation is perfect at

any time
shift.

Thank you. This is embarrassing. I compared the
matrixes and they were dependent, Not Rick’s but mine.

Incidentally,
for the time plots Bjorn is transmitting, all I see are blue

rectangles.
I suspect that this has something to do with Apple computer

output being
viewed on PCs. Of course I could have Eudora set up wrong. The

most
reliable viewing for me is the JPG format. Am I the only one having

this problem?

I will use JPG
format next time. Now I see the graphs I sent are horrible, so I don’t send
them I JPG.

bjorn

From[Bill Williams 26 February 2004 8:50 PM CST]

I would disagree with Bill Powers when he says, that a correlation of 1.0
necessarily means that you are "correlating a variable with itself."
Consider the case of say an airline. In a year's time it is possible that
every flight ended up where the schedule said it was going to go. So, you
could have for that year or even several years, a correlation of precisely
1.0 . Of course, considerable efforts are being made to see that this
happens.

With this one exception, however, the rest of what is recommended seems like
excellent advise. In the program here at UMKC what Bill is recommending, and
far more stringent measures, are mandatory. Random probing of the data is
not permitted. In Kenneth Arrows' phrase, "Data mining is a sin."

Bill Williams

[From Rick Marken (2004.02.26.1115)]

Bill Powers (2004.02.26.0233 MST)

A word of caution. If you get a correlation of 1.0, this means you are
correlating a variable with itself, or with another variable proportional
to it.

Right. As I said in an earlier post, the correlation between x and any
linear transformation of x (ax+b) will be 1.0.

I think it is almost impossible to check equations for consistency and
meaning in the spreadsheet format.

My model equations are in Visual Basic, not in spreadsheet cells.

Incidentally, for the time plots Bjorn is transmitting, all I see are blue
rectangles. I suspect that this has something to do with Apple computer
output being viewed on PCs.

I think it has to do with the mail program you use (or the settings
thereof). I see Bjorn's plots just fine here at work (using Entourage) but I
get the blue rectangles at home (using Apple Mail or whatever the heck I'm
using).

The most reliable viewing for me is the JPG format.

I agree. That's what I tend to use. I have to make sure that the files
aren't too big, however. Different jpeg formats give different size files.

Best

Rick

···

--
Richard S. Marken
MindReadings.com
Home: 310 474 0313
Cell: 310 729 1400

[From Bill Powers (2004.02.26.1238 MST)]

Rick Marken (2004.02.26.1115)--

Right. As I said in an earlier post, the correlation between x and any

linear transformation of x (ax+b) will be 1.0.

When you run models, they generally don't contain noise generators, so the
relationships you observe have no random component. It's only in comparing
model performance with observational data that correlations will be less
than perfect. I see Bjorn has found his equations to be dependent, so the
model wasn't valid. But it's nice to know that Bjorn knows dependent
matrices when he sees them -- that moves him into a higher bracket among
modelers.

Best,

Bill P.

···

> I think it is almost impossible to check equations for consistency and
> meaning in the spreadsheet format.

My model equations are in Visual Basic, not in spreadsheet cells.

> Incidentally, for the time plots Bjorn is transmitting, all I see are blue
> rectangles. I suspect that this has something to do with Apple computer
> output being viewed on PCs.

I think it has to do with the mail program you use (or the settings
thereof). I see Bjorn's plots just fine here at work (using Entourage) but I
get the blue rectangles at home (using Apple Mail or whatever the heck I'm
using).

> The most reliable viewing for me is the JPG format.

I agree. That's what I tend to use. I have to make sure that the files
aren't too big, however. Different jpeg formats give different size files.

Best

Rick
--
Richard S. Marken
MindReadings.com
Home: 310 474 0313
Cell: 310 729 1400