[From Richard Kennaway (970421.1605 BST)]

Bruce Abbott (970419.1740 EST):

Yes, and despite their low values, such correlations may provide useful

information concerning the linkages among variables. Richard, have you ever

heard of factor analysis? Path analysis?

Factor analysis yes, path analysis no.

Useful for what? Real examples would be illuminating.

Richard Kennaway, 970419.1730 BST --

But they do not tell me

that low correlation implies the presence of anything but itself.

I'm not sure what you mean to convey here; could you be more explicit?

What I mean is, that the presence of a low non-zero correlation of P with R

does not imply that P is likely to constitute part of a good predictor or

explanation of R. It just tells you that there's a low, non-zero

correlation of P with R. Maybe you decide to see what might be combined

with P to better predict or explain R. Maybe you decide P isn't useful.

The low correlation is an input to your decision as an experimenter about

where to look next to explain R, but cannot be regarded as being on its own

a partial explanation of R.

You gave a fictitious example where four poor correlates of the variable to

be predicted did together give a good -- perfect -- prediction. Does this

ever happen with real data? Can you cite any experiments in which some

number of factors, each poorly correlating (c < 0.6, say) with the variable

to be predicted, together gave an excellent prediction (c > 0.98) on new

data? If I got a result like that in my research, my reaction wouldn't be

"oh yeah, four variables each correlating 0.5, not surprising that together

they account for everything", it would be "Wow! Eureka! Four pieces of

junk produce this pearl of knowledge! At last, a solid prediction that I

can base further research on!"

I stipulate that they must predict well on new data, as it's very easy to

fit curves as accurate as you like to any given data. The accuracy of that

fit is no guarantee that it will fit data that were not used to construct

it.

Back of an envelope calculation tells me that for a multivariate normal

distribution of variables P1...Pn and R, such that the bivariate

correlations of each of P1...Pn with each other are zero and of each with

R is 0.5, then for P1...Pn to jointly "explain" 90% of the variance of R

requires n to be at least 2 log 0.1/log 0.75 (0.1 is 1 - 0.9, and 0.75 is

1 - 0.5^2). This is a tad over 16, and it doesn't guarantee that when

those 16 observations are made, that 90% of the variance will be explained,

only that it cannot happen with fewer. If there is any correlation among

P1...Pn, n might be higher or lower.This is a bit confusing in that you have used n to stand both for the number

of independent predictors and the number of observations. For a

multivariate analysis, you must have at least as many observations as

predictors, and in practice you would want far more.

n is only the number of predictors. I didn't mention the number of

observations (call it N), but I assumed it to be large enough to neglect

the inaccuracy in the measures of correlations -- far more than n, as you

say.

This topic reminds me of a method of mathematical proof called "career

induction". You want to prove a theorem, say, that all Grossmayer syzygies

are noetherian (don't worry, the words are just flimflam). You prove a

special case -- all Grossmayer syzygies of degree at most 1 are noetherian.

Then you prove another case: all semi-flat Grossmayer syzygies of degree 2

are noetherian. Then another, and another...slicing off ever thinner

pieces of salami from the goal. If each piece is thick enough to make a

publication, you can base a career on it, hence the name. As a

mathematician, when I notice I'm engaging in career induction, it's a

wake-up call to me that I'm on the wrong track.

My gut feeling (which I admit is uncontaminated by any experience of

experimental research) is that "career factor analysis" -- accumulating

more and more partial correlates of the thing to be predicted --

constitutes a similar sort of aimless groping in the dark. I am open to

hearing of examples where it panned out.

## ยทยทยท

__

\/__ Richard Kennaway, jrk@sys.uea.ac.uk, http://www.sys.uea.ac.uk/~jrk/

\/ School of Information Systems, Univ. of East Anglia, Norwich, U.K.