Phone conversation about locus of control

Bill_Powers1 · June 23, 2009, 12:02am

[From Bill Powers (2009.06.22.1755 MDT)]

David Goldstein (2009.06.18. 21:03
EDT) –

Here is an example of a paper that uses the idea of “locus of
control”. Full text is free.

http://rheumatology.oxfordjournals.org/cgi/reprint/45/1/92

Please read it, find the highest correlation relationship that could be
used to makea prediction about some person’s behavior, and tell me what
the chances are that this prediction will be correct.

If you know of other studies where similar information can be obtained,
please include similar calculations.

Then we can use CSGnet to go on with our discussion.

Best,

Bill P.

Bill_Powers1 · June 23, 2009, 1:42am

DMG: noticed that the results
correspond to Table 2, second column entitled Prob. of correct sign
estimation.

The odds are as
follows:
.2 1.3 to
1

.5 2.0 to
1

.8 3.9 to
1

.866 5.0 to 1

.9 6.0
to 1
[From Bill Powers (2009.06.22.1920 MDT)]

David Goldstein (2009.06.18. 21:03
EDT) –

On the subject of locus of control, here is the relationship between
correlations and chances that you put together for me:

With a correlation of 0.866 the chances of predicting the sign of the
effect wrong are 1 in 7. With a correlation of 0.5, the chances of a
wrong prediction are 1 in 3 (1 wrong for every 2 right).

In your study of self-image with Dick Robertson, PCT would predict that a
disturbance tending to contradict the stated self image would be resisted
by the subject rather than agreed to. Just going by the frequency with
which this happened, what would you say about the chances of a correct
prediction of this phenomenon of control? What sort of correlation would
be needed to achieve that degree of correctness?

Best,

Bill P.

BRUCE_STEPHANIE_ABBO · June 23, 2009, 1:55pm

[From Bruce Abbott
(2009.06.23.0955 EST)]

Bill Powers (2009.06.22.1920 MDT)

···

–

David Goldstein
(2009.06.18. 21:03 EDT)

BP: On the subject of locus of control, here
is the relationship between correlations and chances that you put together for
me:

DMG:
noticed that the results correspond to Table 2, second column entitled Prob. of
correct sign estimation.

The
odds are as follows:

.2
1.3 to 1
.5 2.0 to 1
.8 3.9 to 1
.866 5.0 to 1
.9 6.0 to 1

With a correlation of 0.866 the chances of predicting the sign of the effect
wrong are 1 in 7. With a correlation of 0.5, the chances of a wrong prediction
are 1 in 3 (1 wrong for every 2 right).

In your study of self-image with Dick Robertson, PCT would predict that a
disturbance tending to contradict the stated self image would be resisted by
the subject rather than agreed to. Just going by the frequency with which this
happened, what would you say about the chances of a correct prediction of this
phenomenon of control? What sort of correlation would be needed to achieve that
degree of correctness?

BA: I may be sticking my neck out here as I’m not sure I’ve
completely understood Kennaway’s paper, but I believe that the prediction
of sign was based on distributions of normalized variables; their mean values
would be zero. In the limiting case of zero correlation, there would be a 50%
probability that the prediction of sign would be correct.

This would not be the case for a variable whose mean was
non-zero. Imagine a scatter plot (Y as a function of X) relating human height
versus weight. Both variables are always positive, and they tend to be strongly
positively correlated. At each height along the X-axis, there would be a
corresponding distribution of weights along the Y-axis. The regression line
fitted to these data would give the predicted value of Y at a given X value.
For each observed value of Y, we could compute the difference between that
observed value of weight and the predicted value. This difference is called a
residual. Residuals can be positive, negative, or zero depending on whether the
observed value is above, below, or on the predicted value, respectively. The
residuals tend to get smaller as the correlation between X and Y increases,
reaching zero when the correlation is plus or minus 1.0.

The task here is to “correctly” predict the value of
Y for a given individual, based on that individual’s height. How
successful you are depends on how you define a correct prediction. For a
theoretical distribution containing an infinite number of possible Y-values,
the chances of making a perfect prediction to an infinite number of decimal
places is essentially zero. To make the task practical, you have to specify the
range of values around the predicted value that would be considered close
enough to qualify as correct.

As the correlation between X and Y increases, the width of the
distribution of Y-values at a given value of X becomes smaller.
Consequently, any given prediction is more likely to fall within some acceptable
margin of error.

I mention all this because I’m worried that Kennaway’s
prediction-of-sign result might be misinterpreted to mean that one almost
cannot predict the sign of Y from X, let alone get close to the actual value of
Y, unless the correlation is extremely high. It would probably be clearer to
speak instead of the confidence interval (e.g., the 95% CI) around the
predicted value. This is the interval within which the true value of Y is
expected to lie on 95% of occasions.

Bruce A.

Richard_Kennaway · June 23, 2009, 2:46pm

[From Richard Kennaway (2009.06.23.1507 BST)]

[From Bruce Abbott (2009.06.23.0955 EST)]
BA: I may be sticking my neck out here as I�m not sure I�ve completely understood Kennaway�s paper, but I believe that the prediction of sign was based on distributions of normalized variables; their mean values would be zero. In the limiting case of zero correlation, there would be a 50% probability that the prediction of sign would be correct.

This would not be the case for a variable whose mean was non-zero.

For the case of non-zero means, the table in my paper applies to predicting whether Y is above or below its mean from whether X is above or below its mean.

Imagine a scatter plot (Y as a function of X) relating human height versus weight. Both variables are always positive, and they tend to be strongly positively correlated. At each height along the X-axis, there would be a corresponding distribution of weights along the Y-axis. The regression line fitted to these data would give the predicted value of Y at a given X value. For each observed value of Y, we could compute the difference between that observed value of weight and the predicted value. This difference is called a residual. Residuals can be positive, negative, or zero depending on whether the observed value is above, below, or on the predicted value, respectively. The residuals tend to get smaller as the correlation between X and Y increases, reaching zero when the correlation is plus or minus 1.0.

The task here is to �correctly� predict the value of Y for a given individual, based on that individual�s height. How successful you are depends on how you define a correct prediction. For a theoretical distribution containing an infinite number of possible Y-values, the chances of making a perfect prediction to an infinite number of decimal places is essentially zero. To make the task practical, you have to specify the range of values around the predicted value that would be considered close enough to qualify as correct.

As the correlation between X and Y increases, the width of the distribution of Y-values at a given value of X becomes smaller. Consequently, any given prediction is more likely to fall within some acceptable margin of error.

I mention all this because I�m worried that Kennaway�s prediction-of-sign result might be misinterpreted to mean that one almost cannot predict the sign of Y from X, let alone get close to the actual value of Y, unless the correlation is extremely high.

No worries -- that is the correct interpretation, understanding sign as the sign of the difference between Y and its mean. And you really cannot get close to the actual value of Y, unless the correlation is extremely high.

I chose to look at sign estimation because that's about the least information you could aim at getting, and even then you can't do it very well without a fairly high correlation (by the standards of psychologists and sociologists). Getting close to the actual value of Y is out of the question in any situation where you're measuring correlations at all. This is covered later in that paper, in two ways.

Firstly, if X is a long way above its mean, then one can be more sure that Y will be above its mean. So, for a given correlation c, in what proportion of cases will one be able to make that prediction with 95% confidence? This is answered in Table 2. For example, if c = 0.8, then 95% confidence of whether Y is above or below its mean is obtained in 21.7% of cases. The proportion falls off rapidly as c decreases. At c = 0.5, then only in 0.4% of cases, i.e. never, for all practical purposes.

Secondly, suppose you want to estimate Y's value more closely? Say you try to predict which decile it lies in? For the bivariate normal distribution, the best estimate of Y's decile turns out to be the decile that X lies in. Table 3 tells how often this gives the right answer: only one time in 4 for a correlation of 0.8. I didn't bother computing this for any smaller values.

It would probably be clearer to speak instead of the confidence interval (e.g., the 95% CI) around the predicted value. This is the interval within which the true value of Y is expected to lie on 95% of occasions.

This is also covered in the paper. The best estimate of the value of Y (rather than of its rank in its own distribution) is cX. To determine the spread of errors, one must look at the standard deviation of Y, conditional on knowing the value of X. If the unconditional standard deviation of Y is s, the conditional s.d. is s*sqrt(1-c*c). The ratio of the former to the latter is called the "improvement ratio", 1/sqrt(1-c*c). This is tabulated in Table 1, and may be dispiriting reading for someone hoping to predict Y from X from a scatterplot. For example, a correlation of 0.866 reduces the standard deviation by a factor of only 2.

Information theorists will interpret the binary log of this as the amount of information the exact value of X gives you about Y. This is also in Table 1. c=0.866 gives exactly 1 bit of information. Note that knowing the decile of Y is log(10) bits, or about 3.2; this is only obtained for a correlation somewhere above 0.99. But this "bit" is somehow smeared out, and you can't guarantee to extract it: as already mentioned, if you try to predict the sign of Y from c=0.866 you will only be right 5 times out of 6.

One might compare this to the effect of genuine measurements. My weight varies over a range of about 5 pounds -- suppose a standard deviation of 2 pounds. When I weigh myself, my scales have a resolution of 0.2 pounds. If I assume they're accurate, then that's an improvement ratio of roughly 2/0.2 = 10. Equivalent correlation: way off the end of Table 1.

Should I dust this paper off and send it to a journal? Which one?

···

--
Richard Kennaway, jrk@cmp.uea.ac.uk, Richard Kennaway
School of Computing Sciences,
University of East Anglia, Norwich NR4 7TJ, U.K.

Dick_Robertson · June 23, 2009, 3:47pm

[From Dick Robertson,2009.06.22.1046CDT]

Thanks Bruce, your explanation was most helpful to me with my modest grasp of this issue.

Best,

Dick R

···

Bill_Powers1 · June 23, 2009, 3:48pm

[From Bill Powers (2009.06.23.0848 MDT)]

Bruce Abbott (2009.06.23.0955 EST) –

BA: I may be sticking my neck
out here as Im not sure Ive completely understood Kennaways paper, but
I believe that the prediction of sign was based on distributions of
normalized variables; their mean values would be zero. In the limiting
case of zero correlation, there would be a 50%

probability that the prediction of sign would be
correct.

BP: Here is what David’s quote from the ZumaStat statistics program
says:

"The CLES tells you the probability that a randomly selected
individual who is above the mean on X will also be above the mean on Y.
It also reflects the probability that a randomly selected individual who
has a higher score than another person on X will also have a higher score
than that person on Y

ZumaStat also expresses these values in terms of odds, which are a bit
easier to interpret."

So that is what the table of odds is about. This isn’t just a
zero-average prediction.

BA: This would not be the case
for a variable whose mean was non-zero. Imagine a scatter plot (Y as a
function of X) relating human height versus weight. Both variables are
always positive, and they tend to be strongly positively correlated. At
each height along the X-axis, there would be a corresponding distribution
of weights along the Y-axis. The regression line fitted to these data
would give the predicted value of Y at a given X value. For each observed
value of Y, we could compute the difference between that observed value
of weight and the predicted value. This difference is called a residual.
Residuals can be positive, negative, or zero depending on whether the
observed value is above, below, or on the predicted value, respectively.
The residuals tend to get smaller as the correlation between X and Y
increases, reaching zero when the correlation is plus or minus
1.0.

BP: Richard Kennaway’s article dealt with this, too: the chances of
getting the direction of effect right are really not very large at
correlations less than 0.9, especially when we think in terms of
scientific knowledge of the highest quality. David added, in his
post about the odds,

“With a correlation of .5 between X and Y, if David scores above
Bill on X, the odds are 2.0 to 1 that David will score above Bill on Y. I
would make this bet, wouldn’t you?”

To which my answer would be that no, I would not make this bet at those
odds if getting it wrong was at all important. If I were a test
administrator and all I cared about was my own record of correctness,
then sure, 2:1 odds isn’t bad. But put yourself in the shoes of the
person whose future is going to be drastically affected by whether the
prediction is correct – a wrong prediction could mean a college career
full of anxiety and failure because the person should have gone to a
vocational school, or the result could be an unjust refusal to let the
person get a good education because of being underestimated. No wise
person would take a test to qualify for something important if there were
one chance of being misevaluated for every two chances of getting it
right. What if surgeons bet on odds of 2:1?

But beyond cost-benefit analysis, there is science itself to think about.
I think we should strive for the highest standards possible when we’re
trying to understand nature, not just say “Oh, well, we get it right
more than half the time.” The kinds of facts about behavior that we
get from PCT experiments and models are simply a couple orders of
magnitude higher in quality than anything you can get from the usual
psychological investigations.

I think that because of having used wrong models for so long,
psychologists have basically given up on the idea of doing real grown-up
science, where you aim to make predictions that narrow things down to the
smallest differences you can measure. I’ve made these arguments before,
and they’ve been ignored before. It’s just very hard to believe that we
could have a life science that is just as precise as any physical
science. Why even try? It’s hopeless. But that’s exactly what PCT can do
for us.

The biggest problem I see is that PCT isn’t concerned with the sorts of
random phenomena that psychologists have invented to investigate.
Categories like anxiety and depression, internal and external
control, preferences and tendencies and traits, just skim the
surface and deal with unimportant appearances, without telling us what is
really happening in and to the organism. PCT leads us to concepts such as
conflict and levels of control that can be technically defined and turn
out to play an important “transdiagnostic” role in just about
all kinds of psychological difficulties. The boundaries psychologists
have set up to differentiate one kind of problem from another are simply
irrelevant and have nothing to do with what is actually wrong with a
person.

You have probably noticed by now that when I suggest a new experiment or
devise a new demonstration, I can write up the expected results in
considerable detail before the experiment is done or the demo is run. And
when we do the experiments and run the demos, the result are almost
precisely in agreement with the writeup, for essentially every person who
tries them. This is simply a different ballpark in the prediction game.
That is because the theory behind the expected phenomena is RIGHT. It’s
not right some of the time for some of the people under some
circumstances. It’s right for every person every time, near enough.

Now that we know this sort of thing is possible, how could anyone want to
go back to the old days of 6:1 odds and correlations of 0.9? The way to
doing real science is open to us.

BA: I mention all this because
Im worried that Kennaways prediction-of-sign result might be
misinterpreted to mean that one almost cannot predict the sign of Y from
X, let alone get close to the actual value of Y, unless the correlation
is extremely high. It would probably be clearer to speak instead of the
confidence interval (e.g., the 95% CI) around the predicted value.
This is the interval within which the true value of Y is expected to lie
on 95% of occasions.

That is exactly what he does say: just to get the sign of the effect
right 95% of the time, you need a very high correlation. This is the
abstract of his article, "Population Statistics Cannot Be Used
for Reliable Individual Prediction":

“It is known that predictions about individuals from statistical
data about the population are in general unreliable. However, the size of
the problem is not always realised. For a number of ways of predicting
information about one variable from another with which it is correlated,
we compute the reliability of such predictions. For the bivariate normal
distribution, we demonstrate that unless the correlation is at least
0.99, not even the sign of a variable can be predicted with 95%
reliability in an individual case. The other prediction methods we
consider do no better. We do not expect our results to be substantially
different for other distributions or statistical analyses. Correlations
as high as 0.99 are almost unheard of in areas where correlations are
routinely calculated. Where reliable prediction of one variable from
another is required, measurement of correlations is irrelevant, except to
show when it cannot be done.”

There has been a lot of wishful thinking in psychology. The response to
my comments and Richard Kennaway’s analyses has been “Oh, it can’t
possible be that bad, I don’t believe it, no way, how could so many
people have been that wrong?” A severe case of denial if I ever saw
one.

Best,

Bill P.

Bill_Powers1 · June 23, 2009, 4:29pm

[From Bill Powers (2009.06.23.1005 MDT)]

Richard Kennaway (2009.06.23.1507 BST)]

Should I dust this paper off and send it to a journal? Which one?

It's great that your post coincided with mine. But the question you ask is a tough one. Look at the forum you'rd writing for right now, in which everyone is supposedly a rabid supporter of PCT and out to conquer the world of psychology. There's about as much resistance here to what you say as there would be in a journal of personality where they like correlations of 0.3 and less. If many people in psychology were to wake up to the discouraging facts you've made so obvious, most of psychology would go up in smoke.

OF COURSE you should try to get this paper published, and make it even more decisive than it is if you can. But before you try that I think you should simply get in touch with journal editors and tell them what you have, and explain the difficulties that will be caused if your conclusions are widely published. I think most of them will turn you down. But all you need is one editor of a major journal who is ready for the controversy.

The real question is, are WE ready for this? The defensiveness will be fierce and the effort to find a flaw in your arguments will be desperate. We will not be able to avoid some serious clashes.

One suggestion. I think you might be able to devise some cookbook formulae that could be applied to data sets to determine just how useful they might be in predicting anything about behavior or human characteristics. The Kennaway Index of Predictivity, or something like that. If the Index is easy enough to compute, maybe people will start using it. The result might be expressed in units of the number of wrong predictions per hundred, to call attention to the reason we want the index to be a lot higher (or lower) than it is.

Best,

Bill P.

Richard_Kennaway · June 23, 2009, 6:00pm

[From Richard Kennaway (2009.06.223.1853 BST)]

[From Bill Powers (2009.06.23.1005 MDT)]

Richard Kennaway (2009.06.23.1507 BST)]

Should I dust this paper off and send it to a journal? Which one?

It's great that your post coincided with mine. But the question you ask is a tough one. Look at the forum you'rd writing for right now, in which everyone is supposedly a rabid supporter of PCT and out to conquer the world of psychology. There's about as much resistance here to what you say as there would be in a journal of personality where they like correlations of 0.3 and less. If many people in psychology were to wake up to the discouraging facts you've made so obvious, most of psychology would go up in smoke.

OF COURSE you should try to get this paper published, and make it even more decisive than it is if you can. But before you try that I think you should simply get in touch with journal editors and tell them what you have, and explain the difficulties that will be caused if your conclusions are widely published. I think most of them will turn you down. But all you need is one editor of a major journal who is ready for the controversy.

We have some psychologists here: which journals would they suggest approaching? Starting from those that are most prestigious, widely read, and general in coverage, and working down. I found a table of "impact factors" on the web, for what it's worth, according to which the higher impact journals are these:

Acta Psychol.
Appl. Cognitive Psych.
Applied and Preventive Psychology
Arch. Clin. Neuropsych.
Behav. Brain Sci.
Biol. Psychol.
Brain Cognition
Brit J. Math. Stat. Psychol (specialised, but to the topic of this paper)
Cognitive Psychol.

...well, too many to go through. There are 423 psychology and 1311 soc sci journals listed! Rick, Bruce, anyone who publishes in this sort of journal, where would be the best possible place to see this?

I'm drafting a covering letter. I'd also like to update some of the references, e.g. McPhail's 1971 paper on typical published correlations. I also want to eliminate the reference to Herrnstein and Murray, as that's a rather controversial work and this controversy is quite enough without gratuitously tying it to that one. I need some uncontentious and well-established correlation of a similar magnitude to that found between genes and IQ.

···

--
Richard Kennaway, jrk@cmp.uea.ac.uk, Richard Kennaway
School of Computing Sciences,
University of East Anglia, Norwich NR4 7TJ, U.K.

MartinT · June 23, 2009, 7:27pm

[Martin Taylor 2009.06.23.14.16]

[From Richard Kennaway (2009.06.223.1853 BST)]

I'm drafting a covering letter. I'd also like to update some of the references, e.g. McPhail's 1971 paper on typical published correlations. I also want to eliminate the reference to Herrnstein and Murray, as that's a rather controversial work and this controversy is quite enough without gratuitously tying it to that one. I need some uncontentious and well-established correlation of a similar magnitude to that found between genes and IQ.

May I suggest that Psychometrika <http://www.psychometrika.org/journal/Psychometrika.html> is the obvious journal to try first? From its web site: "Psychometrika...contains articles on the development of quantitative models of psychological phenomena, as well as statistical methods and mathematical techniques for evaluating psychological and educational data."

If you go back a way, perhaps you might find McGill, Psychometrika, 1954 19, 97-116 on multivariate information transmission as a foundational reference. I used that as a starting point for looking up on scholar.google.com, and found a few more that I can't get at without paying, though you might. Also Garner and McGill, Psychometrika 21, 1956, 219-228 "The relation between information and variance analyses".

Informationally, it might be useful to cite Andrew Fraser's result that the mutual information between two linearly related Gaussian variables is 0.5 log2 (1 - corr^2). That gives the same result you cite, at a correlation of 0.866, knowing the value of one of the variables gives you at least one bit of information about the value of the other. The point of "at least" is that the correlation gives the minimum possible value for the mutual information of two variables, not the actual value unless the variables are linearly dependent Gaussian.

Physica D 34 (1989) 391-404
North-Holland, Amsterdam
RECONSTRUCTING ATTRACTORS FROM SCALAR TIME SERIES: A COMPARISON OF
SINGULAR SYSTEM AND REDUNDANCY CRITERIA
Andrew M. FRASER
Department of Physics, The University

As so many have done over the years, Fraser also points out the distinction between variables being linearly independent and being "generally independent". As the Wikipedia page on correlation points out, you can have the current value of one variable precisely specified by the current value of another while their correlation is zero (which is a little different from your example of the integral or the derivative of a function being uncorrelated with the function itself, which demonstrates that you can have zero correlation when one variable has a direct causal effect on the other). It's true that quite a few researchers don't realize that while correlation demonstrates the existence of a relationship, lack of correlation does not demonstrate the lack of a relationship.

Martin

rsmarken · June 23, 2009, 8:49pm

[From Rick Marken (2009.06.23.1350)]

Richard Kennaway (2009.06.223.1853 BST)–

We have some psychologists here: which journals would they suggest approaching? Starting from those that are most prestigious, widely read, and general in coverage, and working down. I found a table of “impact factors” on the web, for what it’s worth, according to which the higher impact journals are these:

Acta Psychol.

Appl. Cognitive Psych.

Applied and Preventive Psychology

Arch. Clin. Neuropsych.

Behav. Brain Sci.

Biol. Psychol.

Brain Cognition

Brit J. Math. Stat. Psychol (specialised, but to the topic of this paper)

Cognitive Psychol.

…well, too many to go through. There are 423 psychology and 1311 soc sci journals listed! Rick, Bruce, anyone who publishes in this sort of journal, where would be the best possible place to see this?

Could you send me the URL for the impact factor list? Then I might be able to help. I would suggest going for the journals with the highest impact factors that publish the sort of article you want to publish. I know we had some bad luck with Psych. Methods some time ago. My experience is that what works is persistence (and not listening to Bill Powers help you improve it, even if it would improve it;-). I would say fixate on the journal you think is the best, submit the article and, if it gets rejected, appeal, and if you lose the appeal pick the next best journal (that is appropriate with high impact) and do it again and again until you succeed. These editors are just human and they can be worn down;-)

. I guess I would go into this with an ordered list of the journals that you want to publish in. Then just push, push, push. I’ve found that persistence does pay off. As your very own Rolling Stones say: You can’t always get into the journal you want but if you try sometimes, you just might find, you get into the journal you need!

Oh, and I do have some data that might interest you. I went through every experimental research article publsihed in 3 issues of the journal Psychological Science (a very high impact journal of experimental – now mainly cognitive – psychology) and found the r2 (or equivalent eta2) values for every significant result reported in the results section, a total of 217 significant results and, thus, 217 r2 values. The average (mean) r2 value was .34 (which is an average correlation of .58) and the median r2 was .26 ( for a median r of .51). The research reported in Psychological Science is the creme de la creme of psychological research so the average correlation overall in experimental psychology is probably considerably lower, as can be seen from the correlations in the locus of control study the Bill posted (which wasn’t even experimental).

In my “Revolution” paper I cite the extraordinarily poor results of conventional psychological experiments (in terms of r2, the “proportion of variance in the DV accounted for by the IV”) as strong evidence that conventional psycholoy is using the wrong model (the general linear model of statistics). Your paper would certainly support that idea. I’ll help you with it in any way I can.

Best

Rick

···

–
Richard S. Marken PhD
rsmarken@gmail.com

Richard_Kennaway · June 23, 2009, 9:10pm

[From Richard Kennaway (2009.06.23.2207 BST)]

[From Rick Marken (2009.06.23.1350)]
Could you send me the URL for the impact factor list?

http://www.staff.city.ac.uk/~sj361/here_you_can_see_an_excel_spread.htm

Of course, impact factors are just another piece of statistics...

···

--
Richard Kennaway, jrk@cmp.uea.ac.uk, Richard Kennaway
School of Computing Sciences,
University of East Anglia, Norwich NR4 7TJ, U.K.

rsmarken · June 24, 2009, 12:22am

[From Rick Marken (2009.06.23.1720)]

Richard Kennaway (2009.06.23.2207 BST)

[From Rick Marken (2009.06.23.1350)]

Could you send me the URL for the impact factor list?

http://www.staff.city.ac.uk/~sj361/here_you_can_see_an_excel_spread.htm

Of course, impact factors are just another piece of statistics…

Thanks. I’ll get back to you with a suggested set of journals after I look it over. But I see that the journal in which I published my “Revolution” paper (Review of General Psych) didn’t even make the cut; its impact must be near zero. But it did have a brief impact on George Mandler, a pretty famous psychologist who wrote to me, not to thank me for pointing out that psychology has had it all wrong for the last 100 years, I’m afraid, but rather to protest the use of an article of his as a reference. We did iron things out with the reference but that’s about as far as it got. But the fact that the article did have an “impact” on this one person – and a prominent one at that – confirms your observation that the impact factor is just another piece of statistics;-)

Best

Rick

···

–
Richard S. Marken PhD
rsmarken@gmail.com

rsmarken · June 24, 2009, 2:17am

[From Rick Marken (2009.06.23.1920)]

Rick Marken (2009.06.23.1720)–

Richard Kennaway (2009.06.23.2207 BST)

[From Rick Marken (2009.06.23.1350)]

Could you send me the URL for the impact factor list?

http://www.staff.city.ac.uk/~sj361/here_you_can_see_an_excel_spread.htm

Of course, impact factors are just another piece of statistics…

Thanks. I’ll get back to you with a suggested set of journals after I look it over.

OK, here’s my list.

Psychological Inquiry
American Psychologist
Psychological Science
Psychological Methods

I’m not familiar with Psychological Inquiry but it has a high impact and says it publishes your kind of paper (controversial). I’d go through them in the order listed. You might have to tame the math down a bit for some of them; maybe put it in an appendix.

Best

Rick

···

–
Richard S. Marken PhD
rsmarken@gmail.com

MartinT · June 24, 2009, 3:50am

[Martin Taylor 2009.06.23.23.03]

[From Bill Powers (2009.06.23.0848 MDT)]

That is exactly what he does say: just to get the sign of the effect
right 95% of the time, you need a very high correlation. This is the
abstract of his article, "Population Statistics Cannot Be Used
for Reliable Individual Prediction":

“It is known that predictions about individuals from statistical
data about the population are in general unreliable. However, the size
of
the problem is not always realised. For a number of ways of predicting
information about one variable from another with which it is
correlated,
we compute the reliability of such predictions. For the bivariate
normal
distribution, we demonstrate that unless the correlation is at least
0.99, not even the sign of a variable can be predicted with 95%
reliability in an individual case. The other prediction methods we
consider do no better. We do not expect our results to be substantially
different for other distributions or statistical analyses. Correlations
as high as 0.99 are almost unheard of in areas where correlations are
routinely calculated. Where reliable prediction of one variable from
another is required, measurement of correlations is irrelevant, except
to
show when it cannot be done.”

There has been a lot of wishful thinking in psychology. The response to
my comments and Richard Kennaway’s analyses has been “Oh, it can’t
possible be that bad, I don’t believe it, no way, how could so many
people have been that wrong?” A severe case of denial if I ever saw
one.

When you are thinking about the values of observed correlations,
remember that you can get high correlations between x and y ONLY if
either x is the only variable influencing y, y is the only variable
influencing x, or some other variable influences both x and y
similarly. If you have two independent variables x1 and x2 both
influencing y about equally and totally causally, the highest
correlation you can get of y with either is .707 (1/sqrt(2)). This is
in the absence of noise or any randomness, when the measurements of x1,
x2, and y are precise. More generally, if N variables independently and
equally influence some dependent variable, the maximum correlation you
will see between any one of the independent variables and the dependent
variable is 1/sqrt(N).

To see this, think of the vector representation of the situation. Take
the N independent variables as unit vectors at right angles to each
other in N dimensions (meaning that they are independent and of equal
variance) and the single dependent variable as the vector connecting
the origin through the opposite corner of the N-dimensional hypercube
(meaning that it is completely and equally specified by the N
independent variables). When figuring out the correlation angles, we
don’t need to worry about the length of the vector for the dependent
variable, so we can simply connect the origin to the far corner of the
hypercube. It then becomes the hypotenuse of a right-triangle, with
length sqrt(N) and the cosine of the angle at the origin (the
correlation between the dependent and the k-th independent variable) is
1/sqrt(N). If N=1, the correlation is 1 (the independent variable
determines the dependent one). If N = 2, the correlation of either
independent variable with the dependent one is 1/sqrt(2), and so forth.

Of course, one would not usually expect each independent variable to
have the same variance or to have the same influence over the dependent
variable, so the independent vector lengths would normally not all be
the same, and the dependent vector usually would not go directly
through the far corner of the hypercube. But the general picture will
not change: If one is looking to see whether a putative independent
variable, v, has an influence on something of interest, y, either you
have to be very sure that nothing else can influence y, or you must be
looking for correlations that at best will be small in the context of
this thread.

Why then can one get high correlations between models and human in
control studies, or at least in tracking studies (I know of no
comparable data for control of higher-level perceptions)? Because it
exactly IS control that one is studying, control of a single perception
influenced by a disturbance that is almost known exactly. The model is
subjected to exactly the same disturbance waveform as was the human,
and the experimenter has reason to believe that the controlled
perception is known and most significant disturbances are known and
measurable. The human output is x, the model output is y, and both are
influenced similarly by the same disturbance d. Under these conditions,
N=1, because there is no other influence on x or y that might reduce
the maximum available x-y correlation below 1.0.

In contrast, when one is looking to see whether some real-world
variable affects a process (or a human’s behaviour) in the real world,
one can rarely assert or assume that there are no other influences on
the process, and usually there are other influences. I do not think it
is correct, therefore, to denigrate observational sciences because the
correlations are not such as to allow one sample from one variable to
provide precise determination of another.

It is reasonable, however, to denigrate theories of behaviour that are
not manifest in a workable model whose correlations with behaviour
cannot be compared with those obtained using PCT, or simply that are
incompatible with the control of perception. The problem is how to
achieve the hoped-for high correlations between disturbance and output,
or between model and human, even using perceptual control theory in the
messy real world, when the observer/experimenter cannot know all of the
disturbances that may be influencing the (imprecisely known) controlled
perception, or perhaps even what perceptions are being controlled in
producing the observable actions.

Bottom line: When more than one independent variable can substantially
influence a dependent variable, high correlations are impossible in
principle. Nevertheless, the existence of even a low correlation
between x and y shows either that x influences y, y influences x, or
something else influences them both.

Do not discard evidence of mutual influence just because the proportion
of variance accounted for is less than 0.5 (correlation 0.707).
Consider instead whether the influence indicated by the “low”
correlation is one that might be interesting to investigate further,
using different methods.

Martin

rsmarken · June 24, 2009, 6:15am

[From Rick Marken (2009.06.23.2315)]

[Martin Taylor 2009.06.23.23.03]

Bill Powers (2009.06.23.0848 MDT)

There has been a lot of wishful thinking in psychology. The response to
my comments and Richard Kennaway’s analyses has been “Oh, it can’t
possible be that bad, I don’t believe it, no way, how could so many
people have been that wrong?” A severe case of denial if I ever saw
one.

When you are thinking about the values of observed correlations,
remember that you can get high correlations between x and y ONLY if
either x is the only variable influencing y, y is the only variable
influencing x, or some other variable influences both x and y
similarly.

The goal of experimental research is to find the variables that account for the variance in behavior (the DV in experiments). In most psychological experiments, several variables (IVs) are manipulated simultaneously. The total amount of variance in the DV that is accounted for by all IVs (and their interactions) is rarely more than .7 (that’s the multiple r2 value; the multiple correlation would be .83). As research progresses and we learn which IVs have an effect on a DV we’re supposed to be able to add these IVs (and the interactions between them) together to continue increasing the amount of variance in the DV that we can account for, eventually approaching a perfect 1.0. But, of course, research has never come close to accounting for all the variance in a DV; a researcher who can add enough variables to pick up 80% (r2=.8) of the variance in the DV would consider that a major triumph. The fact that IVs and interactions never pick much more than 80% of the variance in the DV is written off to assumption that there is a lot of “noise” in behavior – at least 20% of the variance in the typical DV is, thus, considered noise. In my “Revolution” paper I refer to Phil Runkel’s observation that is even 0.1% of the variance in behavior was noise we would see people stumbling and falling all the time.

Why then can one get high correlations between models and human in
control studies, or at least in tracking studies (I know of no
comparable data for control of higher-level perceptions)? Because it
exactly IS control that one is studying

No, psychologists are always studying control. The problem is that only PCTers know that that’s what they’re studying and use the correct models to understand it. And an example of comparable data (.99 correlations between model and data) for higher level perceptions using data collected in “real world” situations is some modeling I did of baseball catching data obtained by some folks at Oxford. They gave me ball trajectories (disturbances) and running paths (outputs) for many different catches. I compared the outputs of my model, controlling different hypothetical controlled variables (functions of the disturbances and outputs) to the actual outputs. I measured the fit of the model using RMS deviation but I just recomputed the fit as correlations and found that the correlation between model and actual output was typically over .99.

Have a nice dip in de Nile;-)

Best

Rick

···

–
Richard S. Marken PhD
rsmarken@gmail.com

Richard_Kennaway · June 24, 2009, 6:51am

[From Richard Kennaway (2009.06.24.0705 BST)]

Bottom line: When more than one independent variable can substantially influence a dependent variable, high correlations are impossible in principle. Nevertheless, the existence of even a low correlation between x and y shows either that x influences y, y influences x, or something else influences them both.

This is true if "influences" is taken to include indirect effects, i.e. A would be said to influence B in the situation where A has a direct causal effect on some variable C and C has a direct causal effect on B. But this actually tells you very little about where the direct causal links are.

Consider the control case, and the correlations between D (disturbance), P (perception), and O (output), assuming that the reference is held constant. The correlations found are:

  D P: zero
  P O: zero
  D O: close to -1

The physical model is:

P is caused by D and O: P = D + O (+ some small amount of noise)
O is caused by P: dO/dt = -kP

The graph of direct causal links is this:

           --->
  D ---> P O
                  <---

Correlation is zero precisely where direct causal influence is present, and correlation is large precisely where direct causal influence is absent.

The transitive closure of that graph adds an arrow from D to O, representing the indirect causal influence via P, and that is the only arrow corresponding to a large correlation.

If you just have the time series for D, P, and O, and are trying to decide where causality might or might not be present -- that is, to determine the graph of direct causal links -- then there is a large number of possibilities consistent with the correlations. With 6 possible arrows there are 64 candidates. If you know that D is exogenous (i.e. not influenced by either of the other variables), then there are four possible arrows, and 16 candidate graphs. I reckon that at least 6 and possibly 9 of them are consistent with the correlations.

Each of the equations below represents the causation of the left hand side by the right. "int" means integral.

1. O = -D, P independent of both D -> O
2. O = -D, P = dO/dt D -> O, O -> P
3.* O = -D - int(P), P = dO/dt D -> O, O -> P, P -> O
4. dP/dt = D, O = -dP/dt D -> P, P -> O
5.! P = D+O, dO/dt = -kP D -> P, O -> P, P -> O
6. O = -D, dP/dt = D D -> O, D -> P
7. O = -D, dP/dt = D-O D -> O, D -> P O -> P
8.* dP/dt = D, O = -D-int(P) D -> O, D -> P, P -> O
9.* P = D+O, dO/dt = -kP-dD/dt D -> O, D -> P, O -> P, P -> O

The starred ones are the ones I'm not sure about, I'll have to run some simulations. The !d one is the control system.

It has been said that "Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there.'" (see xkcd: Correlation, and mouseover the cartoon). However, it really doesn't get you very far.

···

--
Richard Kennaway, jrk@cmp.uea.ac.uk, Richard Kennaway
School of Computing Sciences,
University of East Anglia, Norwich NR4 7TJ, U.K.

Bill_Powers1 · June 24, 2009, 8:32am

[From Bill Powers (2009.06.24.0118 MDT)]

It’s great to see everyone pulling together here as if PCT might even be
the preferred idea. I feel definitely outgunned on the subject of
statistics, though I’ll still try to contribute.

[Martin Taylor 2009.06.23.23.03]

MT: When you are thinking about the values of observed correlations,
remember that you can get high correlations between x and y ONLY if
either x is the only variable influencing y, y is the only variable
influencing x, or some other variable influences both x and y similarly.

Rick Marken (2009.06.23.2315)]

RM to MT: The goal of
experimental research is to find the variables that account for the
variance in behavior (the DV in experiments). In most psychological
experiments, several variables (IVs) are manipulated
simultaneously.

BP: I think the big difference here is whether the relationship being
studied is based on a model or is purely empirical. In a model of a
system, all the relationships in the model are expressed as mathematical
dependencies, and for the model to run at all, every variable that isn’t
an independent variable must be a function of all the independent
variables that affect it. Models are determinate, unless you deliberately
include a random variable in them. It’s not that x is the only variable
influencing y, but that all influences on y are accounted for, so the
system of equations has a solution or solutions. The influences don’t add
statistically; they add as the system equations indicate,
quantitatively.

This is also a requirement if the system being modeled is to function.
Unless each system variable is a function of the independent variables,
the set of equations that is implicit in the organization of the system
will not have a solution and the behavior of the system will actually be
indeterminate. The same system will not necessarily behave the same way
twice in a row.

In a purely empirical study there is no known or proposed system of
equations, unless you want to count the regression equation which is the
same for any two variables. Rick has been pointing out that this is
equivalent to proposing the wrong model of the real system. If the real
system is made of control systems, you need to use the control system
equations, not the linear regression equations which assume an incorrect
relationship between the variables.

When we do experiments to check out a model, we know exactly what
behavior the model predicts. Since the calculations are not stochastic,
we make no use of correlations in the model. Of course we can insert
random noise wherever we wish, but as soon as we do that, we become
unable to make model-based predictions except in a vague general way
involving mean values and distributions.

RM: The total amount of
variance in the DV that is accounted for by all IVs (and their
interactions) is rarely more than .7 (that’s the multiple r2 value; the
multiple correlation would be .83). As research progresses and we
learn which IVs have an effect on a DV we’re supposed to be able to add
these IVs (and the interactions between them) together to continue
increasing the amount of variance in the DV that we can account for,
eventually approaching a perfect 1.0. But, of course, research has never
come close to accounting for all the variance in a DV; a researcher who
can add enough variables to pick up 80% (r2=.8) of the variance in the DV
would consider that a major triumph. The fact that IVs and interactions
never pick much more than 80% of the variance in the DV is written off to
assumption that there is a lot of “noise” in behavior – at
least 20% of the variance in the typical DV is, thus, considered noise.
In my “Revolution” paper I refer to Phil Runkel’s observation
that is even 0.1% of the variance in behavior was noise we would see
people stumbling and falling all the time.

BP: Of course. But this is an example of a non-deterministic model. A
control-system model is fully determined, so it “accounts for
variance” not in the statistical sense, but in detail. Every
variation in every system variable is a specific function of the set of
all independent variables. Every variation is quantitatively predicted
(correctly or not). The variances in the real system do NOT add in
quadrature; they add systematically according to the organization of the
system.

When you put an integrator in the output function of a control system
model, you don’t get “no relationship” between the error signal
and the output quantity. You get a very specific relationship such that
knowing the behavior of the error signal you can compute exactly the
behavior of the output quantity. It may be true that the correlation of
error with output is zero, but that says only that correlation is not the
way we need to characterize the output function. The integral of the
error signal will, in fact, correlate almost perfectly with the output
quantity in the model – and as it turns out, our measures of error,
integrated, will correlate close to perfectly with the observed output
behavior. The mouse velocity is very close to the integral of the
tracking error.

MT: Why then can one get high correlations between models and human
in control studies, or at least in tracking studies (I know of no
comparable data for control of higher-level perceptions)? Because it
exactly IS control that one is studying

BP: I think this is not quite the right way to put it. It’s not that we
can choose to study the control relationships in a system any time we
happen to be interested in doing that. How would you study control
relationships in a system that is not organized as a control system, such
as a lever? You might assume that there are perceptual, reference, error,
and output signals, but if the real system doesn’t function that way, you
will get very poor predictions of performance and when you try variations
suggested by the model, they won’t work. In fact, that’s how we pick one
model over another: we try out the system equations and see how well the
solutions fit with the behavior of the system variables that we can
observe. If all models performed equally well just because you’re
interested in them, there would be no case for control theory at
all.

It’s probably true that you can always come up with a control-system
model that will approximate the two-variable input-output behavior of an
open-loop system. But when you start investigating all the variables that
can be observed, you can eliminate all the models that don’t account for
all of them. If there are still undetermined variables you simply don’t
have a complete model, and either need to experiment more with the real
system, or add changes to the model as hypotheses. You know you have a
valid experiment when the number of variables the model accounts for
equals the number of observed variables. A simulation of a system doesn’t
just do pairwise predictions: it always predicts all the system variables
at the same time, and it predicts them quantitatively, not
stochastically.

RM: Have a nice dip in de
Nile;-)

BP: Do you want us all to be together in this, or are you trying to drive
Martin out? See a shrink. You’re sounding like Rush Limbaugh.

Best,

Bill P.

BRUCE_STEPHANIE_ABBO · June 24, 2009, 12:33pm

[From Bruce Abbott (2009.06.24.0835 EST)]

Richard Kennaway (2009.06.23.1507 BST)

Bruce Abbott (2009.06.23.0955 EST)

BA: I may be sticking my neck out here as Iï¿½m not sure Iï¿½ve completely
understood Kennawayï¿½s paper, but I believe that the prediction of sign
was based on distributions of normalized variables; their mean values
would be zero. In the limiting case of zero correlation, there would be
a 50% probability that the prediction of sign would be correct.

This would not be the case for a variable whose mean was non-zero.

RK: For the case of non-zero means, the table in my paper applies to
predicting whether Y is above or below its mean from whether X is above or
below its mean.

O.K.

Imagine a scatter plot (Y as a function of X) relating human height
versus weight. Both variables are always positive, and they tend to be
strongly positively correlated. At each height along the X-axis, there
would be a corresponding distribution of weights along the Y-axis. The
regression line fitted to these data would give the predicted value of
Y at a given X value. For each observed value of Y, we could compute
the difference between that observed value of weight and the predicted
value. This difference is called a residual. Residuals can be positive,
negative, or zero depending on whether the observed value is above,
below, or on the predicted value, respectively. The residuals tend to
get smaller as the correlation between X and Y increases, reaching zero
when the correlation is plus or minus 1.0.

The task here is to ï¿½correctlyï¿½ predict the value of Y for a given
individual, based on that individualï¿½s height. How successful you are
depends on how you define a correct prediction. For a theoretical
distribution containing an infinite number of possible Y-values, the
chances of making a perfect prediction to an infinite number of decimal
places is essentially zero. To make the task practical, you have to
specify the range of values around the predicted value that would be
considered close enough to qualify as correct.

As the correlation between X and Y increases, the width of the
distribution of Y-values at a given value of X becomes smaller.
Consequently, any given prediction is more likely to fall within some
acceptable margin of error.

I mention all this because Iï¿½m worried that Kennawayï¿½s
prediction-of-sign result might be misinterpreted to mean that one
almost cannot predict the sign of Y from X, let alone get close to the
actual value of Y, unless the correlation is extremely high.

RK: No worries -- that is the correct interpretation, understanding sign as
the sign of the difference between Y and its mean. And you really cannot
get close to the actual value of Y, unless the correlation is extremely
high.

Yes, of course: The linear transformation doesn't change things. It might be
worth emphasizing here that we are discussing predicting an individual value
of Y, one of many values of Y that may be paired with a given value of X.
For example, if we plotted the heights and weights of many individuals on a
scatter plot, at a given height we may find many individuals of various
weights. Based on the regression line relating weight to height, we can
predict the weight of a person given his or her height, but the predicted
weight is an estimate of the mean weight of the population of individuals of
that height. If other factors besides height influence weight, then the
individual's actual weight will differ from this predicted value, more or
less, depending on the effect of those other factors. Those other, unknown
or unmeasured factors reduce the correlation between height and weight.

As you have shown so clearly, predicting an individual value is a rather
dicey affair unless the correlation between X and Y is very high. To improve
that prediction, one can seek to identify and measure those other factors
and include them in a multiple regression equation. This will raise the
correlation (in this case, the multiple R), and R-sq, the percentage of
variance in a response variable that is accounted for by variance in the
predictor variables.

Most of the work involving correlation is aimed at identifying factors that
can account for the variance in some response variable. For example, one can
seek to identify factors that contribute to cardio-vascular disease. Diet,
exercise, level of stored body fat, distribution of body fat, and levels of
cholesterol have been investigated and found to have some predictive value.
However, without a detailed model of how these and other factors have their
influences (directly or indirectly) within the body, one cannot say whether,
for a given individual, reducing cholesterol (or changing the HDL/LDL) ratio
will help or hurt. The regression equations only tell us that making such
changes in the whole population will improve the cardio-vascular health of
the population on average. Still, such research may help to identify factors
that need to be included in a detailed physiological model, as Phil Runkel
noted in "Casting Nets."

It would probably be clearer to speak instead of the confidence
interval (e.g., the 95% CI) around the predicted value. This is the
interval within which the true value of Y is expected to lie on 95% of
occasions.

RK: This is also covered in the paper. The best estimate of the value of Y
(rather than of its rank in its own distribution) is cX. To determine the
spread of errors, one must look at the standard deviation of Y, conditional
on knowing the value of X. If the unconditional standard deviation of Y is
s, the conditional s.d. is s*sqrt(1-c*c). The ratio of the former to the
latter is called the "improvement ratio", 1/sqrt(1-c*c). This is tabulated
in Table 1, and may be dispiriting reading for someone hoping to predict Y
from X from a scatterplot. For example, a correlation of 0.866 reduces the
standard deviation by a factor of only 2.

I'm picturing a scatter plot relating weight to height, with the regression
line fitted to the data. With a large enough data set, the points would tend
to form an oval cloud around the line. The larger the correlation, the more
the points will tend to stay close to the line, reducing the minor axis of
the oval relative to the major axis. Thus, at any given height, the
distribution of body weights will be smaller (lower standard deviation). An
average of these conditional distributions is provided by the standard
deviation of the residuals (deviations of the points from the regression
line, along the Y-axis.) This standard deviation can be compared to the
standard deviation of body weight overall, ignoring height. With a
correlation of 0.866, the standard deviation of the residuals is half that
of the body weights overall, and the weight predicted from height is closer
to actual height, on average, than simply predicting that an individual will
have the average weight of the population as a whole. Your predictions of
weight, based on height, will be closer to the observed values on average.
However, unless the correlation is high, the ability to predict an
individual Y from X will remain poor.

RK: One might compare this to the effect of genuine measurements. My weight
varies over a range of about 5 pounds -- suppose a standard deviation of 2
pounds. When I weigh myself, my scales have a resolution of 0.2 pounds. If
I assume they're accurate, then that's an improvement ratio of roughly 2/0.2
= 10. Equivalent correlation: way off the end of Table 1.

Genuine measurements? I don't understand the distinction you are making
between "genuine measurements" (weighing yourself) and predictions based on
regression. The measurements entered into regression analysis can be just as
"genuine" as any.

RK: Should I dust this paper off and send it to a journal? Which one?

Because the focus of the paper is on making predictions of individual values
and not on using correlation to identify predictive variables, I would
suggest that a journal on tests and measures (psychometrics) would be
appropriate.

Bruce A.

Richard_Kennaway · June 24, 2009, 1:15pm

[From Richard Kennaway (2009.06.24.1403 BST)]

[From Bruce Abbott (2009.06.24.0835 EST)]

Richard Kennaway (2009.06.23.1507 BST)
RK: One might compare this to the effect of genuine measurements. My weight
varies over a range of about 5 pounds -- suppose a standard deviation of 2
pounds. When I weigh myself, my scales have a resolution of 0.2 pounds. If
I assume they're accurate, then that's an improvement ratio of roughly 2/0.2
= 10. Equivalent correlation: way off the end of Table 1.

Genuine measurements? I don't understand the distinction you are making
between "genuine measurements" (weighing yourself) and predictions based on
regression. The measurements entered into regression analysis can be just as
"genuine" as any.

But the act of predicting the other variable isn't a real measurement of that variable. You can't measure someone's weight by measuring their height, however accurately you do the latter. You really measure their weight by standing them on a set of scales.

A strict Bayesian might say here that it's all evidence, it all adds up, and it's a sin to discard it, but with low correlations that's like saying that unplugging your mobile phone charger helps to economise on energy. If the scales are accurate to 1%, or 6.6 bits of information, their height won't significantly shift that measurement.

···

--
Richard Kennaway, jrk@cmp.uea.ac.uk, Richard Kennaway
School of Computing Sciences,
University of East Anglia, Norwich NR4 7TJ, U.K.

Richard_Kennaway · June 24, 2009, 2:37pm

[From Richard Kennaway (2009.06.24.1513 BST)]

[From Richard Kennaway (2009.06.24.1403 BST)]
A strict Bayesian might say here that it's all evidence, it all adds up, and it's a sin to discard it, but with low correlations that's like saying that unplugging your mobile phone charger helps to economise on energy. If the scales are accurate to 1%, or 6.6 bits of information, their height won't significantly shift that measurement.

I decided to work out how useless a weak correlation is when you already have a strong one. Suppose you have some random variables related as follows:

H = W + D1
S = W + D2

where W, D1, and D2 are independently normally distributed variables with standard deviations respectively 1, d1, and d2. Think of W being someone's weight, H their height, S the measurement of their weight on scales, and D1 and D2 are random noise. If W is normalised to have standard deviation 1, then take d1 = 0.5 and d2 = 0.1. So these scales aren't very good, only 10% accurate.

Then the correlation between H and W is cHW = 1/sqrt(1 + d1^2) = 0.89.
Similarly, cSW = 1/sqrt(1 + d2^2) = 0.995.

Expressed as improvement ratios, the amount by which knowing H narrows one's uncertainty about W is sqrt(1 + 1/d1^2) = sqrt(5) = 2.24, which corresponds to a mutual information of 1.16 bits.

For S and W, the ratio is sqrt(101) = 10.05, and the mutual information is 3.3.

Now, suppose we know S, and then measure H. How much does the additional measurement improve the accuracy of our estimate of W? The answer is not 3.3+1.16 = 4.46. The real answer is that the improvement ratio for predicting W from S and H is sqrt(1 + 1/d1^2 + 1/d2^2), and the ratio of this to the ratio when we knew just S is:

sqrt(1 + 1/d1^2 + 1/d2^2)

···

-------------------------
sqrt(1 + 1/d2^2)

Remembering that 1/d2^2 = 100, this is only a hair above 1. Precisely, it is sqrt(105/101) = 1.0196, corresponding to an additional 0.028 bits.

What can you do with 1/35 of a bit?

With proper scales that measure to 1%, we have d2 = 100 and the above ratio is sqrt(10005/10001) = 1.0002, and 0.0003 bits.

To sum up: real measurements render correlations irrelevant. Even when the measurement has 10% error and the correlation is 0.89.

--
Richard Kennaway, jrk@cmp.uea.ac.uk, Richard Kennaway
School of Computing Sciences,
University of East Anglia, Norwich NR4 7TJ, U.K.