The question occured to me: if A and B have a correlation of rAB, and B and C have a correlation of rBC, what bounds can be placed on the correlation between A and C?
If rAB and rBC are both 1, then obviously rAC must also be 1. If rAB = rBC = sqrt(1/2), it turns out that rAC can be anywhere in the range 0..1, but cannot be negative.
I worked out a general formula: rAC must be in the range
rAB rBC +/- sqrt( (1 - rAB^2) (1 - rBC^2) )
When rAB = rBC = r this simplifies to 2r^2 - 1 <= rAC <= 1.
Then I thought: suppose you have a chain of variables A, B, C, D... such that the correlation between every consecutive pair is r. How long does the chain have to be before the correlation between the first variable and the last is not guaranteed to be positive?
If r=1 then you can make an arbitrarily long chain and the correlation from beginning to end will always be 1. This corresponds to the notion of logical inference, where everything is simply true or false with certainty. r < 1 is a sort of fuzzy inference, and the more steps you take, the fuzzier things get, until all connection is lost with the starting point.
Suppose you want to be able to make n steps in such a chain of fuzzy inference. By n=1 I mean seeing positive correlations between A and B, and between B and C, and concluding a positive correlation between A and C. n=4 would mean getting from A to F.
Here's a table of the minimum correlation you need to perform a chain of length n.
n r
1 0.707104
2 0.866024
3 0.923878
4 0.951054
5 0.965923
10 0.989819
15 0.995182
20 0.997204
50 0.999523
An approximate formula for n >= 4 is r = 1 - 1.22/n(n+1).
So, unless r is at least 0.707, you can't string even two correlations together.
I can't resist a somewhat frivolous comparison with computers. A typical modern desktop or laptop has around a billion transistors being clocked a billion times a second. It can easily run continuously for a million seconds (about 12 days) without going wrong. That amounts to 10^24 individual transistor operations, every one of which works correctly. (You might not think it when you're dealing with the software, but the basic semiconductor transistor is the most reliable device ever invented. It has to be, or computers would be impossible.) Plugging n=10^24 into the above, r = 1 to 48 decimal places.
···
--
Richard Kennaway, jrk@cmp.uea.ac.uk, http://www.cmp.uea.ac.uk/~jrk/
School of Computing Sciences,
University of East Anglia, Norwich NR4 7TJ, U.K.