[From Bill Powers (2009.06.25.0813 MDT)]
Martin Taylor 2009.06.24.23.57 –
MT: I posted a comment 24 hours
ago almost exactly [Martin Taylor 2009.06.23.23.03] about the
impossibility of getting high correlations between x and z or between y
and z when x and y are independent and have a similar degree of influence
on z. In that 24 hours, we have had substantive comments from Rick, Bill,
and Richard, none of whom addressed the point I made.
BP: I don’t have the same level of competence concerning statistics that
others in this group have. However, for me the point has never been
whether correlations are high or low, except as a vehicle for
communicating with people who use correlations all the time. I don’t draw
any conclusions from the correlations I calculate – I use control theory
instead.
MT: It’s interesting that none
of the comments addressed my main point, which I can restate by quoting
my original message: "Bottom line: When more than one independent
variable can substantially influence a dependent variable, high
correlations are impossible in principle. Nevertheless, the existence of
even a low correlation between x and y shows either that x influences y,
y influences x, or something else influences them both.
BP: As I say, I’m not interested in correlations per se, so if I see
a low correlation, that will not lead me to investigate further because
there are so many more reliable relationships to investigate. See my
“Essay on the Obvious.” I assume that if correlation is low and
I still don’t understand the system, it’s because I have the wrong idea
of what is going on, or because it’s just low. When I look at
experimental results, all I care about is how well the model fits them,
and if it doesn’t fit very well (one possible low correlation situation),
how the model (or the theory) can be modified until it fits better.
Finding a correlation doesn’t show I’m on the right track – in fact,
that seems to be what you’re saying, too: a low correlation can be all
you’re ever going to get because a higher one is impossible. So that
makes the following seem contradictory to what you said before:
MT:" Do not discard
evidence of mutual influence just because the proportion of variance
accounted for is less than 0.5 (correlation 0.707). Consider instead
whether the influence indicated by the “low” correlation is one
that might be interesting to investigate further, using different
methods. "
By “different methods”, I hope you understood “within a
PCT framework”.
BP: If a low correlation is all that is possible to obtain, what is there
to “investigate further?” Why not just skip the correlation
part and go on investigating? I didn’t use statistics to arrive at the
PCT model: I used control theory and experience with designing and
building control systems.
The methods that depend on statistics assume that relationships among
variables are hard to find and that the best we can usually hope for is
just a hint, extracted with much labor, from data that consist
mostly of noise. With no other choice than doing the labor, a scientist
will be grateful to fate for granting him a little hint, and will work as
hard as necessary to pursue it. On the other hand, if you’re constructing
models, any failure to predict properly is taken as an error in the
model. You don’t just keep turning the crank and hoping to get steak
instead of hot dogs out of the machine. You fix the machine.
You say When more than one independent variable can substantially
influence a dependent variable, high correlations are impossible in
principle. The control system equations are a good example of a case
like this: the disturbance and the reference signal, both independent
variables relative to the control system, substantially influence all the
variables in the control loop, which are all dependent variables (you can
solve for any one of them). What this means is that if you know both the
disturbance and the reference condition, you can calculate the exact
values of all the other system variables from the model. If there’s an
integrating output function you would calculate low correlations between
input and output variables, but since they are all functions of the two
independent variables, that’s irrrelevant. You can already calculate the
system variables exactly, without any uncertainty, so the correlations
are of no interest. Neither does a low correlation imply that you can
improve your knowledge by looking for a higher correlation.
Calculating the exact values of system variables in a model does not, of
course, guarantee that they will match measurements of the real variables
exactly. That’s a different matter. When there are errors in the fit, the
first thing we have to do is look for systematic, not random,
errors. It’s not even a statistical problem. It’s a matter of examining
the system more closely to get a better idea of the actual relationships
among the variables. Only when you find that all the residuals are random
do you give up and attribute the errors to unpredictable system noise.
And then, of course, since the errors are truly random, you’ve done all
you can.
I think the point Richard K. may be leading up to, or about to discover,
is that correlations are what they are and do not increase our
understanding at all. They provide an illusion of understanding. The
whole point of statistics is that it purportedly can reveal the presence
of a relationship without giving any information about what it really is.
As far as model-building is concerned, “degree of relatedness”
is an empty phrase: once you see that there is some degree of
relatedness, you still have to do exactly the same things you have to do
in modeling to find out what the relationship is. I don’t think that
modeling depends very often on tracking down relationships through
statistical methods. Mostly you just examine the real system and try to
represent its components as a system of equations which you can then
solve analytically or by simulation. And that usually works, in my
experience.
MT: There’s another underlying
point, which is that if there is a consistent correlation between two
variables of interest, it really doesn’t matter whether the study was
conducted in a way we might consider proper. Either one variable affects
the other or some unspecified other variable affects them both. That is a
fact. It is my considered belief that properly understood, PCT will
explain that fact one day if at least one of the variables is a property
of a living organism. The fact itself should never be discarded, though
any proposed explanation of the fact may be, and probably will be, some
day.
BP: My view is that models will entirely supplant the statistical
approach, because models propose systematic relationships, not simple
straight-line approximations that only scratch the surface of a
phenomenon. I think we can freely discard low correlations because if
there is really any relationship there (there is probably not), we will
discover it again through modeling and get a far better understanding of
it.
MT: On the whole, I agree with
most of the substance of the comments made by Bill and Richard. I suppose
I must, since, where they are at all relevant to what I said, their gist
more or less expands on what I said. There are quite a few details with
which I might take issue, but I don’t think those details matter to my
point. They might matter in a different discussion, however, and it’s
possible I might bring them up some day in some other
context.
BP: It’s always frustrating to think that you have made a new point only
to have it be ignored. I often have that feeling, since I’ve been making
the same points I’m making now for 50 some years. On the other hand, you
can be encouraged by seeing others making the same points, which can
indicate that your ideas have independent support, or perhaps that they
have made some inroads even when others don’t remember that they learned
them from you. In the end, what matters is that new ideas get preserved
and spread while the rest sink back into the primordial ooze where they
came from. We are all better off for that.
Best,
Bill P.