Correlation considered as a stage magician

Richard_Kennaway · September 10, 2009, 8:12am

[From Richard Kennaway (2009.09.10.0837 BST)]

[Martin Taylor 2009.09.09.13.02]
The take-away message is that a reliable correlation -- even a low correlaton -- always indicates the existence of a causal relation, though that relation may not be directly between the correlated variables. A low correlation could be due to meaurement instability, or to the influence of variables neither considered nor measured, but its existence is a demonstration that there is a causal linkage either between X and Y or between them and another variable (in one direction or the other).

Sure. Where there is a correlation, *something* must be going on *somewhere*. The problem is that that something could be almost anything, and the actual causal relations may be far removed from where the correlation was observed.

I find the example of the simple proportional control system quite striking. With the reference fixed, the perception, disturbance, and output show correlations exactly where there is no direct causal connection, and absence of or weak correlation exactly where there is a strong causal connection. The correlations you observe tell you that something is happening somewhere, but the specific links they suggest are exactly the wrong ones.

I would almost suggest a rule of thumb: Ordinary correlations (that is, those smaller than about 0.8) tell you that the mechanism accounting for them is somewhere else. The mechanism, when you find it, may be as exactly reproducible in its operation as the mechanisms of physics, however mushy the original correlations were. The mushiness is not telling you about the strength of the connections, but about how far you are from discovering them. As long as your data make it worth measuring correlations, you aren't there yet.

XKCD had this to say: "Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'." ( http://xkcd.com/552 and mouseover the cartoon.) But the correlation is a stage magician, misdirecting the audience away from the real action.

···

--
Richard Kennaway, jrk@cmp.uea.ac.uk, Richard Kennaway
School of Computing Sciences,
University of East Anglia, Norwich NR4 7TJ, U.K.

MartinT · September 10, 2009, 12:29pm

[Martin Taylor 2009.09.10.07.45]

Well, at least we are converging, if not agreeing!

[From Richard Kennaway (2009.09.10.0837 BST)]

[Martin Taylor 2009.09.09.13.02]
The take-away message is that a reliable correlation -- even a low correlaton -- always indicates the existence of a causal relation, though that relation may not be directly between the correlated variables. A low correlation could be due to meaurement instability, or to the influence of variables neither considered nor measured, but its existence is a demonstration that there is a causal linkage either between X and Y or between them and another variable (in one direction or the other).

Sure. Where there is a correlation, *something* must be going on *somewhere*. The problem is that that something could be almost anything, and the actual causal relations may be far removed from where the correlation was observed.

True. But we do know that in the network of causal linkages, X and Y either are one ancestral to the other, or both have a common ancestor (in the absence of selective sampling, which can happen inadvertently).

I find the example of the simple proportional control system quite striking. With the reference fixed, the perception, disturbance, and output show correlations exactly where there is no direct causal connection, and absence of or weak correlation exactly where there is a strong causal connection. The correlations you observe tell you that something is happening somewhere, but the specific links they suggest are exactly the wrong ones.

I, too, find it striking, but for the opposite reason. The reason I agreed back in June with your example of downstream clamping (X + Y = Z, Z held constant) was that I considered Z to be a controlled variable rather than a conditional sampling selector. However, in this case, X could be the disturbance and Y the output. There is a direct causal link between disturbance and output, though no the reverse, so it is actually a case of X ancestral to Y. When the reference value is held constant, the disturbance is the only external causal influence on the output (and to forestall the obvious riposte, I an not forgetting that the output in a loop causally influences itself). The specific link suggested by the correlation is correct.

The control system example is interesting in the inverse sense, because (as we both have independently pointed out more than once over the years), if the output function is a pure integrator, the output signal is uncorrelated with the error signal, so we have causality without correlation across that direct causal link.

I would almost suggest a rule of thumb: Ordinary correlations (that is, those smaller than about 0.8) tell you that the mechanism accounting for them is somewhere else.

No. I would say instead that a correlation of less than 0.707... tells you that some phenomenon or extraneous variable is accounting for more than half of the variation in one or both of your variables. The phenomenon might be nonmonotonicity in a simple causal link between your X and Y, or as noted above it might be the presence of some memory in the connection such as a differentiator or an integrator. The extraneous variable might be multiple unidentified sources of influence on one or both of your correlated variables. Correlation can be diminished even in the presence of a definitive (physically hard-linked) causal link, but it can't be enhanced in the absence of causal connection, except by selective sampling.

The mechanism, when you find it, may be as exactly reproducible in its operation as the mechanisms of physics, however mushy the original correlations were. The mushiness is not telling you about the strength of the connections, but about how far you are from discovering them. As long as your data make it worth measuring correlations, you aren't there yet.

I rather think we agree on this.

Where I think we disagree is that I would not dismiss the existence of a causal connection when a correlation is reliable but low, as you do when you comment on Rick's economic correlations. I would use it as a suggestion that there is a linkage between two phenomena that might be worth investigating further. Furthermore, if you have good reason to believe there is a direct causal link but a low correlation, the lowness of the correlation suggests it might be worthwhile to see whether your causal link includes nonmonotonicity or historical influence, or whether there are undiscovered important influences on one or both of you variables. For example, if we assume Rick's correlations are due to a causal link between increased taxes and increased growth, we could ask why the correlation is so low. Is it because the measuring instruments are poor, is it because there are other factors that in total are more important than tax rates, is it because people's memories of earlier tax rates affects their economic activity ...? What you cannot say, if the correlation is reliable, is that tax rate has no causal relation to growth (remembering that the causal relation might not be of the kind X->Y, but might be of the kind Z->X, Z->Y, where Z could be the general policy climate in a government that would raise taxes).

XKCD had this to say: "Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'." ( http://xkcd.com/552 and mouseover the cartoon.) But the correlation is a stage magician, misdirecting the audience away from the real action.

I don't know XKCD, but it sounds like a fun place to look at. On the other hand, the stage magician uses the commonly observed relationships in the world deliberately to misdirect. If the relations he used were rare, they wouldn't be effective. Likewise, although there may be occasions when a correlation misdirects the investigator, most of the time it doesn't.

Martin

Bill_Powers1 · September 10, 2009, 3:31pm

[From Bill Powers (2009.09.10.0730 MDT)]
Richard Kennaway (2009.09.10.0837 BST)–
Richard, you have an extraordinary way of getting to the heart of the
issue. I think you have put your finger on exactly the spot that shows
what is wrong with trying to found a science on correlations.
“Ordinary correlations (that is, those smaller than about 0.8) tell
you that the mechanism accounting for them is somewhere else.”
To which you also add, in effect, “… and very high correlations
can give you a totally false impression that you have found the
mechanism.” The correlation between disturbance and action is very
high, yet it does not describe the most direct connection between them
through the organism.
I had the idea once of constructing a board with nails stuck into its
surface around its edges and a whole web of rubber bands stretched
between the nails over the board like a randomized spiderweb. You could
take hold of the junction of any two or more rubber bands and pull it
sideways in any direction, and observe what the other junctions did. You
would find that most of the other junctions moved a little, and that
their movements would show a very high correlation with the movements you
were creating.
Then you could invite a few other people to pull different parts of the
web in the same way, while you go on making your experimental moves. You
will still see a correlation between your movements and the movements of
all of the other junctions, but now the correlations will be lower, and
get lower still as you check junctions closer to where the other people
are creating their disturbances. Now imagine that you are doing this
through holes in another board covering the web, so you can see only the
local vicinity of some of the junctions, with most of the junctions
hidden.
As you say, the correlations will be telling you only that something is
correlating with something. When you realize that, you have to make a
decision. Is this all you’re ever going to know about that web of
causation? Or should you perhaps adopt an entirely different approach,
and start over? If this is all you’re ever going to know, then you give
up ever knowing any more and you start trying to learn the rules
empirically. If I do this, that happens. And you write it
down in your big notebook, and envision a future of doing the same thing
forever.
The scientific definition of Hell.
All of the most successful sciences depend primarily on making models.
The models depict a reality that we can’t, at least initially, observe;
we can only imagine it. We try to imagine what could be causing the
phenomena we can observe, and invent figments like fields and charges and
magnetic poles and lines of force and electrons and quarks, none of which
we can observe directly. But by reasoning from the properties we have
given to these things that we imagine, we can compute what the visible
consequences ought to be, and by observing the difference between what we
compute and what we actually observe, we can adjust the properties of the
model until the prediction errors are too small to measure. That sort of
result could never, ever, be acomplished by looking for correlations. It
has never been done that way and will never be done that way.
The only time we use anything like correlations in PCT is in describing
the difference between what we compute with our models and what we
observe. We never use them to discover anything. In fact,
“discover” is a misleading term: we do not remove a cover
to reveal Reality Itself. We invent models, and then tinker with
them until they work as well as possible. Statistics enters only when we
are judging what “as well as possible” is.

While I have you on the line, Richard, I have a puzzle that you may be
able to solve. How can I make it possible to download a directory from my
web page? I use 1and1.com (that’s one and one), and while I can upload a
folder to my root directory, and I can download individual files, the
most I can get back using a link is a listing of the subdirectories in
the folder. I know I can zip the directory, but is there some simpler way
to package the directory as a single file if compression isn’t the issue?
Basically I just want the directory to be downloaded as is to the
C:\ root directory, the same way I move directories around on the hard
drive.

Or do you charge for consultations?

Best,

Bill P.

···

Bill_Powers1 · September 10, 2009, 3:47pm

[From Bill Powers (2009.09.10.0943 MDT)]

Martin Taylor 2009.09.10.07.45 --

JRK: Sure. Where there is a correlation, *something* must be going on *somewhere*. The problem is that that something could be almost anything, and the actual causal relations may be far removed from where the correlation was observed.

MT:
True. But we do know that in the network of causal linkages, X and Y either are one ancestral to the other, or both have a common ancestor (in the absence of selective sampling, which can happen inadvertently).

This view seems to be like the old idea of the causal "tree" in which there are ancestors and descendants, but no closed loops. Most real systems have multiple loops in them and can't be analyzed as simple trees. Models can capture that aspect of systems; correlations can't.

Best,

Bill P.

rsmarken · September 10, 2009, 4:11pm

[From Rick Marken (2009.09.10.0910)]

Bill Powers (2009.09.10.0730 MDT)--

All of the most successful sciences depend primarily on making models. The
models depict a reality that we can't, at least initially, observe; we can
only imagine it.

I think their success depends on both modeling and observation And
observation must come first or you don't know what to model. What we
observe is variations in perceptual variables and the relationships
(correlations) between those variables.

The only time we use anything like correlations in PCT is in describing the
difference between what we compute with our models and what we observe. We
never use them to discover anything.

Actually, I discovered (well, you noticed it first) that the lack of
correlation between cursor variations on two trials with the same
disturbance must be a result of noise in the control loop.

We invent models,
and then tinker with them until they work as well as possible. Statistics
enters only when we are judging what "as well as possible" is.

You are talking about the inferential use of statistics. I have been
presenting correlations only as observations.

This is like being on Fox News. Just like they try to make Obama out
to be a Socialist (as though that were a bad thing) you guys seem to
be trying to make me out to be a Statistical Social Scientist (this is
after I've published a paper describing the errors of statistical
social science). Modeling has been the basis of my research even
before I was a PCT fanatic. But my modeling has always been aimed at
accounting for _data_ (observations). That includes observed
relationships between variables. So go ahead and rail against doing
science based on correlations; but know that your arguments are
completely orthogonal to the way I go about doing science.

Good night and good luck.

Best

Rick

···

--
Richard S. Marken PhD
rsmarken@gmail.com
www.mindreadings.com

MartinT · September 10, 2009, 6:42pm

[Martin Taylor 2009.09.10.14.38]

[From Bill Powers (2009.09.10.0943 MDT)]

Martin Taylor 2009.09.10.07.45 --

JRK: Sure. Where there is a correlation, *something* must be going on *somewhere*. The problem is that that something could be almost anything, and the actual causal relations may be far removed from where the correlation was observed.

MT:
True. But we do know that in the network of causal linkages, X and Y either are one ancestral to the other, or both have a common ancestor (in the absence of selective sampling, which can happen inadvertently).

This view seems to be like the old idea of the causal "tree" in which there are ancestors and descendants, but no closed loops. Most real systems have multiple loops in them and can't be analyzed as simple trees. Models can capture that aspect of systems; correlations can't.

I pondered whether to use the term "ancestral" because I was concerned that somebody might pounceon the idea that it implied the existence of a pure tree structure of causality. But I couldn't think of a better word, and I figured that I had made sufficient of a point earlier that each variable could be ancestral to the other in a closed loop that no pounce would happen. I was wrong. Silly me.

Martin

Bill_Powers1 · September 11, 2009, 2:55pm

[From Bill Powers (2009.09.11.0842 MDT)]

Martin Taylor 2009.09.10.14.38 --

I pondered whether to use the term "ancestral" because I was concerned that somebody might pounceon the idea that it implied the existence of a pure tree structure of causality. But I couldn't think of a better word, and I figured that I had made sufficient of a point earlier that each variable could be ancestral to the other in a closed loop that no pounce would happen. I was wrong. Silly me.

Excuses, excuses. If you don't say what you mean, you can't blame the reader for not supplying the missing associations. When you say "X and Y either are one ancestral to the other, or both have a common ancestor (in the absence of selective sampling, which can happen inadvertently)" that doesn't seem to leave room for any other alternative, though the ones you mention are not the only possibilities. In a double-star system, which one's orbital position is ancestral to the other?

Anyway, the concept of "ancestral" isn't of much use in analyzing continuous causal networks because variations overlap in time. Final states, if they exist, are reached in many places in the network at the same time. All the variables are always varying; they don't take turns so you can see which event preceded or followed which other event.

Best,

Bill P.

Bill_Powers1 · September 11, 2009, 3:12pm

[From Bill Powers (2009.09.11.0856 MDT)]

Rick Marken (2009.09.10.0910) –

Bill Powers
(2009.09.10.0730 MDT)–

All of the most successful sciences depend primarily on making
models. The

models depict a reality that we can’t, at least initially, observe;
we can

only imagine it.

I think their success depends on both modeling and observation And

observation must come first or you don’t know what to
model.

I don’t agree with that. You have to have a model before you can even say
what “data” means. The model tells you which of the infinity of
possible variables present at a given moment is relevant.

What we observe is
variations in perceptual variables and the relationships

(correlations) between those variables.

Maybe this is the problem: I don’t see “relationships” as
synonymous with “correlations.” In fact you’ve really said the
same thing yourself, when you point out that correlations assume a linear
model of the relationship between two variables. That’s a model right off
the bat.

The only time we use
anything like correlations in PCT is in describing the

difference between what we compute with our models and what we
observe. We

never use them to discover anything.

Actually, I discovered (well, you noticed it first) that the lack of

correlation between cursor variations on two trials with the same

disturbance must be a result of noise in the control
loop.

OK, I’ll amend my statement: We never use [correlations] to discover
anything but facts about correlations. They don’t tell us much about
anything else.

We invent models,

and then tinker with them until they work as well as possible.
Statistics

enters only when we are judging what “as well as possible”
is.

You are talking about the inferential use of statistics. I have been

presenting correlations only as observations.

By which you mean “linear models of relationships between two
variables?”

This is like being on Fox News.
Just like they try to make Obama out

to be a Socialist (as though that were a bad thing) you guys seem to

be trying to make me out to be a Statistical Social Scientist (this
is

after I’ve published a paper describing the errors of statistical

social science).

Don’t say “you guys” when I am arguing with Martin,
too.

Modeling has been the
basis of my research even

before I was a PCT fanatic. But my modeling has always been aimed at

accounting for data (observations). That includes observed

relationships between variables.

Go back and read your own excellent papers about errors of statistical
social science, in which you point out that correlations usually assume
an a priori (rather than observed) linear relationship between
variables.

So go ahead and rail
against doing

science based on correlations; but know that your arguments are

completely orthogonal to the way I go about doing
science.

I think your modeling is based on PCT, not correlations. That’s why you
do it so well.

Best,

Bill P.

MartinT · September 11, 2009, 10:04pm

[Martin Taylor 2009.09.12.17.56]

[From Bill Powers (2009.09.11.0842 MDT)]

Martin Taylor 2009.09.10.14.38 --

I pondered whether to use the term "ancestral" because I was concerned that somebody might pounceon the idea that it implied the existence of a pure tree structure of causality. But I couldn't think of a better word, and I figured that I had made sufficient of a point earlier that each variable could be ancestral to the other in a closed loop that no pounce would happen. I was wrong. Silly me.

Excuses, excuses. If you don't say what you mean, you can't blame the reader for not supplying the missing associations. When you say "X and Y either are one ancestral to the other, or both have a common ancestor (in the absence of selective sampling, which can happen inadvertently)" that doesn't seem to leave room for any other alternative, though the ones you mention are not the only possibilities. In a double-star system, which one's orbital position is ancestral to the other?

OK, but I do accept your excuse. I guess if I revisit this topic I'll just have to repeat on every occasion "X -> Y, Y-> X or both". I think that's tedious for the ordinary reader, but if it must be done, then it must be done.

Anyway, the concept of "ancestral" isn't of much use in analyzing continuous causal networks because variations overlap in time. Final states, if they exist, are reached in many places in the network at the same time. All the variables are always varying; they don't take turns so you can see which event preceded or followed which other event.

Apart from restating the obvious, I don't know what you are trying to get at here. The amount of rain and the amount of sun both causally influence the growing of tomatoes; all change continuously. Why would you not call the amount of rain and the amount of sun "ancestral" to the luxuriance of tomatoes? The disturbance in a control system varies continuously, as does the output that is causally connected only to the disturbance and to its own history when the reference is invariant. Why would one not call the disturbance "ancestral" to the output. What does continuous variation have to do with it? And why bring the weird concept of "Final state" into the discussion?

I really don't know what you were thinking of when you wrote that paragraph. Or what effect you intended it to have on what readers.

Martin

Bill_Powers1 · September 11, 2009, 11:22pm

[From Bill Powers (2009.09.11.1645 MDT)]

Martin Taylor 2009.09.12.17.56 --

OK, but I do accept your excuse. I guess if I revisit this topic I'll just have to repeat on every occasion "X -> Y, Y-> X or both". I think that's tedious for the ordinary reader, but if it must be done, then it must be done.

I was thinking more in terms of that board with rubber bands making a web above its surface, where disturbing one junction has multiple effects everywhere in the web. A movement of one junction would show a correlation with movements of other junctions, but there is no simple path between those junctions. In the absence of a detailed model of the web, and when other disturbances are acting on the web at the same time, all one can do is fall back on empiricism and record how various disturbances affect other parts of the web. In that sort of situation, statistical analysis is the best we can do. But as Richard Kennaway pointed out, the correlations don't give much of a picture of what is really happening, especially when only a few of the junctions are visible and many paths are hidden and indirect.

I've been thinking about this situation for some time in the context of the 8 pills I take every day for blood pressure control, smoothing of my heart rate, and making my breathing a little better. Just reading the sheets that come with the pills, I found three of them each of which recommends against using them with the other two. One pill, which I take twice a day at a cost of $500 per month, won't let me eat grapefruit and another is nullified by "green leafy vegetables" containing vitamin K. Huh?

It's a complete mess and it's all based on a mostly empirical and statistical approach. Each pill has a primary desired effect and a host of undesired effects which are known only through vague symptoms such as "metallic tastes" and "dizziness" and "suicidal thoughts." There is nearly no knowledge about how any of the desired effects are brought about by the pills, and even less about what the mechanisms are through which the so-called side-effects are generated. The pills are disturbing a huge biochemical web, and have a very large number of unknown effects, some of which must be direct effects of the pills, and others of which are undoubtedly attempts by the biochemical systems to oppose errors that the pills are causing.

This is the state of the art in medicine. It's better than nothing, but not by much. This is why I found Richard's comments so apropos.

Best,

Bill P.

MartinT · September 11, 2009, 11:59pm

[Martin Taylor 2009.09.11.18.05]

[From Bill Powers (2009.09.11.0856 MDT)]

Rick Marken (2009.09.10.0910) --

> Bill Powers (2009.09.10.0730 MDT)--

> All of the most successful sciences depend primarily on making models. The
> models depict a reality that we can't, at least initially, observe; we can
> only imagine it.

I think their success depends on both modeling and observation And
observation must come first or you don't know what to model.

I don't agree with that. You have to have a model before you can even say what "data" means. The model tells you which of the infinity of possible variables present at a given moment is relevant.

You don't know what you might want to model until you look at the world. It's a feedback system. Neither model nor data is primary. You can have an abstract model and then look to see if it corresponds to anything in the world (Einstein), or you can have a whole mess of observations and correlations and try to find a model that makes them mean something. Or you might have a crude model and some suggestive data, the model suggesting how you could better measure the data and the data suggesting how you could make a better model. However you look at it, it's a closed loop.

OK, I'll amend my statement: We never use [correlations] to discover anything but facts about correlations. They don't tell us much about anything else.

On the contrary, they tell you that if there's a reliable (even a low but reliable) correlation between X and Y, the variables do have a causal linkage (and to satisfy you, I will repeat the mantra "X->Y, Y->X, or both, or Z->X and Z->Y"). If the data show a correlation between X and Y, but your model has none of those causal links, then it is an inadequate model. If your model does have one of those causal loops, but does not explain why a low correlation is low, it is an inadequate model. Correlations don't tell you where the causal links must be in your model. They just tell you that the links must exist in your model.

Martin

MartinT · September 12, 2009, 12:07am

[Martin Taylor 2009.09.11.20.00]

[From Bill Powers (2009.09.11.1645 MDT)]

Martin Taylor 2009.09.12.17.56 --

OK, but I do accept your excuse. I guess if I revisit this topic I'll just have to repeat on every occasion "X -> Y, Y-> X or both". I think that's tedious for the ordinary reader, but if it must be done, then it must be done.

I was thinking more in terms of that board with rubber bands making a web above its surface, where disturbing one junction has multiple effects everywhere in the web. A movement of one junction would show a correlation with movements of other junctions, but there is no simple path between those junctions. In the absence of a detailed model of the web, and when other disturbances are acting on the web at the same time, all one can do is fall back on empiricism and record how various disturbances affect other parts of the web. In that sort of situation, statistical analysis is the best we can do. But as Richard Kennaway pointed out, the correlations don't give much of a picture of what is really happening, especially when only a few of the junctions are visible and many paths are hidden and indirect.

Yes. I think I agreed with Richard on that. Finding that a correlation exists is only a pointer, not a map. Finding a web of correlations isn't a map, but an indication that a realistic model will be complicated.

I've been thinking about this situation for some time in the context of the 8 pills I take every day for blood pressure control, smoothing of my heart rate, and making my breathing a little better. Just reading the sheets that come with the pills, I found three of them each of which recommends against using them with the other two. One pill, which I take twice a day at a cost of $500 per month, won't let me eat grapefruit and another is nullified by "green leafy vegetables" containing vitamin K. Huh?

It's a complete mess and it's all based on a mostly empirical and statistical approach. Each pill has a primary desired effect and a host of undesired effects which are known only through vague symptoms such as "metallic tastes" and "dizziness" and "suicidal thoughts." There is nearly no knowledge about how any of the desired effects are brought about by the pills, and even less about what the mechanisms are through which the so-called side-effects are generated. The pills are disturbing a huge biochemical web, and have a very large number of unknown effects, some of which must be direct effects of the pills, and others of which are undoubtedly attempts by the biochemical systems to oppose errors that the pills are causing.

This is the state of the art in medicine. It's better than nothing, but not by much. This is why I found Richard's comments so apropos.

I can't imagine being in such a difficult situation and remaining as cheerful as you do.

As I said to Richard, I think (though I don't know) that our views may be the same, and our way of expressing them seems to be converging. We both have a reasonably good background in the related technical matters, though mine is quite rusty and perhaps old-fashioned. So it would be odd if our disagreements were deeper than cosmetic.

Martin

Bill_Powers1 · September 12, 2009, 1:10am

[From Bill Powers (2009.09.11.1900 MDT)]

Martin Taylor 2009.09.11.18.05 --

On the contrary, they tell you that if there's a reliable (even a low but reliable) correlation between X and Y, the variables do have a causal linkage (and to satisfy you, I will repeat the mantra "X->Y, Y->X, or both, or Z->X and Z->Y"). If the data show a correlation between X and Y, but your model has none of those causal links, then it is an inadequate model. If your model does have one of those causal loops, but does not explain why a low correlation is low, it is an inadequate model. Correlations don't tell you where the causal links must be in your model. They just tell you that the links must exist in your model.

I think Richard's point was that when correlations are lower than about 0.8, the low correlations tell you you're not looking in the right place, and when they are very high, they can give you a wrong idea of the causal path.

I think that when you have to use correlations, you're doing so because you have no way to determine the causal paths directly, and it's best not to try to talk about causes at all. You can say what the chances are that doing A will result in B, but you can't answer the question of why that happens.
And it's likely that if you try to find causal links based on correlations, you'll end up with the wrong model.Isn't that where stimulus-response theory came from?

Best,

Bill P.

MartinT · September 12, 2009, 2:09am

[Martin Taylor 2009.09.11.21.35]

[From Bill Powers (2009.09.11.1900 MDT)]

Martin Taylor 2009.09.11.18.05 --

On the contrary, they tell you that if there's a reliable (even a low but reliable) correlation between X and Y, the variables do have a causal linkage (and to satisfy you, I will repeat the mantra "X->Y, Y->X, or both, or Z->X and Z->Y"). If the data show a correlation between X and Y, but your model has none of those causal links, then it is an inadequate model. If your model does have one of those causal loops, but does not explain why a low correlation is low, it is an inadequate model. Correlations don't tell you where the causal links must be in your model. They just tell you that the links must exist in your model.

I think Richard's point was that when correlations are lower than about 0.8, the low correlations tell you you're not looking in the right place, and when they are very high, they can give you a wrong idea of the causal path.

I disagreed with Richard on the first point. I'm not sure whether that's because I misinterpreted what he meant or whether the disagreement was deeper than that.

My point is that if the correlation is low, you should be looking for other sources of influence on one or both of your correlated variables, not that you should be looking for influences from afar. (See also below)

As to the second point, all I can say is that the mere fact of correlation cannot allow you to distinguish among four possible paths of influence X->Y, Y->X, X<->Y, and Z->X + Z->Y. For that, you need modelling.

I think that when you have to use correlations, you're doing so because you have no way to determine the causal paths directly, and it's best not to try to talk about causes at all. You can say what the chances are that doing A will result in B, but you can't answer the question of why that happens.

I think you look at it mirrorwise to the way I look at it. Looking at it the way you seem to do, I come to the same conclusion as you. Looking from the other side, I see what I said earlier: if there is a correlation between variables of interest X and Y, and your model does not contain a causal link of one of the above types, your model is inadequate. The reverse does not hold, in that if your model does show a causal link between X and Y, the details of the model and its parameters will determine whether you should expect to see correlation in the data.

And it's likely that if you try to find causal links based on correlations, you'll end up with the wrong model.Isn't that where stimulus-response theory came from?

I disagree with "it's likely". I would say "it's possible". But as I said to Richard: "the stage magician uses the commonly observed relationships in the world deliberately to misdirect. If the relations he used were rare, they wouldn't be effective. Likewise, although there may be occasions when a correlation misdirects the investigator, most of the time it doesn't." [Martin Taylor 2009.09.10.07.45]

If you think about it, the problem with the stimulus-response model isn't that it incorporates the disturbance-output causal link suggested by the correlation; it is that it does not also include the feedback causal link that is not suggested by any correlation, and by that omission eliminates any possibility of control. (Neither does it include a reference signal that is also a causal influence on output, since there is nothing in their data that could show any related correlation). The take-away from this, for me, is in my last sentence two paragraphs above: "if your model does show a causal link between X and Y, the details of the model and its parameters will determine whether you should expect to see correlation in the data." What this means is that if you make up a model than contains only the causal links suggested by correlations, you may well be missing something critically important. Example: S-R theory.

Going back to the comment I already quoted: "I think that when you have to use correlations, you're doing so because you have no way to determine the causal paths directly," what you say may be true, but it's not the only possibility. You may have a good idea of the causal paths, but want to know the relative strengths of different influences on some variable under conditions of interest. I re-emphasise that if you have two equally important causal influences on some variable, and there is no noise or extraneous influence on that variable, the correlation between either causal influence and the variable will be only 0.707... That's for the case in which you know exactly the linkages. If you have four equally important linkages, the correlations will be only 0.5, even if you know all the influences exactly. If one of the causal paths leads to a correlation of 0.8 between some cause and an interesting variable, the sum of squares of all correlations between that variable and another true causal variable cannot exceed 0.36. Low correlation doesn't necessarily mean lack of knowledge of the situation. It can as easily mean the existence of multiple causes.

Martin

rsmarken · September 12, 2009, 6:59am

[From Rick Marken (2009.09.11.2355)]

Bill Powers (2009.09.11.0856 MDT)--

Rick Marken (2009.09.10.0910) --

Bill Powers (2009.09.10.0730 MDT)--

All of the most successful sciences depend primarily on making models.

I think their success depends on both modeling and observation And
observation must come first or you don't know what to model.

I don't agree with that. You have to have a model before you can even say
what "data" means.

I think there are clear examples where this is not true. The
measurements made by Tycho Brahe were not based on a model. And to the
extent that they were they were based on the wrong model. Nevertheless
those measurements proved to to be very useful for testing what turned
out to be the right model.

The model tells you which of the infinity of possible
variables present at a given moment is relevant.

I don't think that was true for Tycho. He knew what variables to
measure and he measured them well.

Maybe this is the problem: I don't see "relationships" as synonymous with
"correlations." In fact you've really said the same thing yourself, when you
point out that correlations assume a linear model of the relationship
between two variables. That's a model right off the bat.

The linear assumption of correlation is not a model; it's the simplest
possible quantitative relationship that can exist between variables.
The r number simply measures the fit of the data points to that line.
The correlation is no more than a numerical measure of what Dag sees
when he says the data in Frans' graphs show an obvious relationship
between age and number correct (or whatever the Y variable is). What
he is seeing is that the points on the graph are visually close to a
straight line; when you connect the dots they look even more like a
straight line. But if you don't think it's the fit to a straight line
that is being visually evaluated when you look for a relationship in
the data, then you can use the rank order correlation, which is a
correlation number that simply measures the fit of the data to
monotonicity (Increasing or decreasing). Or use some other reference
for measuring the graph. The number is just a description, like a mean
or a median.

You are talking about the inferential use of statistics. I have been
presenting correlations only as observations.

By which you mean "linear models of relationships between two variables?"

No, measures (like r) that give a quantitative picture of what could
probably better be seen by looking at a plot of one variable against
the other, as in Frans' graphs.

Don't say "you guys" when I am arguing with Martin, too.

I meant you, Dag and Richard as "the guys".

I think your modeling is based on PCT, not correlations. That's why you do
it so well.

The modeling is based on PCT; the observations are often correlations
(such as the observation of the relationship between mouse and cursor
in the "mind reading" task.

Correlation is just one way to describe the relationships I find in my
research; the other ways to describe these relationships are to show
the time series next to each other and to produce scattergrams.

I have no idea why you are trying to make this out to be a statistical
issue; this is a data representation issue to me. But the whole thing
came up because I reported (among other things) a low positive
correlation between taxes and growth. Perhaps if I just presented a
scattergram of the data (which I might do next) it would have posed
less of a problem for you. But whether that data is presented as a
graph or a number, I think they provide no basis for the idea that
increased taxes slow growth. If you don't think that these data
provide no such basis, then there are a bunch of folks who could use
your help tomorrow in their anti-tax march on DC. Be sure to leave
your correlations at home and bring your tin foil hat.

Best

Rick

···

--
Richard S. Marken PhD
rsmarken@gmail.com
www.mindreadings.com

Bill_Powers1 · September 12, 2009, 4:58pm

[From Bill Powers (2009.09.12.0933 MDT)]

Rick Marken (2009.09.11.2355) --

>> BP earlier: All of the most successful sciences depend primarily on making models.
>
>RM: I think their success depends on both modeling and observation And
> observation must come first or you don't know what to model.

Yes, it's an iterative process. But the observations don't start with correlations; they start with recording the values of variables. The relationships you look for are determined by the kind of model you have in mind. If you use an S-O-R model you identify one variable as a stimulus and another as a response to the stimulus, so you look for a correlation between S and R: you think that stimuli are causing responses. In a PCT model you're looking for a lack of correlation between S and CV and between CV and R. You have to have a reason to look for the CV -- to find the joint effect of S and R that is being kept from varying. The control-system model suggests you look for it. Otherwise, you end up with S-R theory because you're not interested in variables unless they vary.

> BP earlier: I don't agree with that. You have to have a model before you can even say what "data" means.

I think there are clear examples where this is not true. The
measurements made by Tycho Brahe were not based on a model.

What Brahe did was measure relative positions of stars and planets with greater precision than ever achieved before. That is data about positions, so you're right. But what do those positions reveal to us? That depends on what model we're using. If we used Ptolemy's model of crystalline spheres and epicycles, we would be trying to use these measures to calculate the radius and rotational speed of each sphere carrying a planet. If we used the Keplerian model that came later, we would be trying to calculate the elliptical orbits of the planets around the Sun: the period and the major and minor axes of the ellipses. If we used Newton's model of gravitation, we might use the data to calculate G, the universal constant of gravitation. What we calculate, given the observations, depends on the model we're using or testing.

> BP earlier: The model tells you which of the infinity of possible
> variables present at a given moment is relevant.

RM: I don't think that was true for Tycho. He knew what variables to
measure and he measured them well.

I don't think he was trying to predict or explain anything. He was simply observing the appearances very carefully. I agree that this doesn't require a model.

The linear assumption of correlation is not a model; it's the simplest
possible quantitative relationship that can exist between variables.

But see my post to Martin. Correlation assumes a single linear relationship, but tells us nothing about the mechanisms that create that relationship. I am interested mainly in the mechanisms; to me, the observations are just a way of learning more about the mechanisms. This isn't because I'm not interested in the observations, but because I know that until we understand the mechanisms our ability to interpret and predict observations is going to be very limited.

The r number simply measures the fit of the data points to that line.
The correlation is no more than a numerical measure of what Dag sees
when he says the data in Frans' graphs show an obvious relationship
between age and number correct (or whatever the Y variable is). What
he is seeing is that the points on the graph are visually close to a
straight line; when you connect the dots they look even more like a
straight line. But if you don't think it's the fit to a straight line
that is being visually evaluated when you look for a relationship in
the data, then you can use the rank order correlation, which is a
correlation number that simply measures the fit of the data to
monotonicity (Increasing or decreasing). Or use some other reference
for measuring the graph. The number is just a description, like a mean
or a median.

I commented on this to Martin: it's a question of whether we're only interested in the apparent relationship between X and Y, or are interested in finding the real relationship, the mechanism lying between X and Y. I'm not interested in exactly how a given person moves a cursor while trying to make it track a target; everybody does it a little differently and exactly how they do it is of no importance to me -- except as a test of the model. I want to know what kind of internal organization -- architecture, as they say -- is needed to produce the sort of tracking behavior we observe. If, just by adjusting a few parameters, I can match a model's behavior equally well to the behavior of any person doing the tracking, I know I have got the right kind of underlying architecture. It doesn't matter to me if the parameters for best fit vary from one person to another, though some day that might be valuable information. I expect them to vary. What I don't expect to see changing is the kind of organization in the model: the fact that it is a negative feedback control model.

>> You are talking about the inferential use of statistics. I have been
>> presenting correlations only as observations.

I argue that correlations are not observations or measurements. They are calculations based on measurements. And why are those particular calculations used, when the same measurements could be the basis for many different kinds of calculations, such as those we use in PCT? They are used because they contain a model which assumes a linear relationship as a first approximation to the actual form of a relationship between two variables.

Correlation is just one way to describe the relationships I find in my
research; the other ways to describe these relationships are to show
the time series next to each other and to produce scattergrams.

This makes it clear to me where we differ. You are interested in plots of the variables showing their relationships to each other. I am interested in the nature of the physical connection between the variables, the mechanism that lies between them. The plots of the relationships in no way reveal the nature of the mechanisms. An infinity of different mechanisms could produce those same plots. I am trying to narrow that infinity down to a smaller set of mechanisms, and as nearly as possible narrow that set down to just one kind of organization that would best explain what we observe. PCT is the result, so far.

RM: I have no idea why you are trying to make this out to be a statistical
issue; this is a data representation issue to me.

BP: Not representation: INTERPRETATION. The data are the points obtained from observations, the raw lists of numbers. Anything beyond that is theory and interpretation.

RM: But the whole thing
came up because I reported (among other things) a low positive
correlation between taxes and growth. Perhaps if I just presented a
scattergram of the data (which I might do next) it would have posed
less of a problem for you. But whether that data is presented as a
graph or a number, I think they provide no basis for the idea that
increased taxes slow growth. If you don't think that these data
provide no such basis, then there are a bunch of folks who could use
your help tomorrow in their anti-tax march on DC. Be sure to leave
your correlations at home and bring your tin foil hat.

BP: So if I disagree with you, I'm just another kook? Do you ever ask yourself how your remarks might seen by other people before you let your typing reflexes emit them? Sometimes you sound just like one of those guys on Fox News, or Sarah Palin and her Death Panels. Sort of resentful and vindictive.

The data you talk about provide no basis for any conclusions. Taxes might reduce profits, a drop in profits might lead to layoffs; layoffs might slow growth. But taxes might also, at the same time and in parallel, increase government revenues, redistribute spending where more income is needed, restore profitability of small enterprises, and increase growth. They probably affect growth by many other paths, too. So what is "THE" effect on growth of raising taxes? There is no such thing, unless you just look at the bottom line and ignore the details. There are multiple effects, some of them contradictory, and all of them probably variable over time. You can't predict the effect of raising or lowering taxes without a proper model that takes more into account than just taxes. You have to deal simultaneously with ALL the important variables in the system, not just a few of them. Try predicting control behavior when you omit any signal or function from the PCT model. You can't even solve the equations, and the simulation won't even run.

I should thank both you and Martin because after all this I see what my own position is a lot more clearly.

Best,

Bill P.

rsmarken · September 12, 2009, 5:46pm

[From Rick Marken (2009.09.12.1045)]

Bill Powers (2009.09.12.0933 MDT)

Rick Marken (2009.09.11.2355) --

RM: But the whole thing
came up because I reported (among other things) a low positive
correlation between taxes and growth. Perhaps if I just presented a
scattergram of the data (which I might do next) it would have posed
less of a problem for you. But whether that data is presented as a
graph or a number, I think they provide no basis for the idea that
increased taxes slow growth.

The data you talk about provide no basis for any conclusions.

That's what I've been saying. All I'm saying is that economists (and
others) who confidently conclude that taxes are recessionary either
have no basis for this conclusion (as per the data I present) or there
is some evidence I am unaware of on which this conclusion is based and
no one is willing to tell me.

I should thank both you and Martin because after all this I see what my own
position is a lot more clearly.

I don't think I disagree with your basic position as I understand it;
modeling is the only way to understand phenomena. I think this is
your position and, if so, it's the same as mine, except for one thing.
I think that until we have a relevant model that works well, policy
decisions should be informed by relevant data. Apparently you object
strongly to this; I don't really understand why.

Your position on this seems to be that we should ignore data and not
make any policy decisions (or just make policy decisions by flipping a
coin) until we have a perfect model. Is this right?

Best

Rick

···

---
Richard S. Marken PhD
rsmarken@gmail.com
www.mindreadings.com

Bill_Powers1 · September 12, 2009, 9:08pm

[From Bill Powers (2009.09.12.1444 MDRT)]

Rick Marken (2009.09.12.1045) –

BP earlier: I should thank
both you and Martin because after all this I see >what my own position
is a lot more clearly.

RM: Your position on this seems to be that we should ignore data and
not

make any policy decisions (or just make policy decisions by flipping
a

coin) until we have a perfect model. Is this right?
I still haven’t expressed my newly-formulated position
clearly enough. I don’t care whether increasing taxes increases or
decreases growth. I don’t care what policies are put into effect. I am
interested, as a theoretician, only in the construction of a model that
correctly represents all the important mechanisms of economics. Given
this model, you will then be able to plug in any policies you like, any
data you can find, any properties of the human participants that you want
to propose, and observe what the model says will be the consequences. If
the model has been constructed carefully enough, those would be
the consequences. Naturally, during the development of this model, it
will be necessary to start with some preliminary ideas, run the model
based on them, and check its performance against what actually happened.
The purpose of doing this is not to find better policies, but to modify
the model to make its behavior conform more closely to the real behavior
of the economy. Call that Phase 1.

In effect, we’re already making policy decisions by flipping a coin
because nobody understands how the economy works. The only thing we’re
doing wrong is not flipping it again as soon as we see that the policy
isn’t having the effect we thought it would have. Flipping a coin,
if we could get everyone to agree to be bound by it, would be a far more
peaceable way of establishing policies than the present one, and it might
prove to be right more often. As you suggested a few posts ago, even if
the results were only random this might good way to find new ideas to
try, if we could see their consequences quickly enough so we could flip
the coin again right away if necessary. So while Phase 1 is under way,
your suggestion may be the best strategy to follow, to avoid as far as
possible doing any further harm.

Of course we would not wait to put the model into practical use until it
was “perfect.” You chose that interpretation to make it seem
that we would never put the model into practice because obviously no
model will ever be perfect. We would start using the model as soon as our
investigations showed that it was sufficiently more likely to help than
the methods we currently use, even though it could still lead to making
mistakes. Ultimately, of course, the modeling approach will become very
accurate and all policy decisions based on it will be highly likely to
work as expected. That’s Phase 2.

Best,

Bill P.

rsmarken · September 13, 2009, 12:43am

[From Rick Marken (2009.09.12.1740)]

Bill Powers (2009.09.12.1444 MDRT)–

Rick Marken (2009.09.12.1045) –

RM: Your position on this seems to be that we should ignore data and
not

make any policy decisions (or just make policy decisions by flipping
a

coin) until we have a perfect model. Is this right?

I still haven’t expressed my newly-formulated position
clearly enough. I don’t care whether increasing taxes increases or
decreases growth. I don’t care what policies are put into effect. I am
interested, as a theoretician, only in the construction of a model that
correctly represents all the important mechanisms of economics.

And I am interested, as a citizen, in what policies are put into effect and whether they are successful.

Given
this model, you will then be able to plug in any policies you like, any
data you can find, any properties of the human participants that you want
to propose, and observe what the model says will be the consequences.

This is precisely the approach I took with my prescribing error model. I used the term “model excursions” to describe the process of “plugging in different policies”.

If
the model has been constructed carefully enough, those would be
the consequences.

I think it also requires that the model be validated in terms of its ability to predict existing data. But the actual consequences are certainly data against which to evaluate the model. So I agree with your statement here.

Naturally, during the development of this model, it
will be necessary to start with some preliminary ideas, run the model
based on them, and check its performance against what actually happened.

Yes, that 's what I would suggest (and have been suggesting) as well. We are completely on the same page here.

The purpose of doing this is not to find better policies, but to modify
the model to make its behavior conform more closely to the real behavior
of the economy. Call that Phase 1.

Still on the same page.

In effect, we’re already making policy decisions by flipping a coin
because nobody understands how the economy works. The only thing we’re
doing wrong is not flipping it again as soon as we see that the policy
isn’t having the effect we thought it would have.

Yes, this is the e. coli approach to policy that I described several days ago. So how do we see whether or not a policy is having the effect we thought it would? I say we do it by looking at the data. The data that I have been presenting is a historical look at the consequences of implementing various tax policies. I think it’s useful to look at this kind of data because then we are seeing the consequences of implementing different policies while other possible confounding variables are also changing. This is D. T. Campbell’s quasi-experimental approach that I also mentioned earlier. We should certainly look at the consequences of a policy after it is implemented (I suggested after 2 years of collecting data) but one shot “experiments” without controls are not that convincing. Still, if policy makers had actually monitored the growth rate and deficit for 2 years after the Bush tax cuts and used that data as a basis for evaluating policy, I think those cuts would have been seen to be ineffective (at increasing growth and/or keeping the deficit low) and immediately repealed.

Flipping a coin,
if we could get everyone to agree to be bound by it, would be a far more
peaceable way of establishing policies than the present one, and it might
prove to be right more often. As you suggested a few posts ago, even if
the results were only random this might good way to find new ideas to
try, if we could see their consequences quickly enough so we could flip
the coin again right away if necessary. So while Phase 1 is under way,
your suggestion may be the best strategy to follow, to avoid as far as
possible doing any further harm.

Ah, I see you were reading my posts. OK, that’s all I was (and am) proposing: to evaluate policies based on their consequences (as best as we can determine those consequences in a non-experimental situation).

Of course we would not wait to put the model into practical use until it
was “perfect.”

Sorry, bad choice of words. I should have just said “until the model accounts for the data with an acceptable level of accuracy”. Since you (and I) like a very high level of accuracy (less that 3% error of prediction, say) I would guess that it will take some time before you get what you would think of as an acceptably accurate model.

Ultimately, of course, the modeling approach will become very
accurate and all policy decisions based on it will be highly likely to
work as expected. That’s Phase 2.

Yes, I’m all for getting to Phase 2. But while Phase 1 is under way (and it is) can’t we agree that, while we have no accurate model, the data suggest that the consequences of Bush’s tax policies have been a huge increase in the deficit and a terrible recession and that this policy should be repealed and a different policy, like raising rather than lowering taxes on the wealthy, should be tried?

Best

Rick

···

–
Richard S. Marken PhD
rsmarken@gmail.com
www.mindreadings.com

Bill_Powers1 · September 14, 2009, 2:03am

[From Bill Powers (2009.09.13.1840 MDT)]

Rick Marken (2009.09.12.1740) --

RM:

Yes, I'm all for getting to Phase 2. But while Phase 1 is under way (and it is) can't we agree that, while we have no accurate model, the data suggest that the consequences of Bush's tax policies have been a huge increase in the deficit and a terrible recession and that this policy should be repealed and a different policy, like raising rather than lowering taxes on the wealthy, should be tried?

BP: I might go along with that if you could estimate the chances that raising taxes is likely to make matters worse instead of better. What were those correlations you were talking about?

I finally found this:

RM:
dGDPdt w/Top Marginal Tax 1928-2008 0.28
dGDPdt w/Top Marginal Tax 1947-2008 0.18
Unemployment Rate w/Top Marginal Tax 1947-2008 -0.23

The correlations are not huge; indeed, the second two are not even
statistically significant.

BP:
Saying they are not huge is no exaggeration. According to David G.s table, at a correlation of 0.2, the odds of predicting the right sign of the effect (that is, correctly predicting that raising taxes will result in an increase in growth rate as opposed to a decrease) are 1.3 to 1, or slightly better than a coin toss -- 56% to 44%. I don't know how to calculate the odds for a correlation of 0.3, but they're not going to be a lot better. At what probability of producing the opposite of the effect you want would you decide not to recommend performing the experiment of raising taxes?

Best,

Bill P.