Correlation considered as a stage magician

[From Rick Marken (2009.09.14.0900)]

Bill Powers (2009.09.13.1840 MDT)–

Rick Marken (2009.09.12.1740) –

RM:

Yes, I’m all for getting to Phase 2. But while Phase 1 is under way (and it is) can’t we agree that, while we have no accurate model, the data suggest that the consequences of Bush’s tax policies have been a huge increase in the deficit and a terrible recession and that this policy should be repealed and a different policy, like raising rather than lowering taxes on the wealthy, should be tried?

BP: I might go along with that if you could estimate the chances that raising taxes is likely to make matters worse instead of better. What were those correlations you were talking about?

RM: I don’t think that’s really necessary. I’m not trying to increase growth or reduce unemployment by raising taxes. I am suggesting that we try raising taxes (progressively) to rates that prevailed prior to 1980 in order to reduce what has become a chronic deficit. I think there is some consensus that you reduce deficits by increasing revenue. So assuming that spending remains constant or increases the only way to reduce the deficit is to increase revenue, which means increasing taxes. Economists have been recommending against this policy because they say increasing taxes is recessionary. I’m saying that there is no evidence that this is true (or untrue). So I recommend raising taxes and seeing what happens. If, in fact, the tax increase seems to start a slowdown, then it’s pretty easy to go back (after a long enough trial period – two years, perhaps) to the lower tax rate. It’s a LOT easier to lower taxes than to raise them.

I’ve also been thinking about the fact that our discussion of correlation has been ignoring the fact that we are correlating variables that vary over time. This violates some of the basic assumptions about the correlation coefficient that are made when determining statistical significance (the main assumption being that the data points are independent, which they clearly are not in a time series). I presume the violation of these assumptions would also affect the estimates of prediction odds based on the correlation. In other words, the correlation between two time series (like annual tax and growth rate) violates the basic assumptions of whatever model underlies the statistical use of correlation.

But I do think that the correlation can be useful in the analysis of time series data. In particular, I think we can learn something form the “lagged” correlations where we look at the correlation between one time series and another at different time lags (and/or leads). The lag at which these correlations reach maximum (and minimum) seem like a reasonable measure of something like the phase relationship between these time series. A relatively flat set of lagged correlations would suggest no relationship between the time series; a nicely peaked set would suggest that there is some kind of relationship. This seems like a possibly useful new way of using correlation to study time series data. Whaddaya think?

Maybe such analysis methods already exist; I’ll check.

Best

Rick

···


Richard S. Marken PhD
rsmarken@gmail.com
www.mindreadings.com

[From Bill Powers (2009.09.14.1239 MDT)]

Rick Marken (2009.09.14.0900) –

BP: I might go along with that if you could estimate the chances that
raising taxes is likely to make matters worse instead of better. What
were those correlations you were talking about?

RM: I don’t think that’s really necessary. I’m not trying to increase
growth or reduce unemployment by raising taxes.

BP: Whether you’re trying to have those effects or not, they can still
occur.

RM: I am suggesting that we try
raising taxes (progressively) to rates that prevailed prior to 1980 in
order to reduce what has become a chronic deficit.

BP: That may affect the deficit in addition to whatever other effects it
has. But this is a lot like my doctor giving me pills: he tells me the
particular symptom he is giving them for, but the other effects occur
anyway, though he calls them “side” effects. It’s a bit like
going fishing. Fishermen will tell you they’re “trolling for
pike,” but that’s all in their heads – what bites down there under
the water isn’t influenced by what the fisherman hopes to catch.

RM: I think there is some
consensus that you reduce deficits by increasing revenue. So assuming
that spending remains constant or increases the only way to reduce the
deficit is to increase revenue, which means increasing taxes. Economists
have been recommending against this policy because they say increasing
taxes is recessionary. I’m saying that there is no evidence that this is
true (or untrue). So I recommend raising taxes and seeing what
happens.

BP: OK, do it your way, as I’m sure you would have done anyway. I’m doing
to put whatever efforts I devote to economics into encouraging the
development of a model that will give a better basis than we have right
now for deciding what governmental actions to take. I don’t think anyone
knows right now what the effects of changing one variable in the economy
will have on all the others it affects directly or indirectly. It may be
true that taking more of people’s money and giving it to the government
will reduce the deficit. It will also reduce the rate of buying goods,
which will reduce the income of producers and retailers and thus the pay
of their workers, and that will tend to reduce other tax revenues. Which
effect on the deficit will be the larger one is unknown because nobody
can analyze all those simultaneous relationships in his head.

RM: If, in fact, the tax
increase seems to start a slowdown, then it’s pretty easy to go back
(after a long enough trial period – two years, perhaps) to the lower tax
rate. It’s a LOT easier to lower taxes than to raise them.

BP: Two years of going in the wrong direction is a long time. Of course
if you make a lucky guess, you can claim credit for knowing more than
your do, but if you guess wrong a lot of other people will pay the price,
while you find some unanticipated factor to blame. That’s the other side
of the prediction game: what do you lose if you’re wrong? That determines
just how sure you want to be before you act.

RM: I’ve also been thinking
about the fact that our discussion of correlation has been ignoring the
fact that we are correlating variables that vary over time. This violates
some of the basic assumptions about the correlation coefficient that are
made when determining statistical significance (the main assumption being
that the data points are independent, which they clearly are not in a
time series). I presume the violation of these assumptions would also
affect the estimates of prediction odds based on the correlation.
In other words, the correlation between two time series (like annual
tax and growth rate) violates the basic assumptions of whatever
model underlies the statistical use of correlation.

BP: That gives us even less reason to depend on correlations, which I
never have done. The only reason I have ever calculated correlations for
control tasks is that I was talking to psychologists who expected to hear
such things, and who might be impressed by the kinds of correlations we
get using the PCT model to interpret behavior. I suppose you’re right
about the time series – others have said that they make prediction a
workable method of control, though I’ve never seen that demonstrated. For
my part, the RMS prediction error is the only statistical measure I take
seriously, since the predictions are done using a non-statistical
model.

RM: But I do think that the
correlation can be useful in the analysis of time series data. In
particular, I think we can learn something form the “lagged”
correlations where we look at the correlation between one time series and
another at different time lags (and/or leads). The lag at which these
correlations reach maximum (and minimum) seem like a reasonable measure
of something like the phase relationship between these time series.
A relatively flat set of lagged correlations would suggest no
relationship between the time series; a nicely peaked set would suggest
that there is some kind of relationship. This seems like a possibly
useful new way of using correlation to study time series data. Whaddaya
think?

Maybe such analysis methods already exist; I’ll check.

BP: Go ahead; I’ll watch. But do reread my old “Essay on the
Obvious.” Keep in mind that lagged correlations do not rule out a
common cause – the common cause might affect one variable before it
affects the other.

Best,

Bill P.

[From Rick Marken (2009.09.14.1450)]

Bill Powers (2009.09.14.1239 MDT)]

Rick Marken (2009.09.14.0900) –

RM: I think there is some
consensus that you reduce deficits by increasing revenue. So assuming
that spending remains constant or increases the only way to reduce the
deficit is to increase revenue, which means increasing taxes. Economists
have been recommending against this policy because they say increasing
taxes is recessionary. I’m saying that there is no evidence that this is
true (or untrue). So I recommend raising taxes and seeing what
happens.

BP: OK, do it your way, as I’m sure you would have done anyway. I’m doing
to put whatever efforts I devote to economics into encouraging the
development of a model that will give a better basis than we have right
now for deciding what governmental actions to take.

“My way” simply involves doing something rather than nothing. The fact is that people have problems – some very serious economic problems, like being unemployed or underemployed – that they would like to see solved. I’m just suggesting that we try to solve some of these problems on the basis of data.
My position is like that of Dr. John Snow who created a map showing the locations of cases of cholera in the London epidemic of 1854 (see The Ghost Map: The Story of London’s Most Terrifying Epidemic by Steven Johnson, 2001). Based on this map, which is merely statistical data, Snow inferred that the source of the cholera was a water pump. This was just a guess, based on statistical data, and it went against all the prevailing “knowledge” about cholera, which assumed that it was air born (miasma theory). The situation is similar to that with economists who “know” that taxes are recessionary. So it was hard to get the city leaders to implement a policy of shutting down the implicated pump. Doing so would have been a big inconvenience but nothing else was working and the epidemic was continuing. So the leaders finally agreed to shut down the pump and the epidemic ended.

The leaders implemented this policy with no knowledge of what cholera was or of the mechanism of its transmission. Such knowledge would come many years later, once the modelers got things figured out. But I consider Dr Snow to be a hero; he had pretty good data, obtained at the risk of his life, by the way, and he used it to argue forcefully for a policy which seemed pretty reasonable based on the data. It also was a policy where the potential benefits (saving lives) exceeded the costs (inconvenience of getting water from another pump). And the implementation of the policy didn’t stop research into the mechanisms of cholera transmission.

I think we’re in a very similar situation with respect to the economy. The data suggest that increasing taxes (progressively) is likely to have (at worst) no recessionary consequences. The costs of implementing a tax increase (some rich people will have to go without a second Bentley) seem far lower then the potential benefits (reduction of the debt; possible strengthening of the middle class). And it’s not going to inhibit modeling efforts aimed at understanding the economy.

BP: Go ahead; I’ll watch. But do reread my old “Essay on the
Obvious.” Keep in mind that lagged correlations do not rule out a
common cause – the common cause might affect one variable before it
affects the other.

I know. But I’ll read the essay anyway;-)

Best

Rick

···


Richard S. Marken PhD
rsmarken@gmail.com
www.mindreadings.com

[Martin Taylor 2009.09.15.10.25]

[From Rick Marken (2009.09.14.0900)]

I’ve also been thinking about the fact that our discussion of
correlation has been ignoring the fact that we are correlating
variables that vary over time. This violates some of the basic
assumptions about the correlation coefficient that are made when
determining statistical significance (the main assumption being that
the data points are independent, which they clearly are not in a time
series).

This is only partly true. If you sample too closely in time, the
successive data points are indeed correlated, but the correlation
diminishes as the time distance between points increases. In the
idealized case of a time series with a rectangular spectrum of
bandwidth W, data points taken every 1/2W seconds are uncorrelated
(that’s another way of talking about the Nyquist limit). Realistic
waveforms do not have a rectangular bandwidth, but you can find in the
appropriate engineering books techniques for computing an equaivalent
rectangular bandwidth. More reasonably, one can compute the
autocorrelation function (the correlation of data point n with data
point n+x as a function of x), as in this graph for a tracking run.
There are three autocorrelations here for the pairwise differences
among three waveforms: the target, the model, and the human’s track.
Samples were taken every 1/60 second. As you can see, samples taken
within about 1/2 second of each other are correlated, but samples taken
further apart than that have a negligible correlation. So you will not
violate the sampling independence assumption when tyou correlate one of
these waveforms with anything else if you take your data points no
closer together than 1/2 second.

I presume the violation of these assumptions would also affect
the estimates of prediction odds based on the correlation. In other
words, the correlation between two time series (like annual tax and
growth rate) violates the basic assumptions of whatever model underlies
the statistical use of correlation.

The violation comes not with the calculation of the correlation
coefficient, but with any related calculations that use the number of
samples, such as significance estimates (but since significance
estimates are at best misleading and at worst insupportable, I guess
that doesn’t matter much).

But I do think that the correlation can be useful in the analysis of
time series data. In particular, I think we can learn something form
the “lagged” correlations where we look at the correlation between one
time series and another at different time lags (and/or leads). The lag
at which these correlations reach maximum (and minimum) seem like a
reasonable measure of something like the phase relationship between
these time series. A relatively flat set of lagged correlations would
suggest no relationship between the time series; a nicely peaked set
would suggest that there is some kind of relationship. This seems like
a possibly useful new way of using correlation to study time series
data. Whaddaya think?

Yes.

Maybe such analysis methods already exist; I’ll check.

Since you are good at modelling using excel, you might like to just try
a Monte Carlo study, by making a waveform X drive another waveform Y
with a defined lag, and adding a variable amount of noise to Y (or add
a few extra driving waveforms independent of X) and see how that
affects the estimate of phase from the plot of lagged correlations.

Sorry to be so brief – busy on other things.

Martin