Kennaway paper

[From Bruce Abbott (970416.1055 EST)]

I'd be very much interested to hear Rick Marken's impression of Richard
Kennaway's paper, "the physical meaning of the correlation coefficient for
bivariate normal distributions." Rick?

Regards,

Bruce

[From Bruce Abbott (970416.1140 EST)]

Richard Kennaway (970415.1650 BST) --

A few years ago, when the subject of correlations came up here, Bill Powers
posted a long message about the magnitude of correlations you need to
actually be useful for prediction. I was sufficiently interested to sit
down and work out the mathematics behind his verbal description. I was
very impressed. If he knows the mathematics, he has a rare gift for
explaining it in English, and if he doesn't, he has an even rarer
mathematical intuition. Anyway, I wrote up the maths in a note available
at ftp://ftp.sys.uea.ac.uk/pub/kennaway/drafts/correlationinfo.{dvi,tex}.
All the maths in it can be found in standard stats textbooks, but I've
never seen it collected together. It's incomplete, in that I repeat
critical remarks I've seen on CSGNET about 20% correlations being published
as meaningful results, but I'm just taking that on trust without any actual
references. Can anyone suggest any? The library here doesn't get JEAB, so
I can't browse through it, and I've no experience in reading such material.

Richard, JEAB is the _last_ place you would want to look for
statistically-based studies, let alone the sort you are looking for that
employ correlational methods and report generally low correlations. JEAB is
the premere journal in psychology for reporting the results of "n-of-1"
(single-subject) experimental studies -- the same focus on the individual
that is emphasized in PCT.

I have read your paper on correlation and find it impressive as it
demonstrates that you have an excellent grasp of the mathematics involved.
A minor quibble: correlations are neither proportions nor percentages
(unlike proportions, they vary between -1.0 and +1.0) and are never reported
as such, although this is perfectly acceptable for the coefficients of
determination (r-squared) and nondetermination (1 - r-squared) to be
reported these ways.

As you note, most of what you present is found in standard texts on
statistics; in fact I found little or nothing new, certainly nothing that
would not be familiar to life-science or social-science researchers
acquainted with correlational methods. That said, you have done a nice job
of laying out the basic logic and interpretation of the Pearson r and its
relationship to the proportion of variance accounted and unaccounted for,
scatter about the "best-fitting" (least squares) straight line, reduction in
the variance of Y when given X, the correct classification of Y given a
knowledge of X, and the reduction in uncertainty X provides about Y. You
have done a fine job on this and I find nothing here to complain about.

One common application of the findings you present is in the evaluation of
the reliability of psychological tests; another is in the assessment of
interobserver agreement. Correlation coefficients below .95 or .90 are
typically judged to indicate poor reliability of the test. A similar
criterion holds for validity, as for example, when using the results of
tests such as the Scholastic Aptitude Test to predict success in college (as
measured by, e.g., grade point average). In the latter case correlations
lower than this yield unacceptable numbers of errors in the decision to
admit students based on the prediction that they will succeed or not succeed.

As you note, Bill P.'s intuition about the ability (or inability) to predict
_individual_ performance based on Pearson r (that is, on linear regression)
is correct: correlations must be very high before reasonably good prediction
of individual cases becomes possible. So far, so good.

But then you go on to interpret your mathematical findings as follows.
Speaking of correlations of, say, 0.50 or under, you state:

Yet because a sufficient quantity of data has been amassed to be statistically
sure that the correlation differs from 0, perhaps even at the 99% confidence
level, such a result can be published even though it is totally meaningless.

The question is, meaningless for what purpose? For predicting a particular
individual's performance (Y) based on that person's value of X, yes,
correlations must be very high indeed to be of any value. But your
conclusion above assumes that those published correlations were published
based on their supposed utility in making such predictions. This is incorrect.

Correlations are most often computed in order to determine whether there is
any linear association between the variables in question. Such associations
may arise through a variety of mechanisms; one rather well-known case in
this forum is through control-system activity when variations in a
disturbance to the CV are negatively correlated with variations in the
action of the system, as in the rubber band demo. Such associations (or the
_lack_ of them where they might be expected) may be important both for
developing and testing specific theories of system organization. Where a
number of variables are involved, all free to vary at least to some extent,
the correlation between any one of them and the response measure may be
poor. Here is an example I worked up involving four predictor variables and
a response variable. Each predictor variable is orthogonal to the others in
the population (though not necessarily in the sample), and each is normally
distributed. The correlations of predictors with the response variable were
as follows:

            P1 P2 P3 P4
    r 0.519 0.544 0.527 0.468
    r-sq 0.269 0.296 0.278 0.219

All of these correlations are within the range of values declared as
"meaningless" in your paper. It may therefore come as something of a
surprise to some readers that the four predictors used together perfectly
predict the value of the response measure. Low correlations are to be
expected when there are numerous uncontrolled variables having an influence
on the response measure and these variables are not highly correlated with
each other. In the life sciences and social sciences, many influential
variables often cannot be experimentally controlled (held constant); in such
cases relatively low correlations will be the rule.

Research aimed at identifying variables having an influence on the response
measure (or, indeed, mutual influences) can be likened to a signal detection
task in which variation in the predictor variable is analogous to the signal
and variation in the response variable is analogous to the signal plus noise
(if the signal has a relationship with the response variable) or noise alone
(if the signal has no such relationship). Viewed in this way, r-sq/(1 -
r-sq) is analogous to a signal-to-noise ratio, in which the "noise" is the
(unmeasured) contribution of other variables to variation in the response
measure. When S/N is low, the signal is difficult to detect and larger
samples are required in order to discriminate the signal from the noise.

To summarize, correlations that individually are "useless" for prediction
may sometimes be combined so as to provide useful predictions (so long as
the correlations are real and not sampling artifacts). In addition such
correlations may prove meaningful within the context of building and testing
a theoretical model of the system under consideration. To conclude that
published results are "meaningless" simply because the reported correlations
are relatively low, one must assume that such correlations can only be used
_individually_ to make point predictions of Y based on a single X, and that
they were gathered and published for this specific purpose. Generally
speaking these assumptions are false.

Regards,

Bruce

[From Bill Powers (970417.0420 MST)]

[From Bruce Abbott (970416.1140 EST)]

Richard Kennaway (970415.1650 BST) --

Richard can speak for himself, but I have a couple of comments:

Correlations are most often computed in order to determine whether there
is any linear association between the variables in question.

I think that adding the modifier "linear" is incorrect unless the
correlation is very high. As Richard shows, a correlation of 0.866 is
required to get one bit of information about Y from knowing X -- meaning
that you can't distinguish between linear and nonlinear relationships if the
correlation is less than that. I suspect that even to distinguish between a
first and a second power relationship would require correlations MUCH higher
than 0.866.

Such associations
may arise through a variety of mechanisms; one rather well-known case in
this forum is through control-system activity when variations in a
disturbance to the CV are negatively correlated with variations in the
action of the system, as in the rubber band demo. Such associations (or
the _lack_ of them where they might be expected) may be important both for
developing and testing specific theories of system organization.

In this case the correlation between disturbance and action has be be very
high, _in addition_ to the correlation with the controlled variable being
very low. The Test does not consist ONLY of looking for a lack of effect.
You must also demonstrate that there is an output from the system that is
opposing the effect of the disturbance. Without that high correlation, there
could be many alternatives to control as an explanation for why the
disturbance fails to affect the proposed controlled variable.

Here is an example I worked up involving four predictor variables and
a response variable. Each predictor variable is orthogonal to the others
in the population (though not necessarily in the sample), and each is
normally distributed. The correlations of predictors with the response
variable were as follows:

           P1 P2 P3 P4
   r 0.519 0.544 0.527 0.468
   r-sq 0.269 0.296 0.278 0.219

All of these correlations are within the range of values declared as
"meaningless" in your paper. It may therefore come as something of a
surprise to some readers that the four predictors used together perfectly
predict the value of the response measure.

Obviously the response variable was some regular function of the values of
P1 through P4, and there must have been another function constraining the
four values so they were related to each other. Correlations involving
_regular_ functions are meaningless, as I understand it: the regular
function is all you need. If you have Y = sin(X), where X = k*t, and sample
the values at random times (or regular times incommensurate with k/(2*pi)),
you'll end up with a correlation of zero, even though X perfectly predicts Y
once you know the function relating them.

Low correlations are to be
expected when there are numerous uncontrolled variables having an
influence on the response measure and these variables are not highly
correlated with each other.

Low correlations are also to be expected even when there are no uncontrolled
variables, if you are using the wrong model to represent the relationship.
In the case of Y = sin(X), simply to compute a Pearson's r is to assume a
linear relationship between X and Y. In this case, it's using the wrong
model that leads to the low correlation.

In the life sciences and social sciences, many influential
variables often cannot be experimentally controlled (held constant); in
such cases relatively low correlations will be the rule.

That is the usual assumption, I agree. But if the influential variables were
known, they could be measured and accounted for. The problem here is
assuming that the reason for the poor predictions is that there are
_unknown_ variables affecting the result. This is tantamount to claiming
that there is nothing wrong with the theory under which the predictions are
being made, and that if these unknown variables could only be held constant,
the prediction would be perfect. That does not follow at all. This is like
saying that the reason for a failure of reinforcement theory to predict the
application of operant conditioning in the classroom is an unknown history
of reinforcement on the part of each student. That would be the correct
explanation if it were known that the theory is correct. However, it is also
possible that the theory is incorrect, which would lead to the same result.

To summarize, correlations that individually are "useless" for prediction
may sometimes be combined so as to provide useful predictions (so long as
the correlations are real and not sampling artifacts). In addition such
correlations may prove meaningful within the context of building and
testing a theoretical model of the system under consideration.

I thought this would be your point: "A whole lot of bad experiments can be
equal to one good experiment." I don't believe it.

Best,

Bill P.

[From Bruce Abbott (970417.1340 EST)]

Bill Powers (970417.0420 MST) --

Bruce Abbott (970416.1140 EST)

Correlations are most often computed in order to determine whether there
is any linear association between the variables in question.

I think that adding the modifier "linear" is incorrect unless the
correlation is very high. As Richard shows, a correlation of 0.866 is
required to get one bit of information about Y from knowing X -- meaning
that you can't distinguish between linear and nonlinear relationships if the
correlation is less than that. I suspect that even to distinguish between a
first and a second power relationship would require correlations MUCH higher
than 0.866.

Excuse me, but Pearson r is _defined_ as a number indicating the degree and
direction (+ or -) of _linear_ association. It is a number that indexes how
well the points _fit_ a straight-line function (by a least-squares
criterion); this is referred to as the degree of linear association.

Any function can be decomposed into linear, quadradic, cubic, etc.
components, just as any waveform can be decomposed into a set of sine waves
differing in frequency and amplitude. If the "true" function underlying the
data were, for example, a positive exponential viewed over some short range,
it might not be possible to distinguish this upward-curving function from a
linear function with positive slope; in fact the straight line might provide
a good approximation to the exponential over this range. Pearson r would
index how well the linear approximation fit the data. Linear models
generally are so powerful precisely because linear fits often yeild good
approximations over the range of data under consideration.

Such associations
may arise through a variety of mechanisms; one rather well-known case in
this forum is through control-system activity when variations in a
disturbance to the CV are negatively correlated with variations in the
action of the system, as in the rubber band demo. Such associations (or
the _lack_ of them where they might be expected) may be important both for
developing and testing specific theories of system organization.

In this case the correlation between disturbance and action has be be very
high, _in addition_ to the correlation with the controlled variable being
very low. The Test does not consist ONLY of looking for a lack of effect.
You must also demonstrate that there is an output from the system that is
opposing the effect of the disturbance. Without that high correlation, there
could be many alternatives to control as an explanation for why the
disturbance fails to affect the proposed controlled variable.

A somewhat lower correlation may indicate that (a) there are disturbances
acting on the CV which you are not able to measure, (b) the reference level
of your control system is varying during the test (and this information is
not available for inclusion in your simulation), (c) the environmental
feedback function is varying over the course of observation and, again, you
are not able to measure this variation, (d) your model is wrong, (e) other.
Measuring these other sources of variation would of course be the best
solution to the problem, although this is not always possible given
available resources. The failure to observe the expected high correlations
is itself an important fact, though, which would then justify further
investigation. Thus, even low correlations may have scientific utility.

Here is an example I worked up involving four predictor variables and
a response variable. Each predictor variable is orthogonal to the others
in the population (though not necessarily in the sample), and each is
normally distributed. The correlations of predictors with the response
variable were as follows:

           P1 P2 P3 P4
   r 0.519 0.544 0.527 0.468
   r-sq 0.269 0.296 0.278 0.219

All of these correlations are within the range of values declared as
"meaningless" in your paper. It may therefore come as something of a
surprise to some readers that the four predictors used together perfectly
predict the value of the response measure.

Obviously the response variable was some regular function of the values of
P1 through P4, and there must have been another function constraining the
four values so they were related to each other. Correlations involving
_regular_ functions are meaningless, as I understand it: the regular
function is all you need.

I am not quite sure what you are talking about here. Yes, the response
variable was some regular function of the values of P1 through P4. No,
there was no other function constraining the four values so that they were
related to each other. As noted in my post, the variables P1 through P4
were unrelated in the population, although spurious near-zero correlations
do appear in the statistical sample. Here are the observed correlations
between the predictors:

           P1 P2 P3
    P2 0.060
    P3 0.009 0.057
    P4 -0.031 0.007 0.019

You last sentence has me baffled. The object of observing the correlations
in the data is to account for the variation in the response variable. You
don't know what "regular function," if any, may be at work. The regular
function is all you need? The regular function is what you seek to discover!

Low correlations are to be
expected when there are numerous uncontrolled variables having an
influence on the response measure and these variables are not highly
correlated with each other.

Low correlations are also to be expected even when there are no uncontrolled
variables, if you are using the wrong model to represent the relationship.
In the case of Y = sin(X), simply to compute a Pearson's r is to assume a
linear relationship between X and Y. In this case, it's using the wrong
model that leads to the low correlation.

Yes, of course. That is where modeling comes in, and analytic skill (e.g.,
I can show you a nice correlation between Y and sin(X) over the X range 90
to 270 degrees). But whether correlations are low because there are
influential variables not being measured or because the wrong model (linear)
is being applied to the data, those correlations are still scientifically
informative and useful. Furthermore, correlations should never be
interpreted without a look at the scatterplot. If a strong _non_linear
relationship is present, that will be revealed by the plot.

In the life sciences and social sciences, many influential
variables often cannot be experimentally controlled (held constant); in
such cases relatively low correlations will be the rule.

That is the usual assumption, I agree. But if the influential variables were
known, they could be measured and accounted for.

Yes, and the search for them is what generates those tables of correlation
coefficients, most of which are low because the X-variable is not an
influential variable in the setting being investigated. Having obtained low
correlations, one understands that the current model or variable is not
doing a good job, and launches further investigations to rectify the problem.

The problem here is
assuming that the reason for the poor predictions is that there are
_unknown_ variables affecting the result. This is tantamount to claiming
that there is nothing wrong with the theory under which the predictions are
being made, and that if these unknown variables could only be held constant,
the prediction would be perfect.

The problem you note is true of any scientific investigation: the failure to
predict accurately indicates either that the model is incomplete or that it
is incorrect, or both. My statement is _not_ tantamount to claiming that
"there is nothing wrong with the theory." All I have claimed is that the
failure to identify all the influential variables is _one_ reason why one
might obtain low correlations. Your conclusion would be true only if I had
claimed that this is the _only_ reason why one might obtain low correlations.

To summarize, correlations that individually are "useless" for prediction
may sometimes be combined so as to provide useful predictions (so long as
the correlations are real and not sampling artifacts). In addition such
correlations may prove meaningful within the context of building and
testing a theoretical model of the system under consideration.

I thought this would be your point: "A whole lot of bad experiments can be
equal to one good experiment." I don't believe it.

I don't believe it either.

That is the second time in this post you put words in my mouth that I
neither expressed nor implied. I suppose it's good _polemics_ on your part,
but it certainly does not serve to advance _anyone's_ understanding. In
your "restatement" of my summary, you change "low correlations" into "bad
experiments," and paraphrase me as saying that bad experiments can be
combined to yeild one good experiment. This is nonsense. I didn't say it,
I didn't imply it.

What I _did_ state is that several variables having low correlations with a
response measure sometimes can be combined to yeild a function that predicts
well. As this was _demonstrated empirically_ in the simulation I reported
on in my post, the truth of this statement is beyond question. For this
reason it is simply _wrong_ to assume merely on the basis of low correlation
that a given relationship between two variables is "useless."

I wonder if Richard Kennaway will favor us with a response to my critique of
his paper. Richard, you're not back in "lurk" mode, are you?

Regards,

Bruce

[From Bill Powers (970417.1329 MST)]

From Bruce Abbott (970417.1340 EST)--

I think that adding the modifier "linear" is incorrect unless the
correlation is very high. As Richard shows, a correlation of 0.866 is
required to get one bit of information about Y from knowing X -- meaning
that you can't distinguish between linear and nonlinear relationships if
the correlation is less than that. I suspect that even to distinguish
between a first and a second power relationship would require
correlations MUCH higher than 0.866.

Excuse me, but Pearson r is _defined_ as a number indicating the degree
and direction (+ or -) of _linear_ association. It is a number that
indexes how well the points _fit_ a straight-line function (by a
least-squares criterion); this is referred to as the degree of linear
association.

In the section on "Mutual information", Kennaway shows that to predict Y
from X to one part in N (i.e., to distinguish N different values of the
variable), the correlation c must be

c := sqrt(1 - 1/N^2).

To distinguish 2 points (the minimum required to establish a regression
line) the correlation must be 0.866 (for p < 0.05, I presume). But this does
not allow you to assign values on any finer scale. You can say only that if
X is known to be positive, Y will be SOME positive number -- but you can't
say WHAT positive number. That's what "1 bit of Information" means. For 3
points you must have c = 0.94 -- this would enable you to say that if X is
negative, Y is negative, if X is 0, Y is zero, and if X is positive, Y is
positive. To distinguish 5 levels of Y (two negative, two positive, and 0),
you need c = 0.98.

This says to me that drawing a regression line through a scatter plot
representing c = 0.866 is highly misleading; it implies a linear
relationship _between_ the extremes that is unjustified. It takes three
points to establish a quadratic relationship, so you'd need c = 0.94 to get
a _minimal_ indication that Y = X^2 fits the data (you can perform any
transformation you like before applying the "linear" correlation method).

Any function can be decomposed into linear, quadradic, cubic, etc.
components, just as any waveform can be decomposed into a set of sine
waves differing in frequency and amplitude. If the "true" function
underlying the data were, for example, a positive exponential viewed over
some short range, it might not be possible to distinguish this
upward-curving function from a linear function with positive slope; in
fact the straight line might provide a good approximation to the
exponential over this range. Pearson r would index how well the linear
approximation fit the data. Linear models generally are so powerful
precisely because linear fits often yield good approximations over the
range of data under consideration.

Linear models are "powerful" only if you have some independent way of
knowing that the relationship should be linear. When you're presented with a
data set and have no knowledge of what the relationship is, the linear model
(where I come from) is considered the _weakest_ assumption, and it's
generally used only because of a lack of justification for anything more
complicated. And also, no less important, because there aren't very many
nonlinear equations that we can solve!

To distinguish an exponential from a straight line, you need at least two
slopes; i.e., three data points, which implies c = 0.94, at least. And that
would be quite insufficient to distinguish the exponential (A*exp(kx)) from
a quadratic (Ax^2 + Bx).

You can't get away from the fact that with a correlation of only 0.866, you
could not distinguish a straight line from an exponential relationship by
ANY means. But I should let Richard have his say on that.
......................

A somewhat lower correlation may indicate that (a) there are disturbances
acting on the CV which you are not able to measure, (b) the reference
level of your control system is varying during the test (and this
information is not available for inclusion in your simulation), (c) the
environmental feedback function is varying over the course of observation
and, again, you are not able to measure this variation, (d) your model is
wrong, (e) other.

Measuring these other sources of variation would of course be the best
solution to the problem, although this is not always possible given
available resources. The failure to observe the expected high
correlations is itself an important fact, though, which would then justify
further investigation. Thus, even low correlations may have scientific
utility.

You miss the point. Even to see that there are significant deviations from
theory, you must have very good data; the basic correlations must be in the
high 0.9s. To distinguish among different explanations for the deviations,
you must have better data still, or a better model. This means that the
major part of your labor has to go into getting very good data and improving
the model, not into devising ever more "powerful" ways of extracting
information from noisy data.
............................

Here is an example I worked up involving four predictor variables and
a response variable.

...
I am not quite sure what you are talking about here. Yes, the response

variable was some regular function of the values of P1 through P4. No,
there was no other function constraining the four values so that they were
related to each other. As noted in my post, the variables P1 through P4
were unrelated in the population, although spurious near-zero correlations
do appear in the statistical sample. Here are the observed correlations
between the predictors:

          P1 P2 P3
   P2 0.060
   P3 0.009 0.057
   P4 -0.031 0.007 0.019

You last sentence has me baffled. The object of observing the >correlations
in the data is to account for the variation in the response variable. You
don't know what "regular function," if any, may be at work. The regular
function is all you need? The regular function is what you seek to >discover!

Bafflement all around, here. It might help if you told us what the trick is!
...........................

Low correlations are to be
expected when there are numerous uncontrolled variables having an
influence on the response measure and these variables are not highly
correlated with each other.

Low correlations are also to be expected even when there are no
uncontrolled variables, if you are using the wrong model to represent the
relationship. In the case of Y = sin(X), simply to compute a Pearson's r
is to assume a linear relationship between X and Y. In this case, it's
using the wrong model that leads to the low correlation.

Yes, of course. That is where modeling comes in, and analytic skill
(e.g., I can show you a nice correlation between Y and sin(X) over the X
range 90 to 270 degrees).

Well, not THAT nice! What is it, by the way? Not that it would mean anything
-- the distribution is not normal.

But whether correlations are low because there are
influential variables not being measured or because the wrong model
(linear) is being applied to the data, those correlations are still
scientifically informative and useful. Furthermore, correlations should
never be interpreted without a look at the scatterplot. If a strong
_non_linear relationship is present, that will be revealed by the plot.

Not with a correlation of only 0.866. Unless you mean a VERY strong
nonlinearity or an obviously non-normal distribution.

In the life sciences and social sciences, many influential
variables often cannot be experimentally controlled (held constant); in
such cases relatively low correlations will be the rule.

That is the usual assumption, I agree. But if the influential variables
were known, they could be measured and accounted for.

Yes, and the search for them is what generates those tables of correlation
coefficients, most of which are low because the X-variable is not an
influential variable in the setting being investigated. Having obtained
low correlations, one understands that the current model or variable is
not doing a good job, and launches further investigations to rectify the
problem.

I think we agree on that. The point of disagreement between us is what
constitutes "rectifying" the problem.
....

The problem you note is true of any scientific investigation: the failure
to predict accurately indicates either that the model is incomplete or
that it is incorrect, or both. My statement is _not_ tantamount to
claiming that "there is nothing wrong with the theory."

I'm not saying that YOU would do this. But it is commonly done, simply by
failing to mention the alternative to "uncontrolled variables" as an
explanation of the scatter. How many papers have you seen in which the
author says that the low correlation can be explained either by uncontrolled
variables, or by the possibility that stimuli do not cause responses?

I thought this would be your point: "A whole lot of bad experiments can
be equal to one good experiment." I don't believe it.

I don't believe it either.

That is the second time in this post you put words in my mouth that I
neither expressed nor implied. I suppose it's good _polemics_ on your
part, but it certainly does not serve to advance _anyone's_ understanding.

I was thinking back to your remarks about bad experiments in EAB, in which
you said quite clearly that even though each experiment might have left much
to be desired, when many experiments are done, all indicating about the same
results, the net effect can be better than the results of any one
experiment. The problem is in assuming that all these experiments yield
about the same results, when in fact each one could be subject to many
different interpretations, because of the incomplete or wrong observations.
If everyone doing these experiments agrees on an interpretation within the
range covered by the experiments, this merely shows a desire to reach
agreement; it by no means shows that the data especially support that
particular interpretation.

In your "restatement" of my summary, you change "low correlations" into
"bad experiments,"

I think that low correlations are generated by bad experiments, so in my
view I didn't change anything. I don't expect people to suddenly start
getting high correlations by improving their bad experiments; I just wish
they would stop PUBLISHING before they have refined their experiments.

What I _did_ state is that several variables having low correlations with
a response measure sometimes can be combined to yeild a function that
predicts well. As this was _demonstrated empirically_ in the simulation I
reported on in my post, the truth of this statement is beyond question.

I'd still like to see what's behind those numbers. Is this a common
situation? If not, how do you tell when you're dealing with an example of it?

For this
reason it is simply _wrong_ to assume merely on the basis of low
correlation that a given relationship between two variables is "useless."

I agree that it's wrong to assume that the correlation couldn't be improved
by a better understanding of the underlying processes. But the fact is, a
correlation of 0.866 permits you only to predict the _sign_ of an effect,
which is useless if you need to predict more than that.

Best,

Bill P.

[From Rick Marken (970417.2200 PDT)]

I have not yet read Richard Kennaway's paper but from what I can
glean about it from the exchange between Abbott and Powers,
it seems to be a _masterpiece_.

I found this exchange to be particularly illuminating.

Bruce Abbott (970417.1340 EST) --

Measuring these other sources of variation would of course be the
best solution to the problem, although this is not always possible
given available resources. The failure to observe the expected
high correlations is itself an important fact, though, which
would then justify further investigation. Thus, even low
correlations may have scientific utility.

Bill Powers (970417.1329 MST) --

Even to see that there are significant deviations from
theory, you must have very good data; the basic correlations
must be in the high 0.9s. To distinguish among different
explanations for the deviations, you must have better data
still, or a better model. This means that the major part of
your labor has to go into getting very good data and improving
the model, not into devising ever more "powerful" ways of
extracting information from noisy data.

I think Bill makes an _extremely_ important point here. When I was
a conventional psychologist I delt with noisy data by "devising ever
more "powerful" ways of extracting information from" it. I can
understand the attraction; you get to use all kinds of cool statistical
techniques, you feel like you are doing some very brilliant analyses
and, most important, you don't have to go
through the labor of collecting more data using different
procedures and techniques, with no guarantee that you will ever
get high quality results.

When I became a PCT psychologist I started doing research in a new way.
If the data from a PCT experiment were noisy, I didn't try to make sense
of them with powerful statistical techniques. Instead, I would try to
change the experiment so that I would get good data
or I would try to change the model to get a better fit. It turns out
that this is the way "real" scientists go about doing their research.
But I didn't start doing research this way because I wanted to
imitate real scientists. I did it because the PCT model works so well.

The results of a well conceived PCT experiment are clear and precise and
perfectly consistent with the PCT model. So when the results of
a PCT experiment are noisy, one's inclination is not to blame extraneous
variables or "the inherent variability of behavior" or whatever (as in
conventional psychology); that is, one's
inclination is not to blame the phenomenon. Rather, one's
inclination is to blame oneself.

If I am not able to design an experiment so that it produces clear
and consistent results, I give up on the experiment until a cleverer
person comes along and does it right. If I were still a conventional
psychologist, instead of giving up on an experiment that produced noisy
data, I would do some fancy statistics on the data and publish the
results. Of course, if I were a conventional psychologist I probably
wouldn't find much of value in Richard Kennaway's paper either.

By the way, a good example of what I think is the right way to go about
doing behavioral research is described in Dick Robertson's recently
posted "Testing the self as a control system". Dick and
his colleagues kept changing their test procedure until they got fairly
clear, noise free evidence of control of a rather complex
perceptual variable (which can be called "self image"). Nice work, Dick.

Best

Rick

[From Bruce Abbott (970418.1100 EST)]

Rick Marken (970417.2200 PDT) --

I found this exchange to be particularly illuminating.

Bruce Abbott (970417.1340 EST) --

Measuring these other sources of variation would of course be the
best solution to the problem, although this is not always possible
given available resources. The failure to observe the expected
high correlations is itself an important fact, though, which
would then justify further investigation. Thus, even low
correlations may have scientific utility.

Bill Powers (970417.1329 MST) --

Even to see that there are significant deviations from
theory, you must have very good data; the basic correlations
must be in the high 0.9s. To distinguish among different
explanations for the deviations, you must have better data
still, or a better model. This means that the major part of
your labor has to go into getting very good data and improving
the model, not into devising ever more "powerful" ways of
extracting information from noisy data.

I think Bill makes an _extremely_ important point here.

I agree with Bill's point, as I stated before. And I make an extremely
important point here as well, although you have chosen to ignore it. Both
Bill's position and mine are correct, but refer to different points in an
investigation. The correlational approach is useful during the early stages
of an investigation, when one is attempting to identify what variables are
involved and how they interact. At this stage, even relatively low
correlations can be informative. Later, when you have created a model or
models suitable for testing, very high correlations are desirable as
indicating a good "fit" of model to data, and to discriminate alternative
models.

When I was
a conventional psychologist I delt with noisy data by "devising ever
more "powerful" ways of extracting information from" it. I can
understand the attraction; you get to use all kinds of cool statistical
techniques, you feel like you are doing some very brilliant analyses
and, most important, you don't have to go
through the labor of collecting more data using different
procedures and techniques, with no guarantee that you will ever
get high quality results.

When you were a conventional psychologist you must have been enthralled with
statistics. I don't view this sort of activity as productive: if your data
are messy, you can't save the study with "ever more powerful ways of
extracting information from it." This is _not at all_ the approach I am
talking about. What I am talking about is that a set of relatively low
correlations among a number of measured variables can turn out to be useful
for a number of scientific purposes. As I've already gone over this in
another post, I won't repeate it here. But please, if you are going to
offer "replies" to my posts, let's get on the same frequency. To pretend
that I am talking about one thing when I am discussing something else
entirely, and to offer criticism on that basis, contributes nothing to
understanding.

When I became a PCT psychologist I started doing research in a new way.
If the data from a PCT experiment were noisy, I didn't try to make sense
of them with powerful statistical techniques. Instead, I would try to
change the experiment so that I would get good data
or I would try to change the model to get a better fit. It turns out
that this is the way "real" scientists go about doing their research.

What a relief: this is the way _I_ was taught to do scientific research.
Try reading Murray Sidman's (1960) _Tactics of Scientific Research_ for an
excellent presentation of this method.

The results of a well conceived PCT experiment are clear and precise and
perfectly consistent with the PCT model. So when the results of
a PCT experiment are noisy, one's inclination is not to blame extraneous
variables or "the inherent variability of behavior" or whatever (as in
conventional psychology); that is, one's
inclination is not to blame the phenomenon. Rather, one's
inclination is to blame oneself.

When the results of a PCT experiment are noisy, there is no use "blaming"
anyone. One looks for an explanation for the noisy data. This explanation
must _necessarily_ involve either extraneous variables or problems with the
structure of the model being fit to the data. It is not to be found in the
"self."

If I am not able to design an experiment so that it produces clear
and consistent results, I give up on the experiment until a cleverer
person comes along and does it right.

Well, that explains why PCT has not advanced beyond simple tracking
experiments in over 25 years. You leave all the really hard work to others
-- and there ain't no others.

If I were still a conventional
psychologist, instead of giving up on an experiment that produced noisy
data, I would do some fancy statistics on the data and publish the
results.

Of course, if I were a conventional psychologist I probably
wouldn't find much of value in Richard Kennaway's paper either.

My, how you do go on about a paper you haven't even read. (I don't suppose
we can rely on you for an _objective_ evaluation, can we?) But more to the
point: I have already stated (twice now, I believe) that Kennaway's paper is
right on the mark as far as it goes. No statistically competent
"conventional psychologist" would find anything in it objectionable, insofar
as its conclusions about the ability to predict the Y-value of a specific
point from its X-value are concerned. You imply (again, without having even
seen the paper!) that the paper has something important -- and disturbing --
to say about the _general_ usefulness of moderate correlations (disturbing,
that is, to conventional psychologists). I don't know how you can reach
that conclusion without having read the paper. If you still hold that view
_after_ reading Kennaway's paper, well, we'll cross that bridge when we come
to it.

Regards,

Bruce

[Martin Taylor 970419 13:50]

Bill Powers (970417.1329 MST)

To distinguish 2 points (the minimum required to establish a regression
line) the correlation must be 0.866 (for p < 0.05, I presume). But this does
not allow you to assign values on any finer scale. You can say only that if
X is known to be positive, Y will be SOME positive number -- but you can't
say WHAT positive number. That's what "1 bit of Information" means.

No it doesn't. It means that the variance of estimate has been cut to
1/4 of its original value (Standard deviation halved--uncertainty reduced
by -log2(0.5)). Only if you have values limited to belonging to a discrete
set of distinct values (e.g. the integers) can you say what you said.

For the rest, I think Bruce Abbott gave a sufficient response.

But I think a little expansion on his four-variable demonstration might be
in order. In a deterministic universe, if we knew all the influences that
affect a particular variable, and knew the values of all of them, we would
be able to predict the value of the variable exactly. Omit any of the
influences, and the prediction becomes uncertain--for replicated values
of the rest of the influences, the value of the variable changes because
of differences in the value of the unmeasured influence. In Bruce's micro-
universe, the variable was equally influenced by four independent variables.
If one knew all the values of those variables and the nature of the
influence, one would be able to predict the value of the variable with
no error.

Now suppose you were an experimenter, looking to see what was affecting this
variable that interested you, and you guessed that Bruce's influence P1
might be doing so. You might vary P1 and see what effect it had on the
variable V. You get a correlation with r^2 about 1/4. If you say "That's
useless. I'll drop consideration of P1 because the effect is too uncertain
to mean anything" you lose your chance of ever finding out how V is caused.
But if you keep V and P1, and now look at what you can do about V having
now guess that P2 also has an influence, you find another lowish correlation.
Now you find that P2 has a r^2 of about 1/4 with V, but of about 1/3 with
the variation of V after the effect of P1 _on each individual trial_ has
been partialled out. And so on with P3, which has an r^2 of about 0.5
after the "useless" effects of P1 and P2 have been applied to the
individual trials, and finally P4 has an r^2 of 1.0--but _only_ if you
take account of the effects that you had deemed to have uselessly low
correlations. If you don't take account of those "trivial" influences,
P4 also seems to have a trivial effect, with r^2 = 1/4.

What's useless depends on what you want to use it for.

Martin

For this
reason it is simply _wrong_ to assume merely on the basis of low
correlation that a given relationship between two variables is "useless."

I agree that it's wrong to assume that the correlation couldn't be improved
by a better understanding of the underlying processes. But the fact is, a
correlation of 0.866 permits you only to predict the _sign_ of an effect,
which is useless if you need to predict more than that.

No. It enables you to predict the _value_ of the correlated variable twice
as accurately as you could otherwise have done.

That's what's meant by the difference between _some_ information and none.
And between some information and complete information, which would allow
you to predict the value precisely.

Martin

[From Bill Powers (2007.07.22.1235 MDT)]

Here is the Kennaway paper on predicting individual data from group data.

Bill P.

No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.476 / Virus Database: 269.10.10/908 - Release Date: 7/19/2007 6:10 PM

StatPredictJRK.pdf (250 KB)

[David Goldstein}

Bill,

Do you know if this paper by Kennaway was actually published?
I imagine that it would have made a big splash.

The Psychological test with the highest test-rest reliability are the IQ
tests by David Wechsler.
I recall that they are about .95.

from Kenneways paper, using this test to predict what an individal who
takes it at time 1 will do
at time 2 will not be very accurate. Using Table 3, the probability that
it will fall within plus or minus
a half a decile will be .47. In the IQ test, a standard deviation is 15
IQ points. If plus or minus
2 standard deviations on each side of the mean of 100 covers most of the
range of 60 IQ points, a decile would be
6 IQ points; a half of a decile would be 3 IQ points. So the probability
that the person IQ would fall
within plus or minus 3 IQ points would be .47. This is about what the
manual says.

The maximum possible correlation between two tests is equal to the
square root of their respective test
re-test reliabilities. So we would need a test of which is at least as
reliable as the IQ test to get any meaningful
results for an individaul.

David

···

-----Original Message-----
From: Control Systems Group Network (CSGnet)
[mailto:CSGNET@LISTSERV.UIUC.EDU] On Behalf Of Bill Powers
Sent: Sunday, July 22, 2007 2:36 PM
To: CSGNET@LISTSERV.UIUC.EDU
Subject: Kennaway paper

[From Bill Powers (2007.07.22.1235 MDT)]

Here is the Kennaway paper on predicting individual data from group
data.

Bill P.

No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.476 / Virus Database: 269.10.12/910 - Release Date:
7/21/2007 3:52 PM

No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.476 / Virus Database: 269.10.12/910 - Release Date:
7/21/2007 3:52 PM