Correlations again -- request for referees

[From Richard Kennaway (970923.1700 BST)]

I'm on the point of submitting my paper on correlations to "Science". They
require me to suggest 4 to 6 people outside my own institution who would be
competent to referee it. Is there anyone here who would like to volunteer,
or who can suggest other people? Pillars of academic respectability in the
field of statistics would be particularly useful (if that's consistent with
being favourable to the message of the paper). I need to supply Science
with their full contact details and a description of their fields of
interest.

The editors also want me to give them the names of colleagues who have
reviewed the paper. Is there anyone who has read it and would be willing
to have their name listed?

The current version is at
http://www.sys.uea.ac.uk/~jrk/distribution/correlationinfo.ps

-- Richard Kennaway, jrk@sys.uea.ac.uk, http://www.sys.uea.ac.uk/~jrk/
   School of Information Systems, Univ. of East Anglia, Norwich, U.K.

[From Bill Powers (970924.1526 MDT)]

Richard Kennaway (970923.1700 BST)--

I'm on the point of submitting my paper on correlations to "Science". They
require me to suggest 4 to 6 people outside my own institution who would be
competent to referee it. Is there anyone here who would like to volunteer,
or who can suggest other people? Pillars of academic respectability in the
field of statistics would be particularly useful (if that's consistent with
being favourable to the message of the paper).

I suggest Philip J. Runkel (runk@oregon.Uoregon.edu).

The editors also want me to give them the names of colleagues who have
reviewed the paper. Is there anyone who has read it and would be willing
to have their name listed?

Me.

Best,

Bill P.
P.S., you're on the list.

[From Mike Acree (970925.1037 PDT)]

Richard Kennaway (970923.1700 BST)

I just read your paper for the first time last night, and would be happy
to recommend it for publication, if my credentials as Senior
Statistician at the Center for AIDS Prevention Studies at the University
of California, San Francisco, carried any weight.

Although the paper doesn't really break any new theoretical ground, and
its practical message is not without precedent, it does provide a very
nice and compelling demonstration of the silliness of most research
practice in the social sciences, and such demonstrations are still badly
needed. The basic point I think you are targeting is that statistics in
general are relevant or useful only where our interest lies in random
mass phenomena, in aggregates rather than individuals. That _may_ be
the case with something like large-scale personnel decisions such as
college admissions--seen from the point of view of the college rather
than the individual applicants. But even there I think most
institutions now at least want to avoid the appearance of relying on
simple mechanical formulas. Social scientists, and psychologists
especially (their statistical training being notoriously weak compared
with, say, economists'), rely very heavily on statistics in theoretical
work, but that mostly gives away what a weak notion they have of theory
(as Bill has observed). Most psychological variables seem to correlate
with most others in the range of .2-.4, and it is hard to draw useful
conclusions even about aggregates or patterns with such data. Michael
Oakes' finding may also be relevant here, that academic psychologists
typically think a scatterplot with a .8 correlation represents a
correlation of about .5; i.e. they seriously overestimate the strength
of association represented by correlations.

Psychologists have never been fond of confidence intervals, the main
reason surely being that they don't preserve the illusion of a
definitive answer to the question in the way that the yes-no outcome of
a significance test does; but a secondary reason is probably the
awareness that confidence intervals based on typical psychology-size
samples, if you compute them, turn out to be wide enough to drive a
truck through--which makes it all too plain how little information there
is in the data. That's especially true of confidence intervals for
individual points based on a regression line, and especially there for
values of X far from the mean. Power analysis has focused our
attention, increasingly in recent years, thanks to Jacob Cohen, on how
large samples must be in order for confidence limits to be serviceably
narrow; what you have done is essentially to accept the sample sizes in
common use and ask how large the _effect_ must be to achieve similarly
narrow and useful limits.

Though I think you do enough with your examination of correlations,
similar arguments would apply to virtually all other statistical
techniques. I especially appreciated your remarks about multiple
regression, which in one form or another has become the staple in the
field. It is totally commonplace for investigators (and readers) to
interpret regression coefficients as though they represented the
bivariate relationship of a given predictor to the dependent variable,
and to overlook their dependency on whatever other variables happened to
be included in the equation (the model rarely being selected with any
compelling theoretical rationale).

The least compelling part of the article for me was that on screening
tests, where you were computing the correlation such that an individual
had "a 95% chance of receiving a prediction that has a 95% chance of
being correct." The extreme magnitudes involved in that exercise are
piquant; but the double quantification is hard to keep track of as well
as demanding--reminds me of Oakes' wondering, if he calculated a 95%
confidence interval but knew he had a tendency to overconfidence, how
confident was he _really_?

The point about the irrelevance of statistics for individual prediction
was made very nicely by Keynes, who observed in his dissertation on
probability (published in 1921) that, although the Post Office keeps
statistics on letters mailed without stamps, those statistics had but
the slightest bearing on the question of whether _he_ would post a
letter without a stamp. You make, in a sentence or two, the point that
we still act _as if_ statistics were useful for individual prediction;
though the focus of the article is more on the mathematical
demonstration, I would think this fundamental observation would bear
strengthening.

I found only two typos, which I assume you have caught: the
nonsuperscripted 2s in the formula for the ellipse in Appendices A.4 and
A.6. I make that prediction, of course, not on the basis on any
statistics about authors catching their own typographical errors, but on
my impression of you, from the rest of the manuscript, as a thoroughly
fastidious individual :-).

Best wishes with the publication process.

Michael Acree
UCSF Center for AIDS Prevention Studies
74 New Montgomery Street, Suite 600
San Francisco, CA 94105-3444
(415) 597-9148
Fax: (415) 597-9213