A TABLE SHOWING THE RELATIONSHIP BETWEEN
SEVERAL DESCRIPTIVE STATISTICS*
···
____________________________________________
r r2 k2 k E
____________________________________________
1.00 1.00 .00 .00 100 %
.9995 .999 .001 .032 97 %
.9987 .997 .003 .054 95 %
.995 .99 .01 .099 90 %
.954 .91 .09 .299 70 %
.90 .81 .19 .435 56 %
.87 .756 .244 .493 51 %
.865 .748 .252 .50 50 %
.80 .64 .36 .60 40 %
.75 .56 .44 .66 34 %
.71 .50 .50 .70 30 %
.65 .42 .58 .76 24 %
.60 .36 .64 .80 20 %
.55 .30 .70 .83 17 %
.50 .25 .75 .87 13 %
.45 .20 .80 .89 11 %
.40 .16 .84 .92 8 %
.35 .12 .88 .94 6 %
.30 .09 .91 .95 5 %
.25 .06 .94 .97 3 %
.20 .04 .96 .98 2 %
.15 .02 .98 .99 1 %
.10 .01 .99 .995 0 %
.00 .00 1.00 1.00 0 %
DEFINITION AND INTERPRETATION OF THESE STATISTICS**
All of these measures describe two variables (X, Y)
within a particular sample:
r is a correlation (or coefficient of
correlation) which describes the linear association of
one variable with another. It can also be
characterized as "... a relative measure of the degree
of association between two series " of values for two
variables. It varies between 1 (perfect positive
correlation) to -1 (perfect negative correlation).
The closer this measure is to a perfect correlation
the more confidence one has in "predicting" the values
of one variable from another variable.
r2 is a measure of "explained" variance (or
coefficient of determination) which describes "shared"
variation or the amount of variance that one variable
is "explained" by the other variable or the proportion
of the sum of y2 that is dependent on the regression
of Y on X. The larger the numerical value of this
measure the more confidence one has in "predicting"
the values of one variable from another.
k2 is a measure of "unexplained" variance (or
coefficient of nondetermination) which describes
"unshared" variation or the amount of variance that
one variable is NOT "explained" by the other variable
or the proportion of the sum of y2 that is independent
of the regression of Y on X. The smaller the
numerical value of this measure the more confidence
that one has in "predicting" the values of one
variable from another.
k is a measure (called coefficient of
alienation) which describes the lack of linear
association of one variable with another or the ratio
of the standard error of estimate to the standard
deviation of the variable. The smaller the numerical
value of this measure the more confidence one has in
"predicting" the values of one variable from another.
E this measure is computed by (1-k)100 and is called
an "index of forecasting efficiency" (Downie and
Heath, 1965: 226) and indicates the "improvement" for
a prediction by knowing the coefficient of correlation
(r) for two variables as contrasted with knowing
nothing about the linear association of the two
variables. For example, with a coefficient of
correlation of .71 one can "predict" the values of one
variable from another 30% better (on the average) than
one could "predict" those values WITHOUT any knowledge
of the relationship between the two variables OR one
has decreased the size of the "error of prediction" by
30% (on the average) by knowing that the correlation
of the two variables is .71.
REFERENCES
Arkin, Herbert and Raymond R. Colton. 1956.
Statistical Methods. College Outline series, Forth
Edition, Revised.
Downie, N. M. and R. W. Heath. 1965. Basic Statistical
Methods. Second Edition. New York: Harper and Row.
______________________________
*compiled by Charles W. Tucker with the encouragment and
assistance of the Control Systems Group CSG-L @ UIUCVMD
(especially Gary Cziko) and the comments of Jimy Sanders.
Other comments appreciated - N050024 AT UNIVSCVM.BITNET
**It should be noted that these descriptions and
interpretations, especially those involving
"predictions" are limited to a particular sample; if
another sample is not a random sample from the same
population then predictions about the other variable
("Y") will be unpredictably worse than the original
sample.