Statistics referred to in Tom's post of 921022.13:45

CHARLES_W_TUCKER5 · October 26, 1992, 1:48pm

A TABLE SHOWING THE RELATIONSHIP BETWEEN
SEVERAL DESCRIPTIVE STATISTICS*

···

____________________________________________
r r2 k2 k E
____________________________________________

           1.00 1.00 .00 .00 100 %
            .9995 .999 .001 .032 97 %
            .9987 .997 .003 .054 95 %
            .995 .99 .01 .099 90 %
            .954 .91 .09 .299 70 %
            .90 .81 .19 .435 56 %
            .87 .756 .244 .493 51 %
            .865 .748 .252 .50 50 %
            .80 .64 .36 .60 40 %
            .75 .56 .44 .66 34 %
            .71 .50 .50 .70 30 %
            .65 .42 .58 .76 24 %
            .60 .36 .64 .80 20 %
            .55 .30 .70 .83 17 %
            .50 .25 .75 .87 13 %
            .45 .20 .80 .89 11 %
            .40 .16 .84 .92 8 %
            .35 .12 .88 .94 6 %
            .30 .09 .91 .95 5 %
            .25 .06 .94 .97 3 %
            .20 .04 .96 .98 2 %
            .15 .02 .98 .99 1 %
            .10 .01 .99 .995 0 %
            .00 .00 1.00 1.00 0 %

DEFINITION AND INTERPRETATION OF THESE STATISTICS**

All of these measures describe two variables (X, Y)
within a particular sample:

          r is a correlation (or coefficient of
     correlation) which describes the linear association of
     one variable with another. It can also be
     characterized as "... a relative measure of the degree
     of association between two series " of values for two
     variables. It varies between 1 (perfect positive
     correlation) to -1 (perfect negative correlation).
     The closer this measure is to a perfect correlation
     the more confidence one has in "predicting" the values
     of one variable from another variable.

          r2 is a measure of "explained" variance (or
     coefficient of determination) which describes "shared"
     variation or the amount of variance that one variable
     is "explained" by the other variable or the proportion
     of the sum of y2 that is dependent on the regression
     of Y on X. The larger the numerical value of this
     measure the more confidence one has in "predicting"
     the values of one variable from another.

          k2 is a measure of "unexplained" variance (or
     coefficient of nondetermination) which describes
     "unshared" variation or the amount of variance that
     one variable is NOT "explained" by the other variable
     or the proportion of the sum of y2 that is independent
     of the regression of Y on X. The smaller the
     numerical value of this measure the more confidence
     that one has in "predicting" the values of one
     variable from another.

          k is a measure (called coefficient of
     alienation) which describes the lack of linear
     association of one variable with another or the ratio
     of the standard error of estimate to the standard
     deviation of the variable. The smaller the numerical
     value of this measure the more confidence one has in
     "predicting" the values of one variable from another.

     E this measure is computed by (1-k)100 and is called
     an "index of forecasting efficiency" (Downie and
     Heath, 1965: 226) and indicates the "improvement" for
     a prediction by knowing the coefficient of correlation
     (r) for two variables as contrasted with knowing
     nothing about the linear association of the two
     variables. For example, with a coefficient of
     correlation of .71 one can "predict" the values of one
     variable from another 30% better (on the average) than
     one could "predict" those values WITHOUT any knowledge
     of the relationship between the two variables OR one
     has decreased the size of the "error of prediction" by
     30% (on the average) by knowing that the correlation
     of the two variables is .71.

REFERENCES

     Arkin, Herbert and Raymond R. Colton. 1956.
     Statistical Methods. College Outline series, Forth
     Edition, Revised.

     Downie, N. M. and R. W. Heath. 1965. Basic Statistical
     Methods. Second Edition. New York: Harper and Row.
     ______________________________
     *compiled by Charles W. Tucker with the encouragment and
     assistance of the Control Systems Group CSG-L @ UIUCVMD
     (especially Gary Cziko) and the comments of Jimy Sanders.
     Other comments appreciated - N050024 AT UNIVSCVM.BITNET

     **It should be noted that these descriptions and
     interpretations, especially those involving
     "predictions" are limited to a particular sample; if
     another sample is not a random sample from the same
     population then predictions about the other variable
     ("Y") will be unpredictably worse than the original
     sample.