Statistics and predictions

Bill_Powers1 · November 6, 2006, 11:44pm

[From Bill Powers (2006.11.06.1610 MST)]

I'm finding it rather hard to communicate what I'm trying to find out about the uses of statistics. From David G's post, here's an example:

David says:

We do need someway of measuring whether or not MOL Therapy results in a change in a person which the person views as positive or desired.

I'm not disagreeing with that. I am simply questioning whether the usual kinds of tests we find in psychology are capable of giving us this information. If they can, then we can use them. If they can't, we will have to devise our own.

I then say:
The question is, what are the chances that the first test
will correctly indicate whether the person has a given characteristic
as shown by the second test?

And David replies,

If the two tests are correlated, then knowing a person's percentile rank on the first test will predict the person's percentile rank on the second test with an accuracy that is a function of the magnitude of the correlation coefficient.

There is the whole point I'm trying to raise. What is that function? If I know that a test result correlates 0.6 with an outcome measure, just how accurate is the test as a way of predicting outcome? To refresh memories, here is a scatter plot of a correlation of 0.6 between two variables:

···

-----------------------------------------------------------------------------------------
*
                                      *
                              *
      *
*
                                        *
                                                      *
              *
                                     *
                                                            *
                 *
                       *
                  *
                                                               *
                    *
                                                     *
                          *
                                       *
                                                                     *

*
*
*
Correlation = 0.620
----------------------------------------------------------------------------------------
Clearly, given knowledge of the value on the x-axis, one's ability to predict the value on the y-axis from the regression equation is very, very low. Looking at the whole plot you can see a trend, but that trend does not allow making any predictions of individual points. Even if we divide the data into simply high versus low values, a binary scale, huge errors can result. Trying to do any finer discriminations would make the predictions even worse (see Runkel, Casting Nets and Testing Specimens, on "Fine slicing.")

From other discussions about this, it seems clear to me that there is a lot of confusion between the "nets" side and the "specimens" side. Inevitably, results are said to be about "people" or "subjects", which means that they are about a population (the whole plot), not an individual specimen (one point). This confusion is so deeply ingrained in psychology that it's almost impossible to speak about individuals. Yet a therapist who deals with patients one at a time never faces anyone but an individual, and using tests to learn something about that individual is, it seems to me on the basis of the above diagram, the stuff from which prejudices are made. But I want some numbers. How do we convert a scatter plot like the one above into some sort of statement about the chances that a given prediction from the regression line will be correct, or within some range of the actual value? Am I even asking the right question?

Best,

Bill P.

Fred_Nickols1 · November 7, 2006, 12:32am

[From Fred Nickols (2006.11.06.1930 EST)] --

Bill:

I believe you know I was employed at Educational Testing Service (ETS) for about a dozen years (1990-2001). I still have friends there, including a prominent statistician and former VP of research. If you like, I can put you in touch with him and perhaps he can answer your question(s).

Regards,

Fred Nickols
www.nickols.us
nickols@att.net

Bill Powers <powers_w@FRONTIER.NET>

···

[From Bill Powers (2006.11.06.1610 MST)]

I'm finding it rather hard to communicate what I'm trying to find out
about the uses of statistics. From David G's post, here's an example:

David says:
>We do need someway of measuring whether or not MOL Therapy results
>in a change in a person which the person views as positive or desired.

I'm not disagreeing with that. I am simply questioning whether the
usual kinds of tests we find in psychology are capable of giving us
this information. If they can, then we can use them. If they can't,
we will have to devise our own.

I then say:
The question is, what are the chances that the first test
will correctly indicate whether the person has a given characteristic
as shown by the second test?

And David replies,
>If the two tests are correlated, then knowing a person's percentile
>rank on the first test will predict the person's percentile rank on
>the second test with an accuracy that is a function of the
>magnitude of the correlation coefficient.

There is the whole point I'm trying to raise. What is that function?
If I know that a test result correlates 0.6 with an outcome measure,
just how accurate is the test as a way of predicting outcome? To
refresh memories, here is a scatter plot of a correlation of 0.6
between two variables:
--------------------------------------------------------------------------------
---------
*
                                      *
                              *
      *
*
*
                                        *
                                                      *
              *
                                     *
                                                            *
                 *
                       *
                  *
                                                               *
                    *
                                                     *
                          *
                                       *
                                                                     *

*
                                                  *
                                                  *
                                                   *
Correlation = 0.620
--------------------------------------------------------------------------------
--------
Clearly, given knowledge of the value on the x-axis, one's ability to
predict the value on the y-axis from the regression equation is very,
very low. Looking at the whole plot you can see a trend, but that
trend does not allow making any predictions of individual points.
Even if we divide the data into simply high versus low values, a
binary scale, huge errors can result. Trying to do any finer
discriminations would make the predictions even worse (see Runkel,
Casting Nets and Testing Specimens, on "Fine slicing.")

From other discussions about this, it seems clear to me that there
is a lot of confusion between the "nets" side and the "specimens"
side. Inevitably, results are said to be about "people" or
"subjects", which means that they are about a population (the whole
plot), not an individual specimen (one point). This confusion is so
deeply ingrained in psychology that it's almost impossible to speak
about individuals. Yet a therapist who deals with patients one at a
time never faces anyone but an individual, and using tests to learn
something about that individual is, it seems to me on the basis of
the above diagram, the stuff from which prejudices are made. But I
want some numbers. How do we convert a scatter plot like the one
above into some sort of statement about the chances that a given
prediction from the regression line will be correct, or within some
range of the actual value? Am I even asking the right question?

Best,

Bill P.

Bill_Powers1 · November 7, 2006, 2:45pm

[From Bill Powers (2006.11.07.0740 MDT)]

Fred Nickols (2006.11.06.1930 EST) --

I believe you know I was employed at Educational Testing Service (ETS) for about a dozen years (1990-2001). I still have friends there, including a prominent statistician and former VP of research. If you like, I can put you in touch with him and perhaps he can answer your question(s).

That might be useful -- perhaps you could forward the same post to him and ask if he has any comments.

Do you have any observations of your own to contribute to this subject? Your ideas might be less biased than those of a "prominent statistician."

Best,

Bill P.

Fred_Nickols1 · November 7, 2006, 3:09pm

[From Fred Nickols (2006.11.07.1006 EST)] --

I didn't see this post until after I sent the one just sent so ignore the P.S. in that post.

Bill Powers <powers_w@FRONTIER.NET>

[From Bill Powers (2006.11.07.0740 MDT)]

Fred Nickols (2006.11.06.1930 EST) --

>I believe you know I was employed at Educational Testing Service
>(ETS) for about a dozen years (1990-2001). I still have friends
>there, including a prominent statistician and former VP of
>research. If you like, I can put you in touch with him and perhaps
>he can answer your question(s).

That might be useful -- perhaps you could forward the same post to
him and ask if he has any comments.

Do you have any observations of your own to contribute to this
subject? Your ideas might be less biased than those of a "prominent
statistician."

I'll check with him. He's leaving ETS, bound for a professorship at Boston College, so no telling what all he's wrapped up in right now.

I read your earlier posts. I'll go back and re-read them to see if I understand them and will clarify with you before positing any of my own ideas. (Actually, I have very few ideas of my own; most are the rehashing of others; for example, the GAP-ACT model is simply your PCT stuff slightly reconfigured.)

Regards,

Fred Nickols
www.nickols.us
nickols@att.net