[From Bill Powers (2006.11.06.1610 MST)]
I'm finding it rather hard to communicate what I'm trying to find out about the uses of statistics. From David G's post, here's an example:
David says:
We do need someway of measuring whether or not MOL Therapy results in a change in a person which the person views as positive or desired.
I'm not disagreeing with that. I am simply questioning whether the usual kinds of tests we find in psychology are capable of giving us this information. If they can, then we can use them. If they can't, we will have to devise our own.
I then say:
The question is, what are the chances that the first test
will correctly indicate whether the person has a given characteristic
as shown by the second test?
And David replies,
If the two tests are correlated, then knowing a person's percentile rank on the first test will predict the person's percentile rank on the second test with an accuracy that is a function of the magnitude of the correlation coefficient.
There is the whole point I'm trying to raise. What is that function? If I know that a test result correlates 0.6 with an outcome measure, just how accurate is the test as a way of predicting outcome? To refresh memories, here is a scatter plot of a correlation of 0.6 between two variables:
···
-----------------------------------------------------------------------------------------
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
Correlation = 0.620
----------------------------------------------------------------------------------------
Clearly, given knowledge of the value on the x-axis, one's ability to predict the value on the y-axis from the regression equation is very, very low. Looking at the whole plot you can see a trend, but that trend does not allow making any predictions of individual points. Even if we divide the data into simply high versus low values, a binary scale, huge errors can result. Trying to do any finer discriminations would make the predictions even worse (see Runkel, Casting Nets and Testing Specimens, on "Fine slicing.")
From other discussions about this, it seems clear to me that there is a lot of confusion between the "nets" side and the "specimens" side. Inevitably, results are said to be about "people" or "subjects", which means that they are about a population (the whole plot), not an individual specimen (one point). This confusion is so deeply ingrained in psychology that it's almost impossible to speak about individuals. Yet a therapist who deals with patients one at a time never faces anyone but an individual, and using tests to learn something about that individual is, it seems to me on the basis of the above diagram, the stuff from which prejudices are made. But I want some numbers. How do we convert a scatter plot like the one above into some sort of statement about the chances that a given prediction from the regression line will be correct, or within some range of the actual value? Am I even asking the right question?
Best,
Bill P.