[From Richard Kennaway (20061107.1803 GMT)]
Bill Powers writes:
I feel a great reluctance to use the kinds of tests that psychologists have devised over the years, tests that David Goldstein feels might be useful in evaluating outcomes of therapy. I don't want to be dogmatic about this, so I'm looking (still) for some consensual way to evaluate this approach.
Here is a way of putting the problem I see:
Given: two tests, one measuring performance in some task and the other purporting to show whether a person has certain characteristics or traits. The question is, what are the chances that the first test will correctly indicate whether the person has a given characteristic as shown by the second test?
This implies that there are characteristics (like weight) that a person has or doesn't have regardless of our ability to measure them with the second test. The second test, we can say, produces results that are disturbed by unknown factors which cause the measurements to vary with some standard deviation. We can stipulate that without the disturbances, the test would measure accurately.
We can also stipulate that the first test is similar: if it were not for unknown randomly varying factors, the first test would always yield the same (correct) measure of performance. I know this assumption is unwarranted, but let's make it anyway since it seems to be a popular one.
Under these assumptions, the question now concerns the translation of a correlation between the first test and the second (given knowledge of the standard deviations) into a probability that the person actually has the characteristic in question. I suppose this can be cast in terms of the proportions of false positives or false negatives. I introduce the correlation topic because that is the figure that is usually obtainable from articles in the literature.
I know that you've already analyzed the case for screening people for traits or predicted performance on the basis of tests. I'm sure that paper is on my computer or in an archive somewhere -- I haven't found it yet. But maybe I'm asking a slightly different question now.
Ah yes, my first ever post to CSGNET, these many years ago. I have the current version of the paper online at http://www.cmp.uea.ac.uk/~jrk/distribution/corrinfo.pdf
That's analysing the case where there are two measurements, X and Y, of different properties of the same thing, and a correlation between them, and asks questions such as, how well can one predict the value of Y from the value of X? Not very well is the answer for correlations below about 0.9. At 0.8, estimating the sign of Y from the sign of X (assuming the means are normalised to zero) is right 80% of the time; at a correlation of 0.5, it is right 2/3 of the time.
I think you're asking a more complicated question, where the measurements of X and Y are themselves uncertain indicators of the real X and Y, giving four variables:
MX: measured X
RX: real X
RY: real Y
MY: measured Y
with correlations between each consecutive pair: c(MX,RX), c(RX,RY), and c(RY,MY). Then the question is, given an observed c(MX,MY), and -- obtained somehow -- values for c(MX,RX) and c(MY,RY), what can one say about c(RX,RY)?
In general, c(RX,RY) is going to be greater than c(MX,MY). Informally, if you can see a correlation through the fog of measurement errors, the real correlation is likely to be higher than the one you see. That assumes that the errors in the two measurements are not correlated with each other.
If I assume that MX = RX + EX and MY = RY + EY, where EX and EY are random variables uncorrelated with each other or with RX and RY, then the loss of correlation can be calculated. Take the ratios of standard deviations A = sigma(EX)/sigma(RX) and B = sigma(EY)/sigma(RY). Then:
c(MX,MY) = k c(RX,RY)
where
k = 1/sqrt( (1 + A^2)(1 + B^2) )
I'm assuming these are all continuous variables, but I expect the effect would be similar for binary variables.
This is actually fairly insensitive to errors. If A=B then k = 1/(1 + A^2), which is 0.99 when A = 0.1, 0.8 when A = 0.5.
If EX and EY are correlated, and we take A=B again, then:
c(RX,RY) + c(EX,EY) A^2
c(MX,MY) = -----------------------
1 + A^2
which reduces to the previous formula when c(EX,EY) = 0.
If the errors are more highly correlated than RX and RY, then c(MX,MY) will exceed c(RX,RY). For example:
A c(EX,EY) c(RX,RY) c(MX,MY)
0.5 1 0.5 0.75
0.5 0.75 0.5 0.55
0.5 0.5 0.5 0.5
0.5 0.25 0.5 0.45
0.5 0 0.5 0.4
The context of my question is evaluation of therapy outcomes. This gives the question two meanings.
1. If one gives a patient a test before therapy and it indicates that the person has a condition like depression, what are the chances that the person actually has that disorder, under the assumptions above (i.e., there is actually a condition called depression that the person has or doesn't have, and so on).
2. If a before-and-after test is given, what are the chances that the condition actually changed in the direction implied by the two tests? I suppose that has to be put in the form of the probability that there is a finding of change when in reality there was a change in the opposite direction, or no change.
In this case, X and Y are different values of the same thing at different times, so it is probably unjustified to assume that EX and EY are uncorrelated. If the test before overestimates the person's depression, the test after might very well do the same, because that's what the test will always do on that person.
The worst case is when the measurement error behaves very consistently on any one person (c(EX,EY) is close to 1), the real before and after variables have have lower correlation, and the errors are large. When that is so, the correlation between the before and after tests is primarily the correlation between the two errors, and is not measuring the underlying change.
If the standard deviations were close to zero, these questions would answer themselves: the correct result would be the one we measure. But with correlations less than 1.0, the chances of a favorable result become less. My suspicion is that in the range of correlations usually found in the psychological literature, the favorable answer has less than a 50% probability. For example, suppose the probability is 70 percent that a person who shows as depressed on a performance test is actually depressed. And suppose the probability that a test which measures depression by other means has a 70 percent probability of being correct. In that case, the chance that person who is described as depressive by the first test will measure as depressive on tghe second is 50% -- a coin toss would do as well as this combination of tests. And, of course, if antidepressants are prescribed, half of the time the main effects on the patient will be the side-effects.
That depends on whether or not the errors in the two tests are correlated. One can get at that by applying both tests to a large number of people and looking at the correlation between the tests.
This sort of thing is done for intelligence tests. There are a large number of tests purporting to measure "intelligence" (whatever that is). One can analyse the correlations between the tests, and derive a theoretical quantity which one can say that they are all more or less imperfect measurements of. Whether one can conclude that this variable is a physically existing thing or merely one of several ways of analysing the data is a matter of dispute.
···
If my suspicions about these matters are correct, there are very serious implications for psychology which I don't need to spell out. I predict enormous resistance even to finding out the truth of this matter, but what would be new about that?
--
Richard Kennaway, jrk@cmp.uea.ac.uk, Richard Kennaway
School of Computing Sciences,
University of East Anglia, Norwich NR4 7TJ, U.K.