how to predict percentile ranks from correlation coefficents

Bill_Powers1 · November 13, 2006, 3:44pm

Hi, all (now including CSGnet) --
[From Bill Powers (2006.11.013.0818 MST)]

In phone conversations, David Goldstein and I, and I'm sure Richard Kennaway though he didn't say so, came to agree that a test used for judging a person's condition can never be more accurate than the method used to validate the test. As an example, a paper-and-pencil test used to assess depression is validated by clinical examinations of the tested subject, so that each person's state of depression is judged by a panel of experts. The paper-and-pencil test cannot diagnose more accurately than the panel of experts can do.

The paper-and-pencil test is used as a substitute for clinical examinations -- to save time, money, or other costs, or simply because the validation stage is not usually available.

I'm still working on just what it is that we want to know about this way of diagnosing disorders. It seems to me that some of the discussions are getting away from this central problem. Here are the main considerations as I see them:

1. The degree X' of the hypothesized condition that is being tested: depression, ADHD, or any so-called "disorder."

2. The translation of this degree of disorder X' into answers on a paper-and-pencil test. which yields score X, with uncertainty Ux relative to X', and Sx over many subjects.

3. The validation score Y', with uncertainty Uy' relative to the actual degree of disorder X' and standard deviation Sy' over many subjects.

4. The projected value of the degree of disorder, Y, obtained as a function of the test score X.

5. The cost of an error in diagnosis in terms of degree of error, severity of consequences, and fraction of the tested population affected.

[The symbols are my own selections. I don't know if there are standard conventions for these things -- there probably are]

The two "uncertainties" pertain to the initial classification of the subject as having some specific disorder. Assuming that the disorder has some objective existence and nature, the instrument used for detecting the disorder will have some inaccuracy in the form of false positives and negatives. A score indicating a degree X of the disorder will therefore represent a range of possible degrees X1' ... X2' of the disorder. Presumably, this range will be smaller for the panel of experts doing clinical evaluations than it will be for a paper-and-pencil test, but the uncertainty will not be zero in either case. I haven't seen this uncertainty taken into account yet, but it would probably show up as a difference between test-retest variability and variability across a population.

The uncertainties create measurement errors which add either directly or indirectly to the size of the standard deviation in the population being measured. All we see, of course, are the standard deviations in the test results and the clinical evaluations. Any systematic misrepresentations of all subjects' actual state (such as diagnosing a condition which never actually exists) cannot, of course, be seen at all. Neither is it possible to tell whether a "disorder" represents a malfunction in the subject, or a completely healthy behavior used to deal with an abnormal environmental condition. For example, an authoritarian doctor who rubs a patient the wrong way may well classify the patient as have a "resistance to authority" disorder; the disorder exists, but not in the patient. It is in the doctor.

I'm getting away from the point, too. The main point is summarized in 5. above. It is not the possibility of intellectual errors, but the possibility of misdiagnosis and its consequences that concerns me. It's often said that having even a bad way of diagnosing problems is better than having no way. This is not true, however, if one person's gain is many people's loss, or if a benefit of being occasionally right is offset by a large cost of being wrong.

So it seems to me that the focus of a discussion about statistical methods of diagnosis ought to be on the likelihood and costs of misdiagnosis of the individual who is in front of us right now seeking help. It is the very same problem we have with the death penalty for murder. Is it ever worth killing an innocent person to create some undetermined amount of deterrent effect on real potential murderers? Is it worth treating ten people with a mind-altering drug that helps them if that means giving the drug to another 10 people who are, or were, perfectly normal? What does "First, do no harm" mean?

Best,

Bill P.

Martin_Taylor2 · November 13, 2006, 6:35pm

[Martin Taylor 2006.11.13.13.26]

Hi, all (now including CSGnet) --
[From Bill Powers (2006.11.013.0818 MST)]

In phone conversations, David Goldstein and I, and I'm sure Richard Kennaway though he didn't say so, came to agree. As an example, a paper-and-pencil test used to assess depression is validated by clinical examinations of the tested subject, so that each person's state of depression is judged by a panel of experts. The paper-and-pencil test cannot diagnose more accurately than the panel of experts can do.

From this and the rest of your message, it sounds as though you are asking about how to create a perception that best matches "real reality". You've got two means of perception (analogous to seeing and feeling, say), looking at some state that is presumed to exist in a real world.

Does such a state exist? Is it _defined_ by what the panel of experts says? If your eyes tell you something is there but your hand can pass right through where it seems to be, is your hand wrong? Are your eyes?

I'm wondering whether you are asking a valid question. I'd certainly disagree with "that a test used for judging a person's condition can never be more accurate than the method used to validate the test", unless the "method used to validate the test" itself defines the person's condition. If it doesn't define the condition, then there is presumed to be a "real" condition out there to be tested, and there's no reason to believe (a priori) that the "method used to validate the test" is any better at determining the "real" condition than is the new test. It's just older.

Cross-checking (or as my advisor called it, "triangulation" is often a better way of assessing what may be "out there" than is any single way of looking at it. By that, I mean that triangulation is more likely to let you produce reliable results when you act to alter your perception of the "really real" state. If you can see it and feel a shape that matches what you see, that's a lot more reliable than saying "seeing is believing".

So, statistically, and philosophically, are you asking the right question?

Martin

Bill_Powers1 · November 13, 2006, 9:06pm

[From Bill Powers (2006.11.13.1350 MST)]

Martin Taylor 2006.11.13.13.26 --

I'm wondering whether you are asking a valid question. I'd certainly disagree with "that a test used for judging a person's condition can never be more accurate than the method used to validate the test", unless the "method used to validate the test" itself defines the person's condition. If it doesn't define the condition, then there is presumed to be a "real" condition out there to be tested, and there's no reason to believe (a priori) that the "method used to validate the test" is any better at determining the "real" condition than is the new test. It's just older.

Hmm. is that right? I guess my assumption is that the validation method is based on a more thorough "specimens" method that deals with each individual, whereas the test being validated is more indirect and is based on a mass measure. If the idea is to save time and money by administering a simple written test rather than doing the full clinical workup, I think it still holds that the full clinical workup is more likely to find out what is wrong, and that the written test can be evaluated by comparing its results with results of the more stringent examination (as is actually done).

Cross-checking (or as my advisor called it, "triangulation" is often a better way of assessing what may be "out there" than is any single way of looking at it.

Actually, isn't triangulation the ONLY way? Simply doing the same test again would not seem very useful, epistemologically.

So, statistically, and philosophically, are you asking the right question?

Hard to say. I'm trying.

Best,

Bill P.