[spam] Statistics again

Bill_Powers1 · November 5, 2006, 8:10pm

Hi, Richard -- (copies to Goldstein and CSGnet)

I feel a great reluctance to use the kinds of tests that psychologists have devised over the years, tests that David Goldstein feels might be useful in evaluating outcomes of therapy. I don't want to be dogmatic about this, so I'm looking (still) for some consensual way to evaluate this approach.

Here is a way of putting the problem I see:

Given: two tests, one measuring performance in some task and the other purporting to show whether a person has certain characteristics or traits. The question is, what are the chances that the first test will correctly indicate whether the person has a given characteristic as shown by the second test?

This implies that there are characteristics (like weight) that a person has or doesn't have regardless of our ability to measure them with the second test. The second test, we can say, produces results that are disturbed by unknown factors which cause the measurements to vary with some standard deviation. We can stipulate that without the disturbances, the test would measure accurately.

We can also stipulate that the first test is similar: if it were not for unknown randomly varying factors, the first test would always yield the same (correct) measure of performance. I know this assumption is unwarranted, but let's make it anyway since it seems to be a popular one.

Under these assumptions, the question now concerns the translation of a correlation between the first test and the second (given knowledge of the standard deviations) into a probability that the person actually has the characteristic in question. I suppose this can be cast in terms of the proportions of false positives or false negatives. I introduce the correlation topic because that is the figure that is usually obtainable from articles in the literature.

I know that you've already analyzed the case for screening people for traits or predicted performance on the basis of tests. I'm sure that paper is on my computer or in an archive somewhere -- I haven't found it yet. But maybe I'm asking a slightly different question now.

The context of my question is evaluation of therapy outcomes. This gives the question two meanings.

1. If one gives a patient a test before therapy and it indicates that the person has a condition like depression, what are the chances that the person actually has that disorder, under the assumptions above (i.e., there is actually a condition called depression that the person has or doesn't have, and so on).

2. If a before-and-after test is given, what are the chances that the condition actually changed in the direction implied by the two tests? I suppose that has to be put in the form of the probability that there is a finding of change when in reality there was a change in the opposite direction, or no change.

If the standard deviations were close to zero, these questions would answer themselves: the correct result would be the one we measure. But with correlations less than 1.0, the chances of a favorable result become less. My suspicion is that in the range of correlations usually found in the psychological literature, the favorable answer has less than a 50% probability. For example, suppose the probability is 70 percent that a person who shows as depressed on a performance test is actually depressed. And suppose the probability that a test which measures depression by other means has a 70 percent probability of being correct. In that case, the chance that person who is described as depressive by the first test will measure as depressive on tghe second is 50% -- a coin toss would do as well as this combination of tests. And, of course, if antidepressants are prescribed, half of the time the main effects on the patient will be the side-effects.

If my suspicions about these matters are correct, there are very serious implications for psychology which I don't need to spell out. I predict enormous resistance even to finding out the truth of this matter, but what would be new about that?

Best.

Bill P.

Bill_Powers1 · November 6, 2006, 11:04pm

[From Bill Powers (2006.11.06.1600 MST)]

David Goldstein can' send to CSGnet from work so I include a .txt version of the .doc file he sent me. I've set off his comments so you can tell them from mine.

Subject:
         Statistics again
    Date:
         Sun, 05 Nov 2006 13:11:05 -0700
    From:
         Bill Powers <powers_w@frontier.net>
      To:
         jrk@SYS.UEA.AC.UK, davidmg@verizon.net, David.Goldstein@dcf.state.nj.us
     CC:
         CSGNET@listserv.uiuc.edu

Hi, Richard -- (copies to Goldstein and CSGnet)

I feel a great reluctance to use the kinds of tests that
psychologists have devised over the years, tests that David Goldstein
feels might be useful in evaluating outcomes of therapy.

We do need someway of measuring whether or not MOL Therapy results in a change in a person which the person views as positive or desired. The purpose of therapy is to help a person reduce unwanted symptoms, improve functioning level in areas of life which need work and perhaps, to make changes in a person which helps the person move in a direction of being healthier.

I don't want
to be dogmatic about this, so I'm looking (still) for some consensual
way to evaluate this approach.

Here is a way of putting the problem I see:

Given: two tests, one measuring performance in some task

What do you mean here? We give a person a pursuit tracking task, for example?
Why are you limiting yourself to performance measures?

and the other purporting to show whether a person has certain characteristics
or traits.

Some measure of outcome would be what is wanted. If the person is depressed, then some measure of depression would be given such as the Beck Depressiion Inventory., the SCL-90R, or the MCMI-III.

The question is, what are the chances that the first test
will correctly indicate whether the person has a given characteristic
as shown by the second test?

If the two tests are correlated, then knowing a person's percentile rank on the first test will predict the person's percentile rank on the second test with an accuracy that is a function of the magnitude of the correlation coefficient.

If the frist test does this well, then the first test is said to have predictive validity.

This implies that there are characteristics (like weight) that a
person has or doesn't have regardless of our ability to measure them
with the second test. The second test, we can say, produces results
that are disturbed by unknown factors which cause the measurements to
vary with some standard deviation. We can stipulate that without the
disturbances, the test would measure accurately.

There is error of measurement in all tests.

We can also stipulate that the first test is similar: if it were not
for unknown randomly varying factors, the first test would always
yield the same (correct) measure of performance. I know this
assumption is unwarranted, but let's make it anyway since it seems to
be a popular one.

Doesn't it depend on what you are measuring? Even with something physical like height or weight, there is variability due to factors which have an impact on the variable, like how much we eat and how much time goes by before we measure again.

Under these assumptions, the question now concerns the translation of
a correlation between the first test and the second (given knowledge
of the standard deviations) into a probability that the person
actually has the characteristic in question.

This makes sense if we are talking about a binary variable with a psychogically meaningful zero point. The person has zero depression or the person has some depression. But can't we make finer distinctions?

I suppose this can be cast in terms of the proportions of false positives or false negatives.

Again, you seem to thinking of predicting a binary variable with a zero point.

I introduce the correlation topic because that is the
figure that is usually obtainable from articles in the literature.

I know that you've already analyzed the case for screening people for
traits or predicted performance on the basis of tests. I'm sure that
paper is on my computer or in an archive somewhere -- I haven't found
it yet. But maybe I'm asking a slightly different question now.

The context of my question is evaluation of therapy outcomes. This
gives the question two meanings.

1. If one gives a patient a test before therapy and it indicates that
the person has a condition like depression, what are the chances that
the person actually has that disorder, under the assumptions above
(i.e., there is actually a condition called depression that the
person has or doesn't have, and so on).

The test maker has to show that the test has validity. The criterion would have to be the judgement of people who assessed the person. For example, the test predicts that 10 of 10 people would say that the person is depressed. Or the test predicts that 8 out of 10 people would say the person is depressed. Or the test predicts that 0 out of 10 people would say the person is dpressed.

2. If a before-and-after test is given, what are the chances that the
condition actually changed in the direction implied by the two tests?
I suppose that has to be put in the form of the probability that
there is a finding of change when in reality there was a change in
the opposite direction, or no change.

The correlation coefficient determines the accuracy of prediction. The closer the correlation coefficient is to one, the more accurate the prediction.

A test correlates the highest with itself. In Psychology, the test/retest correlations of the IQ tests are in the .90 plus range.

If the standard deviations were close to zero, these questions would
answer themselves: the correct result would be the one we measure.

Every test will have error due to random factors plus systematic factors that vary with time of testing. Psychologists try to give the test under 'standard' conditions to minimize the systematic changes.

But with correlations less than 1.0, the chances of a favorable
result become less. My suspicion is that in the range of correlations
usually found in the psychological literature, the favorable answer
has less than a 50% probability. For example, suppose the probability
is 70 percent that a person who shows as depressed on a performance
test is actually depressed. .

I don't understand this. If we were measuring height of the person, would we talk about the probability that the person is actually such and such tall. The test has 'face validity' for measuring height. A lot of work would have to be done before a test was considered to be validly measuring the depression of a person.

And suppose the probability that a test
which measures depression by other means has a 70 percent probability
of being correct. In that case, the chance that person who is
described as depressive by the first test will measure as depressive
on tghe second is 50% -- a coin toss would do as well as this
combination of tests.

You are thinking of two independent events A and B. The probability that they will both occur is P(A) times P(B).

If the two tests are given to the same person by the same tester then the independence of the events are called into question. Also, if the person giving the test and the person taking the test are not 'blind' to the purpose of the test then this could affect the result. Life is not simple.

And, of course, if antidepressants are
prescribed, half of the time the main effects on the patient will be
the side-effects.

If my suspicions about these matters are correct, there are very
serious implications for psychology which I don't need to spell out.
I predict enormous resistance even to finding out the truth of this
matter, but what would be new about that?

Best.

Bill P.