[From Rick Marken (2007.07.23.0850)]
I have attached a plot of the results of my tests to determine the
relationship between regression and misclassification at the group
level. What the graph shows is that even when the correlation between
predictor and criterion variable is between 0 and 0.1 you still make
fewer misclassifications in the long run using regression prediction
(the square points labeled Regression) than you would by just flipping
a coin (the triangular points labeled Random). These results were
obtained by Monte Carlo simulation. I'll try to give a quick summary
of what I did. I'm willing to send the spreadsheet to whoever wants it
but it is _not_ self explanatory.
I created a set of 30 random predictors between 0 and 800 and 30
associated criterion scores, which were proportional to the predictors
but with an error term added. The average size of the error term added
to the criterion variables varied on each run of the simulation so
that the correlation between predictor and criterion scores vary from
0 to 1.0 (absolute). On each iteration of the program I did a
regression on the 30 scores and derived 30 Y' scores for 30 "new"
individuals who had the same predictor scores but new criterion
scores. So the Y' scores are predictors of the actual scores of the 30
new individuals. I then categorized the new individuals as "Success"
or "Failure" based on their Actual and Y' scores by setting a cutoff
that was in the middle of the range of the predicted scores and saying
that those above the cutoff were a "Success" and a "Failure"
otherwise. I also randomly classified the scores as "Success" or
"Failure" based on a random "coin flip". I then determined the number
of misclassifications by comparing the Y' and Random based
classifications to the actual score based classification. This was
done on each of 20000 iterations. Each iteration yielded a different
correlation between predictors and criterion scores. The graph shows
the average proportion of misclassifications based on Y' (Regression)
and Random coin flip as a function of the _absolute_ size of the
correlation (some correlations were slightly negative).
These results show that regression based prediction reduces
misclassifications at the group level relative to random guessing as
long as the correlation between predictor and criterion variable is
0. Using regression does _not_ make things worse relative to random
selection, even when the correlation is zero.
This is rather comforting and is consistent with my intuitions. Using
regression analysis _does_ improve selection at the group level
relative to random guessing; Martin is right to say that some
information -- even of very low quality -- is better than none. So I
think this shows that regression is a reasonable tool for policy
(group level) research, even though it tells you absolutely nothing
about the individuals in the group.
Best regards
Rick
···
---
Richard S. Marken PhD
Lecturer in Psychology
UCLA
rsmarken@gmail.com
Content-Type: image/jpeg; name=Regression.JPG
X-Attachment-Id: f_f4h2pp04