statistical predictions

{From Rick Marken (2007.08.05.2215)]

Bill Powers (2007.08.05.1320 MDT)]

Since we're predicting only whether the IM changes in the same direction as
the predicted change, the chance level would be 50% accuracy. So this looks
pretty much like what I expected. The difference, however, goes only to 10
places. Can we take it on up to half the span? Of course N will get pretty
small for the higher values of D -- maybe just go up to 100. For the largest
values of D, the accuracy ought to get much closer to 100%.

Right again. Here's the latest:

D IncD Acc NC N

···

--------------------------------------------------
1 0.02 0.38 45 120
9 0.13 0.67 75 112
17 0.25 0.74 77 104
25 0.37 0.81 78 96
33 0.49 0.88 77 88
41 0.60 0.99 79 80
49 0.71 0.96 69 72
57 0.83 0.97 62 64
65 0.94 1.00 56 56
73 1.06 1.00 48 48

Best

Rick
--
Richard S. Marken PhD
Lecturer in Psychology
UCLA
rsmarken@gmail.com

[From Bill Powers (2007.08.06.0630 MDT)]

Rick Marken (2007.08.05.2215) –

Great. Now let’s see what can be said about predictions for groups versus
predictions for individuals at 0.9 correlation between predictor and
predicted variables. We will probably find that Richard Kennaway has
already said what we’re finding, but let’s go ahead and find it
anyway.

At 0.9 correlation, it turns out that the sign of the change can be
predicted correctly 75% of the time when we divide the data into 122/9 or
13 groups, and 80% of the time when we use 5 groups (such as totally
unlike, somewhat unlike, don’t know, somewhat like, and exactly like). If
we use just two groups, the number of correct predictions goes up to
somewhere between 97% and 100%. This reflects what Phil Runkel said about
“fine slicing” – the uncertainties go up as the number of
slices increases. Kennaway said that at 0.9 correlation, the probability
of correct prediction of sign was about 86%, but I don’t know how that
relates to the numbers here. It looks similar.

A better estimate might come from aggregating the data into groups,
taking averages, and then looking at the predictions.

The log difference in income for two levels of resolution interpolates to
0.885. This says that if one country has an income 7.7 times that of
another country. the regression line predicting infant mortality is
almost certain to predict lower mortality for the first country (between
97% and 100% accuracy).

The sign can be estimated correctly 80% of the time when the predictor is
divided into 5 levels. The chances of guessing correctly three times in a
row are 51%, and five times in a row, 35%. Or to use a context I brought
up a long time ago, if we have five facts of this same quality, and a
deduction depends on all five of them, the chances of the deduction being
factually correct are about 1 in 3. Since scientific deductions usually
depend on a lot more than five facts being true at once, our 0.9
correlation would not very useful for reaching complex logical
conclusions.

All these are very rough calculations and could be done better.

Would it be possible to generate artificial data sets with known
correlations and do this procedure for various levels of correlation? Can
you try this with Martin’s data sets to see what happens? I suppose
you’re getting bored with all this nitpicking of details, but I think
it’s interesting. This is sort of like checking your long division by
multiplying the divisor by the quotient to see how close you come to the
original dividend. We’re taking the regression line we get from a
statistical analysis and seeing how well it does in predicting individual
items from the original data set. We can actually count the number of
wrong predictions for each way of interpreting the results. This probably
gives us an approximate upper limit on the accuracy we can expect when we
use the same regression line and method of prediction to predict items
from new data sets.

I know that this is a long way from real grown-up statistical analysis,
but it seems to tell me something I want to know.

Best.

Bill P.

[From Rick Marken (2007.08.06.1230)]

Bill Powers (2007.08.06.0630 MDT)

Great. Now let's see what can be said about predictions for groups versus
predictions for individuals at 0.9 correlation between predictor and
predicted variables.

OK.

Would it be possible to generate artificial data sets with known
correlations and do this procedure for various levels of correlation?

Sure.

We're taking the regression line we get from a statistical
analysis and seeing how well it does in predicting individual items from the
original data set. We can actually count the number of wrong
predictions for each way of interpreting the results.

That's what I did in my earlier analysis of misclassifications based
on the regression prediction for various levels of correlation. Maybe
we can go over that -- once I've finished re-writing my paper.

Best

Rick

···

--
Richard S. Marken PhD
Lecturer in Psychology
UCLA
rsmarken@gmail.com