statistical predictions

[From Rick Marken (2007.08.03.1220)]

Jeff Vancouver (2007.08.03.0915)--

As I see it, you are missing the point. Above you acknowledge that in the
extreme case (perfect correlation), the group data is no different than the
individual data. Yet here you seem to suggest that once that effect loses
perfection, the group and individual levels are completely separate.

I wasn't clear. I think group data, even when there is a perfect
correlation, tells you nothing about the nature of the organism. This
is because, if the system is closed loop, IV-DV correlations, even
when obtained experimentally, are related to the inverse of the
feedback function connecting IV and DV to controlled variable, not to
characteristics of the organism itself. But I agree that when there is
a very high (>.99) correlation between IV and DV you will be able to
predict an individual's behavior quite accurately.

Even the test for the control variable is subject to the limitations I am
describing.

What limitations are these? The one thing the TCV gives that no group
level analysis could give is a reasonably good picture of the variable
a system is controlling, assuming that the system under study is a
control system. Variations in the reference for a CV will create
problems but there are ways of handling even that through modeling.

A group that is given chicken soup recovers from a cold more quickly (or
with less symptoms) than a group not given chicken soup. In a follow-up
study, a group given hot liquids recovers better than a group given chicken
bouillon. We have learned that the mechanism by which chicken soup works
appears to have more to do with the soup than the chicken.

I say we have learned nothing about the mechanism at all. You could
get this result because, for some unknown reason, chicken soups works
on some people but has no effect on others. There may be a mechanism
that depends on chicken soup in some people and not in others. Or
there may be a mechanism that sometimes works with chicken soup and
sometimes doesn't. This is an experimental version of Bill's
demonstration, where he shows that the group level data shows a
positive relationship between reward and effort when, in fact, for
every individual, effort is negatively related to reward.

But your "chicken soup" experiment makes me want to do an experimental
version of Bill's reward-effort demo (which was correlational) to see
how group level IV-DV results look when the IV has a different
relationship to the DV for each individual because each is controlling
a different variable and those controlling the same variable are
controlling it at different levels. This is something I should have
done a long time ago: determine how do group level experimental
results relate to the actual characteristics of the individual control
systems in a group. Thanks for goading me into doing this; this should
really be what my "Revolution" paper is about, since psychologists
rarely ever do individual level research anyway.

I doubt anyone can win this argument. If the study was "done properly" it
was based on analysis of how the individual might work; if not, it was group
level. Tautology alert.

I don't understand. There is no tautology. Group level experiments are
done on groups; they report average results. I believe (and I hope to
show with a modeling demonstration) that such experiments tell you
virtually nothing about the nature of the individuals studied if those
individuals are control systems.

> The problem of using group data to test a model of individual behavior
> is nicely illustrated in Powers' paper in the Perceptual Control
> Theory issue of the _American Behavioral Scientist_. I highly
> recommend it to anyone doing conventional research!!

Yes, and these are very useful illustrations. But there are also
illustrations of cases where individual data is misinterpreted.

No. Bill shows how the group level data misrepresents what is actually
happening at the individual level. There is no way to use the group
data to determine what is going on at the individual level. You know
what is going in at the individual level in Bill's demo because you
can see the type of individuals Bill created (individuals that control
input so that the effort they put out is inversely related to the
effect of reward on controlled input).

For instance, if
one is studying a case where the reference level is changing, or the cues
that might be used by a hypothesized input function (i.e., hypothesized
controlled variable) are not directly measurable, the TCV loses much of its
value.

Look at my "TCV"
demo:http://www.mindreadings.com/ControlDemo/ThreeTrack.html. There
the computer does the TCV successfully to determine which of the three
squares is being moved even though you are varying the reference for
the controlled square. There is no way to determine what variables
people are controlling other than by testing, on an individual basis,
to determine what variables they are controlling.

Thanks for the idea about determining what group experiments might
tell us about the functional characteristics of the individuals in the
groups. I'll start working on that and let you know what comes out of
it. More fun with spreadsheets;-)

Best

Rick

···

--
Richard S. Marken PhD
Lecturer in Psychology
UCLA
rsmarken@gmail.com

[from Gary Cziko 2007.08.03 15:18 CDT]

···

On 8/3/07, Richard Marken rsmarken@gmail.com wrote:

[From Rick Marken (2007.08.03.1220)]

But I agree that when there is

a very high (>.99) correlation between IV and DV you will be able to
predict an individual’s behavior quite accurately.

I think it is important to point out that this prediction is possible only if if the relationship between the “IV” (disturbance) and controlled variable does not change. You can predict the controller’s end of the rubber band based on what the disturber does with his end, but this will not work if you add another disturber to the situation with another rubber band attached to the knot.

Discovering the controlled variable allows you to predict behavior (or more importantly, its perceptual consequences) no matter what the disturbance is.

–Gary

[From Rick Marken (2007.08.03.1445)]

Gary Cziko (2007.08.03 15:18 CDT) --

> Rick Marken (2007.08.03.1220)--

> But I agree that when there is
> a very high (>.99) correlation between IV and DV you will be able to
> predict an individual's behavior quite accurately.

I think it is important to point out that this prediction is possible only
if if the relationship between the "IV" (disturbance) and controlled
variable does not change. You can predict the controller's end of the
rubber band based on what the disturber does with his end, but this will
not work if you add another disturber to the situation with another rubber
band attached to the knot.

Discovering the controlled variable allows you to predict behavior (or more
importantly, its perceptual consequences) no matter what the disturbance
is.

Excellent point. Worth repeating;-)

Best

Rick

···

--
Richard S. Marken PhD
Lecturer in Psychology
UCLA
rsmarken@gmail.com

I wasn’t clear. I think group
data, even when there is a perfect

correlation, tells you nothing about the nature of the organism.
This

is because, if the system is closed loop, IV-DV correlations, even

when obtained experimentally, are related to the inverse of the

feedback function connecting IV and DV to controlled variable, not
to

characteristics of the organism itself.
[From Bill Powers (2007.08.03.1050 MDT)]

Rick Marken (2007.08.03.1220) –

Isn’t that the case only when the independent variable is a disturbance
of a controlled quantity and the dependent variable is the action
involved in controlling it? It seems to me there are many other choises.
We could make the independent variable the gain in the environmental
feedback function and the DV the error in a tracking task. The IV could
be the ratio in a fixed-ratio operant conditioning experiment, while the
DV is the rate of bar-pressing. Also, the DV could be something like the
mean error in a tracking task, while the IV is the number of minutes one
has been doing the task without resting. In fact, just about anything you
choose to vary about the experiment, such as the time of day you carry it
out, could be the independent variable, while the dependent variable is
any aspect of the behavior you want to measure. In my “experiment
with purpose” paper, the IV was a staircase pattern of reference
signal settings that the subject was asked to produce during the middle
third of the experiment, and the DV was the behavior of the reference
signal deduced by applying the model in reverse.

I really don’t think you can use IV as a synonym for stimulus and DV as a
synonyn for response.

Best,

Bill P.

[From Rick Marken (2007.08.03.2320)]

Bill Powers (2007.08.03.1050 MDT) --

> Rick Marken (2007.08.03.1220) --

>I wasn't clear. I think group data, even when there is a perfect
>correlation, tells you nothing about the nature of the organism. This
> is because, if the system is closed loop, IV-DV correlations, even
> when obtained experimentally, are related to the inverse of the
> feedback function connecting IV and DV to controlled variable, not to
> characteristics of the organism itself.

Isn't that the case only when the independent variable is a disturbance of
a controlled quantity and the dependent variable is the action involved in
controlling it?

Yes. I have to make this point in my paper. Thanks!

I really don't think you can use IV as a synonym for stimulus and DV as a
synonyn for response.

Sorry. Habit borne of spending too much time teaching research
methods, where IV is used virtually as a synonym for stimulus of cause
and DV as a synonym for response. But, of course, there are many
experiments where, even though the IV is conceived of as stimulus and
Dv is conceived of as response, the IV is not necessarily a
disturbance to a controlled variable and the DV is not necessarily the
action involved in controlling it.

So what, if anything, is wrong with using group data as a basis for
studying individuals?

Best

Rick

···

--
Richard S. Marken PhD
Lecturer in Psychology
UCLA
rsmarken@gmail.com

So what, if anything, is wrong
with using group data as a basis for

studying individuals?
[From Bill Powers (2007.08.04.0315 MDT)]
Rick Marken (2007.08.03.2320) –
Funny you should ask – I’ve been trying to assemble some thoughts about
that.
Actually, group data is about individuals. You test a lot of
individuals by applying the same manipulation (as nearly as you can) to
all of them, and measure something about them that changes. But then, for
group data, you do something extra: you try to find a general
relationship that applies to everybody. The theory seems to be that
everyone in the group really responds to the treatment the same way, but
the response is masked by random variations which are different in
different people. Statistics allows you to fit a straight line or a curve
to the data, revealing the alleged underlying relationship. Of course
such a relationship will be found whether the individuals are really the
same or really different.

It’s the same principle as using Sprague-Dawley rats. We are all of the
same species, and so it’s assumed we should have the same underlying
characteristics. Measuring many individuals of the same species is
considered conceptually the same as measuring one individual many times,
using the principle that random fluctuations should average out over many
measurements, improving the signal-to-noise ratio. That assumption has
some drawbacks, one being that it’s probably not true. Nobody really
thinks it’s true of randomly-selected individuals, however, so usually
there is an attempt to sort them in terms of shared characteristics. That
is supposed to make the assumption more true.

Of course after a set of individuals has been treated and measured, we do
have individual data for all of them, albeit only one measurement each.
We can use that data to assess how well the curves we fit to the
measurements would work as a predictor of that same data for various
purposes.

Consider your data on infant mortality versus national income, and my
log-log curve fit which is pretty good and gives a correlation of close
to 0.9. Suppose we wanted to use this data to see how well the curve
allows us to predict infant mortality from national income. For example,
we might ask how accurately pair-wise comparisons on the basis of income
would allow us to predict whether infant mortality would be greater in
one country than in another.

We have the actual data for each country, and the regression equation.
Computers do not boggle at big complex calculations, so we can actually
go through the data and compare each country with each other country,
simply counting the cases in which the comparison of incomes predicts the
relative infant mortality to be in the right direction or the wrong
direction, and by how much. This does not involve any assumptions about
distributions of random effects, representativeness of a sample, or any
of those theoretical statistical concepts.

We can also do this for different degrees of differentness in income. If
we compare countries that differ in income by some minimum amount, we can
obtain this information as a function of the amount of income difference.
Presumably, the larger the differences we use, the more accurate the
prediction will be. There will be fewer comparisons, but we can still
find out how accurately the regression line predicts the infant mortality
differences. Sorting the data by income will make this easy.

I don’t know how to use the “for” instruction and macros or I’d
do this myself. Do you have time to try it?

I’m pretty sure that the eyeball impression one obtains by looking at the
plot is really using all the data points to create that impression, so
the result looks much more impressive than it is. But let’s see.

Best,

Bill P.

Re: statistical predictions
[Martin Taylor 2007.08.04.10.23]

I’m about to shut off the computer before going away for a week,
so this is just a couple of quick comments…

[From Bill Powers (2007.08.04.0315
MDT)]

Rick Marken (2007.08.03.2320) –

So what, if anything, is wrong with using
group data as a basis for

studying individuals?

Funny you should ask – I’ve been trying to assemble some thoughts
about that.
Actually, group data is about individuals. You test a lot of
individuals by applying the same manipulation (as nearly as you can)
to all of them, and measure something about them that changes. But
then, for group data, you do something extra: you try to find a
general relationship that applies to everybody. The theory seems to be
that everyone in the group really responds to the treatment the same
way, but the response is masked by random variations which are
different in different people. Statistics allows you to fit a straight
line or a curve to the data, revealing the alleged underlying
relationship. Of course such a relationship will be found whether the
individuals are really the same or really different.

It’s the same principle as using Sprague-Dawley rats. We are all of
the same species, and so it’s assumed we should have the same
underlying characteristics.

Remember that in a LOT of cases of drug testing (maybe all?), the
drugs are tested on DIFFERENT species before being tested on humans.
The assumption is that in important respects, the mechanisms are the
same in rats, or guinea pigs, or pigs, or chimps, as they are in
humans. That isn’t always true, but it’s true often enough that the
inter-species differences don’t make the animal testing meaningless
for humans; it’s false often enough to make the human trials
necessary. The whole thing starts with an assumption of the existence
of mechanisms that are likely to be the same within members of a
species if it’s the same across different species. But that’s always
only “likely”, not assured.

We have the actual data for each country,
and the regression equation. Computers do not boggle at big complex
calculations, so we can actually go through the data and compare each
country with each other country, simply counting the cases in which
the comparison of incomes predicts the relative infant mortality to be
in the right direction or the wrong direction, and by how much. This
does not involve any assumptions about distributions of random
effects, representativeness of a sample, or any of those theoretical
statistical concepts.

We can also do this for different degrees of differentness in income.
If we compare countries that differ in income by some minimum amount,
we can obtain this information as a function of the amount of income
difference. Presumably, the larger the differences we use, the more
accurate the prediction will be. There will be fewer comparisons, but
we can still find out how accurately the regression line predicts the
infant mortality differences. Sorting the data by income will make
this easy.

I don’t know how to use the “for” instruction and macros or
I’d do this myself. Do you have time to try it?

Why not do it with the much larger spreadsheet of CIA and WHO
data I distributed, that has 220 countries and includes a range of
other kinds of data, many of which correlate highly with each other? I
made quite a few scatter-plots of different relationships that you
could use for your test. You would be able to use real-life values for
correlations from 0.969 (log infant mortality vs median age) down to
0.606 (life span after age 1 vs log GDP) using only the pairwise sets
I checked, but there’s a much wider range, I imagine, among pairs I
didn’t check because I didn’t expect them to be related.

Bye for a week or ten days.

Martin

I just chanced across this discussion of a fallacy of probabilistic argument, which bears on the present discussion.

http://www.sfu.ca/philosophy/swartz/modal_fallacy.htm

The relevant section is entitled "Parallel fallacy in inductive logic".

···

--
Richard Kennaway, jrk@cmp.uea.ac.uk, http://www.cmp.uea.ac.uk/~jrk/
School of Computing Sciences,
University of East Anglia, Norwich NR4 7TJ, U.K.

I just chanced across this
discussion of a fallacy of probabilistic argument, which bears on the
present discussion.


http://www.sfu.ca/philosophy/swartz/modal_fallacy.htm

The relevant section is entitled “Parallel fallacy in inductive
logic”.
[From Bill Powers (2007.08.04.0930 <DT)]

My new DSL modem came and I am functioning again, at least in that
respect. I will not try to catch up on all the interesting posts – by
the time I did that the backlog would be even larger. Just a few
scattered comments.

Richard Kennaway (2007.08.04) –

I think this would be more interesting to a mathematician than it can be
to me, since I don’t understand the shorthand. But one thing stands out:
to say that an argument is invalid says nothing about its application to
experiences. “Invalid” means only that it doesn’t follow the
agreed-on rules correctly. And as the author says in the footnotes, a lot
depends on how you translate from ordinary language into the symbolism of
the mathematical system. As someone once said, approximately, it all
depends on what you mean by “is.”

RE your earlier post; I agree that your analysis probably gives the
answers I’m looking for, and that the answer depends on the context. I
will continue to pursue that angle. Right now I’m looking for a simple
way to make a judgment about how useful a regression line is for
prediction. Maybe yours is simplest, so that’s where I will end
up.

Best,

Bill P.

[From Rick Marken (2007.08.04.0955)]

Bill Powers (2007.08.04.0315 MDT)--

Consider your data on infant mortality versus national income, and my
log-log curve fit which is pretty good and gives a correlation of close to
0.9. Suppose we wanted to use this data to see how well the curve allows us
to predict infant mortality from national income. For example, we might ask
how accurately pair-wise comparisons on the basis of income would allow us
to predict whether infant mortality would be greater in one country than in
another.

We have the actual data for each country, and the regression equation.
Computers do not boggle at big complex calculations, so we can actually go
through the data and compare each country with each other country, simply
counting the cases in which the comparison of incomes predicts the relative
infant mortality to be in the right direction or the wrong direction, and by
how much. This does not involve any assumptions about distributions of
random effects, representativeness of a sample, or any of those theoretical
statistical concepts.

So you want to compare each country with every other country in terms
of their actual log IM and their predicted log IM? And count the
number of predictions that are in the right direction? And then
measure accuracy in terms of the proportion of correct predictions
divided by the total number of comparisons (which I think is about
N!)?

We can also do this for different degrees of differentness in income. If we
compare countries that differ in income by some minimum amount, we can
obtain this information as a function of the amount of income difference.
Presumably, the larger the differences we use, the more accurate the
prediction will be. There will be fewer comparisons, but we can still find
out how accurately the regression line predicts the infant mortality
differences. Sorting the data by income will make this easy.

I don't know how to use the "for" instruction and macros or I'd do this
myself. Do you have time to try it?

Sure.

I'm pretty sure that the eyeball impression one obtains by looking at the
plot is really using all the data points to create that impression, so the
result looks much more impressive than it is. But let's see.

But I don't quite see what this has to do with using group data to
assess individual functional characteristics.

Best

Rick

···

--
Richard S. Marken PhD
Lecturer in Psychology
UCLA
rsmarken@gmail.com

[From Rick Marken (2007.08.04.1035)]

Rick Marken (2007.08.04.0955)

> Bill Powers (2007.08.04.0315 MDT)--

> For example, we might ask
> how accurately pair-wise comparisons on the basis of income would allow us
> to predict whether infant mortality would be greater in one country than in
> another.

OK. I wrote the program for the Log IM vs Log Income prediction. There
were 120 countries. The proportion of correct pairwise comparisons
between countries based on the Y' scores was .85. That's 6200 correct
comparisons out of a total of 7260 pairwise comparisons. Since the R^2
for this regression is .81, the pairwise analysis makes things look
even better that they do when you look at it in term of proportion of
variance in Y scores accounted for by the prediction based on X
scores. The Y' value gives you an 85% chance of being right when
comparing the predicted infant mortality of one country to another
based on the log of its income.

> We can also do this for different degrees of differentness in income.

I'll do that now if you like but I still don't know what this is
telling us about the merits of using group data to study how
individual work.

Best

Rick

···

--
Richard S. Marken PhD
Lecturer in Psychology
UCLA
rsmarken@gmail.com

[From Rick Marken (2007.08.04.1110)]

Bill Powers (2007.08.04.0315 MDT)--

We can also do this for different degrees of differentness in income.

OK. I did this one too. If you compare only the predicted log IM
scores (Y') for countries with a difference in log income > .9 then
the pairwise prediction accuracy is .996 (1746 correct comparisons out
of 1753 total comparisons). You can get it to 100% (for this data set)
if you require a difference in log income of 1.2 (765 correct
comparisons out of 765).

I guess I can't avoid it any more; I'm off to try to rewrite the
damned "Revolution" paper. I think I'm almost there. This paper is
starting to seem like it might be a boojum rather than a snark;-)

Best

Rick

···

--
Richard S. Marken PhD
Lecturer in Psychology
UCLA
rsmarken@gmail.com

[From Bill Powers (2007.08.04.1145 MDT)]

Rick Marken (2007.08.04.1035) –

OK. I wrote the
program for the Log IM vs Log Income prediction. There

were 120 countries. The proportion of correct pairwise comparisons

between countries based on the Y’ scores was .85. That’s 6200
correct

comparisons out of a total of 7260 pairwise comparisons. Since the
R^2

for this regression is .81, the pairwise analysis makes things look

even better that they do when you look at it in term of proportion
of

variance in Y scores accounted for by the prediction based on X

scores. The Y’ value gives you an 85% chance of being right when

comparing the predicted infant mortality of one country to another

based on the log of its income.

Ah, a surprise. I see I didn’t define the analysis properly – it
includes all differences in income from small to large. But it’s
interesting to see that 15% of the predictions will still be
wrong.

What I meant to do was look at minimum income differences first, then
increase the difference. The high accuracy should come when the
differences are larger. So the prediction should go [n, n+1], [n+1,n+2]
and so forth, then [n,n+2], [n+1,n+3],[n+2,n+4] etc. For the smallest
income differences, the accuracy should be low, and it should get larger
as the income difference gets larger.

I’ll do that now if
you like but I still don’t know what this is

telling us about the merits of using group data to study how

individual work.

The idea is to look at how useful the regression line is for representing
the data it came from. The appearance of the data plot I generated is
that the predicted values march right up the middle of the data, with the
data falling well within the probable error limits. But in fact there are
many pairs of points that define lines of the opposite slope, until the
results from well-separated income levels are compared. The resolution of
the plot is much lower than it looks to the eye. To get a 90% probabilty
that the higher income level will go with a lower infant mortality, the
income levels must be separated by a substantial fraction of the total
range – the object here is to find out what fraction.

The nation with the lower income is predicted to have higher infant
mortality than the nation with the higher income. As I noted before,
using the regression line moves the United States from its actual 25th
position in infant mortality to a predicted 3rd position.

This is actually the basis on which we suspect that the United State is
not using its income for health care as wisely as some other nations are
using theirs. We look at the group statistics and compare them with the
individual numbers. Overall, it seems that more income means lower infant
mortality, which makes some sense. Any nation that does markedly better
or worse than the prediction invites examination to find out why. This is
more or less along the lines Martin Taylor looks at as a legitimate use
of statistics in scientific explorations.

Whenever group statistics is used as the basis for evaluating or treating
individuals, the question arises as to what percentage are misevaluated
or given the wrong treatment. This has to be tempered by asking how
important to the individual and to society the mistake is, and by what
margin the greatest good must go to the greatest number for the treatment
to be ethically acceptable.

I’m not trying to reach a conclusion about this, just now. I’m trying to
establish a basis for getting some numbers that we can take into
consideration as we juggle the various human values involved. One number
we absolutely must have is the percentage of wrong evaluations that
arises at different levels of correlation between predictors and
predicted results. Another number, which will be different in different
situations, is the cost of making a mistake. We must also distinguish
between social and individual problems, because individual statistics are
no better at making social predictions than group statistics are at
making individual predictions. When the correlations are very high, the
two domains largely overlap, but in the low to medium correlation domains
they do not overlap significantly. And the question always comes up about
whether group rights get more weight than individual rights.

I think we have established that when benefits of a treatment apply only
to certain individuals in a population, and those individuals form a
minority of the population, it is possible to misclassify more than half
of the people. This is because in fact the treatment has no beneficial
effect on the majority, the apparent population benefit coming from a
very high effectiveness for the minority, and relative small damage to
the majority, too small to offset the large benefit for the few. The
false impression is given that everyone stands a chance of benefiting
from the treatment, when in fact only specific individuals do. If you
give mandatory basic French lessons to everyone in France, some of the
people who are there will benefit greatly, but of course those who
already speak fluent French – practically everyone – will get
very little out of the program. Yet the whole population will appear to
benefit somewhat.

That might be harmless if the lessons were free and took no time from
other pursuits. But that would be unlikely.

After we have the numbers and know what we’re talking about, we can start
the debate on where to set the limits. For now that debate is
empty.

Best,

Bill P.

P.S. I must have missed seeing Martin’s data for 200+ countries. I’ll
look for it. Might as well use the best data base we have.

OK. I did this one too. If you
compare only the predicted log IM

scores (Y’) for countries with a difference in log income > .9
then

the pairwise prediction accuracy is .996 (1746 correct comparisons
out

of 1753 total comparisons). You can get it to 100% (for this data
set)

if you require a difference in log income of 1.2 (765 correct

comparisons out of 765).
[From Bill Powers (2007.08.04.1300 MDT)]
Rick Marken (2007.08.04.1110)]
Great, you’re ahead of me. A different of 0.9 in the log (base 10) is a
factor of 7.9 in income.
But aren’t there too many comparisons here? With 122 countries, how do
you get 1753 comparisons out of this? Maybe it’s because you
did the comparisons for all differences greater than 0.9 for each
country. That’s what “log income > 0.9” seems to say. Of
course that means that for each country, the comparisons include not only
differences of 0.9, but all the larger differences as well, which makes
the accuracy too large.

How do we do this so as to use only the specified income difference?
Maybe it will be necessary to use bins: “(log income >= x) and
(log income < x+delta)” with the range of x being restricted
appropriately.

Best,

Bill P.

[From Rick Marken (2007.08.04.1420)]

Bill Powers (2007.08.04.1300 MDT)-

Great, you're ahead of me. A different of 0.9 in the log (base 10) is a
factor of 7.9 in income.

But aren't there too many comparisons here?

Yes. Based on your earlier post I now see what you want to do. I'll
get the new results to you ASAP.

Best

Rick

···

--
Richard S. Marken PhD
Lecturer in Psychology
UCLA
rsmarken@gmail.com

[From Rick Marken (2007.08.04.1800)]

Here is the data you ordered, in the table below. D is the ordinal
distance between pairs of countries in an ordered list from highest to
lowest income (x[i] - x[i+D], where x[i] is the i th country in the
list). IncD is the average difference in log income for countries
separated by difference D, Acc is the accuracy with which predicted Y'
scores (for log infant mortality) reflect actual differences in log
infant mortality, Y, for the pairs of counties. NC is the raw number
of correct predictions for the pairs and N is the total number of
pairs. As you can see, the accuracy levels are generally quite low. My
earlier calculation, where I started with 87% accuracy and moved up to
100% by increasing the income difference requirement, was obviously
incorrect. This data is now correct (I think). The highest level of
accuracy we get is 79% when the average difference in log income for
countries is .33 (a factor of a little over 2 in income levels). So
there you have it. Let me know if you need anything else.

Best

Rick

D IncD Acc NC N
1 0.04 37% 45 121
2 0.08 46% 28 61
3 0.11 49% 20 41
4 0.15 52% 16 31
5 0.19 64% 16 25
6 0.22 57% 12 21
7 0.26 61% 11 18
8 0.29 56% 9 16
9 0.33 79% 11 14
10 0.36 69% 9 13

···

--
Richard S. Marken PhD
Lecturer in Psychology
UCLA
rsmarken@gmail.com

[From Rick Marken (2007.08.04.1810)]

Opps. Still not right. I'll have to fix it tomorrow. I know what the
problem is: bad inner loop;-)

Best

Rick

···

On 8/4/07, Richard Marken <rsmarken@gmail.com> wrote:

[From Rick Marken (2007.08.04.1800)]

Here is the data you ordered, in the table below. D is the ordinal
distance between pairs of countries in an ordered list from highest to
lowest income (x[i] - x[i+D], where x[i] is the i th country in the
list). IncD is the average difference in log income for countries
separated by difference D, Acc is the accuracy with which predicted Y'
scores (for log infant mortality) reflect actual differences in log
infant mortality, Y, for the pairs of counties. NC is the raw number
of correct predictions for the pairs and N is the total number of
pairs. As you can see, the accuracy levels are generally quite low. My
earlier calculation, where I started with 87% accuracy and moved up to
100% by increasing the income difference requirement, was obviously
incorrect. This data is now correct (I think). The highest level of
accuracy we get is 79% when the average difference in log income for
countries is .33 (a factor of a little over 2 in income levels). So
there you have it. Let me know if you need anything else.

Best

Rick

D IncD Acc NC N
1 0.04 37% 45 121
2 0.08 46% 28 61
3 0.11 49% 20 41
4 0.15 52% 16 31
5 0.19 64% 16 25
6 0.22 57% 12 21
7 0.26 61% 11 18
8 0.29 56% 9 16
9 0.33 79% 11 14
10 0.36 69% 9 13

--
Richard S. Marken PhD
Lecturer in Psychology
UCLA
rsmarken@gmail.com

--
Richard S. Marken PhD
Lecturer in Psychology
UCLA
rsmarken@gmail.com

Oops. Still not right. I’ll have
to fix it tomorrow. I know what the

problem is: bad inner loop;-)
[From Bill Powers (2007.08.05.0320 MDT)]
Rick Marken (2007.08.04.1810) –
We’re getting there. As I recall, we were only requiring that the sign of
the difference in incomes be right. A positive difference in income
should go with a negative difference in infant mortality. But there’s a
problem about signs.
Check my reasoning about this. If the difference in log(IM) is zero, that
means that the ratio of infant mortalities is 1: in other words, the
second infant mortality is the same as the first one. If the ratio is
less than 1, that means that the second IM is less than the first one,
and if the ratio is greater than 1, the second is greater than the first.
The dividing line is at log(difference) = 1, not log(difference) =

So I think the criterion of successful prediction of sign should be
log(second minus first) < 1.
Note that requiring only that the sign be predicted correctly will
exaggerate the accuracy, since the slightest slope the right way will be
counted as correct even if it’s only 1% of the slope that should be
there. If we wanted to do this more quantitatively (I’m not sure we do
right now), we might consider calculating the mean squared error instead
of just counting right and wrong predictions. That should probably be
done using the original numbers instead of the logs. These are all logs
to the base 10, and I don’t know if Excel has an antilog base 10
function, so:
antilog10(X) = exp(2.3026*x)
This is kind of interesting. What we’re doing is a statistical analysis
directly from the raw data. To be sure, we’re using a least-squares fit
of a straight line to the data, but then we’re turning around and
checking how well that straight line represents the data, and how useful
it is for predictions, without assuming any “normal” or other
distribution. We could use any other way of fitting a line to the data –
for example, a model! However we get a line passing near the data points,
the question then becomes what the errors in using that line for
prediction of the known data will be. Prediction of new data is
not likely to be better than that, though it could be – but we would
never know without re-fitting the model to the new data, so we might as
well just do the model-fitting every time. Fifty years ago that would
have been a huge chore, but now it’s easy.

This is making it clear to me that standard statistics does use a model
of the relationship being investigated. It’s written y = ax + b (not
using statistical conventions for case of letters). All the rest is a
sort of idealization of the data, assuming some easily-represented
distribution such as Gaussian, binomial, or Poisson. If the actual
distribution doesn’t resemble one of the standard ones (it hardly ever
does), the results can only be an approximation and strictly speaking
none of the standard terms like sigma or correlation means
anything.

When we fit a linear equation to log data, we’re actually fitting a
nonlinear model to the original data. So already any assumptions about
normal distributions are out the window.

Best,

Bill P.

[From Rick Marken (2007.08.05.1010)]

Bill Powers (2007.08.05.0320 MDT)>

We're getting there.

Here's the new data:

D IncD Acc NC N

···

--------------------------------------------------
1 0.02 38% 45 120
2 0.03 45% 53 119
3 0.05 48% 57 118
4 0.06 56% 65 117
5 0.07 59% 69 116
6 0.09 62% 71 115
7 0.10 62% 71 114
8 0.12 68% 77 113
9 0.13 67% 75 112
10 0.15 61% 68 111

So I think the criterion of successful prediction of sign should be
log(second minus first) < 1.

I used that to get these data. But the results were exactly the same
as when I looked just for the first to be > the second.

Over to you!

Best

Rick
--
Richard S. Marken PhD
Lecturer in Psychology
UCLA
rsmarken@gmail.com

D
IncD Acc
NC N


1 0.02
38%
45
120

2 0.03
45%
53
119

3 0.05
48%
57
118

4 0.06
56%
65
117

5 0.07
59%
69
116

6 0.09
62%
71
115

7 0.10
62%
71
114

8 0.12
68%
77
113

9 0.13
67%
75
112

10 0.15
61%
68 111
From Bill Powers (2007.08.05.1320 MDT)]

Rick Marken (2007.08.05.1010)–

Here’s the new data:

Since we’re predicting only whether the IM changes in the same direction
as the predicted change, the chance level would be 50% accuracy. So this
looks pretty much like what I expected. The difference, however, goes
only to 10 places. Can we take it on up to half the span? Of course N
will get pretty small for the higher values of D – maybe just go up to
100. For the largest values of D, the accuracy ought to get much closer
to 100%.

Best,

Bill P.