Regression and Misclassification

[From Rick Marken (2007.07.23.0850)]

I have attached a plot of the results of my tests to determine the
relationship between regression and misclassification at the group
level. What the graph shows is that even when the correlation between
predictor and criterion variable is between 0 and 0.1 you still make
fewer misclassifications in the long run using regression prediction
(the square points labeled Regression) than you would by just flipping
a coin (the triangular points labeled Random). These results were
obtained by Monte Carlo simulation. I'll try to give a quick summary
of what I did. I'm willing to send the spreadsheet to whoever wants it
but it is _not_ self explanatory.

I created a set of 30 random predictors between 0 and 800 and 30
associated criterion scores, which were proportional to the predictors
but with an error term added. The average size of the error term added
to the criterion variables varied on each run of the simulation so
that the correlation between predictor and criterion scores vary from
0 to 1.0 (absolute). On each iteration of the program I did a
regression on the 30 scores and derived 30 Y' scores for 30 "new"
individuals who had the same predictor scores but new criterion
scores. So the Y' scores are predictors of the actual scores of the 30
new individuals. I then categorized the new individuals as "Success"
or "Failure" based on their Actual and Y' scores by setting a cutoff
that was in the middle of the range of the predicted scores and saying
that those above the cutoff were a "Success" and a "Failure"
otherwise. I also randomly classified the scores as "Success" or
"Failure" based on a random "coin flip". I then determined the number
of misclassifications by comparing the Y' and Random based
classifications to the actual score based classification. This was
done on each of 20000 iterations. Each iteration yielded a different
correlation between predictors and criterion scores. The graph shows
the average proportion of misclassifications based on Y' (Regression)
and Random coin flip as a function of the _absolute_ size of the
correlation (some correlations were slightly negative).

These results show that regression based prediction reduces
misclassifications at the group level relative to random guessing as
long as the correlation between predictor and criterion variable is

0. Using regression does _not_ make things worse relative to random

selection, even when the correlation is zero.

This is rather comforting and is consistent with my intuitions. Using
regression analysis _does_ improve selection at the group level
relative to random guessing; Martin is right to say that some
information -- even of very low quality -- is better than none. So I
think this shows that regression is a reasonable tool for policy
(group level) research, even though it tells you absolutely nothing
about the individuals in the group.

Best regards

Rick

···

---
Richard S. Marken PhD
Lecturer in Psychology
UCLA
rsmarken@gmail.com
Content-Type: image/jpeg; name=Regression.JPG
X-Attachment-Id: f_f4h2pp04

[Martin Taylor 2007.07.23.13.56]

[From Rick Marken (2007.07.23.0850)]
... regression is a reasonable tool for policy
(group level) research, even though it tells you absolutely nothing
about the individuals in the group.

Actually, that's not quite right.

Richard Kennaway (reposted by Bill Powers (2007.07.22.1235 MDT)) or at http://www.cmp.uea.ac.uk/~jrk/distribution/corrinfo.pdf quantifies in Table 1 (p16) just how much information you can get about measure Y for an individual when you know X at different levels of correlation. It may not be much (assuming his calculations are correct, it's only 1 bit when the correlation is 0.866), but it's non-zero.

Martin

[From Rick Marken (2007.07.23.1130)]

Martin Taylor (2007.07.23.13.56)

> Rick Marken (2007.07.23.0850)]
>... regression is a reasonable tool for policy
>(group level) research, even though it tells you absolutely nothing
>about the individuals in the group.

Actually, that's not quite right.

Richard Kennaway (reposted by Bill Powers (2007.07.22.1235 MDT)) or
at http://www.cmp.uea.ac.uk/~jrk/distribution/corrinfo.pdf quantifies
in Table 1 (p16) just how much information you can get about measure
Y for an individual when you know X at different levels of
correlation. It may not be much (assuming his calculations are
correct, it's only 1 bit when the correlation is 0.866), but it's
non-zero.

I think it's best to go with the idea that group level statistics are
relevant only to groups and tell you nothing about the individuals in
the group (this is the lesson of Bill's wonderful demonstration of a
positive group-level relationship between reward and effort obtained
from individuals who all put out effort in inverse relation to
reward). Thinking otherwise just leads to prejudice.

Best

Rick

···

--
Richard S. Marken PhD
Lecturer in Psychology
UCLA
rsmarken@gmail.com

I think it’s best to go with the
idea that group level statistics are

relevant only to groups and tell you nothing about the individuals
in

the group (this is the lesson of Bill’s wonderful demonstration of a

positive group-level relationship between reward and effort obtained

from individuals who all put out effort in inverse relation to

reward). Thinking otherwise just leads to
prejudice.
[From Bill Powers (2007.07.23.1230 MDT)]

Rick Marken (2007.07.23.1130) –

For the usual range of correlations found in the literature, I agree.
However if the model is good enough the correlations can become very
high, and then the model curve predicts individual scores with higher
accuracy, as your Monty Python – oops – test shows.

Looks like some convergence toward agreement among you, me, and Martin
here.

I have discovered an annoying fact about my spreadsheet program. If you
define an area of cells that doesn’t include the row labels, and sort on
one column, the columns get out of synch with the labels. Can you send me
another copy of your Infant Mortality spread sheet? I ruined all my
copies before I found out what was happening.

Best,

Bill P.

···

Best

Rick

Richard S. Marken PhD

Lecturer in Psychology

UCLA

rsmarken@gmail.com

No virus found in this incoming message.

Checked by AVG Free Edition. Version: 7.5.476 / Virus Database:
269.10.10/908 - Release Date: 7/19/2007 6:10 PM

No virus found in this incoming message.

Checked by AVG Free Edition. Version: 7.5.476 / Virus Database:
269.10.10/908 - Release Date: 7/19/2007 6:10 PM

[From Rick Marken (2007.07.23.1200)]

Bill Powers (2007.07.23.1230 MDT)--

However if the model is good enough the correlations can become very high,
and then the model curve predicts individual scores with higher accuracy, as
your Monty Python -- oops -- test shows.

I've been stumped by the Monty Hall problem, ran a Monti Carlo
simulation to measure misclassification rate and now get mistaken for
Monty Python. Are you running a three card monte on me;-)

I have discovered an annoying fact about my spreadsheet program. If you
define an area of cells that doesn't include the row labels, and sort on one
column, the columns get out of synch with the labels. Can you send me
another copy of your Infant Mortality spread sheet? I ruined all my copies
before I found out what was happening.

Here it is.

Best

Rick

health21.xls (87 KB)

···

--
Richard S. Marken PhD
Lecturer in Psychology
UCLA
rsmarken@gmail.com
Content-Type: application/vnd.ms-excel; name="health2.xls"
Content-Disposition: attachment; filename="health2.xls"
X-Attachment-Id: f_f4hbpv3t

I have attached a plot of the
results of my tests to determine the

relationship between regression and misclassification at the group

level. What the graph shows is that even when the correlation
between

predictor and criterion variable is between 0 and 0.1 you still make

fewer misclassifications in the long run using regression prediction

(the square points labeled Regression) than you would by just
flipping

a coin (the triangular points labeled Random).
[From Bill Powers (2007.07.24.0105 MDT)]

Rick Marken (2007.07.23.0850) –

After sleeping on this a while, I realized that this result was built
into the Monte Vista calculation.

I created a set of
30 random predictors between 0 and 800 and 30

associated criterion scores, which were proportional to the
predictors

but with an error term added.

This means that every person in the group was actually affected in a way
proportional to the predictor, but that additional positive and negative
effects were present that made the individual score greater or less than
the average effect. If every individual was affected, it’s not surprising
that on the average, the predictors improved the prediction of individual
behavior. This is the unwarranted assumption that is usually made in
statistical studies: the effect is actually there in everyone, but is
masked by noise so it only shows up in some of them. Therefore you can
say you have discovered an effect on “people.”

Now try the same Monte Cristo calculation with each treatment having two
effects: one, an effect on everyone with a negative slope of 1 and
varying amounts of random noise, and two, a clear noise-free effect on a
randomly-selected 20% of the people with a positive slope of 10. This
should make the other case clear.

The positive effect on 20% of the people will outweight the negative
effect on the group as a whole, yet the majority of the individuals will
experience a negative effect. You can’t tell whether a positive effect in
one person is due to noise or to the treatment. So this is like telling
everyone they should take a small aspirin every day, or vitamins, or eat
beets, or take Plavix, or avoid stress. Some people, for reasons that are
not understood, strongly benefit from doing these things, while most
people do not and in fact suffer random side-effects and of course
expenses and inconvenience. If the good of the few outweighs the harm to
the many, the group statistics will indicate that “people”
benefit from the treatment.

I believe that this second case is by far the most common. Many things
are known about “people” that are false for most
individuals.

There is a second version of the second case. Create a random
distribution of slopes which averages -1, and a second distribution of
slopes that averages +10. The first distribution is used to convert the
predictor to an effect on everyone, the second distribution to an effect
on 20% of the people. Now it becomes even harder to distinguish the
positive effects from the negative in any individual, and statistics is
the only way to sort things out. But the positive group effects will
still benefit a minority of individuals.

When we don’t understand the mechanisms, it is highly unlikely that
anything we try in the attempt to improve life, however sincere, will
happen to have a positive effect on every individual. How could it? There
are too many variables and we don’t know what is actually wrong with the
patient. When our treatments are selected essentially at random in
relation to the actual parameters, functions, and interactions in the
real system, we will be lucky to find some that do not violently harm
everybody. Once in a while we will stumble across something that really
does help some people a lot, and they will swear by the treatment and
recommend it to all their friends. Of course with so much random
variability around, it’s hard to assure their friends that the treatment
will always help, but heck, if there’s a chance that it will help, it’s
worth a try.

In fact there is no chance at all that the treatment will help everybody
even a little – it will help only those in which the treatment happens
to correct what is actually wrong with the person. You can’t make up for
a maladjusted carburetor by having everybody change their spark plugs. On
the other hand, everybody whose problem with acceleration was bad spark
plugs will praise you for your advice, while the rest will just go on
enduring poor acceleration. You will pin the grateful letters from people
who had bad plugs on the shop bulletin board, where new customers can see
them. The rest of the letters you will burn, if there are any.

Statistics is not a cure for ignorance.

Best,

Bill P.

Message
[David Goldstein]

Bill and Rick

I think that what you suggested is equivalent to what is called ‘an interaction effect’ with the variable of subjects.

The effect of the predictor variable interacts with the subject variable.

Criterion variable = treatment variable effect+(treatment variable effect x subjects variable effect) +error term

Rick, do you agree with this?

David

From: Control Systems Group Network (CSGnet) [mailto:CSGNET@LISTSERV.UIUC.EDU] ** On Behalf Of** Bill Powers
Sent: Tuesday, July 24, 2007 4:28 AM
To:
CSGNET@LISTSERV.UIUC.EDU
Subject: Re: Regression and Misclassification

I have attached a plot of the results of my tests to determine the
relationship between regression and misclassification at the group
level. What the graph shows is that even when the correlation between
predictor and criterion variable is between 0 and 0.1 you still make
fewer misclassifications in the long run using regression prediction
(the square points labeled Regression) than you would by just flipping
a coin (the triangular points labeled Random).

[From Bill Powers (2007.07.24.0105 MDT)]

Rick Marken (2007.07.23.0850) –

After sleeping on this a while, I realized that this result was built into the Monte Vista calculation.

I created a set of 30 random predictors between 0 and 800 and 30
associated criterion scores, which were proportional to the predictors
but with an error term added.

This means that every person in the group was actually affected in a way proportional to the predictor, but that additional positive and negative effects were present that made the individual score greater or less than the average effect. If every individual was affected, it’s not surprising that on the average, the predictors improved the prediction of individual behavior. This is the unwarranted assumption that is usually made in statistical studies: the effect is actually there in everyone, but is masked by noise so it only shows up in some of them. Therefore you can say you have discovered an effect on “people.”

Now try the same Monte Cristo calculation with each treatment having two effects: one, an effect on everyone with a negative slope of 1 and varying amounts of random noise, and two, a clear noise-free effect on a randomly-selected 20% of the people with a positive slope of 10. This should make the other case clear.

The positive effect on 20% of the people will outweight the negative effect on the group as a whole, yet the majority of the individuals will experience a negative effect. You can’t tell whether a positive effect in one person is due to noise or to the treatment. So this is like telling everyone they should take a small aspirin every day, or vitamins, or eat beets, or take Plavix, or avoid stress. Some people, for reasons that are not understood, strongly benefit from doing these things, while most people do not and in fact suffer random side-effects and of course expenses and inconvenience. If the good of the few outweighs the harm to the many, the group statistics will indicate that “people” benefit from the treatment.

I believe that this second case is by far the most common. Many things are known about “people” that are false for most individuals.

There is a second version of the second case. Create a random distribution of slopes which averages -1, and a second distribution of slopes that averages +10. The first distribution is used to convert the predictor to an effect on everyone, the second distribution to an effect on 20% of the people. Now it becomes even harder to distinguish the positive effects from the negative in any individual, and statistics is the only way to sort things out. But the positive group effects will still benefit a minority of individuals.

When we don’t understand the mechanisms, it is highly unlikely that anything we try in the attempt to improve life, however sincere, will happen to have a positive effect on every individual. How could it? There are too many variables and we don’t know what is actually wrong with the patient. When our treatments are selected essentially at random in relation to the actual parameters, functions, and interactions in the real system, we will be lucky to find some that do not violently harm everybody. Once in a while we will stumble across something that really does help some people a lot, and they will swear by the treatment and recommend it to all their friends. Of course with so much random variability around, it’s hard to assure their friends that the treatment will always help, but heck, if there’s a chance that it will help, it’s worth a try.

In fact there is no chance at all that the treatment will help everybody even a little – it will help only those in which the treatment happens to correct what is actually wrong with the person. You can’t make up for a maladjusted carburetor by having everybody change their spark plugs. On the other hand, everybody whose problem with acceleration was bad spark plugs will praise you for your advice, while the rest will just go on enduring poor acceleration. You will pin the grateful letters from people who had bad plugs on the shop bulletin board, where new customers can see them. The rest of the letters you will burn, if there are any.

Statistics is not a cure for ignorance.

Best,

Bill P.

No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.476 / Virus Database: 269.10.14/912 - Release Date: 7/22/2007 7:02 PM

No virus found in this outgoing message.

Checked by AVG Free Edition.

Version: 7.5.476 / Virus Database: 269.10.14/912 - Release Date: 7/22/2007 7:02 PM

···

-----Original Message-----

[From Rick Marken (2007.07.24.0850)]

David Goldstein writes:

Bill and Rick

I think that what you suggested is equivalent to what is called 'an
interaction effect' with the variable of subjects.

Criterion variable = treatment variable effect+(treatment variable effect x
subjects variable effect) +error term

Rick, do you agree with this?

Yes, I think that's what he's talking about. I'll try to implement
Bill's ideas in a revision. But my prediction is that adding these
changes (which sound a lot like an interaction) will not affect the
result. An interaction just adds "error" variance to the prediction.
It won't affect the fact that knowing the value of the predictor will
improve the group level selection result.

I think what Bill is concerned about is the fact that there may be
severe negative consequences to the individual who is inaccurately
selected or rejected based on the prediction equation. I think this is
a very important consideration when implementing group level policy.

Best

Rick

···

--
Richard S. Marken PhD
Lecturer in Psychology
UCLA
rsmarken@gmail.com

David Goldstein writes:

Bill and Rick

I think that what you suggested is equivalent to what is called 'an

interaction effect’ with the variable of subjects.

Criterion variable = treatment
variable effect+(treatment variable effect x

subjects variable effect) +error term

Rick, do you agree with this?

Yes, I think that’s what he’s talking about. I’ll try to implement

Bill’s ideas in a revision. But my prediction is that adding these

changes (which sound a lot like an interaction) will not affect the

result. An interaction just adds “error” variance to the
prediction.

It won’t affect the fact that knowing the value of the predictor
will

improve the group level selection result.
[From Bill Powers (2007.07.24.1030 MDT)]

Rick Marken (2007.07.24.0850) –

Well, be sure you’re testing the situation I’m trying to describe. What
I’m trying to bring out is the inability of population measures to
distinguish between an effect that is present to some average degree in
every individual in the population, and an effect that is never seen
except in certain individuals in the population. To see the effect in its
rawest form, you can set up the simulation so that everyone in the
population is affected by a maximum of 10% at the maximum treatment
level, or so that 10% of the population is affected by a maximum of 100%
at the maximum treatment level with no effect on the other 90%. These two
underlying situations should look the same at the population level. Maybe
at the highest correlations the effect would be obvious, but as the
correlation decreases it would be hidden by increasing amounts of random
variation. I don’t think it would be obvious in the following
scenario:

Suppose one aspirin a day reduces the incidence of heart attacks by 100%
in 1 person out of 10, and has no effect on the incidence of heart
attacks in the other 9. If the incidence of heart attacks in the 90%
remains the same, we will see a total of 10% fewer heart attacks in the
population at maximum effective dose. But this does not mean that every
person has a 10% reduction in the chance of having a heart attack. Ninety
per cent of the people get no benefit at all. Their chance of a heart
attack is unchanged.

Perhaps this is what is called “interaction.” I wouldn’t know.
But it means that a population effect can be completely
misleading.

See you guys tomorrow!

Best.

Bill P.

[From Rick Marken (2007.07.24.1220)]

Bill Powers (2007.07.24.1030 MDT)--

Well, be sure you're testing the situation I'm trying to describe. What I'm
trying to bring out is the inability of population measures to distinguish
between an effect that is present to some average degree in every individual
in the population, and an effect that is never seen except in certain
individuals in the population.

Yes. I think I have it. What I do is have some proportion of the
sample "respond" in the opposite way as the remainder of the sample to
the predictor scores, X. So for some proportion of the sample Y =
bX+error and for the others it is Y = -bX+ error. We can go over this
tomorrow in person, by the way.

If I've got it set up right then the larger the proportion of people
you have following the opposite "behavioral law", the closer your
misclassification rate based on a regression solution are to the
chance rate. You never seem to do worse than chance.

I think your point is related to my claim that you really learn
nothing about individuals from group results. This is what is shown so
well in your _American Behavioral Scientist_ article showing that you
can a group level positive correlation between reward and effort from
individuals who actually have a negative relationship between reward
and effort. You could arrange it so the the group level correlation
between reward and effort is quite high -- I would be you could get an
r greater than .95 -- but this would still not correctly describe the
relationship between reward and effort for an individual. But it would
still be true that, at the group level, you would predict that people
who get a large reward will also be putting out the most effort. The
relationship would be predictive, but would not be correct to say that
this is the way individuals work.

Best

Rick

To see the effect in its rawest form, you can

···

set up the simulation so that everyone in the population is affected by a
maximum of 10% at the maximum treatment level, or so that 10% of the
population is affected by a maximum of 100% at the maximum treatment level
with no effect on the other 90%. These two underlying situations should look
the same at the population level. Maybe at the highest correlations the
effect would be obvious, but as the correlation decreases it would be hidden
by increasing amounts of random variation. I don't think it would be obvious
in the following scenario:

Suppose one aspirin a day reduces the incidence of heart attacks by 100% in
1 person out of 10, and has no effect on the incidence of heart attacks in
the other 9. If the incidence of heart attacks in the 90% remains the same,
we will see a total of 10% fewer heart attacks in the population at maximum
effective dose. But this does not mean that every person has a 10% reduction
in the chance of having a heart attack. Ninety per cent of the people get no
benefit at all. Their chance of a heart attack is unchanged.

Perhaps this is what is called "interaction." I wouldn't know. But it means
that a population effect can be completely misleading.

See you guys tomorrow!

Best.

Bill P.

No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.476 / Virus Database: 269.10.10/908 - Release Date: 7/19/2007
6:10 PM

--
Richard S. Marken PhD
Lecturer in Psychology
UCLA
rsmarken@gmail.com

[From Rick Marken (2007.07.24.1240)]

I have to run off for a second but I just wanted to quickly let you
know that Bill Powers is right again!! Brilliant!! If there are more
people in the "New" group who have the -b behavioral function than had
it in the group on which the regression equation is based, the the
misclassification rate using the regression equation will be greater
than it is by chance when the correlation between predictor and
criterion is sufficiently low. How low depends on how many more
"deviants" are in the New group compared to the original regression
group.

This is a VERY cool discovery and it really convinces me that
regression is not even good for group level prediction!! Wow!! Talk
about spadework at the foundations.

Best

Rick

···

--
Richard S. Marken PhD
Lecturer in Psychology
UCLA
rsmarken@gmail.com

Yes. I think I have it. What I
do is have some proportion of the

sample “respond” in the opposite way as the remainder of the
sample to

the predictor scores, X. So for some proportion of the sample Y =

bX+error and for the others it is Y = -bX+ error. We can go over
this

tomorrow in person, by the way.
[From Bill Powers (2007.07.24.1339 MDT)]

Rick Marken (2007.07.24.1220)] –

To reproduce my simplest scenario you have to say

Y1 = b*X + random error, and

Y2 = 0 + random error

Population 2 is the larger of the two that make up the whole
sample.

Remember, the treatment does not have any effect on the majority of the
population. The correlation of X with the response of population 2 will
be zero. But since we don’t know which responses are Y1s and which are
Y2s, we can only correlate X with Y for the whole population. This
will yield a net positive correlation and regression line, so a positive
response will be predicted for everyone, including those that don’t
respond to X but merely show random behavior in terms of Y2.

I think your point
is related to my claim that you really learn

nothing about individuals from group results. This is what is shown
so

well in your American Behavioral Scientist article showing that
you

can a group level positive correlation between reward and effort
from

individuals who actually have a negative relationship between reward

and effort.

I really don’t call that “learning nothing”, because you are
learning something, and it’s wrong. You learn that more reward goes with
more effort, but for individuals with the same reference level for
reward, more reward goes with less effort.

You could
arrange it so the the group level correlation

between reward and effort is quite high – I would bet you could get
an

r greater than .95 – but this would still not correctly describe
the

relationship between reward and effort for an individual. But it
would

still be true that, at the group level, you would predict that
people

who get a large reward will also be putting out the most effort.

The relationship would be predictive, but would not be correct to say
that

this is the way individuals work.

Strictly at the group level, the observed acausal relationship is that
reward and effort covary But at the individual level, reward and effort
vary in opposite directions for all sets of individuals with the same
reference level for reward. So the predicted relationship for individuals
has the wrong sign. The statistical analysis doesn’t know anything about
reference levels, so it can’t put individuals into the appropriate
subgroups with the same reference level.

You say “at the group level, people who get a large reward put out
the most effort” Surely, if I say that “people have two
legs” you will hear that as “ALL [normal] people have two
legs.” “People need to eat” means “ALL people need to
eat.”

“People” actually means “the average over all the
people.” It doesn’t mean “any person” and certainly
doesn’t mean “every person.” Nobody behaves as the average
behaves. But facts about “people” in the sense you’re using
always turn out to be facts about a few people, and definitely not all
people.

Richard Kennaway carried this even farther, proving that group data can
be made up of subsets of individual data organized in an infinity of ways
that have nothing to do with the group data. There is simply no way to
deduce an individual’s internal organization from group data: Kennaway
put the QED on that.

As I understand what happened, that paper has been rejected and was never
published.

See you soon.

Bill P.

[Martin Taylor 2007.07.24.17.19]

[From Bill Powers (2007.07.23.1230 MDT)] to

Rick Marken (2007.07.23.1130) --

I have discovered an annoying fact about my spreadsheet program. If you define an area of cells that doesn't include the row labels, and sort on one column, the columns get out of synch with the labels. Can you send me another copy of your Infant Mortality spread sheet? I ruined all my copies before I found out what was happening.

That's exactly what happened to me when I did the GDP-GIN-health spreadsheet back in 2001 or 2002. It's why I wasn't able to do more than rely on my often faulty memory when the subject arose and Rick wanted real data.

However, I was finally spurred to go back to the CIA World Factbook and extract a lot of data for about twice as many countries as Rick' spreadsheet has (not many have GINI data, though). I'm in the process of filling in a few more fields, and then I'll post it. There are abour 220 countried that have enough data to be worth including.

Martin

However, I was finally spurred
to go back to the CIA World Factbook and extract a lot of data for about
twice as many countries as Rick’ spreadsheet has (not many have GINI
data, though). I’m in the process of filling in a few more fields, and
then I’ll post it. There are abour 220 countried that have enough data to
be worth including.
[From Bill Powers (2007.07.24.1530 MDT)]

Martin Taylor 2007.07.24.17.19 –

That’s wonderful. I suppose you’re aware, but I’ll say it anyway: the
correlation between log income and log mortality rate is 0.895 for the
data I have. When you post your data, I hope you will do it as a
spreadsheet (.xls) because I don’t have any idea how to import data in
other formats.

Best,

Bill P.

[From Rick Marken (2007.07.24.2110)]

Rick Marken (2007.07.24.1240)]

This is a VERY cool discovery and it really convinces me that
regression is not even good for group level prediction!! Wow!! Talk
about spadework at the foundations.

Well, you can put back the spade this time. I must have made some
error in my spreadsheet simulation. Now I find that it is just
impossible to get the regression misclassification rate to be worse
(in the long run) than chance misclassification rate. Regression does
_at least_ as well as chance, even when some people behaves according
to the behavioral law Y = bX and others behave according to the law Y
= - bX or Y = 0 * X. Attached is one graph of the results where a
random proportion of each sample and prediction group behaves
according to Y = bX and the rest behave according to Y = -X.

So it still looks like regression improves classification (or at least
doesn't make it any worse than it would be using coin flipping) even
if individuals in the sample behave according to two different
behavioral laws.

Best

Rick

···

--
Richard S. Marken PhD
Lecturer in Psychology
UCLA
rsmarken@gmail.com
Content-Type: image/jpeg; name=RegressAnal.JPG
X-Attachment-Id: f_f4jarxpz

[Martin Taylor 2007.07.25.09.44]

[From Bill Powers (2007.07.24.1339 MDT)]

To reproduce my simplest scenario you have to say

Y1 = b*X + random error, and
Y2 = 0 + random error

Population 2 is the larger of the two that make up the whole sample.

Remember, the treatment does not have any effect on the majority of
the population. The correlation of X with the response of population
2 will be zero. But since we don't know which responses are Y1s and
which are Y2s, we can only correlate X with Y for the whole
population. This will yield a net positive correlation and
regression line, so a positive response will be predicted for
everyone, including those that don't respond to X but merely show
random behavior in terms of Y2.

You are always advocating that one should look at the actual tracks
in a tracking study, rather than at the summary number such as RMS
error. People do the same with statistical surveys. In this case,
when you look at the scatter plots, it's easy to see that there are
two populations and that the scatter looks nothing like the diagonal
ellipse that would suggest the possibility of drawing conclusions
from the simple regression over the whole population.

I attach an example spreadsheet with your two populations with "b"
set to 1, 3, and 10.

The strongest argument against using group-level correlations as
indicators of effects within individuals is your own demo:

         > *
         > x *
         > a x *
  result | b a x

TwoPopulations.xls (50 KB)

···

  b a

         > b
         _______________________
                influence

where the different symbols represent different individuals. The
overall regression has the opposite sign from the direction of
effects within the individuals.

The problem with this is that if you do have repeated measures on
individuals, a statsitician would use them and would not be misled.
It's when this situation pertains, and you have only one measure per
individual that you would be led astray by relying on the group
regression to give you information about how changing the "influence"
might affect the "result" for any individual.

Martin

[Martin Taylor 2007.07.25.23.29]

[From Bill Powers (2007.07.24.1530 MDT)]

Martin Taylor 2007.07.24.17.19 --

However, I was finally spurred to go back to the CIA World Factbook
and extract a lot of data for about twice as many countries as
Rick' spreadsheet has (not many have GINI data, though). I'm in the
process of filling in a few more fields, and then I'll post it.
There are abour 220 countried that have enough data to be worth
including.

That's wonderful. I suppose you're aware, but I'll say it anyway:
the correlation between log income and log mortality rate is 0.895
for the data I have. When you post your data, I hope you will do it
as a spreadsheet (.xls) because I don't have any idea how to import
data in other formats.

Here's the spreadsheet. It's a bit more comprehensive than I had
intended when I wrote yesterday's message. There are more data fields
and I've transformed some of them to linearize some of the scatter
plots, of which I've made several. For me, the interesting thing
isn't so much that measures are correlated with each other, but that
one can look at the individuals that fall off the trend lines to
which most others conform, and then look further at what might
distinguish those individuals from the rest.

Comments: There are a lot of pretty high correlations among measures
that in principle could be independent. It would be interesting to do
a factor analysis or a principle components analysis, which I believe
Excel can do. I don't have time to do any more on this than I have
done.

The highest correlation is between Median Age and birth rate per 1000
population (log transformed). The scattergram showns a curvilinear
relationship, but even so, the correlation is 0.969. Between birth
rate and log infant mortality the correlation is 0.874, suggesting a
causal relation (couples want to have surviving babies, so make
enough that some do survive).

On several scattergrams I've identified individual countries that
fall off the main trends (this is what I think is most useful about
doing the correlational analysis -- looking for mavericks). It's
interesting to see that on several of the graphs S.Africa and its
neighbours (other than Mozambique) are clustered together off the
main trend; another frequent cluster includes many of the European
ex-Soviet countries, and yet another seems to consist largely of
small islands.

Equatorial Guinea is WAY off on any graph that has GDP as one axis.
That's no typo. According to the CIA World Factbook, it has very
large oil revenues, almost all of which is taken by the President's
family, so for most other measures it is like any other central
African country, despite having one of the world's highest GDP per
capita. The GINI index isn't noted, but it has to be close to 100.

Literacy seems to set a lower bound on infant mortality. You can get
high literacy with high infant mortality, but you can't get low
infant mortality with low literacy. The definition of literacy isn't
really suitable for the scattergrams, since near 100% literacy what
matters isn't the percentage of people over 15 who can read and write
simple stuff, it's the general level of ability to make sense out of
written material. The scattergrams often look as though they would
continue through the 100% limit if the data didn't pile up on that
limit.

The GINI index seems to have two regimes, for countries with GDP
above and below $15,000 per capita. Neither group seems to show much
if any relationship between GINI and GDP. On the high side, GINI
mostly ranges between about 25 and 38; only three counries on the
high side have a GINI index above 40 (high inequality of income):
USA, Singapore, and Hong Kong. Below $15000 per capita, the GINI
index is all over the lot (to use a technical term), mostly between
about 25 and 65. For the richer countries, most cluster tightly
around a trend line of increasing infant mortality with increasing
GINI, with outliers Hong Kong and Singapore, plus Japan having infant
mortality well below that trend line. I didn't look at it for poorer
countries. Looking at the scattergram, the interocular traumatic test
suggests that there is a real trend with these anomalous countries,
but it's always dangerous to excise outliers and then read too much
into the trends for the remaining data.

GINI and Median Age: It seems you can't get a high Median Age with a
high GINI, but you can get low Medium Age with low GINI. If there's a
causal relationship here, that would be interesting.

With the same exceptions as for infant mortality (Hong Kong and
Singapore), all the countries with life expectancy after age 1
greater than 79 have a GINI index below 38. For countries with lower
life expectancies (including the USA), GINI doesn't seem to have much
relation to life expectancy, though if there is any trend it's in the
direction of high-GINI low life expectancy.

You can make of these what you will. I wish the CIA had included data
on proportion of GDP or dollor equivalent spent on Health, but they
don't. ...

Since writing that, I've searched for the data and found them on the
World Health Organization site (along with much more). I've extracted
those numbers from the WHO Web site, and I'll add them into the
spreadsheet in a revised version someday (maybe soon). Lots more
correlations and scattergrams if you like that sort of thing! But the
spreadsheet I attach will have to do for now. It's not really PCT,
anyway, is it?

Martin

CIA_2007_Data_v2.xls (334 KB)