Rick's Spreadsheet

[From Bill Powers (2007.07.19.1310 MDT)]

Attached is my version of Rick’s spread sheet for the log income data,
with a column added to show the percent error of the regression line
relative to the actual mortality for each country. I finally figured out
how to sort it on different columns. The results show that 35 of the 122
predictions of mortality rate, or 28.7% of them, are in error by +/-100%
or more (the maximum error is about 350%), even though the standard error
is “only” 18 units out of a maximum of 158 (and a minimum of
2.3). The United States is in 25th place in actual measured mortality,
and in 3rd place according to the regression equation.

The regression equation predicts that the mortality rate in the US is
-8.5 (I think that’s per 1000 per year). In other words, according to the
regression equation, the United States brings 8.5 dead infants per 1000
of population back to life in the first year. However, 18 other countries
also have discovered how to do this, and two of them do it better than we
do.

The United States has the third best income, while being 25th best in
actual infant mortality. Singapore is 17th best in income, and lowest of
all in actual mortality.

Obviously, a linear equation is not the best model for predicting infant
mortality as a function of anything. Yet standard statistics attempts to
fit a straight line to everything. Introducing a log function might help,
but why not look for the best-fit curve, say a second-degree polynomial
at least?

My opinion of statistical studies is still that all they give us is a lot
of completely unexplained facts of very low quality. Of course that’s a
statistical generalization.

Best,

Bill P.

SXYError.xls (34 KB)

[From Rick Marken (2007.07.19.1400)]

Bill Powers (2007.07.19.1310 MDT)--

Attached is my version of Rick's spread sheet for the log income data

Nice work!

The regression equation predicts that the mortality rate in the US is -8.5
(I think that's per 1000 per year). In other words, according to the
regression equation, the United States brings 8.5 dead infants per 1000 of
population back to life in the first year.

And that resurrection rate is sure to decrease now that Bush has
vetoed the child heath care bill;-)

Obviously, a linear equation is not the best model for predicting infant
mortality as a function of anything. Yet standard statistics attempts to fit
a straight line to everything. Introducing a log function might help, but
why not look for the best-fit curve, say a second-degree polynomial at
least?

Why not, indeed. But you can do that using linear regression along
with variable transformations. I think the problem here (and with all
data) as far as accuracy of prediction of individual cases is that the
data is too noisy. I suppose you could ultimately fit any data
perfectly using an inverse Fourier analysis. So treating per person
income as the "time" variable we could find the "spectrum" of the
infant mortality data. But then what do you know?

I look at regression and correlation as a way to describe data. You
can also use it to predict individual cases but the prediction is not
going to be very accurate. But it is an improvement over just
guessing. You say the prediction of infant mortality in the US is
absurdly off but it is not as far off as it would be (in the long run)
you had just guessed randomly or had used the average infant mortality
as your guess of each country's infant mortality.

My opinion of statistical studies is still that all they give us is a lot
of completely unexplained facts of very low quality. Of course that's a
statistical generalization.

I agree. They are unexplained facts; I don't know whether they are low
quality. They just are what they are. One measure of quality is their
reliability: do different collectors of this data get the same
measures of infant mortality and per capita income for each country. I
would imagine that by this measure these measures are surely of lower
quality than physical measures, such as voltage and current.

The infant mortality vs per capita income relationship itself is just
an observation which is _possibly_ relevant to the quality of heath
care in different countries. I don't know if it is or it isn't
relevant but there it is. What would you look for to start evaluating
different health care systems? What are the explained facts of very
high quality that you would use for your analysis? Should we just stop
measuring things like infant mortality and per capita income because
they are of such low quality and unexplained? It would be nice if you
could make a positive suggestion about how to evaluate health care
policy? Why do you prefer a single payer system, by the way?

Best

Rick

···

--
Richard S. Marken PhD
Lecturer in Psychology
UCLA
rsmarken@gmail.com

[Martin Taylor 2007.07.19.17.06]

[From Bill Powers (2007.07.19.1310 MDT)]

Obviously, a linear equation is not the best model for predicting infant mortality as a function of anything. Yet standard statistics attempts to fit a straight line to everything.

Why on earth do you say that?

Introducing a log function might help, but why not look for the best-fit curve, say a second-degree polynomial at least?

Indeed, why not? As I pointed out in my tutorial [Martin Taylor 2007.07.17.11.08], you can have any kind of curve or discontinuous function you want.

My opinion of statistical studies is still that all they give us is a lot of completely unexplained facts of very low quality. Of course that's a statistical generalization.

It all depends on what you do with the statistics, doesn't it? I think your generalization applies more to those who blindly use statistics to generate "significance levels" for deviations from null hypotheses they didn't believe in the first place than it does to "statistics".

You really do have some weird ideas about statistics! "Standard statistics attempts to fit a straight line to everything", indeed!

Martin

It all depends on what you do
with the statistics, doesn’t it? I think your generalization applies more
to those who blindly use statistics to generate “significance
levels” for deviations from null hypotheses they didn’t believe in
the first place than it does to “statistics”.

You really do have some weird ideas about statistics! “Standard
statistics attempts to fit a straight line to everything”,
indeed!
[From Bill Powers (2007.07.19.1530 MDT)]

Martin Taylor 2007.07.19.17.06 –

I was referring to the regression equation, which is a least-squares best
fit to the relationship between two noisy variables. Of course a person
who has advanced mathematical abilities can do better than that, but do
the standard statistical packages most people use make provision for more
than a few standard transformations, like logarithmic? And who knows
enough to use them?

I don’t really know, of course.

Best,

Bill P.

I look at regression and
correlation as a way to describe data.
[From Bill Powers (2007.07.19.1540 MDT)]

Rick Marken (2007.07.19.1400) –

What kind of description is it that shows the US with a negative infant
mortality rate?

Well, I can see that my reservations about statistics are not very
popular around here, so I guess I’ll get interested in something
else.

Warren Mansell came up with a very nifty set of photos of a woman’s face
that spans a range of expressions from delight through neutral to
furious. I have incorporated them into a tracking program that I will
show at the meeting next week(!). Most interesting. It amazes me that the
computer can really show randomly-selected 2 x 3 inch color photographs
on the screen at 60 frames per second. Seems to me we could do some
interesting things with that.

Best,

Bill P.

[From Rick Marken (2007.07.19.1510)]

Bill Powers (2007.07.19.1540 MDT)

Rick Marken (2007.07.19.1400) --

I look at regression and correlation as a way to describe data.
What kind of description is it that shows the US with a negative infant
mortality rate?

It's the relationship that is described reasonably well, I think.

Well, I can see that my reservations about statistics are not very popular
around here, so I guess I'll get interested in something else.

No unpopular.Just unclear (to me anyway). I have plenty of
reservations about statistics. I would just like to know how you would
go about studying something like health care policy. Is there an
alternative to group level data that you would look at? Or do we just
not deal with the issue? I'd just like to know how you would go about
evaluating the merits of public policy options?

The expression control thing (below) sounds great. Can't wait to see it.

Best

Rick

···

Warren Mansell came up with a very nifty set of photos of a woman's face
that spans a range of expressions from delight through neutral to furious. I
have incorporated them into a tracking program that I will show at the
meeting next week(!). Most interesting. It amazes me that the computer can
really show randomly-selected 2 x 3 inch color photographs on the screen at
60 frames per second. Seems to me we could do some interesting things with
that.

Best,

Bill P.

Internal Virus Database is out-of-date.
Checked by AVG Free Edition.
Version: 7.5.476 / Virus Database: 269.10.2/894 - Release Date: 7/10/2007
5:44 PM

--
Richard S. Marken PhD
Lecturer in Psychology
UCLA
rsmarken@gmail.com