[From Kenny Kitzke (2007.07.29)]
<Bill Powers (2007.07.29.1422 MDT)>
<Actually I’m still in Minneapolis waiting to come home (tomorrow)
from the CSG meeting. Rick and I have talked several times about the
misclassification problem, if that’s the right thing to call it, and
are agreed that we want to be sure we get it right before reaching
any conclusions.
I’ll take the “preventing fatal heart attacks” theme as the basis for
constructing a thought-experiment. Here there is no question of
getting multiple determinations for each individual (as Martin T.
mentioned), since we die only once.>
I made it back from Minneapolis without a hitch and hope you did also. I am going to make some comments on your thought-experiment. They may help, but only you can assess that. 
<Given: a population with a certain incidence of fatal heart attacks
per year, say K per capita per year. In a sample of N individuals,
K*N are expected to die of a heart attack each year.>
First, K is an historic average of a population (usually you show a population average with a capital letter such as X with a bar on top). What K will be next year is acknowledged to be a probability prediction usually using a confidence interval. So, assuming the heart attack phenomena is not fundamentally altered next year by an epedimic, you would say something like, "It is 90% sure that the actual deaths next year will be K + or - say 0.06. What the confidence level and the interval are is rigorously determined from the historic distribution of the annual K’s.
Statistical laws are powerful like mathematical laws but they are different in nature. In mathematics, the equation y = x + 4 makes the true statement that for every x there is one unique y. But in statistics, for every x there is a unique distribution of y’s.
There are a couple of misleading messages in your statement <In a sample of N individuals,
K*N are expected to die of a heart attack each year>. First, it would be more accurate to say that with 90% confidence K(+ or - 0.06) of the population will die next year. Alternatively, you could rearrange the prediction to relate to the chance of any one person in the population having a fatal heart attack. This would most likely be such a small number that no one would pay much attention to it.
Second, the idea of applying population data to a sample N is essentially as meaningless as applying it to any one individual. Statistical predictions are ALWAYS a distribution not an individual outcome. Most use of statistical data analysis is to try to estimate population statistics from measured samples.
<Now suppose there is a treatment that is hypothesized to have either
of two effects:
-
It actually reduces the chance of a fatal heart attack by 10% for
every individual in the population. That is, if the whole population
is given the treatment, the incidence of heart attacks will become
0.9KN per year.
-
It actually reduces the incidence of heart attacks by 100% for 10%
of the individuals, and has no effect on the incidence of heart
attacks in the other 90% of the population. In this case, too, the
incidence of heart attacks will become 0.9KN per year.>
Your 1. has a strange twist in it. You would not give the treatment to every individual if you did not expect it to reduce each persons chance of a fatal heart attack, would you? The hypothesis is simply that of the entire population there will be enough of a reduction in risk to actually record 10% less deaths in the population during the next 12 months. On your 2., while you have posited an attribute variable, die or not die, it does not mean that there was no effect of the treatment for those who did not die.
<I claim that if the treatment is given to the whole population, these
two cases are statistically indistinguishable.>
I do not see why this matters?
<All we can say is
that KN of the population will die without the treatment, and that
0.9K*N will die with it.>
No, but you can say with some confidence that the treatment reduced fatal heart attacks by between x and Y %. That should help one decide if the treatment is worthwhile.
<We can’t tell if this equation should be
written (0.9K)N, 90% of the previous risk for 100% of the
population, or K(0.9N), 100 percent of the previous risk for for
90% of the population.>
Again, why does how it came about really matter?
<If my conjecture is right, it opens the door to a continuum of
statistically indistinguishable cases in which a treatment appears to
provide a B% reduction of risk to individuals in a population where
N% of the members are at risk. Is this the result of a B% reduction
in risk for the whole population, or of a 100% reduction in risk for
(100 - B)% of the population?>
This entails who is included in the population. People with healthy hearts would not be expected to have a significant reduction and would not be included? If only at risk people receive the treatment and the fatalities fall 10%, that would seek relevant. Even then it is not conclusive, that could have happened by chance. Chance must be elimated by statistical probability laws.
<It is also my contention that the most likely case is that a
treatment will benefit some proportion of the population for
physical/physiological reasons, and will not benefit the rest at all
for similar reasons. Problems are not caused by the chances of things
happening, but by specific physical effects that stem from regular
relationships among variables. The appearance of stochastic effects
arises primarily from errors of measurement, from the use of
incorrect models, and from lack of knowledge.>
Well, even if that contention is true, when one has a possible treatment, there is a way to use statistical laws to help verify its efficacy. My guess is there would be monthly data on fatal heart attacks. If you gave the treatment to the entire at-risk population from last year, it may take only a few months to confirm that the treatment is effective. You could then study the particulars of who still died to get a better picture of the characteristics of those who benefited. However, it is more likely that random samples would be used which would probably take longer to have the same confidence but cost far less.
Though the statistical laws can add value, they will not tell you which population person will live rather than die that year. It will still be a population statistical method that improves the odds that more will not die from heart attack Such improvement still is valuable. The bottom line is that statistical analyses is more valuable to separate chance from certainty by actual experiment than in trying to project the future be it for groups or individuals where changes in the system that generates the data can give errant predictions.
Best,
Kenny
Still EATing!
Best wishes,
Kenny
···
Get a sneak peek of the all-new AOL.com.