leakage; statistical ballparks

[From Bill Powerws (950102.1215 MST)]

Rick Marken (950101.1500 PST) --

There's one source of conflict you didn't consider in explaining your
"reorganization experience" -- simply having understood PCT. I think you
started doing experiments very soon, so you saw the phenomena and how
PCT explained them. Some people can grasp the ideas of PCT and set them
up as a separate systems concept that is kept apart from the older ideas
they already believe. So on Monday, Wednesday, and Friday they can think
PCT, and on Tuesday, Thursday, and Saturday they can think a different
way -- and on Sunday they go to church. Didn't someone once call this
sort of thing "logic-tight compartments?"

Looks like your compartments leaked.

···

-----------------------------------------------------------------------
Bruce Abbott (941231.1530 EST)--

     For example, consider all the work done by ethologists and
     comparative psychologists over the past 60 years or so on the
     stereotypical patterns of behavior of a large variety of animal
     species. The functions of many of these behaviors in the life of
     the animal, their development and dependence on experience, their
     necessary physiological conditions and sensory "triggers"-- all
     were observed and explored using many of the traditional methods of
     scientific inquiry detailed in my text (which, by the way, includes
     much more than just "IV-DV" methods).

I can't object to searching out regularities in nature -- it's only when
we discover such regularities that we need theories to explain them. My
only objection to traditional methods in this regard is that they are
too often used to justify accepting as facts observations of very low
quality.

There's a spectrum of facts that can be found in any behavioral science.
Some of them are so solid that we would be extremely surprised to find a
counterexample. Male Sticklebacks are said to attack any elongated forms
with the right patch of red on them, and as far as I know they all do
this, every one. But most accepted facts are just "trends" and
"tendencies" that exist only in population measures; if they were borne
out by five observations in a row one would consider that a run of good
luck.

To me, the quality of a science depends on the standards that are
applied in separating facts from nonfacts. If you demand that every
statement of fact fit observations 100.000% of the time, you won't have
a science because it won't contain any facts. If you demand that stated
facts fit observations only 50% of the time, you will have scads of
facts but you won't have a science because you won't be able to make
predictions any better than you could do by tossing a coin. I think you
can judge the maturity of a science by seeing where the line is drawn:
what proportion of counterexamples is considered enough to invalidate a
statement of fact. The better sciences place the line not far short of
100%; the least good ones put it close to 50%.

My main objection to the uses of statistics in psychology (aside from
the use of population measures to characterize individuals) is that they
provide powerful methods for allowing unreliable facts to be treated as
if they were reliable. If you guess that a treatment A is going to have
an effect on behavior B, and in fact it has such an effect in 51% of a
population, all you have to do is perform the experiment with a large
enough population and you will prove that the relationship actually
exists, p < 0.05 or better. But if you then use this statement of fact
as if it is true, you will find that for every 51 instances that support
it, there are 49 instances that go against it.

So what do we do with such "provable facts?" Can we reason with them?
Suppose we make a deduction that if A and B are both true, then C must
be true. Obviously, for C to have even a 50% chance of being true, A and
B must individually have a 70% change of being true (ignoring other
mixes of probabilities). If you're trying to show that D is true when it
depends on A, B, and C all being true, you need A, B, and C to have
truth probabilities of 0.794, and so forth. And that's just to make the
prediction as reliable as tossing a coin. What if you would like your
predictions to have a 90% chance of being true, or 99%? Obviously if you
want that degree of predictability, you are really going to have to be
very selective about what you consider a fact to be, or confine yourself
to conclusions that can be drawn from one or two facts.

I tried to do this calculation once before, but maybe you can help me do
it right. The facts, out of my Handbook of Chemistry and Physics are
these:

  1. The probability of a deviation of 3.5 standard deviations from the
mean is about 0.05.

  2. The probability of a deviation of 6 standard deviations from the
mean is 2e-7.

3. The second SD is 1.71 times the first one. 1/1.71 = 0.585.

Here's one way to reason about these facts -- you can probably come up
with a better one:

If we're using two measures like this to make a yes-no decision (is the
measure A different from B), and if the separation of the means between
which we're deciding is equal to 3.5 standard deviations of one mean, we
would be wrong 1 time in 20. If it were 6 standard deviations, we'd be
wrong one time in 5 million. So by reducing the standard deviation by
only 42%, we increase the truth value of saying that A is different from
B from 95% to 99.998%.

When I say we need facts that are 99.998% likely to be true if we want a
to build a science on them, people often throw up their hands and say
this is impossible in the behavioral sciences. But had I done this
calculation before I said that, I would have said "You need to reduce
the standard deviation of your observations by 40%". And people would
have thought, "Oh, that should be easy."

In fact both reactions are wrong. It is not impossible to achieve
99.998% probability of truth, and it is not easy to reduce the standard
deviation of observations by 40%. The reason it is hard to reduce the
standard deviation that much is that doing so requires completely
rethinking the theories behind the observations.

You can easily get extremely small standard deviations -- on the order
of 1/20 to 1/30 the measure of the variable -- in a tracking experiment,
but only if you know what relationships to measure. If you look at the
ratio of handle position to cursor-target separation, you will see
standard deviations on the order of 20 to 30% of the mean ratio. But if
you deliberately apply an independent disturbance to the cursor, the
same one being affected by the handle, and measure the ratio of the
disturbance to the handle position, you will get standard deviations on
the order of 3 to 5% of the ratio.

We're talking a ratio of 10:1 in the standard deviations here as a
percentage of the mean. One measure gives you a fact that is like most
of the better-quality psychological facts: saying that two people differ
by 30% in the ratio of handle movement to cursor-target separation has
about a 90% chance of being true on successive tests. If you use two
facts of this quality in an argument, the conclusion is 80% probable. If
you use five facts of the same quality, you'll do nearly as well by
flipping a coin.

The other measure provides a truth value in the 0.99999 range. If you
used five facts of this quality in an argument, the conclusion would
still be 0.99995% probable. If you used 25 facts, the probalility would
be 0.99975. This is the kind of fact that you need to build a science.

What I've just been through is probably very sloppily put and full of
mistakes, but I hope that the basic approach is clear. What we need are
measures that differ by 6 standard deviations, not 3.5 standard
deviations. That makes the difference between a science and (at best) a
proto-science. And achieving that difference means finding the right
ways to observe a system, not just any old way that give some sort of
measure of the behavior. When you have found the right way, you'll know
it: your ability to predict will improve by a factor of hundreds of
thousands. You will be playing in a different ballpark, the one where
grownups play.
----------------------------------------------------------------------
Best,

Bill P.