Good data, bad data

[From Bill Powers (920624.1500)]

Martin Taylor and Rick Marken and Bill Powers (920624) --

RE: quality of data

Yeah, let's pick at this scab a little more, but let's try to minimize the
bleeding. Some bleeding, I think, is unavoidable.

Martin Taylor (920624.1300) --

I did an experiment with aftereffects of screens full of moving dots, with
Pat Alfano (last year). Same principle as yours: the subject moved a
control handle, immediately after the adaptation period, to make the dots
appear to stand still for a few minutes. We got beautiful curves showing
the decay of the illusion for lateral, vertical, rotational, and
convergent/divergent movement, showing quantitative effects of speed before
adaptation. I just matched an exponential to the data -- no analysis as
fancy as yours. The point was not so much to do a fine analysis as to look
for differences between people with and without motion disorders.

Just as Pat was about to start her PhD thesis using this experiment, BOTH
The offset of the illusions occurred about a week apart for us, mine
disappearing first. As far as I can tell, nothing about the apparatus
changed, although I beat my brains out trying to find some difference. Pat,
incidentally, came out of this essentially cured of the debilitating motion
disorder that had prompted her to do the study in the first place!

I really think we both ended up reorganizing, through too many hours of
familiarity with the experimental situation. So your warnings about the
hazards of experimentation fall on receptive ears, here. Pat, by the way,
had to go to a completely different approach, poor thing. Graduate students
should stay away from control theorists.

Well, on to the real subject.


Here's my main objection to statistically defined phenomena, in a nutshell.
If someone comes up with an effect and I decide to try to model it, the
model I produce will embody the mechanism by which I propose that this
effect could be produced. So by its nature, the model predicts the behavior
of ALL subjects in a given experiment.

If, however, I then find that in fact, only a majority of subjects show
this effect, while one or two or twenty show some quite different behavior
under the same conditions, where does this leave the model? Now the model,
instead of fitting the data, predicts clearly wrongly for a substantial
proportion of the subjects. The model, in other words, has failed. Fixing
the model to fit all the data might require only a small change -- or more
likely, it may require completely abandoning the basic premise and starting
over. Even one clear counterexample is enough to do in a model, although if
there's really only one, and it can't be reproduced, I'll admit that I
might hang onto the model a little longer.

If, however, a model CONSISTENTLY fails to account for some even small
number of observations that it's supposed to predict, there's no choice but
to track down the reason and modify the model accordingly, or scrap the
model and start again from a different set of assumptions. If the data
themselves are irreproducible to some extent, then modeling is futile.

In your 10:30 post you say

Once you get to a reasonably high level of the hierarchy, it is quite
possible that the experimenter has no ECS that controls exactly the >same

complex environmental variable (CEV) that is being controlled by >the
subject. If so, the experimenter cannot disturb the CEV in a known >way,
and cannot precisely assess the subject's control.

This may be true, although until we actually do PCT experiments it may be
premature to borrow trouble. My question is what one does about such a
situation when it's encountered. Your answer is one I've heard many times

Publication of the study provides some data for other scientists, who >may

be better able to control (disturb) the real variable. The study >is not a
blind alley, but a probe into the hills where some gold is >found, if not
the motherlode.

I think this is a highly idealized version of science. If scientists in the
behavioral sciences actually did take equivocal results from other people,
replicate their experiments, and eliminate sources of variability bit by
bit, this might be a reasonable approach. But in fact that's not what
happens. Replication is all but unheard-of -- everyone wants to do his or
her own jazzy experiment, not slog along cleaning up behind someone else a
la Bullwinkle. Even when replications are published, there's always some
critical change in conditions, methods, subjects, or something -- the
temptation to improve on the original is apparently irresistible. And as to
trying to reduce the unaccountable variability -- well, have you EVER seen
a study like that? Have you ever seen a study in which somebody said "Gee,
X's experiment left 20 percent of the variability unaccounted for. So I
tried to find out where it came from, and now only 10 percent is
unaccounted for." Maybe you have; you read more of this stuff than I do.
But my strong impression is that once a study has been published, with its
findings pronounced statistically significant, all variability vanishes
from the description of the phenomenon, and from then on, as far as the
rest of the scientific world is concerned, the phenomenon has been
established as a fact true of all subjects who fit the population
description. The more interesting and striking the result, the less it
matters that large numbers of people don't fit the description. Tell me
this isn't true, if you dare.

That's one objection: elevation of preponderances to universals.

Another objection I have is simply that the levels of correlation accepted
in the literature (Gary Cziko has lots of data on this) are abominably low.
Rick cited a study on VOR in which r = .34 was the basis for reaching a
conclusion about how "people" speak. Yet if the AVERAGE VOR was taken as
the predicted VOR for all people in the group regardless of treatment,
prediction of an individual's behavior would be sqrt(1 - r^2) or 94% as
accurate as would be a prediction based on the supposed effect. I simply
refuse to accept this sort of thing as data. I'm supposed to come up with a
general model of behavior that makes predictions that are 94% useless? This
whole VOR business could be the result of the way people move their mouths
from one configuration to another, and differences in this manner of
producing output could have no importance at all.

Rick Marken said this well, but I'll say it, too. The behavioral sciences
have simply grown into the habit of accepting data that are completely
inadequate for science. They've blamed variability on their subjects
instead of blaming a wrong concept of what's happening.

I know that it's always possible to make excuses for bad data, even in
control theory. You can say "Well, maybe the subject was controlling for
some other things that interfered with the experiment." This may indeed be
true -- but you've offered an hypothesis, and now it's up to you to show
that this is in fact why the data were bad. WHAT other things was the
subject controlling for? Do the experiment, show that this is in fact the
right explanation. And then fix the model so it explains, instead of making
excuses for it. I'm not going to accept even an impeccably phrased HPCT
excuse for poor predictions as a reason for keeping a model of a specific
behavior. I say go away and work on your model until it DOES predict
correctly. And for heaven's sake, don't publish until it does. Why clutter
up the literature with junk before you can convince others that you have it
right? All this stuff about leaving hints for others to follow up sounds
very nice, but the most likely fate of such work is to be forgotten the
instant after appears in print -- if it's ever read at all.

(2) For any CEV being controlled by the subject, there are probably >other

CEVs not the object of experiment but also being controlled by >the subject
and being disturbed by the environment. These other CEVs >will, in all
probability, induce conflicts in the hierarchy, which show >up as noise in
the control function. The experiment will show reduced >correlation.

You see? Even you do it. What other CEVs? Find them, test them, and get rid
of the variability. Don't just accept a "reduced correlation" as the
inevitable consequence of things we can't help. If there's a reduced
correlation, you can't use the hypothesis as a fact in any grown-up
scientific argument: bad data is bad data no matter how much it's not your
fault. If an hypothesis is not very likely to be true in any given test of
it, it won't get any more true by being used in a deductive argument.

As I said, I think you're borrowing trouble. When you think up a sound PCT
experiment, you're not going to find a lot of interfering variables
reducing your correlations to where you have to explain why they're so low.
When you hit on the right way of doing the experiment, you're going to get
good data. I mean you, generic. If no matter how you try, the data still
come out bad, then you've got the wrong idea or you're into something more
complex than you can deal with at our present stage of understanding. Do
you think Galileo was ready to explain how compasses work? There are some
things that just have to wait a while until we build up a base of solid

My attitude is this: let's explain what we can explain, and not lower our
standards just to appear wise about things we don't understand yet. As Rick
said, it's all right to say "I don't know."

Let's explain simple aspects of behavior with high precision. In that way,
we will leave behind something on which others CAN and WILL build. The
longer we or our descendants stick to this principle, the greater the
cumulative effect will be and the more complex will be the behaviors we can
confidently and accurately explain. The real sin in the behavioral sciences
has been the pretense of knowing what nobody actually knows yet. Go ye and
sin no more.

Bill P.

[Martin Taylor 920625 18:30]
(Bill Powers 920624.1500)

Sorry I can't reply properly to your postings in response to my flood of
yesterday. I appreciate them, and will try to get to them over the weekend.
Today, tomorrow, and Monday I have meetings all day.

But one point quickly...

Just as Pat was about to start her PhD thesis using this experiment, BOTH
The offset of the illusions occurred about a week apart for us, mine
disappearing first. As far as I can tell, nothing about the apparatus
changed, although I beat my brains out trying to find some difference. Pat,
incidentally, came out of this essentially cured of the debilitating motion
disorder that had prompted her to do the study in the first place!

Wonderful! You were controlling the percept, weren't you? Have you ever
had a motion aftereffect after driving a car? Actually, I'm not entirely
sure that controlling is necessary for the aftereffect to disappear, since
non-drivers learn not to see the aftereffect of forward motion. But I
suspect that drivers lose it more quickly (but perhaps the control is of
the zero motion encountered when you get out of the car and have to stand
up; that's more like the experimental condition, and is the same for drivers
and never-drivers). Check and see if you get an aftereffect of motion if
you look out of the back window of the car for the length of a drive (with
someone else driving).

Sorry to leave aside the interesting discussion on experimental method. I do
hope to get to it, because I have an interesting blend of agreement and
disagreement with you.


[From Bill Powers (960905.0800 MDT)]

[Hans Blom, 960903d]

Replying to Mary, you said

What do you mean by GOOD data? Data is data. Data call for
explanation. Collecting only GOOD data implies to me that you only
want to collect data that confirm a model that you already have. That
is bad science. The data themselves should lead you to a model. If
that model confirms your earlier beliefs, great. If you reject the
data as NOT GOOD data, that means that your model is not good,
because it does not fit the data. In that case, you'd better reject
the model than the data, or at least admit that your model cannot
explain THESE data and that a more complex model is needed. And let
the data point the way to a better model.

I'd really like to observe while you do an experiment. If Galileo had had
your attitude toward data, he might have proceeded very differently.

       " Let's see -- I want to find out how things roll down inclined
planes. So I need some sort of board-thing; ah, here's a log lying in the
back yard, and a rock to prop it up on. Just snap off a few of these twigs
and we're all set. Now some ball-things to roll down it -- just the ticket,
my wife's darning-egg! Big end down or little end down? Eh, what's the
difference, it will still roll. Now something to time it with. Hmm. Swing
the chandelier? Wait, what's that? A line of ants coming out of a crack in
the paving stones! Just what I wanted. I'll count ants while the egg rolls
down the log. Here we go! One ant, two ants -- oops, the egg fell off the
log when it hit that knot. Well, that's Nature for you, just let me write
down the result of that run: two and a half ants. Better run it again to be
sure: one ant, two ants, three ants, four ants, five ants, six ants, seven
ants .... come on, come on, I know there are more ants in there .... ah,
eight ants ... nine ants ... teneleventwelvethirteen. Guess I'd better
joggle that log a bit, the egg seems to be leaning against something. There.
Twelve ants, done. Gosh, Nature sure is variable, isn't it?"


Bill P.