[From Bill Powers (971208.0249 MST)]
Rick Marken (971207.2210)--
While this version of the RM design does answer the PCT
objection to the use of group data (since it is really a
"testing specimens" approach to research) it still provides
no useful information about individual behavior (from a PCT
perspective) since it is still based on the conventional
IV-DV paradigm. This means that any observed relationships
between IV and DV -- even those seen for individual subjects --
reflect characteristics of the subject's environment, not of the
subject him or herself (that is, if the researcher thinks he or she
is learning about the behavior of the subjects in his or her
experiment, he or she is suffering from the behavioral illusion).
I think your use of "IV-DV" is imprecise -- you're using it as a code word
rather than with its strict meaning. As a code word, it means not just
"varying one variable and measuring its effect on another," but much more
narrowly, "Varying something that is actually a disturbance of a controlled
variable, and measuring the output of the control system, and interpreting
the result as if the IV were linked through a lineal causal chain, via the
organism, to the DV." It's primarily that interpretation to which we
object, not the other part of the process. We object to the model, not the
method.
When we speak of getting correlations in the 0.99 range, it is just this
"behavioral illusion" to which we refer. In a control system, the
disturbance is in fact one of the independent variables, and the output is
a dependent variable (the reference signal is the other independent
variable, which we customarily try to hold constant during the experiment
by asking the participant to maintain a constant goal-state). So our
procedure in showing
that the output is cancelling the effect of the disturbance is precisely to
hold "all else" (that matters according to the model) constant, varying an
IV, and measuring its relation to a DV.
The real difference between the PCT and conventional approaches is not in
the use of IV-DV analysis, but in the model. In the conventional analysis,
behavior is assumed to be a one-way function of all external influences
acting on the organism. That model, which we could call the "equilibrium
among forces" model, causes problems because in principle the entire
universe would have to be held constant in order to be sure of eliminating
unknown influences on behavior during an experiment while the one IV is
varied. This is what leads to the use of multiple determinations, multiple
subjects, and the use of statistical averages across populations and
individuals. The assumption is that if you can average across time and use
enough different individuals, the unknown influences will tend to average
to zero, leaving visible whatever measure of the IV-DV relationship can be
extracted from the mean relationships.
The same thing is done in all sciences; for example, when I was an
undergraduate I earned my keep in part by measuring the positions of the
(photographed) components of the double star 61 Cygni. Each measure was
repeated three times while the center of the image was approached from two
directions in each axis, and I measured some 8,000 images during that job.
The total number of images measured over the entire 10-year project by all
the people doing it was over a million. The theory here was that influences
on a single human measurement of position were unknown, but by averaging
over repeated measurements of each point by one individual, and over many
measurements by that individual, and over many results obtained by
different individuals, the random sources of error would average to zero.
And it worked. The orbit of 61 Cygni was determined with a mean position
error of a few thousandths of a second of arc, 100 times better than the
theoretical resolution of the 18.5-inch Clark refractor used to obtain the
images.
PCT offers us a great advantage in making predictions of behavior: we do
not need to hold the entire universe constant in order to determine the
effect of the disturbance on the output. All we need to hold constant is
the reference signal, and everything that could materially affect the state
of the controlled variable (except the output, of course). It's the model
that tells us there is a controlled variable, and that shows us how to
isolate and identify it sufficiently well for experimental purposes. If we
can just protect that one variable from all significant disturbances other
than the one we manipulate, and verify from the data that the reference
signal was reasonably constant, we can forget about all other influences on
the whole organism. We don't need to average them out, because according to
the model they can have no effects that matter.
Or we can put that differently: if there are any other influences on the
whole organism that have an important effect, we will soon discover that
when the behavior fails to fit the model with the expected accuracy!
The difference between the PCT and conventional models is obviously highly
important. Under the conventional model, there is no way to say in advance
what relationship should be seen between the IV and the DV, other than the
basic one of "effect" versus "no effect", assuming a linear relationship.
There is no way of selecting manipulations of IVs or ways of measuring DVs
that should reveal particularly clear effects. The basic rationale is "try
it and see what happens." And the result is pretty clear: large variances
in all measured relationships, so that multiple determinations and multiple
subjects MUST be used to see any effects at all.
The PCT model tells what variables to measure (we must identify a
controlled variable or at least propose one), and how to measure them (we
must convert measures of disturbance and output into an equivalent effect
on the controlled variable). And where we are able to satisfy the
conditions demanded by the model, we get relationships with a much lower
variance than those found through the traditional approach.
Actually, as I said some years ago and will now repeat, in statistical
terms, PCT only needs to give us two or three more standard deviations of
precision in order to make a qualitative difference in our ability to
predict behavior. In a traditional experiment, a good result is an effect
size that is twice the standard deviation of the measurements. That gives
us (I'm doing this from memory so correct me if I'm wrong) about 1 chance
in 20 that the effect appeared by chance. If we can improve our theories to
the point where effects are of the order of 4 times the standard
deviations, the odds against the result appearing by chance increase to
15,000 to one. At five times the standard deviation, the odds against
become 1.7 million to one.
So somewhere between 4 and 5 standard deviations, we cross into a new kind
of territory. A statement that has a 95% chance of being true (the
2-standard-deviation kind of truth) can be combined with no more than 12 or
so other statements if a conclusion that depends on all the statements is
to have a probability of truth equal to 0.5 -- the chance level. At 4
standard deviations, the argument can involve over 10,000 such "facts"
before the truth-probability of the conclusion drops to 0.5. And at 5
standard deviations the number of statements becomes over 400,000.
In short, when the predictions have a 4 to 5 standard deviation fit to the
data, we get the kinds of facts on which we could build a science; that
would permit drawing valid conclusions from logical reasoning of some
degree of complexity. With 2-standard-deviation facts, the scope of logical
reasoning is reduced enormously, to the point where it is hardly possible
to draw a valid conclusion.
The "stability factor" we sometimes use to evaluate control is roughly the
effect size measured in standard deviations. A stability factor of 5 is
quite low; a well- practiced controller can achieve stability factors of
anywhere from 9 to 15, depending on task difficulty. The statement that the
controller is in fact acting like a control system has such a high
probability of being true that the numbers are far off the end of the chart
in my handbook. The highest odds-against given is for 7 standard
deviations: it is 4 times 10^13.
I'm drawing all these figures out of my old Handbook of Chemistry and
Physics, and am not very sure about my reasoning. It would be nice if
others more familiar with such statistical concepts would see what they
come up with in trying to replicate my calculations.
Best,
Bill P.