[From Bill Powers (971120.0430 MST)]
Bruce Abbott 971119.2000 EST)--
It's not a defense of the application of group statistics to individual
characteristics, and that's where you and others on CSGnet are missing the
boat. It's a defense of the appplication of group statistics to the
identification of influences on behavior (casting nets).
It's an identification of (1) _apparent_ influences on (2) _group_ behavior.
If this apparent influence on group means were as strong as the apparent
influence of cooking temperature on cookie brownness (which you gave as an
example in another post) I would object far less. _All_ of the cookies get
more done than they started out, even at the lower temperature. The normal
result is for the cookies to vary in brownness across temperatures far more
than individuals vary from the mean brownness at each temperature. But this
is not normal for ANOVA results from psychological experiments. What you
end up with in most psychoological experiments is the equivalent of some
cookies getting _less_ done than when they were when they were raw dough.
In psychological experiments, what we usually observe when the independent
variable is changed is that some subjects show an increase in the dependent
variable, some show no change, and some show a decrease. If there are
slightly more people who show an increase than a decrease, the customary
conclusion is stated as if _all_ of the people showed a _slight_ increase.
If the increase is highly significant, it is simply said that there is an
increase, without mentioning its magnitude. The obvious assumption is that
there was in fact an effect in every person, but that uncontrolled
variables introduced random noise that masked the effect in any individual.
The problem is that there is absolutely no way to prove that that
assumption is justified. It is just as likely that some individuals show
the effect and others show the opposite effect, while the rest are not
measurably affected at all. However, to interpret the results this way
would make use of the hypothesis being tested untenable, because invariably
the hypothesis is stated as if it is meant to be true of _all_ people. If
you conclude that job satisfaction is inversely related to pay, you can't
blame others for interpreting this to mean that the opposite relation
doesn't occur with almost equal frequency. If you see this large variation
in the supposed effect, you have two choices: the hypothesis is true of all
people, but random variations make it difficult to verify, or the
hypothesis is false for about as many people as it is true.
Incidentally, I was asking about the actual numbers in that experiment for
real, not rhetorically. Can someone supply them?
I just wanted to
point out that this use of group methods is legitimate, even though it can
be taken only so far. It would be easy to conclude on the basis of Rick's
piece (and your followup to it) that these methods are worthless for this or
any other purpose. "The method that can most reliably produce an unending
output of superstitions is known as Analysis of Variance" is a case in
point. The problem lies not with the method but with the fact that its
results are often misinterpreted.
The method invites and almost demands misinterpretation. How about this
conclusion: when you give low pay for a job, the people who pride
themselves on doing a job for its own sake report higher statisfaction than
others who have no opinion on that subject. When you give high pay, _other_
people who consider monetary reward to be the index of satisfaction report
higher satisfaction than the others report, including the others would
would report greater satisfaction for lower pay. So the results are not
indicating any general human characteristics; the differing rates of pay
are selecting different subpopulations. While it may be true that there is
an inverse relation between pay and job satisfaction, that is true only of
some people, and for the rest it is false or the exact opposite of the
truth. And since you have no way to detect these subpopulations in advance,
the hypothesis is, in practice, useless for dealing with individuals.
In Rick Marken's special issue of the American Behavioral Scientist, I
published a little simulation study in which 4000 control systems produced
some degree of effort to produce some amount of reward in the presence of
varying costs (disturbances). The individuals had reference levels for the
desired amount of reward that were distributed over a range around a mean
value. For the whole population, the result was an apparent increase of
effort with increasing reward. With 4000 samples, this was a highly
signficant relationship.
This apparent relationship followed from the fact that individuals with
higher reference levels for reward had to work more to get the higher
levels of reward. But for _every individual_, the actual relationship was a
strong _decrease_ in effort with increasing reward. As the obtained reward
increased toward the reference level (due to random decreases in costs) the
behavior sharply _decreased_. So the apparent group relationship between
independent and dependent variables was as wrong as it could get as an
indicator of individual characteristics.
This paper, beside being about PCT, was intended as a cautionary tale. The
_apparent_ relationship you get from varying IVs and measuring DVs over a
population is NO INDICATOR AT ALL of the actual relationship between IV and
DV for any individual in the group, and can be completely wrong for ALL of
them, as in my example. There is simply no substitute for measuring
individual characteristics, one at a time.
The problem is that what you end up with is a characteristic of a group,
not of an individual. At any given time, some members of the group may show
the dependent effect and while others do not. At a different time,
different members may show the effect, while a different group does not.
There is absolutely no evidence that this effect is shown consistently over
time by any individual. To show that, you would have to study each
individual over time.
What is shown in such an evaluation is that behavior differed on average
across the levels of the independent variable, under the conditions of the
study, when other factors are unlikely to account for the difference. This
indicates that the independent variable very likely has some influence on
the behavior that was observed and recorded, in the sample of individuals
tested. It does not indicate that this variable was effective in every
individual tested, nor does it show that this variable would have an effect
consistently over time in the same individual. Nevertheless, it is a
variable having an influence on this dependent measure in enough
individuals, at the time of testing, to be detected by the procedure.
But there is no reason to think that it is a correct indicator of any
individual's characteristics, as I tried to show in my ABS paper. You're
echoing the standard rationale for statistical studies, and that standard
rationale is simply wrong.
If you're dealing strictly with populations, as would be the case for
educational institutions, insurance companies, government programs, and
market research organizations, then you don't care _why_ any observed
population characteristic exists. You simply take advantage of it, to your
profit. That's where casting nets is a valid approach. But if your success
in predicting population effects leads you to think you've learned
something about human nature, you're simply deluded. You've learned nothing
about any individual. To understand individual characteristics, you have to
use the method of testing specimens, and this means making and testing
models. If people share anything in common, it is at the level of
organization, not behavior. We are all control systems, but what we
control, and how, and to what ends, is almost infinitely variable.
And if you did study the individual over many trials,
you would soon find that _all_ individuals caught on to the fact that a low
evaluation led to high pay.
Let's at least get the experiment right. A low evaluation did not lead to
low pay. Low pay tended to lead to a higher evaluation, after the fact.
I haven't read the report, but even accepting your correction, what you say
is most likely false. Low pay did NOT lead to a higher evaluation after the
fact. It also led to a LOWER evaluation after the fact. You're simply
ignoring the counterexamples, treating them as meaningless statistical
fluctuations. That's the very assumption to which I'm objecting.
If you had to repeat this experiment with a single individual again and
again, the individual would perceive any pattern that existed. If there
were no pattern, that is if job performance were equally likely to be
followed by high or low pay, I would expect the individual to abandon any
initial bias. If there were a pattern -- an inverse relationship
predominated -- I would not be surprised if the pattern were discovered and
used to increase pay.
But all that's a minor quibble compared with the most glaring fault of this
supposed "evidence in favor of the theory." In fact, the theory is
disconfirmed by every individual who did not behave as predicted.
In this study there is _no_ individual who can be said _not_ to behave as
predicted. As I pointed out in my post, if a person in the low-pay
condition gave a low liking-rating, there is no way to tell whether it
wouldn't have been even lower in the high-pay condition (baring a floor
effect).
Then there is no individual who could be said to behave as predicted, either.
Again, my argument was not in favor of using group data to predict
individual behavior. What I argued is that group procedures can be used to
identify variables that are effective (in at least most of those tested in
the experiment, under the test conditions). As to whether a theory is a
theory or "no theory at all" if it cannot predict individual behavior, I
suggest that by your definition atmospheric physics offers no theory at all
of the weather.
That's a bad example: there is only one global weather system, and the
predictiveness of the theory can be judged only over many forecasts.
Anyway, modern weather theory is based more and more on models, which makes
it into an example of testing specimens.
I'm not arguing that the discovery of a relationship between levels of a
variable and average performance constitutes a universally-applicable Law of
behavior, although clearly you are imagining that I am. (As a reality check
you might try re-reading my post. You won't find any such argument there.)
What I find are statements like "low pay is followed by a high evalution."
If that doesn't sound like a universally-applicable law of behavior, I
don't know what does. How would that sound if you described the results
more truthfully? "Sometimes people will give a high evaluation to a job if
the pay is high, and sometimes they won't, and I can't tell you when the
one or the other result will occur."
What came out of this study was a relationship -- confirmed in a large
number of followup studies -- between two variables that can be observed in
more-or-less normal college students when other factors are statistically
equated. A lot more would have to be known about other interacting factors
before one would be in a position to use this information for individual
prediction.
In _all_ more or less normal college students? Obviously not. And anyway,
"confirming" this relation for more populations shows only that it exists
in populations. It is quite possible that the actual relationship in every
individual is different from or even opposite to the population effect. And
don't tell me that's impossible: I've proven that it's possible.
What we need to know are the chances of predicting incorrectly a person's
evaluation of a task for which the pay is high or low. To determine this,
we need to know how many people behaved in the expected way, and how many
in the other way. Those who understand statistics can then compute the
chances that a prediction will be wrong for a given individual.
This assumes that this variable alone would be used for this purpose,
without any further investigation of conditions under which, in the
individual, it would be expected to influence the person's rating of the
task in the predicted way. Geez, Bill, I _agree_ that it probably isn't
much use by itself. I never argued that it would be.
Then can we drop the idea that this experiment somehow "supports" the idea
of cognitive dissonance?
I am not defending any "heinous scientific crimes," Bill. I certainly do
agree, however, that there has been a lot of knee-jerking going on in
response to my post. At the beginning of your post you said you were
disappointed in me. If anyone has a right to feel disappointed, I do. I
expected a more careful reading on your part of what I had to say than you
evidently gave. You merely used my post as an excuse to trot out all your
old arguments against a position I wasn't even defending.
Perhaps I do you an injustice. Are you now agreeing that population studies
can't tell us anything of use about individual characteristics?
Best,
Bill P.