IV-DV

[From Bill Powers (930330.0800)]

Ken Hacker (930329) --

I understand and appreciate the point about rejecting the IV-DV
view for all human behaviors, but are there not some questions
about human behavior where they are useful -- if we take
causality out of the assumptions? For example, if I test 2
groups (which I will be doing in the fall) of students, one
with one type of learning program and one with another, and see
what differences there are in knowledge retention, recall,
etc., what is wrong with what I am doing?

I don't want to say there is something wrong with what you're
doing, but I would like to know what you ARE doing. As you
actually do experiments of the kind you describe, you can
probably answer some questions about it (I seem to be in a
question-asking mode left over from talking with Greg Williams).

If you do this experiment and find that there is a significant
difference in learning between the groups, to what will you
attribute this difference?

If one learning program is associated with a significant
improvement in learning, would you recommend using it in the
future, over the other?

My last question would probably be answered best using the
results of some study like this that you've already done. Since I
don't know what it is, I'll frame the question generally.

Assuming that there is some treatment that differs between two
groups, and that the different treatments yielded a significant
difference in performance between the two groups:

1. How many people took part in the study?

For each of the treatments:

2. How many people clearly performed better,

3. how many clearly performed worse,

4. and how many did not clearly perform better or worse?

···

--------------------------------------------------------------
Best,

Bill P.

FROM KEN HACKER [930330]:

Bill, I sense that you want me to say that I will attribute increased
recall or retention to the learning program. Aha. BUT, I am not doing
that. I am going to attribute whatever significant differences there
are to what the subjects appear to need to learn what I am trying to
teach them with the program. Essentially, the program is a self-paced
tutorial where students learn how to use computer conferencing. I am
trying to compare the program method (each student learns by him or
herself with a pc) versus a lecture-by-the-expert method of training.
You see, I am trying to find the best method of teaching-learning
from the point of view that the student has needs and those needs will best
be met by one method over the other (at least that's my hypothesis). KEN

[FROM: Dennis Delprato (9303310]

KEN HACKER [930330]:

Bill, I sense that you want me to say that I will attribute increased
recall or retention to the learning program. Aha. BUT, I am not doing
that. I am going to attribute whatever significant differences there
are to what the subjects appear to need to learn what I am trying to
teach them with the program. Essentially, the program is a self-paced
tutorial where students learn how to use computer conferencing. I am
trying to compare the program method (each student learns by him or
herself with a pc) versus a lecture-by-the-expert method of training.

You see, I am trying to find the best method of teaching-learning

from the point of view that the student has needs and those needs will best
be met by one method over the other (at least that's my hypothesis).

Profound. You show how actuarial research (necessary for addressing
questions such as which service delivery system will be more
effective over the long haul), although PROCEDURALLY following
the classic IV-DV experimental model as well as hypothesis testing
inferential statistics that behavioral scientists deftly integrated
with the former, need not be INTERPRETED in the conventional way.
As far as individuals, outcomes from actuarial research tell us only
about control systems. You will find out whether or not one
method is any more effective than the other at permitting the
AVERAGE member of your population to adjust to particular
circumstances according to particular criteria. Furthermore,
if your experiment is flawed by confounding according to
conventional definitions, you will not draw a reliable conclusion
regarding the relationship between your manipulated variable
and how it alone relates to adjustments of the average member
of the population.

You will make no claims about learning anything about individual
control systems (fundamental principles). The concern of the
study at issue is actuarial.

Dennis Delprato
psy_delprato@emunix.emich.edu

[From Bill Powers (930428.0700)]

General, on IV-DV:

The term "IV-DV" threatens to degenerate on this net into a
stereotype of an approach to human behavior. All that this phrase
means is that one variable is taken to depend on another and the
degree and form of the dependence is investigated experimentally.
This is a perfectly respectable scientific procedure. Want to
know how the concentration of salt affects the boiling point of
water? Keep the atmospheric pressure constant, carefully vary the
salt content, and carefully measure the boiling point. You can
find relationships like this throughout the Handbook of Chemistry
and Physics, and so far nobody has suggested anything
methodologically wrong with these tables and formulae.

If we're going to object to a procedure for investigating
behavior, let's not indulge in synecdoche, but say exactly what
it is about the method to which we object. There can be no valid
objection to the IV-DV approach itself.

The basic problem with the IV-DV approach as used in the bulk of
the behavioral sciences is that it is badly used; that bad or
inconclusive measures of IV-DV relationships are not discarded,
but are published. The basic valid approach has been turned into
a cookbook procedure that substitutes crank-turning for analysis,
thought, and modeling. The standards for acceptance of an
apparent IV-DV relationship have been lowered to the point where
practically anything that affects anything else, by however
indirect and unreliable a path, for however small a proportion of
the population, under however ill-defined a set of circumstances,
is taken as a real measure of something important, and is
thenceforth spoken of as if it were just as reliable a
relationship as the dependence of the boiling point of water on
the amount of dissolved salt.

While I was in Boulder, I spent some time in the library looking
through a few journals. By chance, I looked first through two
issues of the 1993 volume (29) of the Journal of Experimental
Social Psychology. With few exceptions, the articles were of the
form "the effect of A on B." One article went further: the title
was "Directional questions direct self-conceptions."

All of the articles rested on some kind of ANOVA, primarily F-
tests, and the justification for the conclusions was cited, for
example, as "F(1,82) = 7.88, p < 0.01." No individual data were
given; it was impossible to tell how many subjects behaved
contrary to the hypothesis or showed no effect. There was no
indication, ever, that the conclusion was not true of all homo
sapiens.

I suppose that a person who understood F-tests (how about some
help, Gary) might be able to deduce the number of people in such
studies who didn't show the effect cited as universal. Even I
could see, in some cases, that there had to be numerous
exceptions. For example, paraphrasing,

Subjects covertly primed rated John less positively (M = 21.32)
than subjects not primed (M=22.78). Ratings were significantly
correlated with the independent "priming" variable: r(118) =
0.35, p < 0.001.

[Skowronski, J.J.; Explicit vs. implicit impression formation:
The differing effects of overt labeling and covert priming on
memory and impressions." J. Exp. Soc. Psychol _29_, 17-41 (1993)

When means differ by only 1.46 parts out of 22, it's clear that
many of the 120 students must have violated the generalization,
so this conclusion would be true of something close to half of
the students. The coefficient of uselessness is 0.94, showing the
same thing. The authors are teasing a small effect out of an
almost equal number of examples and counterexamples. In another
study, "When warning succeeds ... " a rating scale ran from -5 to
5, and the mean self-ratings for one case were 0.89 and in the
other -0.92. A large number of the subjects must have given
ratings in the opposite order from the one finally reported.

So what we're talking about here is not a bad methodology, but
bad science based on equivocal findings.

The IV-DV approach is not incompatible with a model-based
approach or with obtaining highly reliable data. In the Journal
of Experimental Psychology - General, I found a gem by Mary Kay
Stevenson, "Decision-making with long-term consequences: temporal
discounting for single and multiple outcomes in the future" (JEP-
General, _122_ #1, 3-22 (1993). Mary Kay Stevenson, 1364 Dept. of
Psychology, Psychological sciences building, Purdue University,
W. Lafayette, Indiana 47907-1364:
mksb@psych.purdue.edu.

This paper used old stand-bys like questionnaires and rating
scales, but it had some rationale in the observation that during
conditioning, delaying a consequence of a behavior lowers the
strength of the conditioning. It also freely postulated a
thinking organism making judgements -- this was actually an
experiment with high-level perceptions. Moreover, there was a
systematic model behind the analysis, and an attempt to fit an
analytical form to the data rather than just do a standard ANOVA.

Furthermore -- oh, unheard-of procedure -- Ms. Stevenson actually
replicated the experiment with 5 randomly-selected individuals,
fitting the model to each individual's data and verifying that
the curve for each one was concave in the right direction.

The mathematical model predicted between 97 and 99 percent of the
variance in the data.

I didn't have time to read the article carefully, but it
certainly seemed to show that high standards were applied and
that an IV-DV approach can yield data that anyone would call
scientific. All that's required is that one think like a
scientist. A LOT of work went into this paper. If only papers in
psychology done to this standard were published, all the
different JEPs would fit into a single issue.

In JEP-Human Perception and Performance, there was a good
control-theory experiment:

Viviani, P. and Stucchi, N. Behavioral movements look uniform:
evidence of perceptual-motor interactions (JEP-HPP _18_ #3, 603-
623 (August 1992).

Here the authors presented subjects with spots of light moving in
ellipses and "scribbles" on a PC screen, and had them press the
">" or "<" key to make the motion look uniform (as many trials as
needed). The key altered an exponent in a theoretical expression
used to relate tangential velocity to radius of curvature in the
model. The correlation of the formula with an exponent of 2/3
(used as a generative model) with the subjects' adjustments of
the exponent was 0.896, slope = 0.336, intercept 0.090.

This is just the kind of experiment a PCTer would do to explore
hypotheses about what a subject is perceiving. By giving the
subject control over the perception in a specified dimension, the
experiment allows the subject to bring the perception to a
specified state -- here, uniformity of motion -- and thus reveals
a possible controlled variable (at the "transition" level?). The
authors didn't explain what they were doing in that way, but this
is clearly a good PCT experiment. Even the correlation was
respectable, if not outstanding (the formula was rather
arbitrary, so it should be possible to improve the correlation
considerably by looking carefully at the way the formula
misrepresented the data).

There is a world of difference between the kinds of experiments
reported in J Exp. Soc. Psych and the two described above (and
between the two described above and most of the others in JEP).

From good experiments, even if one doesn't buy the

interpretation, one can go on to better experiments. From bad
experiments there is no place to go: you say "Oh" and go on to
something completely different.

···

------------------------------------------------------------
Best,

Bill P.

[From Dag Forssell (930528 16.20)]

_Scientific American_ June 1993 features an article on "The dubious
link between genes and behavior".

"The century-old idea that genetics can explain, predict and even
modify human behavior is back in vogue....... But some of the findings
have been retracted, and critics charge that the others are based on
flimsy evidence."

I don't see that the article is against IV-DV as such, but it sure is
critical of _Science_ editor Daniel E. Koshland, Jr. The article features
"Behavioral Genetics: A Lack-of-Progress Report" that debunks findings
on Crime, Manic depression, Scitzophrenia, Alcoholism, Intelligence,
and homosexuality.

Why not offer Little Man to _Scientific American_. They can do a good
job of exposing PCT to a large audience of people without vested
interest in being right.

Best, Dag

[From Bill Powers (951112.0900 MST)]

Bruce Abbott (951112.1015 EST) -- (replying to Gary Cziko) --

     If your disturbance produces actions, you have disturbed a
     controlled variable, no? So this method IS telling you something
     valuable.

It's telling you that something about the IV (the disturbance), or what
you are doing while varying it, is affecting a controlled variable, and
that something about the DV (the action that you're measuring) is
affecting the same controlled variable in the other direction. However,
if you're not measuring the IV and the DV in the dimensions that are
actually related to the controlled variable, you may be led to express
the relationship in a way that is irrelevant to both effects on the
controlled variable. You can't tell whether you're measuring the IV and
DV correctly unless you know what the controlled variable is (if there
is one, which I assume here to be provable).

If you were measuring the IV and the DV correctly, and they are actually
related to some controlled variable, you should find that the
correlation comes close to a perfect -1.0. To the extent that your way
of measuring them is not appropriate to the controlled variable, you
will find that the correlation will be considerably less than 1.0 and
may be positive or negative, depending on how you defined the
measurement scale (e.g., assigning a plus sign to pushing instead of
pulling).

The IV-DV approach can get you started looking for a controlled variable
if you understand IV and DV as disturbance and action. However, what you
initially define as IV and DV may have to be radically changed before
you can say what the actual controlled variable is. So IV-DV analysis
has no necessary relevance to behavioral organization, especially if you
take your initial definitions as the final ones.

I have used the example of seeing that when a window is opened (IV), a
man puts on a sweater (DV). This might prove to be a statistically
significant relationship, but what does opening a window have to do with
putting on a sweater? This is just an empirical, and as it stands
baffling, fact. If you're selling sweaters you might be able to take
some advantage of it, but if you want to understand behavior it's
useless. It's just another fact to file away against the day when those
future superscientists finally Explain It All.

To make sense of this behavior you have to figure out what it is about
opening the window, and what it is about putting on a sweater, that
matters. What matters is not to be found in either the IV or the DV: it
is to be found in the controlled variable, the temperature of the man's
skin. Once you understand what the controlled variable is, you know what
you need to measure about the IV and the DV. For the IV, you now know
that what matters is not the rate at which the window opens, or the
number of centimeters by which it is open, or the sounds that come
through the opening, or whether it is the front or side window that is
opened, but the effect on the room air temperature. And for the DV you
now know that it is not the weight of the sweater or its color or its
stylishness or the way it buttons or slips over the head that matters,
but its insulating properties. You now understand what matters about the
IV and DV, and you also understand WHY these aspects of each matter. You
also understand the exceptions to the rule, the cases where opening the
window does not result in putting on a sweater, or where opening it
results in the man's closing it again, or putting on a coat, or lighting
a fire in the fireplace -- other "responses" which are seemingly very
different until you understand their effect on the controlled variable.

And finally, you can now predict many other IV-DV relationships that you
have never seen before. Anything that tends to lower the temperature of
the man's skin will be an IV that seems to cause any one of many
responses that tend to raise the temperature of the man's skin. You have
brought a whole family of IV-DV observations, those observed and those
not yet observed, under a single explanatory principle.

What the IV-DV approach gets you is a random collection of apparent
cause-effect relationships. In trying to make sense of these
relationships, the usual procedure is to generalize, to try to find a
class of IVs that tends to result in another class of DVs, as if it is
something that the IVs have in common that makes them effective in
causing something that the DVs have in common. Without control theory it
is very unlikely that the real explanation will be found: that the IVs
all affect the same controlled variable, and the DVs affect the same
controlled variable in the opposite direction.

···

-----------------------------------------------------------------------
Best,

Bill P.