Using Group Data to Test Models of Individuals( was Re: Bill off net)

[From Rick Marken (2007.08.08.1900)]

Bill Powers (2007.08.08.1350 MDT)--

How about getting out one of those journals and going through a few issues
of it? Maybe Jeff can suggest one. I really think you're right -- it's a
rare paper that doesn't apply group data to individuals -- but we need some
concrete cases before us to focus the discussion.

I went to the APA journal site at:

and this is the title of an article from Behavioral Neuroscience that
a feature article at the site:

  "Exercise and Mental Stimulation Both Boost Mouse Memory Late in Life"

If you look at the article you will find that this is not true for
every mouse; this is an average result being pitched as something that
is true every individual mouse.

Here's the title of the first article listed in this months JEP:HPP:

  "Initial scene representations facilitate eye movement guidance in
visual search"

I have not read the article but I'll bet that this finding is true
only on average: initial scene representations probably didn't
facilitate eye movement guidance for everyone.

I would be surprised if Jeff could find one example of group research
in psychology where the researchers were careful to make it clear that
their results are true at the group level only; they say nothing about
how individuals work.

Best

Rick

I went to the APA journal site
at:


http://www.apa.org/journals/

and this is the title of an article from Behavioral Neuroscience
that

a feature article at the site:

“Exercise and Mental Stimulation Both Boost Mouse Memory Late
in Life”

If you look at the article you will find that this is not true for

every mouse; this is an average result being pitched as something
that

is true every individual mouse.
[From Bill Powers (2007.08.09.0150 MDT)]

Rick Marken (2007.08.08.1900) –

Actually, I see no reason to suppose that their conclusions are true of
ANY mouse in the study. The problem is not just the statistics (presented
in as uncommunicative a way as humanly possible), but the leaps of
interpretation about the meaning of the various treatments to a mouse.
Mental simulation? How do you measure that? Answer: you don’t, you assume
it. Reminds me of a Carver and Scheier study in which they
“measured” self-awareness by the presence or absence of a
mirror in the room. Applying statistics to a study like that is futile;
there’s nothing to study in the first place.

Best,

Bill P.

[From Rick Marken (2007.08.09.0830)]

Bill Powers (2007.08.09.0150 MDT)--

Rick Marken (2007.08.08.1900) --

If you look at the article you will find that this is not true for
every mouse; this is an average result being pitched as something that
is true every individual mouse.

Actually, I see no reason to suppose that their conclusions are true of ANY
mouse in the study. The problem is not just the statistics (presented in as
uncommunicative a way as humanly possible), but the leaps of interpretation
about the meaning of the various treatments to a mouse.

Hey, I'm just trying to deal with one problem at a time;-)

Jeff said that there are all kinds of examples of psychological
research on groups where the researchers make it clear that the
results are relevant only to the group, not necessarily any individual
therein. So I went to the psychological research literature and tried
to find examples of group level data in psychology used to study only
groups (as is appropriate) or used to study individuals (which is
not). I quickly found 2 examples of studies where the group level data
is taken to tell us something about individual level processes. So
it's now:

Inappropriate use of group data 2
Appropriate use of group data 0

Maybe Jeff can even out the score or tilt the balance toward "appropriate".

Best

Rick

···

--
Richard S. Marken PhD
Lecturer in Psychology
UCLA
rsmarken@gmail.com

[From Jeff Vancouver (2007.08.09.1215 EST)]

> Bill Powers (2007.08.09.0150 MDT)--

>> Rick Marken (2007.08.08.1900) --

>> If you look at the article you will find that this is not true for
>> every mouse; this is an average result being pitched as something
that
>> is true every individual mouse.

I skimmed this article. Given the discussion, I found it interesting that in
the results section the authors were careful to say group instead of mice,
while in the discussion they referred mice. At the same time, they often
said the results "suggested" this or that. I employ my students not to say
results suggest something; only people do. Nonetheless, I see this
grammatical error all the time. I know what they really mean and I doubt it
really leads to misinterpretation. That is, it is probably no big deal. What
is my point? The argument is what interpretations researchers and readers
are making of data and analyses. Thus, it seems the question is not how you
and I would interpret the text because I interpret the researchers as being
largely aware of what the data could mean (i.e., mean differences does not
mean all the mice in the one group differed from all the mice in another
when a mean difference was statistically significant) whereas Rick and Bill
seem to interpret the opposite. That is, the question at hand is how are
these statistic interpreted by those in the target population (e.g.,
psychological researchers). That requires asking the interpreters (not just
us). To study that I might suggest constructing a questionnaire that asks
question about an article that differentiate these two types of
interpretations (or other types or degrees of interpretational issues). For
example, we might ask a reader of the paper in question, which is the most
accurate interpretation of results of related to Figure 4, panel D)?

a. All the young mice in the toy condition swam slower than all the other
young mice.
b. On average, the young mice in the toy condition swam slower than all the
other young mice.

Or
True/False
___ a. No young mouse in the toy condition swam faster than any young mouse
in the control condition.
___ b. Some young mouse in the toy condition might have swam faster than any
young mouse in the control condition.

We would have to agree on the questions and what would be most diagnostic,
but otherwise we are all just extrapolating our belief about how others
would interpret these reports. It is my contention that most (but not all of
course) researchers know better. But that empirical question cannot be
answered by reading what they say because shortcuts (like saying the data
suggest) have permeated scientific writing.

> Actually, I see no reason to suppose that their conclusions are true
of ANY
> mouse in the study. The problem is not just the statistics (presented
in as
> uncommunicative a way as humanly possible), but the leaps of
interpretation
> about the meaning of the various treatments to a mouse.

Hey, I'm just trying to deal with one problem at a time;-)

The leap you refer to is the construct validity problem. That is, we
psychologists are often dealing with the problem of indirect measures or
manipulations of the underlying latent constructs we hypothesize are
involved. Indeed, when one infers the reference level from the TCV, one is
indirectly measuring that parameter. Otherwise, one gets dangerously close
to radical behaviorism, where one is restricted to a science of only the
observable. Now that that perspective has been largely rejected, we have to
be concerned that our indirect measures are contaminated or deficient.
Related to the issue above, writing differs on this. Some are careful to
speak of the variable (measure or manipulation) and not the construct when
discussing the result. Much more common is for researchers to refer to the
construct (like the article example) and maybe (if a reasonable possibility)
refer to construct validity issues in a limitations section. A lot depends
on reviewers and editors. I just had a paper accepted where in an earlier
version I had devoted a lot of discussion to the possible limitations of my
measures and manipulations (largely because reviewers had asked for it).
Yet, for this last journal (versions of it had been rejected from 4 journals
previously), the editor and reviewers thought this discussion was
unnecessary and just made the paper longer than necessary. So I deleted it.
I chock this up to differences in what the reviewers/editors saw as
"reasonable" possibilities. It would not surprise me if someone questioned
the findings of that study by questioning the construct validity of one or
more of the measures/manipulations. That would lead to a study addressing
the issue with their measures and results. So it goes. We are always
questioning each other's variables, interpretations, generalizability, etc.
That is what we do: consider explanations and alternative explanations for
findings, each trying to promote (or undermine) some theory or another.

Jeff said that there are all kinds of examples of psychological
research on groups where the researchers make it clear that the
results are relevant only to the group, not necessarily any individual
therein.

I did not say it this way. For example, I would say that if one found mean
difference between groups then it must mean that at least one individual in
the one group differed from at least one individual in the other group (I
think what you meant to say was not necessarily EVERY individual therein,
not ANY). Note, that when Bill says he thinks the mice finding might not
refer and ANY mice, I believe he is talking about the construct issue, not
the variable (group defining) issue. That is, he is not saying that it is
possible that no young mouse in the toy condition swam slower than any young
mouse in any other condition; he is saying that it is possible that no young
mouse who had a more mentally stimulating environment swam slower because
toys might not translate into mental stimulation. This later case is
possible. How reasonable would be subject to the question of what other
thing in the mouse/environment system might toys be affecting. That is, the
reasonableness of interpretation is judged against alternative explanations.
Unfortunately, alternative explanations are limited to what
reviewers/editors/researchers can think of at the time. But what can you do?

So I went to the psychological research literature and tried

to find examples of group level data in psychology used to study only
groups (as is appropriate) or used to study individuals (which is
not). I quickly found 2 examples of studies where the group level data
is taken to tell us something about individual level processes. So
it's now:

Inappropriate use of group data 2
Appropriate use of group data 0

Maybe Jeff can even out the score or tilt the balance toward
"appropriate".

See above (i.e., I do not think the data you present is very relevant to the
issue at hand).

Although I am hesitant to do this (because it will mean even more of my time
is spent arguing with you all rather than trying to promote PCT to the
psychological community), I am inclined to submit our paper referenced above
as an example. The curious thing about that paper is that I discuss some of
the issues related to this debate within it. My position was similar to
yours. That is, we needed to look at individual models. I had 2 studies
where we examined these models in some detail. We were trying to compare
four possible empirical models of what an individual was doing. But we
rejected many individuals because they did not act consistently during the
study, so none of the models would have applied. This bothered reviewers.
However, there was one study where we had a between-person manipulation. In
such a case, one cannot remove individual subjects because that would create
a mortality threat to internal validity. Thus, we included all subjects. The
paper that was finally accepted is this one where no subjects were dropped.
The effect we found was smaller (because as with the other study, some were
not following instructions/trying to do the task well/or whatever), but
still diagnostic in terms of address our research question. Nonetheless, I
think it violates the prescriptions you two are suggesting (we look at
individual empirical [not process] models, but we never present individual
parameters, only average parameters, mostly using analysis and statistics
that the uninitiated will find hard to follow). In other words, you will
find a lot to complain about.

So if you want the paper, please send me your email (it is copyrighted to
APA, so I cannot post it to the list). It is not yet in proof stage, so you
might not want to print it (it is still in double space form).

Jeff V.

[From Bill Powers (2007.08.09.1510 MDT)]

Jeff Vancouver (2007.08.09.1215 EST) –

If you’re just trying to see if there is a phenomenon somewhere in
the group of mice, group statistics can be useful. But a basic question
always has to be asked: why is it so hard to find regular relationships?
Could it be that the researchers are not looking where regular
relationships are to be found?

If I were advising graduate students who need to do a publishable study,
I would tell them to look for things the mice are controlling, not for
ways they react to stimuli. I would tell them that apparent reactions to
stimuli are probably illusions; any reactions that are observed are
almost certainly opposing disturbing effects the so-called stimulus is
having on some controlled variable. And the relationships they
hypothesize are generated out of their imaginations; only a very lucky
guess will get them even to the fringes of a real relationship, if they
don’t know what is being controlled.

I think that every PCT experiment I do produces correlations in at least
the 0.9s, and usually it takes only small adjustments of the conditions
to get them much higher than that (one thing that greatly improved
control data was to hold the mouse so the fingers don’t touch the table.
Sensations from the fingers provide variable amounts and kinds of
feedback that introduce uncertainties and distract attention from the
display).

If PCT is right, and control is the principle behind every behavior, then
why bother looking for anything else? Of course you can always try
something else, but if your correlations then drop from the usual 0.95
down below 0.8 or so, you can well conclude that the paydirt is in
control phenomena.

The final thing I would tell the graduate students is that if they can’t
get their correlations over 0.9, then they’re probably barking up the
wrong tree. They should junk their hypothesis and start over. It’s better
(and easier) to do this early than after committing a month to trying to
make the experiment work. If it’s going to work, it will do so right
away.

Best,

Bill P.

[From Bill Powers (2007.08.09.1540 MDT)]

Rick Marken (2007.08.09.0830) –

I guess I’ll have to try the library to see some of those journals. All
the on-line journals want to charge me to see the full text.

Best,

Bill P.

[From Rick Marken (2007.08.09.2140)]

Bill Powers (2007.08.09.1510 MDT) --

If you're just trying to see if there is a phenomenon somewhere in the
group of mice, group statistics can be useful. But a basic question always
has to be asked: why is it so hard to find regular relationships? Could it
be that the researchers are not looking where regular relationships are to
be found?

This is a nice way to put it. Psychologists should just start looking
in the right place for regular relationships and that means looking
for the variables organisms control. But even looking in the right
place like this will not reveal regular relationships if the search is
done using groups of subjects.

On that note, I should mention that I realized that psychologists come
to conclusions about individual behavior based on group data because
that's what they are supposed to do based on the model of behavior
they use -- the general linear model of statistics. The model assumes
that the same law of behavior underlies the behavior of all subjects
in an experiment. Any difference in the way individual subjects
respond to different treatment conditions is considered to be the
result of random error.

This was brought home to me while I was teaching one way within
subjects ANOVA in statistics. As David Goldstein mentioned, any
difference in the way individual subjects respond to different levels
of an experimental treatment is called a subject X treatment
interaction. In one way within subjects ANOVA the subject X treatment
variance estimate is used as the error term in computing the F ratio.
What this means is that any _observed_ difference in the way subjects
respond across levels of the treatment is considered to be a result of
random error -- it's not a result of a real differences between
subjects in terms of how they react to the treatment.

So when an ANOVA reveals a "significant" treatment effect over a group
of subjects, the conclusion is that there is a true treatment effect
(behavioral law) that is the same for all subjects; observed
differences in the way subjects behave across treatment conditions
(the subject X treatment interaction) are treated as error variance.
This conclusion is based on the statistical model that is used to
evaluate all psychological research and I believe it is why
researchers will say something like "mice respond to X by doing Y"
even though not all mice are seen to respond to X by doing Y. The
statement about the relationship between X and Y is thought to be true
of each individual mouse because that is basically the conclusion that
is warranted by the "significance" of the statistical test. Each mouse
is assumed to have the same transfer function that converts the
treatment conditions (X) into behavior (Y). But what is observed is
not the same for all mice because Y is also affected by error. The
statistics help you find the true behavioral law relating X to Y for
all individuals.

Best

Rick

···

--
Richard S. Marken PhD
Lecturer in Psychology
UCLA
rsmarken@gmail.com

On that note, I should mention
that I realized that psychologists come

to conclusions about individual behavior based on group data because

that’s what they are supposed to do based on the model of behavior

they use – the general linear model of statistics. The model
assumes

that the same law of behavior underlies the behavior of all subjects

in an experiment. Any difference in the way individual
subjects

respond to different treatment conditions is considered to be the

result of random error.
[From Bill Powers (2007;08.10.0718 MDT)]

Rick Marken (2007.08.09.2140) –

It’s good to see all this said in “official” statistical
language. I suspected the same thing but for different reasons. I had
notice references to the “Sprague-Dawley Rat” in many papers,
and read how terrible it was when laboratory suppliers managed to
contaminate the genetic lines so they delivered impure rats. It seemed
that the properties of all S-D rats were considered to be the same –
which of course in some respects they are. But even then (this was in
undergraduate days) I doubted the implication that these animals weren’t
at all changed by their life experiences.

I think it’s clear that what you say is correct: the unspoken assumption
behind statistical analyses is that all subjects would show the same
effects of a standardized treatment if it weren’t for random errors that
mask the effects. This is why I was raising the possibility that subjects
could be very different, with a few subjects showing all the effects and
the rest not being affected at all. That’s an extreme case, but all cases
in between are possible: variations in the real effect ranging from small
to large, with truly random effects also ranging from none to very
large.

I think this is supported by our experiences with experiments involving
control As soon as you discover a way in which organisms really are
alike, suddenly the random noise goes away, or shrinks to the levels
we’re used to in other laboratory sciences. The apparent randomness in
other kinds of experiments is probably due to the fact that organisms
learn to control different variables in different ways, so the way they
react to disturbances is going to reflect those differences. Generalizing
about that sort of thing is just futile, at least for human beings in
whom reorganization plays such a large part in determining their
characteristics.

Best,

Bill P.