Repeated-measures ANOVA

[From Mike Acree (971205.1020) PST]

Rick Marken (971204.0825)--

Bruce Abbott (971204.0945 EST)--

A repeated measures design is an individual subject design replicated
as many times as there are subjects. Nothing precludes the
investigator from examining the data for each individual separately,
and large differences in the trends of individual subjects would
show up as a large error term within the ANOVA, as well as big
standard errors around at least some of the individual means.

Everything you say here is wrong.

I was about to jump in and say that there was _nothing_ wrong in this
paragraph from a statistical point of view. But then I turned the page
and found that Bruce (971204.1210 EST) was quite capable of taking care
of himself on statistical issues. Not really surprising: even the
issue I quibbled about a few days ago (the meaning of a significance
test) is one I should have known he was sensitive to, since his textbook
is generally careful on that point, when many are not.

Mike

[From Rick Marken (971206.1510)]

Bruce Abbott (971204.0945 EST)--

A repeated measures design is an individual subject design
replicated as many times as there are subjects. Nothing
precludes the investigator from examining the data for each
individual separately, and large differences in the trends
of individual subjects would show up as a large error term
within the ANOVA, as well as big standard errors around at
least some of the individual means.

Me --

Everything you say here is wrong.

Mike Acree (971205.1020) --

I was about to jump in and say that there was _nothing_ wrong
in this paragraph from a statistical point of view.

What is wrong about it is what is implied. The main incorrect
implication is that the repeated measures (RM) ANOVA has anything
to do with subject "trends". The RM ANOVA is based on the following
model of scores: x(i,j) = kIV+ S(i) + e(i,j). This model let's you
get estimates of between subjects error (Si) variance _as well as_
random error (e(ij)) variance. The only "trend" in the model
is assumed to come from the effect of the IV.

The individual scores of a subject across levels of the IV are
not looked at as an individual _trend_ from the point of view of
the RM ANOVA model. As in any group statistical analyis, the goal
is to detect the "true" effect (k) of the IV by _averaging over
subjects_. The scores of any subject in each condition of the
experiment represent (according to the RM ANOVA model) the effect
of the IV plus any constant subject contribution (Si) _plus_ an
independent error score (e(i,j) for that condition.

Say that the true responses of subject i at three levels of an
IV are 1, 2, 3. Assume that the subject's contribution (Si) is
10 and that the error (e*i,j) ) in each condition is 2, -1, -3.
So what we _observe_ for this subject is: 13, 11, 10. This
"trend" is the opposite of the real effect of the IV on this
subject because of the particular values of the _random_ error
contribution to the scores. We test many subjects in a RM design
to _average out_ these errors and pick up what is presumed
to be the _true_ effect (k) of the IV on each subject -- which
(according to the RM ANOVA model) is the _same_ for each subject!!

What we _gain_ by using the RM (rather than the completely
randomized) design is the ability to _factor out_ variance due
to subject (Si) differences (the fact that Si for each subject
is different). By removing this error variance _plus_ the
random error variance we have a better chance of _detecting_
(via statistical significance) the "true" effect of the IV on
the DV.

So, while, as Bruce Abbott says, nothing precludes the
investigator from examining the data for each individual
separately and looking for large differences in the trends
over subjects, this would be a pretty stupid thing for the
investigator to do from the point of view of the RM ANOVA
model since the trends the investigator observe for each
subject do not represent the "true" effect of the DV on
the subjects.

So what Bruce says above is not only statistically wrong
(in what it implies) it also gives the misleading impression
that the RM design is used _with the aim_ of detecting
individual trends. It is not and should not be so used,
even from a _conventional_ perspective. If Bruce recommended
using RM ANOVA in this way in his statistics textbook he would be
reprimanded by other conventional psychologists who have a
better understanding of statistical techniques and the models
on which they are based.

If you (Mike) and Bruce want to defend conventional methodology
I think it's only fair that you defend the methodology that is
actually used in conventional psychology and not one that you
invent on the spot -- one that might seem more compatible with
the PCT focus on the study of individuals.

Bruce Abbott (971204.1605 EST)--

The repeated measures ANOVA was significant,
F(2,18)=4.456, p=.0268

Well, there you jolly well are, aren't you;-)

Oops. I guess not. More BS from BA follows:

However, this ANOVA requires that the variances in the
populations be equal (homogeniety of variance assumption)

The things we do for love of a dream defunct;-) How many
studies with significant results have not been published
because the data failed the homogeneity of variance
assumption? Can you say "log transform? Get real, Bruce!

The ANOVA results are therefore untrustworthy.

Well, they certainly are! But not because of any problems
with non-homeogeneity of variance.

A plot of scores from individual subjects reveals that as the IV
increased, some scores went sharply up, some went up less sharply,
some went up and then down, some stayed approximately level,
some went up a bit and then down sharply, and some declined a
small amount across levels.

How do you know that these differences are not the result of
random error added to the same effect of the IV on DV for
each subject? Get with it Bruce! You're supposed to be writing
a _textbook_ on this stuff.

To me it looks easy to spot what is going on in these data,
both overall and for individual subjects. Anyone who plots
the data and connects the dots for same subject (row) can
see what I mean.

Again, how do you know that these differences are not the result
of random error added to the same effect of the IV on DV for
each subject? It would be easy to create EXACTLY the same
data using an equation like x = kIV + S + rand(). If you think
that, by looking at the plots of the individual data, you are
able to spot "what is going on" in these data, you are (as usual)
fooling yourself.

Best

Rick

···

--
Richard S. Marken Phone or Fax: 310 474-0313
Life Learning Associates e-mail: rmarken@earthlink.net
http://home.earthlink.net/~rmarken/

[Martin Taylor 971207 16:50]

Rick Marken (971206.1510)]

Bruce Abbott (971204.0945 EST)--

A repeated measures design is an individual subject design
replicated as many times as there are subjects. Nothing
precludes the investigator from examining the data for each
individual separately, and large differences in the trends
of individual subjects would show up as a large error term
within the ANOVA, as well as big standard errors around at
least some of the individual means.

Me --

Everything you say here is wrong.

Mike Acree (971205.1020) --

I was about to jump in and say that there was _nothing_ wrong
in this paragraph from a statistical point of view.

I suppose I ought to leave it to Bruce Abbott to answer the manifest
errors in Rick's message. But Bruce has had to deal with so much
nonsense recently that I fain must jump in where angels do not
leave footprints:-)

The individual scores of a subject across levels of the IV are
not looked at as an individual _trend_ from the point of view of
the RM ANOVA model.

The differences in individual trends ordinarily appear as the error
variance when testing for the reliability of the group trend. That
is to say, they are used to determine whether the group trend can
be used to say anything about the individual trends.

The rest of the commentary is either wrong or pointless, but
I am glad Rick paraphrased my earlier comment to him, except that
I started with something like "If you (Rick) want to attack
conventional methodology":

If you (Mike) and Bruce want to defend conventional methodology
I think it's only fair that you defend the methodology that is
actually used in conventional psychology and not one that you
invent on the spot.

It's a good idea, either way.

Martin

[From Rick Marken (971207.1630)]

Martin Taylor (971207 16:50) --

The differences in individual trends ordinarily appear as the error
variance when testing for the reliability of the group trend. That
is to say, they are used to determine whether the group trend can
be used to say anything about the individual trends.

This is kind of a misleading way of saying it. But it is basically
just what I said. Error variance results, in theory (the ANOVA
model), not from differences in individual subject trends but,
as I said, from error independently affecting individual scores
in each experimetnal condition.

If the ratio of treatment to error variance (F) is large enough,
the group "trend" (effect of IV on DV) is considered "significant".
A significant group "trend" can then be used (again according
to the ANOVA model) to say that this group trend is the "true"
trend for _each individual_. So if the group trend (the average
response of the group at three levels of an IV) is 10, 20, 30
then the conclusion is that the trend for _each individual_
is _really_ proportional to 10, 20, 30 -- and we would have
seen that trend in each subject if it were not for the obscuring
effect of error variance and between subject differences.

You statement is misleading because it gives the impression
that the group trend detected in a repeated measures design
can be used to say something about individual trends _that
may be different_. This is, of course, wrong and, if it is
what you intended to say, you are making the same error Bruce
Abbott made. The repeated measures design has nothing to do
with determining individual differences in subject "trends"
(the effect of the IV on the DV for each subject).

The repeated measures design is an efficient way (because
it uses fewer subjects and at the same time increases
statistical power by providing an estimate of the
contribution of between subjects variance to overall error
variance) to estimate the "true" effect of IV on DV (what you
call the "trend" ) -- the one that is presumably true of _every
subject_ -- based on the group results.

The rest of the commentary is either wrong or pointless

Pointless, yes.

Best

Rick

···

--

Richard S. Marken Phone or Fax: 310 474-0313
Life Learning Associates e-mail: rmarken@earthlink.net
http://home.earthlink.net/~rmarken/

[From Bill Powers (971207.2034 MST)]

Rick Marken (971207.1630)--

[Martin Taylor]

The differences in individual trends ordinarily appear as the error
variance when testing for the reliability of the group trend. That
is to say, they are used to determine whether the group trend can
be used to say anything about the individual trends.

{Rick Marken]

This is kind of a misleading way of saying it. But it is basically
just what I said. Error variance results, in theory (the ANOVA
model), not from differences in individual subject trends but,
as I said, from error independently affecting individual scores
in each experimetnal condition.

It seems to me that both Martin and Bruce Abbott have decided that your
interpretation of the RM-ANOVA is incorrect, so they are hearing everything
you say in terms of their understanding of what you referring to. Obviously
what they perceive you to be talking about should not be analyzed the way
you're doing it, since Martin and Bruce can do the calculations correctly.
And since you're probably talking about something else, you will never
agree with them.

I think this is a case where a numerical example is required, followed by a
mathematical treatment that is clear enough so these different
interpretations are not possible. As long as the argument is partly verbal,
there will be no way to resolve it.

The question (as I understand it) is whether the RM-ANOVA, as technically
defined, can predict individual trends from the group trend, when the
individuals show different trends when measured one at a time. Rick appears
to be saying that the differences in individual trends are treated as
random variables with some distribution around the trend of the group -- in
other words, that the individuals are assumed to have the same basic trend
as the group, but with some random scatter in trends. Martin is emphasizing
the reliability of the group trend, with the scatter of individual trends
around the group trend being used to judge that reliability.

What I suggest is that a model be set up in which each person in a set of
individuals is _known_ to respond to the IV in a particular way, with a
collection of positive and negative (linear) trends. By selecting the mix
of positive and negative trends, it should be possible to arrange for the
group trend to be some positive value. Since we know the individual
characteristics exactly, we can then see how well the group trend agrees
with each individual trend.

In terms of the sign of the trend (positive or negative effect of the IV),
clearly the sign of the individual trend will be correctly predicted only
for those individuals whose trend has the same sign as the group trend. In
terms of the variance of the measured trends, we could evaluate how far
from each individual trend (in terms of its variance) the group trend is.
Can that kind of information be obtained from the RM-ANOVA?

From what Bruce Abbott said, I get the impression that if the variance in

the measure of an individual trend is very low (i.e., the points for the
individual lie along a straight line), but the variance in the group trend
is very large (meaning that there is a large range of individual trends),
the group trend would not be seriously considered as a predictor of
individual trends. Is that right?

And would that mean that group trends would be used to predict individual
trends only when it's not possible to say how well they predict individual
trends (that is, when the variance in meaures of the individual trends is
large)?

Best,

Bill P.

[Martin Taylor 971208 00:45]

Rick Marken (971207.1630)]

Martin Taylor (971207 16:50) --

The differences in individual trends ordinarily appear as the error
variance when testing for the reliability of the group trend. That
is to say, they are used to determine whether the group trend can
be used to say anything about the individual trends.

This is kind of a misleading way of saying it. But it is basically
just what I said.

If it is what you said before, and I misinterpreted it, I apologize.
But it isn't what you say immediately below.

Error variance results, in theory (the ANOVA
model), not from differences in individual subject trends

Yes, that's exactly the correct error term for the group trend--the
variance of the individual trends. The ANOVA model does not in itself
dictate how the variances are to be partitioned among the N degrees of
freedom corresponding to the N observations. It is based on the idea
that if there are no correlations among the various conditions and
observations, the data will be hyperspherically distributed. You
can slice the dimensions out of the hyperspace however you like, but
if your interpretation is to make any sense, you have to do so in a
sensible way. The usual "sensible" way is to look at what would be
a hypersphere in a subspace if the effect you are interested in was
actually of zero impact. If it is elongated in the subspace of interest,
you get an F-ratio greater than unity. You get the error term from
the sub-subspace orthogonal to the subspace of your interest.

What this means is that the appropriate subspace for determining whether
trends are consistent across individuals is the subspace of measured
trends, _not_ the variation across individuals at specified values of
the IV.

but,
as I said, from error independently affecting individual scores
in each experimetnal condition.

Yes, that _is_ what I thought you said before. And it isn't "just
what I said."

You statement is misleading because it gives the impression
that the group trend detected in a repeated measures design
can be used to say something about individual trends _that
may be different_.

I say what I said before, many times. If you measure some parameter about
individuals, the best estimate of that parameter for an individual is
a measurement on that individual. If you need to estimate the parameter
for an individual who you can't measure, your best bet is based on
the parameter values for individuals as like the one you are interested
in as you can find, in all apparently important respects you know about.
And the best value for that parameter is (usually) the average value
for those similar individuals. This is true whether the parameter is
a point value such as weight, or a relationship value such as a trend.

Whether the value you get from the average is any use to you is another
matter. That depends on how accurately you need to know it for your
purpose of the moment, and how wide the distribution is of the values
for your "similar" individuals.

The repeated measures design has nothing to do
with determining individual differences in subject "trends"
(the effect of the IV on the DV for each subject).

No, nothing to do with _determining_ the individual differences in
trends. They are just the appropriate source of the error term for
determining the confidence interval or "significance" of the group
trend.

The rest of the commentary is either wrong or pointless

Pointless, yes.

Well, at least we agree on something.

Martin

[From Rick Marken (971207.2210)]

Bill Powers (971207.2034 MST) --

It seems to me that both Martin and Bruce Abbott have decided
that your interpretation of the RM-ANOVA is incorrect

I think the problem might be that there _is_ a (fairly rarely
used) version of the repeated measures design that _does_ give
an estimate of the contribution of differences in individual
"trends" to error variance. In this version of the repeated
measures design there are _at least_ two measures of each
subject's response in each condition. This version of the RM
design provides enough "degrees of freedom" to estimate the
effect of the IV on the DV (the "trend") for each subject.

In this version of the RM design, subjects become a "factor"
in the experiment; each subject is a different "level"
of this factor. The "interaction" between this "subjects" (S)
factor and the IV (the "S x IV" interaction) represents the
degree to which each subject responds to the IV differently;
it is a measure of the degree to which individual "trends"
differ from each other. If there is not a significant S x IV
interation then the IV affects all individuals in the same way.

This version of the RM design is equivalent to running
several individual experiments in parallel. In each individual
experiment, two or more measures of behavior are obtained from
each subject in each condition. If this is the version of the
RM design that that Bruce and Martin were thinking of, then
they are correct in saying that such a design allows an estimate
of differences between subjects in terms of the effect of the IV
on the DV.

But I think it is important to keep in mind that this version of
the RM design is really an example _individual_ research (in
Runkel's felititous terminology, it's not really a "net cast";
it's a "specimen test"); if there is a significant S x IV
interaction then the researcher knows that the average effect
of the IV on the group is meaningless. If there is no S x IV
interaction then the researcher knows that the average effect
of the IV on the group is the same as the effect of the IV on
the behavior of each individual. But the researher knows this
only because he or she collected a sufficient amount of individual
data from each subject.

While this version of the RM design does answer the PCT
objection to the use of group data (since it is really a
"testing specimens" approach to research) it still provides
no useful information about individual behavior (from a PCT
perspective) since it is still based on the conventional
IV-DV paradigm. This means that any observed relationships
between IV and DV -- even those seen for individual subjects --
reflect characteristics of the subject's environment, not of the
subject him or herself (that is, if the researcher thinks he or she
is learning about the behavior of the subjects in his or her
experiment, he or she is suffering from the behavioral illusion).

Best

Rick

···

--

Richard S. Marken Phone or Fax: 310 474-0313
Life Learning Associates e-mail: rmarken@earthlink.net
http://home.earthlink.net/~rmarken/

[From Bill Powers (971208.0249 MST)]

Rick Marken (971207.2210)--

While this version of the RM design does answer the PCT
objection to the use of group data (since it is really a
"testing specimens" approach to research) it still provides
no useful information about individual behavior (from a PCT
perspective) since it is still based on the conventional
IV-DV paradigm. This means that any observed relationships
between IV and DV -- even those seen for individual subjects --
reflect characteristics of the subject's environment, not of the
subject him or herself (that is, if the researcher thinks he or she
is learning about the behavior of the subjects in his or her
experiment, he or she is suffering from the behavioral illusion).

I think your use of "IV-DV" is imprecise -- you're using it as a code word
rather than with its strict meaning. As a code word, it means not just
"varying one variable and measuring its effect on another," but much more
narrowly, "Varying something that is actually a disturbance of a controlled
variable, and measuring the output of the control system, and interpreting
the result as if the IV were linked through a lineal causal chain, via the
organism, to the DV." It's primarily that interpretation to which we
object, not the other part of the process. We object to the model, not the
method.

When we speak of getting correlations in the 0.99 range, it is just this
"behavioral illusion" to which we refer. In a control system, the
disturbance is in fact one of the independent variables, and the output is
a dependent variable (the reference signal is the other independent
variable, which we customarily try to hold constant during the experiment
by asking the participant to maintain a constant goal-state). So our
procedure in showing
that the output is cancelling the effect of the disturbance is precisely to
hold "all else" (that matters according to the model) constant, varying an
IV, and measuring its relation to a DV.

The real difference between the PCT and conventional approaches is not in
the use of IV-DV analysis, but in the model. In the conventional analysis,
behavior is assumed to be a one-way function of all external influences
acting on the organism. That model, which we could call the "equilibrium
among forces" model, causes problems because in principle the entire
universe would have to be held constant in order to be sure of eliminating
unknown influences on behavior during an experiment while the one IV is
varied. This is what leads to the use of multiple determinations, multiple
subjects, and the use of statistical averages across populations and
individuals. The assumption is that if you can average across time and use
enough different individuals, the unknown influences will tend to average
to zero, leaving visible whatever measure of the IV-DV relationship can be
extracted from the mean relationships.

The same thing is done in all sciences; for example, when I was an
undergraduate I earned my keep in part by measuring the positions of the
(photographed) components of the double star 61 Cygni. Each measure was
repeated three times while the center of the image was approached from two
directions in each axis, and I measured some 8,000 images during that job.
The total number of images measured over the entire 10-year project by all
the people doing it was over a million. The theory here was that influences
on a single human measurement of position were unknown, but by averaging
over repeated measurements of each point by one individual, and over many
measurements by that individual, and over many results obtained by
different individuals, the random sources of error would average to zero.
And it worked. The orbit of 61 Cygni was determined with a mean position
error of a few thousandths of a second of arc, 100 times better than the
theoretical resolution of the 18.5-inch Clark refractor used to obtain the
images.

PCT offers us a great advantage in making predictions of behavior: we do
not need to hold the entire universe constant in order to determine the
effect of the disturbance on the output. All we need to hold constant is
the reference signal, and everything that could materially affect the state
of the controlled variable (except the output, of course). It's the model
that tells us there is a controlled variable, and that shows us how to
isolate and identify it sufficiently well for experimental purposes. If we
can just protect that one variable from all significant disturbances other
than the one we manipulate, and verify from the data that the reference
signal was reasonably constant, we can forget about all other influences on
the whole organism. We don't need to average them out, because according to
the model they can have no effects that matter.

Or we can put that differently: if there are any other influences on the
whole organism that have an important effect, we will soon discover that
when the behavior fails to fit the model with the expected accuracy!

The difference between the PCT and conventional models is obviously highly
important. Under the conventional model, there is no way to say in advance
what relationship should be seen between the IV and the DV, other than the
basic one of "effect" versus "no effect", assuming a linear relationship.
There is no way of selecting manipulations of IVs or ways of measuring DVs
that should reveal particularly clear effects. The basic rationale is "try
it and see what happens." And the result is pretty clear: large variances
in all measured relationships, so that multiple determinations and multiple
subjects MUST be used to see any effects at all.

The PCT model tells what variables to measure (we must identify a
controlled variable or at least propose one), and how to measure them (we
must convert measures of disturbance and output into an equivalent effect
on the controlled variable). And where we are able to satisfy the
conditions demanded by the model, we get relationships with a much lower
variance than those found through the traditional approach.

Actually, as I said some years ago and will now repeat, in statistical
terms, PCT only needs to give us two or three more standard deviations of
precision in order to make a qualitative difference in our ability to
predict behavior. In a traditional experiment, a good result is an effect
size that is twice the standard deviation of the measurements. That gives
us (I'm doing this from memory so correct me if I'm wrong) about 1 chance
in 20 that the effect appeared by chance. If we can improve our theories to
the point where effects are of the order of 4 times the standard
deviations, the odds against the result appearing by chance increase to
15,000 to one. At five times the standard deviation, the odds against
become 1.7 million to one.

So somewhere between 4 and 5 standard deviations, we cross into a new kind
of territory. A statement that has a 95% chance of being true (the
2-standard-deviation kind of truth) can be combined with no more than 12 or
so other statements if a conclusion that depends on all the statements is
to have a probability of truth equal to 0.5 -- the chance level. At 4
standard deviations, the argument can involve over 10,000 such "facts"
before the truth-probability of the conclusion drops to 0.5. And at 5
standard deviations the number of statements becomes over 400,000.

In short, when the predictions have a 4 to 5 standard deviation fit to the
data, we get the kinds of facts on which we could build a science; that
would permit drawing valid conclusions from logical reasoning of some
degree of complexity. With 2-standard-deviation facts, the scope of logical
reasoning is reduced enormously, to the point where it is hardly possible
to draw a valid conclusion.

The "stability factor" we sometimes use to evaluate control is roughly the
effect size measured in standard deviations. A stability factor of 5 is
quite low; a well- practiced controller can achieve stability factors of
anywhere from 9 to 15, depending on task difficulty. The statement that the
controller is in fact acting like a control system has such a high
probability of being true that the numbers are far off the end of the chart
in my handbook. The highest odds-against given is for 7 standard
deviations: it is 4 times 10^13.

I'm drawing all these figures out of my old Handbook of Chemistry and
Physics, and am not very sure about my reasoning. It would be nice if
others more familiar with such statistical concepts would see what they
come up with in trying to replicate my calculations.

Best,

Bill P.

[From Rick Marken (971208.0815)]

Me:

The repeated measures design has nothing to do with determining
individual differences in subject "trends" (the effect of the
IV on the DV for each subject).

Martin Taylor (971208 00:45) --

No, nothing to do with _determining_ the individual differences in
trends. They are just the appropriate source of the error term for
determining the confidence interval or "significance" of the group
trend.

See my post for (971207.2210). Individual differences in trends
are a "source of the error term" only when you can estimate the
Subject x IV interaction. And you can only do this when you have
obtained _two or more_ measures of the DV for each subject in
each condition.

In our original discussion of the "repeated measures design" we
were discussing data that looked like this:

S1 12 30 45
S2 1 20 21
.
.
.
SN 3 16 12

where Si is subject i (i = 1...N) and there are three levels of
the IV. So I thought we were discussing the most commonly used
repeated measures (or "within subjects") design in which only
ONE measure of the DV is obtained at each level of the IV for
each subject. I claim that everything you and Bruce Abbott have
said about this version of the repeated measures design is
incorrect or misleading; for example, your claim that the ANOVA
error term that is used to test for significance of the main effect
of the IV is based on "individual trends". This clainm is only
correct if you can get a measure of the "Mean Square Error"
(MS error) associated with the Subject x IV interaction. And
you can only get a measure of MS error for this interaction if
you have _more than one_ measure of the DV at each level of the
IV for each subject. That is, you need data that look like this:

S1 12 30 45
    11 32 35
    14 40 39

S2 1 20 21
    10 18 29
    2 25 22
.
.
.
SN 3 16 12
    1 12 32
    5 25 25

Here, there are three measures of the DV at each level of the IV
for each subject. If this is what you mean by a "repeated measures"
design then what you say about individual trends and error variance
is correct. But note that, in this design, you are doing an individual
study on each subject; the group result is only meaningful if all
subjects respond to the IV in the same way (if there is no
Subject x IV interaction).

Best

Rick

···

--
Richard S. Marken Phone or Fax: 310 474-0313
Life Learning Associates e-mail: rmarken@earthlink.net
http://home.earthlink.net/~rmarken

[Martin Taylor 971208 23:00]

Rick Marken (971207.2210)]

I think the problem might be that there _is_ a (fairly rarely
used) version of the repeated measures design that _does_ give
an estimate of the contribution of differences in individual
"trends" to error variance.
...
While this version of the RM design does answer the PCT
objection to the use of group data (since it is really a
"testing specimens" approach to research) it still provides
no useful information about individual behavior (from a PCT
perspective) since it is still based on the conventional
IV-DV paradigm. This means that any observed relationships
between IV and DV -- even those seen for individual subjects --
reflect characteristics of the subject's environment, not of the
subject him or herself (that is, if the researcher thinks he or she
is learning about the behavior of the subjects in his or her
experiment, he or she is suffering from the behavioral illusion).

Good. I think we may possibly have agreement! If you look back over
my messages on this matter, I think you will find I said the same in
a variety of ways.

However, I do have to object to the cavalier use of IV-DV to represent
"stimulus" and "response", since in legitimate PCT studies we also
independently vary something (IV) and measure another thing (DV). To
use those terms would seem to suggest you disapprove of _all_ experiments.

Martin