# The Test in practice

[From Rick Marken (920929.1000)]

Martin Taylor (920928 16:45) --

Your example of the test is, indeed, interesting.

In your example, it looks like Martin is controlling the position of the
waste basket, then stops. I don't see this as giving up -- it could be just
a change of reference. There really wasn't much testing going on in
the example; only one level and type of disturbance, no revision of the
hypoethesis about the CEV when there was lack of disturbance resistence, etc.

Your comments on conflict seem reasonable. I would just say that I think
the test should not involve setting up a conflict with the testee. The tester
should just be disturbing variables, not trying to prevent the testee from
controlling the variable. The test should cause no inconvenience for the
testee -- or higher order systems are likely to become involved and,
ultimately, reorganiztion. This would mess up the clarity of the results
of "the test".

My claim is that The Test is always ambiguous. P can tell that Q is control
ling for a percept that incorporates some CEV that P has disturbed, but P can
never tell that the percept Q is controlling for corresponds exactly to the
CEV that P's percept corresponds to. The CEV is in the world, but only
percepts can be controlled. The percepts are private, so in a sense the
CEVs are, too.

I suppose this is ultimately true. But when your guesses about a controlled
(perceptual) variable are allowing you to predict behavior variations with
99% accuracy, I think worrying about whether you have identified the
absolutely true CEV is rather academic. For example, in my "area" vs
"perimeter" control study, I did the test for the controlled variable to
determine whether the subject is controlling x+y vs x*y (where x and y
are height and width of a quadrilateral figure). Using x+y as the
hypothesized controlled variable the error in predicting responses was,
I think, about 2%. With x*y as the hypothesized CEV, the error was halved,
to 1%. You could probably do slightly better with some other hypothesized
CEV -- maybe sqrt(x*x+y*y) -- but clearly you are on the right track
with x*y. I guess you reach a point where you have to determine how much
trouble it's worth to try to improve your model of the behaving system;
it seems to me that, when you are predicting over 99% of the variance, you
are so far ahead of the conventional psychology game that such improve-
ments are really mute (until PCT science develops to a level at which
such tiny improvements are significant).

But the correspondences of CEVs can be tested, and that is
what you do when you try variants on moving the wastebasket, such as putting
some other obstruction in the aisle, or moving the wastebasket to some other
unobtrusive place. Such experiments can never give exact answers (pace
Rick Marken and Bill Powers) but can only be evaluated statistically.

The changes in reference level are
an independent reason why experimental tests can only be statistical, and
why one can never be sure to what degree one's percepts coincide with the
percepts of other people.

I agree that there are many difficulties inherent in doing "the test". One
only needs to actually try the "coin game" to see how difficult and
frustrating it can be in practice. But I think that, in this early stage of
the development of PCT science, the attitude that the test can only be
evaluated statistically, is just too defeatist for me. I say this
because 1) we already have some examples of very precise results
of the test -- where there is no need for statistical evaluation at all.
My "area/perimeter" study mentioned above is one; Bill's analysis of the
another. These are examples of the test where the quantitative results are
so precise that the use of inferential statistics (I presume this is the
kind we're taking about) would be nothing more than pompous posturing (the
results are obviously "significant") 2) we know that people are controlling
perceptual variables and there is no reason to suspect that we cannot get
a pretty accurate description of them -- even if we can't actually perceive
what people are perceiving. So the goal of the test should always be high
quality data -- we should work to improve the test (changing disturbance
techniques, revising hypotheses about CEVs, whatever) until the results
are nearly perfect (as in the examples we have) befoe throwing in the
towel and starting to turn the statistical crank again; statistics should
be the last resort; but given my understanding of the PCT model I can't
see how they could tell you much about the system you are controlling. I
don't rule out statistics completely -- I have used a statistical measure
of performance in my mind reading demo (the stability factor) and done
an inferential statistical procedure to decide whether or not an object
on the screen was or was not under control. But this was a crude approach
to doing a version of the test -- if I really wanted to know WHAT variable
was being controlled, I would have done things that obviated the need for
the statistical approach.

I think that this is an important topic and, perhaps by discussing it we
will get some people out there to actually try the test and see what
problems they run into.

I don't want to be "ideologically" anti-statistical, by the way. I am willing
to believe that I don't fully understand your position on statistics, Martin.
I consider it entirely possible that statistical methods be could useful
tools in PCT -- they already have been. I just object to using statistics
the way they are used in conventional psychology -- to see whether or
not anything happened in a study. Maybe it would help if you described
exactly how you see statistics fitting into methodology of "the test".
Perhaps give a concrete example of your idea of using statistics
as part of "the test".

Thanks

Rick

ยทยทยท

**************************************************************

Richard S. Marken USMail: 10459 Holman Ave
The Aerospace Corporation Los Angeles, CA 90024
E-mail: marken@aero.org
(310) 336-6214 (day)
(310) 474-0313 (evening)

[Martin Taylor 920929 16:00]
(Rick Marken 920929.1000)

I consider it entirely possible that statistical methods be could useful
tools in PCT -- they already have been. I just object to using statistics
the way they are used in conventional psychology -- to see whether or
not anything happened in a study.

I also object to the way statistics are used in conventional psychology. I
started this objection when I was in graduate school, and have maintained it
ever since. Any time I see a significance test in a paper, my first question
is "was it inserted at the demand of the editor, or did the author really not
know what his/her data meant." A significance test tells you nothing except
that your experiment was sensitive enough to detect an effect that you knew
beforehand had to exist. To quote Ted Nelson, "everything is deeply
intertwingled." To quote myself in a letter refusing to incorporate
significance tests into a paper that satisfied what Ward Edwards called
"the Interocular Traumatic Test" (the test that Rick approves): The inclination
of the rings of Saturn affects the curl of a puppy dog's tail. So what use
is it to demonstrate that you have made enough measurements to show it to
be so. The interesting question is how much influence the rings of Saturn
have on that puppy dog's tail, and how reliable is that estimate under
different circumstances.

Maybe it would help if you described
exactly how you see statistics fitting into methodology of "the test".
Perhaps give a concrete example of your idea of using statistics
as part of "the test".

That's an important question. I'd prefer to answer it in a separate posting,
for two reasons. (1) It needs a lot of background presentation, some of
which I have done in other postings, but which should be pulled together,
and (2) it needs me to think of a mid-level experiment that is unlikely to
satisfy the 99% predictability that can be obtained from lower-level tasks.
My presumption is that you get the 99% prediction because the subsystems
(perhaps ECSs) that are involved in the task are those that support very many
different kinds of behaviour, and so are not readily disturbed by contextual
differences. Since I am at the moment being asked to think of PCT-driven
experiments in planning and decisions-making tasks, I may be able to satisfy
condition (2) in that context. (Any help from the CSG-L readership in
devising such an experiment would be most welcome, by the way.)

These are examples of the test where the quantitative results are
so precise that the use of inferential statistics (I presume this is the
kind we're taking about) would be nothing more than pompous posturing (the
results are obviously "significant")

As you see from the above, "inferential statistics" is not what I am talking
about. What I am talking about would be more along the lines of parameter
estimation for the perceptual functions, gain estimates with variance, and
the like.

we know that people are controlling
perceptual variables and there is no reason to suspect that we cannot get
a pretty accurate description of them -- even if we can't actually perceive
what people are perceiving. So the goal of the test should always be high
quality data -- we should work to improve the test (changing disturbance
techniques, revising hypotheses about CEVs, whatever) until the results
are nearly perfect (as in the examples we have) befoe throwing in the
towel and starting to turn the statistical crank again; statistics should
be the last resort; but given my understanding of the PCT model I can't
see how they could tell you much about the system you are controlling.

...

For example, in my "area" vs
"perimeter" control study, I did the test for the controlled variable to
determine whether the subject is controlling x+y vs x*y (where x and y
are height and width of a quadrilateral figure). Using x+y as the
hypothesized controlled variable the error in predicting responses was,
I think, about 2%. With x*y as the hypothesized CEV, the error was halved,
to 1%. You could probably do slightly better with some other hypothesized
CEV -- maybe sqrt(x*x+y*y) -- but clearly you are on the right track
with x*y.

(These statements are, as I proposed, in backwards order).

Do you see the contradiction between your two statements? Now I grant that
you followed up with a comment that it isn't worthwhile to try to do better,
and in this case you may well be right. But suppose your intention was to
try to find the form of the perceptual function that defines the CEV. This
form could be fairly critical in determining the internal linkages within
the control net, because if it happened to be based on the sum of squares
rather than a simple product, and you found on another occasion a different
CEV that also used the squares of simple dimensions, you might well suspect
that the individual squares could also be controlled if necessary. But if
the better function was the product, you might ask whether actually it was
the sum of logarithms instead. If that were the case, then your variance
would depend differently on the magnitude than it would if the product were
the actual controlled variable. (I suspect that the sum of logarithms is a
more likely perceptual function than the product, by the way). Having
reduced the extrinsic variance so much gives you an opportunity to determine
where the rest comes from. Perhaps it is mainly due to a low gain function,
perhaps to delays in the feedback loop. You can model that, to see whether
it improves the statistics (and I know you have done so, in other experiments).
I don't find the use of statistics "throwing in the towel" or a "last resort."
I see it as fundamental to finding out what is going on.

Going on backwards...

The tester
should just be disturbing variables, not trying to prevent the testee from
controlling the variable. The test should cause no inconvenience for the
testee -- or higher order systems are likely to become involved and,
ultimately, reorganiztion. This would mess up the clarity of the results
of "the test".

Yes, ideally any observer should avoid disturbing the thing observed. But
any disturbance to a controlled variable causes error in the controller, even
if momentarily. The tester is inevitably controlling for perceiving a change
in a variable, or a resistance to change (if the circumstances are appropriate).
The tester may relinquish control very quickly ("give up") if the testee
shows signs of controlling the disturbed variable, but there has been an
effect by the tester on the testee, which has some (we hope vanishingly small)
probability of inducing reorganization. And if the testee happened to have
been on the verge of reorganizing anyway, a vanishingly small disturbance
might tip the balance. When David moves the wastebasket that I had been
controlling for seeing under the table, I might "finally" be tipped into
deciding that I don't want to use a wastebasket at all. (Some people think
that is the case when they see my office, in any case).

In your example, it looks like Martin is controlling the position of the
waste basket, then stops. I don't see this as giving up -- it could be just
a change of reference. There really wasn't much testing going on in
the example; only one level and type of disturbance, no revision of the
hypoethesis about the CEV when there was lack of disturbance resistence, etc.

Yes, sure it could be a change of reference. I hypothesised that it was a
case of "giving up." And your other comments just illustrate the point I
disturbances going on, and you will get near-perfect results only if several
conditions hold: (1) you have identified correctly the controlled CEV, (2)
the controller gain is high compared to the effects of other systems or
sources of disturbance that might affect that CEV, (3) the controlling
reference level is fixed or changes according to a known pattern. There
are probably other conditions, but I can't think of them at the moment. These
three are characteristically different in their effects, but failure to
meet them cannot be distinguished readily from measurement noise without
careful analysis of the statistics, and I'm not sure whether there is any
reliable way to do so even then.