Re.: PCT-Specific Methodology

Message
[David Goldstein (1006.12.16.02490)]

Dear Rick and listmates:

Isn’t it a good idea to maintain the usual notation of IV (Independent Variable-experimentor manipulated) and Dependent Variable (Action or subject manipulated) to best communicate with others? Also, the Organism Variables (O) is a term used to refer to some hypothetical variable within the peson.

Traditional:

Null Hypothesis: There is no relationship between the IV and the DV.

Alternative Hypothesis: There is a relationship between the IV and the DV.

Evidence to reject the Null Hypothesis: There is a statistically significant correlation between the IV and the DV. The correlation coefficient is nonzero. Any relationship would qualify. It could be directional, however. The researcher may theorize about what happens between the IV and the DV. The IV causes changes in the O which causes changes in the DV.

PCT::

Null Hypothesis: There is no relationship between the IV (Disturbance) and the DV(Action); there is no control system operating between the IV and DV.

Alternative Hypothesis: There is a specific relationship between the IV and the DV (-1); there is a control system operating between the IV and the DV

Evidence to reject the Null Hypothesis: There is a statisically significant correlation between the IV and the DV and it takes a specific form (-1). the researcher theorizes

that the IV would cause changes in the O if the person didn’t nullify it by the changes in the DV.

[From Bill Powers (2006.12.16.0555 MST)]

PCT::

Null Hypothesis: There is no relationship between the IV (Disturbance)
and the DV(Action); there is no control system operating between the IV
and DV.

Alternative Hypothesis: There is a specific relationship between the IV
and the DV (-1); there is a control system operating between the IV and
the DV

Evidence to reject the Null
Hypothesis: There is a statisically significant correlation between the
IV and the DV and it takes a specific form (-1). the researcher theorizes

that the IV would cause changes in the O if the person didn’t nullify it
by the changes in the DV.

OK, this begins to acknowledge Rick Marken’s point that the CV really
ought to be mentioned.
In the traditional approach, the null hypothesis is simply that there is
NO RELATIONSHIP between the manipulated environmental variable (IV) and
the action of the organism (DV). So rejecting the null hypothesis does
not support any particular relationship – it simply says there is some
kind of relationship, without saying what kind it is.
In the PCT approach, the IV is still the manipulated environmental
variable, now called the disturbance. But the null hypothesis is that
there is NO CONTROL SYSTEM. If there is no control system, there could
still be some kind of relationship between the IV and the DV, so some
other kind of system could still be present (a stimulus-response
system,for example).
I think David’s way of putting this misses the point that there will be a
correlation between IV and DV, as he defines them, whether the system in
question is a control system or an S-R system. Rick points out that the
difference is found in the Controlled Variable, the CV. In traditional
S-R theory it is assumed that the manipulated environmental variable is a
remote manipulandum that acts through a proximal stimulus (PS) on the
organism’s sensory inputs. Therefore the correlation between the remote
manipulandum and the behavior cannot be any less than the correlation
between the proximal stimulus and the behavior. In PCT, we expect a
different situation as shown below together with the traditional
picture.

PCT and S-R theory:

correlation> = 0.9
IV
-------------------------------------------->DV

Remote -------> Proximal —> Organism
—> Behavior.

Manipulandum
Stimulus .
S-R |<–|corr| = 0.95
---->PS<------ |corr| = 0.95 ---->|
PCT |<-- corr = 0.2
------>CV<-------- corr = -0.2 ---->|

slope =
+k1
slope = -k2
The proximal stimulus, in PCT. is the controlled variable (if there is
one). So the absolute value of the correlation between the IV and the
proximal stimulus (CV) is low in PCT, and high in traditional
theories.
In both PCT and SR theory, observing a significant nonzero correlation
between IV and DV disproves the hypothesis that there is no relationship
between IV and DV as defined above – the null hypothesis. However, as we
see from the diagram, that nonzero correlation could be brought about in
two very different ways. We still can’t say which of the two theories is
preferable just from observing a significant absolute value of
correlation between IV and DV.
The critical difference is in the correlations between the proximal
stimulus (the CV) and the other two variables. For distinguishing control
theory from conventional theory, we must also measure the state of the
proximal stimulus. If the new null hypothesis is that the system is not a
control system, that hypothesis will be disproven by showing that there
is a low correlation between the IV and the CV, while there is a high
absolute value of correlation between IV and DV.
Note that the sign of the correlation is not considered in the
traditional analysis. Because of changes of units between input and
output, there is no “natural” interpretation of the signs: is a
bar-pressing response in the same direction as the sound of food rattling
in the dish? Is jerking the arm away opposed to the direction of the pain
caused by a needle jab? Directionality makes a difference only in PCT,
where the effect of the behavior on the CV is opposite to the effect of
the remote manipulandum (the disturbance) on the same CV. The sign
attached to the IV and the DV is determined by their respective effects

···

on the CV, and *in PCT these signs must be opposite if there is a control system. Note that if the two signs are the same, the null hypothesis (that this is not a control system) is disproven.*Now I can say something that I didn’t see how to say before, or said
unclearly.

When we model control systems, we normally don’t add any random
fluctuations to variables inside the controller. If we calculated the
correlations between D and CV. and between CV and the output, we would
not find the low correlations indicated in the above diagram. We would
find very high, essentially perfect correlations. But we would still find
something that the standard theory can’t explain: the correlation between
IV and PS would always have a sign opposite to the sign of the
correlation between PS and DV.

The low correlation between IV and CV, or between CV and DV, is not
a fundamental fact about control. It’s caused by the presence of noise in
the controller. Assume that there is an irreducible level of noise in the
control system that lies between CV and DV (between qi and qo in PCT
terms). This means that there will be fluctations of the output even when
there is nothing disturbing the CV at all. If the controller is a good
one, it will reduce the systematic fluctuations in the CV due to the
disturbance to a very small value – in fact, a value that can be less
than the normal unsystematic noise fluctuations in the controller. When
noise effects predominate, the low correlations in the diagram will be
seen.

So the low correlations we find in experimental data, while a unique
feature of control-system phenomena, are not a fundamental indicator of
control. The fundamental indicator is the equal-and-opposite relationship
in the effects of the IV and the DV on the CV or PS. The proximal
stimulus or CV varies much less than we could predict on physical grounds
from the known effects of the disturbance on the CV.

Instead of basing our conclusions about control on the correlations, we
should base them on the magnitude of the effect of disturbance and output
on the input quantity, the CV or PS. Without control, the input quantity
should vary just as the physical links from the disturbance to the input
quantity would predict. With control, the input quantity should vary much
less, and we should be able to show that the reason for this abnormally
small effect is that the output quantity, the behavior, is having a
systematic effect on the same input quantity, opposing the effect of the
disturbance or IV.

Since we are looking now at systematic effects, the statistical entity we
want to use is not the correlation, but the regression coefficients,
particular the slope of the least-squares linear approximation. The slope
of the relation between disturbance and CV should be opposite to the
slope of the relation between behavior and the CV. The correlations then
tell us how accurately these slopes are known.

These slopes can be used in almost the same way we have been using the
correlations. A smaller-than-expected correlation goes with a
smaller-than-expected slope. So the basic reasoning doesn’t change. It
just uses a measure that is easier to calculate and grasp.

Best,

Bill P.

[Martin Taylor 2006.12.16.10.15]

Without commenting directly on the discussion (which I think a bit misinformed) of "traditional" statistical methods, I suggest that if you are interested in the question of correlation between observable disturbance and observable control system output, you have a look at <http://www.mmtaylor.net/PCT/Info.theory.in.control/Control+correl.html>

Martin

[From Bill Powers (2006.12.16.1515 MST)]

Martin Taylor 2006.12.16.1514.

Of course it uses feedback
effects. It’s the usual derivation around the control loop, the same
derivation you used to contradict mine. We both arrive at p = Gr - Gp +
d, which we then develop in two different ways. I simply move the
“G” terms to the other side of the equal sign, giving d = p +
Gp - Gr, whereas you combine the “p” terms, to give

G 1

p = ----- r + ----- d,

1 + G     1 + G

They are exactly the same thing, aren’t they?

No. Mine implies that p is determined by G, r, and d, which is physically
true. Yours implies that d is determined by p, G, and r, which is
physically false. If you vary d, r, or G, p will change. But if you
vary p, G or r, d will not change, even if your equation says it will…

Your derivation is algebraically correct, but false as a description of
physical relationships. Algebra doesn’t know anything about dependent and
independent variables.

So the perceptual signal is a
dependent variable which depends on just two independent variables, r and
d.

Exactly. You like mathematical derivations… so, given the equation you
arrive at (as one does with the usual derivation that I used) d is
equally a function of p, G, and r. Equally, r is a function of d, G, and
p. You know any three of them and you can derive the
fourth.

I prefer mathematical derivations that reflect the physical situation
properly. What you say is algebraically true. But it is not physically
true. If I inject a perceptual signal into a control system, or change
the reference signal or G, there will be no effect on the value of d as
implied by your equation. If I change d, r, or G, there will be an effect
on p as indicated by my equation. I have now said all of that twice,
which makes it true.

Note that G/(1+G) approaches 1
as G becomes much greater than 1. The 90-degree phase shift which you say
reduces correlations to zero is greatly modified by this expression (see
below for the case in which G is an integrator).

No it isn’t. The ONLY reason for the 90 degree phase shift is the
assumption that the output function is a perfect
integrator.

I was pointing out that the phase shift through the whole control system,
or the one implied by the solved equations, is different from the phase
shift through the output function. You apparently read my remark to mean
that the phase shift in the output function itself is modified, which is
not what I meant to say. Or what I meant to mean.

Even with the perfect
integrator, the output varies so it remains about equal and opposite to
the disturbance, with a phase shift that varies from zero at very low
frequencies to 90 degrees at very high frequencies where the amplitude
response approaches zero.

The phase shift in question is the phase shift between the error signal
and the output signal. A pure integrator gives a 90 degree phase shift at
ALL frequencies. The integral of a cosine is the corresponding sine, and
vice-versa.

Yes, but the next cited part explains what I meant:

The negative feedback makes the
frequency response of the whole system different from the frequency
response of the integrating output
function.

The frequency response of the integrating output function is of the form
1/f, with a 90-degree phase shift over the whole range of frequencies.
The frequency response of the whole system, as determined by varying the
frequency of a sine-wave disturbance or reference signal and observing
the output quantity, is not of that form.

For one thing, if the time
constant of a leaky-integrator output function is T seconds, the time
constant of a response of the whole system to a disturbance is T/(1+G),
where G is the loop gain.

I have a note about the leaky integrator and its effect on the frequency
effects on the correlation near the end. A leaky inegrator is not a
perfect integrator.

The leakiness is not the point. Even with a perfect integrator, there
will be a negative correlation between d and o, higher at lower
frequencies but present for any real waveform. What I said would have
been clearer if I had just deleted the side-remark about leaky
integrators.

Separating the real and
imaginary parts, we have

G^2
Gw
= ------------- - j

        G^2 +

w^2 G^2 + w^2
From this we can see that as the integrating factor G increases. and as
the frequency decreases (remember that w is 2pifrequency), the real
part of the factor G/(1+G) approaches 1. As G increases and wincreases, the imaginary (90-degree phase shifted) part approaches
zero.

For the loop as a whole, yes. I think my animated diagram illustrates
this. Actually, you don’t need to go into that kind of complex arithmetic
analysis. All you need is the knowledge that the Laplace transform is
linear, and you can operate on the transforms as though they were simple
scalar variables.

Yes, but their meaning is not all that transparent. To me, anyway. They
may look like scalar variables, but they aren’t.

The correlation of the
error signal with the output of the integrator will always be zero.
However, a correlation lagged 90 degrees will be
perfect,

You can’t “lag” the correlation 90 degrees, except at one
frequency. The correlation is time-domain, and you can only lag it by
delta t. There will be a frequency (an infinite set of them, actually)
for which a given delta t gives a lagged cross-correlation of unity, but
that’s a complete red-herring in this discussion.

If you calculate a correlation between sin(wt) and cos(w(t - tau)), where
tau is set to correspond to a phase shift near the low end of the
observed range of frequencies, there will be a nonzero correlation
between those two functions, because the low-frequency amplitudes are
greater than the high-frequency amplitudes. So it’s only a partial red
herring – say a pale pink herring. I did forget that the correlation for
a given lag will not be perfect for other frequencies

I hope mine doesn’t, too. I
think you have quite misunderstood it. I could be quite wrong, but when I
went through it again this morning, I didn’t find a mistake. Your
comments haven’t (yet) helped me to find a mistake.

Perhaps this comment will clarify what I do and don’t consider to be
mistakes.

Best,

Bill P.

[Martin Taylor 2006.12.17.14.16]

[From Bill Powers (2006.12.16.1515 MST)]

Martin Taylor 2006.12.16.1514.

Of course it uses feedback effects. It's the usual derivation around the control loop, the same derivation you used to contradict mine. We both arrive at p = Gr - Gp + d, which we then develop in two different ways. I simply move the "G" terms to the other side of the equal sign, giving d = p + Gp - Gr, whereas you combine the "p" terms, to give

      G 1
p = ----- r + ----- d,
    1 + G 1 + G

They are exactly the same thing, aren't they?

No. Mine implies that p is determined by G, r, and d, which is physically true. Yours implies that d is determined by p, G, and r, which is physically false. If you vary d, r, or G, p will change. But if you vary p, G or r, d will not change, even if your equation says it will..

The equation says nothing of the sort. It can equally be written p - d + Gp - Gr = 0, or p - d = Gr - Gp, or ... All say EXACTLY the same thing. There's no implication whatsoever about causality. You are, in effect, saying that if a=b+c, it's wrong to write c = b-a.

Your derivation is algebraically correct, but false as a description of physical relationships.

Apart from being untrue, isn't that irrelevant when the question addressed by the analysis is the limit on the correlation between o and d?

Algebra doesn't know anything about dependent and independent variables.

Precisely the point !!!

If I change d, r, or G, there will be an effect on p as indicated by my equation. I have now said all of that twice, which makes it true.

I thought three times was the requisite number :slight_smile:

Note that G/(1+G) approaches 1 as G becomes much greater than 1. The 90-degree phase shift which you say reduces correlations to zero is greatly modified by this expression (see below for the case in which G is an integrator).

No it isn't. The ONLY reason for the 90 degree phase shift is the assumption that the output function is a perfect integrator.

I was pointing out that the phase shift through the whole control system, or the one implied by the solved equations, is different from the phase shift through the output function.

But it is the 90 degree phase shift through the integrator that is relevant to the analysis. Any other is of no interest that I can see.

Actually, you don't need to go into that kind of complex arithmetic analysis. All you need is the knowledge that the Laplace transform is linear, and you can operate on the transforms as though they were simple scalar variables.

Yes, but their meaning is not all that transparent. To me, anyway. They may look like scalar variables, but they aren't.

They are, however, linear. As I remember, it was Oliver Heaviside who started treating derivative and integration operators as though they were algebraic quantities. His only justification was that it worked. I don't know who proved that it was in fact justified, and was true for Laplace operators. All that really matters is that it works, and that it makes analyses like these an incredible amount easier than when you try actually working out the complex numer calculus of the operators individually.

Perhaps this comment will clarify what I do and don't consider to be mistakes.

Perhaps. But I can't agree that it is in any sense a mistake to say a = b+c also means b = c-a. That "mistake" is the crux of your comment.

···

=====================

It is, however, true that the original thread treated of other correlations than the one discussed in <http://www.mmtaylor.net/PCT/Info.theory.in.control/Control+correl.html&gt;, which was the correlation between the disturbance and the perceptual signal. It would be worthwhile to address the correlations between p and o, and between o and d. These three are the signals of concern in the loop.

In the cited analysis, when the reference signal is uniformly zero, the maximum correlation between the perceptual signal and the disturbance is 1/CR, where CR is the control ratio -- (fluctuations in the disturbance)/(fluctuations in the perceptual signal).

The three signals p, o, and d are all shown in the animated image ( o = Gp when r is uniformly zero). The correlation between two signals is the cosine of the angle between their vectors. (With a pure integrator output function, the correlation between p and o is zero). Since sin^2(x) + cos^2(x) = 1, and sin (90-x) = cos (x), then we can compute from the diagram the correlation between d and o given the correlation between d and p.

The maximum corelation between d and p is 1/CR, which means that the absolute value of the minimum correlation between d and o must be sqrt(1 - 1/CR^2).

When p and Gp are not orthogonal (e.g. when the output function is not a perfect integrator) this analysis fails and a more complete analysis is required.

It's always possible that there's a gross mistake in the analysis, but I can't see it yet.

Martin

[From Bill Powers (2006.12.17.1530 MDT)]
Martin Taylor 2006.12.17.14.16 –
Not to drag this out, but I’ve very confused about something you
seem to be saying. In a previous post you said the correlation between
two vectors was the cosine of the angle between them. In this post you
say

In the cited analysis, when the
reference signal is uniformly zero, the maximum correlation between the
perceptual signal and the disturbance is 1/CR, where CR is the control
ratio – (fluctuations in the disturbance)/(fluctuations in the
perceptual signal).

This makes it sound as if a correlation is just a partial derivative. My
impression has always been that that is what a regression line’s slope
indicates, while the correlation tells us how much scatter there is
around that mean relationship. Can’t you have a correlation between x and
y of 0.9 even when the slope of the relationship is y = 0.1x? Have we
been talking about different things again?

Best,

Bill mP.

[From Bill Powers (2006.12.17.1125 MST)]

Martin Taylor 2006.12.17.22.22 –

[From Bill Powers
(2006.12.17.1530 MDT)]
Martin Taylor 2006.12.17.14.16 –
Not to drag this out, but I’ve very confused about something you
seem to be saying. In a previous post you said the correlation between
two vectors was the cosine of the angle between them. In this post you
say

In the cited analysis, when the
reference signal is uniformly zero, the maximum correlation between the
perceptual signal and the disturbance is 1/CR, where CR is the control
ratio – (fluctuations in the disturbance)/(fluctuations in the
perceptual signal).

This makes it sound as if a correlation is just a partial
derivative.

How do you get such an implication out of this?

I thought a partial derivative is the ratio of a change in one variable
to a change in another variable that covaries with the first one. You’re
describing a ratio of a (change in a disturbance) to a (change in a
perceptual signal), which is my understanding of what a partial
derivative is.

CR is an amplitude ratio that
has a potential range of between 1 and infinity, and you know the
derivation of it quite precisely, since you’ve been criticizing that same
derivation with some authority over the last couple of
days.

But suppose this amplitude ratio CR is 10 – that is, fluctuations of the
disturbance correspond to fluctuations in the perceptual signal one tenth
as large. If there is no noise in the system, the correlation between the
perceptual signal and the disturbance will be 1.0, not 0.1 as you imply
(1/CR = 0.1). A correlation is not just a ratio of amplitudes, is it?
That’s why I found the reference to the cosine of the angle between two
vectors confusing – that’s just an amplitude ratio, and would not seem
to have anything to do with the correlation coefficient. When you say
“maximum correlation” are you talking about Pearson’s r, or
just about a covariation?

Can’t you have a
correlation between x and y of 0.9 even when the slope of the
relationship is y = 0.1x? Have we been talking about different things
again?

Of course you can. I fail to see where you are coming from.

Ar you OK with the result that “the absolute value of the minimum
correlation between d and o must be sqrt(1 - 1/CR^2)?” if there’s no
loop transport delay, no noise, and the output function is a pure
integrator?

No, I’m not. Without any noise or delay, the correlation between d and o
will be 1.000, the maximum possible, regardless of their relative
amplitudes. I must be missing something here. The phase shift has nothing
to do with the correlation, either – only with amplitude ratios.
Correlation describes the degree of randomness in a relationship, not a
ratio of amplitudes.

Is it that you’re using “correlation” not in the statistical
sense but in an informal sense? I must be suffering some serious
misapprehension here.

Best,

Bill P.

[From Rick Marken (2006.12.18.1255)]

Bill Powers (2006.12.18.1145 MST)]

Rick Marken (2006.12.18.0900) --

So Martin's formula is correct: the minimum correlation between d and o is 1-1/CD^2

assuming CR is var(d)/var(p)

I don't see how he could mean "variance" when he specifically says "CR is an amplitude ratio that has a potential range of between 1 and infinity, and you know the derivation of it quite precisely, since you've been criticizing the same derivation with some authority over the last couple of days." That derivation had no random variables in it.

Yes, but if you ignore that it seems to work OK;-)

Also, he said "Are you OK with the result that "the absolute value of the minimum correlation between d and o must be sqrt(1 - 1/CR^2)?" if there's no loop transport delay, no noise, and the output function is a pure integrator?" That, too, implies that there are no random variables. If nothing is random, the correlations all have to be 1.0 if there is any variation at all. All relations are completely systematic, with no uncertainties.

I'm sure Martin will explain.

Yes, I assume so.

By the way, on an unrelated note, I've been reading a wonderful book by Bill Bryson called "A short history of nearly everything". It's a wonderful overview of the history of science. Extraordinarily well written with wonderful anedotal stories. I was just reading about Einstein and was struck by how much his situation was like yours. Like you Einstein had just an undergraduate degree, was not part of the physics "establishment", was working in the patent office (as you working in medical physics), developed his ideas pretty much on his own (as you did) and then published them in major journals (as you published in _Science_ and _Psych Review_). The only difference is that the value of Einstein's ideas was seen almost immediately. I think that's because physics is a much more mature science than psychology.

Another fun thing I noticed in the book was mention of Bruce Gregory (p. 129), a. He gets a parenthetical mention as an astrophysicist who determined that if galaxies were frozen peas there would be enough to fill old Boston Garden. Whatever happen to ol' Bruce?

Best regards

Rick

···

----

Richard S. Marken Consulting
marken@mindreadings.com
Home 310 474-0313
Cell 310 729-1400

[Martin Taylor 2006.12.18.17.05]

[From Bill Powers (2006.12.17.1125 MST)]

Martin Taylor 2006.12.17.22.22 --

Without any noise or delay, the correlation between d and o will be 1.000, the maximum possible, regardless of their relative amplitudes.

OK. That's a direct contradiction of my analysis. I'd appreciate seeing your derivation. If you can't find a problem in my derivation, and I can't find a problem in yours, there must be a subtle problem somewhere. If your claim is true for all values of the gain, then my derivation must be wrong. I'd really, really, appreciate it if you could show me where.

To be precise, we are talking about a loop in which there's no transport lag, the output function is C*integral (error) dt, and you are saying that for all values of C, the correlation between d and o is 1.000.

I presume, although you did not mention it, that your claim does not include C<=0, but that for any non-zero and non-negative value of C. even .00001, the perfect correlation must hold.

If what you say is true, then of course my derivation is simply wrong, and I'd like to find the flaw.

I must be missing something here. The phase shift has nothing to do with the correlation, either -- only with amplitude ratios. Correlation describes the degree of randomness in a relationship, not a ratio of amplitudes.

I don't know why you would assert that phase shift has nothing to do with correlation. Just imagine a trivial situation, and ask what is the correlation between cos(x) and cos(x + phi). Phi is the phase shift, and when it's zero the correlation is unity, and when it's 90 degrees the correlation is zero. In the more complex situation we are talking about, the phase shift across the integrator has everything to do with the correlations between d and o and between d and p. At least it does in my analysis. If you want to demonstrate the error in my analysis, you have to do more than simply assert that "phase shift has nothing to do with correlation".

Of course correlation describes the precision with which one variable tracks variations in another, as you say. Correlation is equal to the cosine of the angle between the vectors representing two waveforms, isn't it? Isn't that a representation of the degree of randomness in a relationship? That the ratio of amplitudes between p and d turns out to be equal to their maximum correlation is an interesting result, not a definition.

I do wish you would _read_ the derivation you so severely criticise. All that the derivation produces as a result is that the MAXIMUM correlation between disturbance and perception signals is equal to the ratio of their fluctuation amplitudes, which will be between zero and 1.0. It's just what falls out of the derivation.

I've said over and over that the derivation seems too simple, and I'm hoping someone will demonstrate the flaw that may well exist, but all I get is commentary on my explanatory e-mails, which seem after all not to be too explanatory. Please, please, refer to the original <http://www.mmtaylor.net/PCT/Info.theory.in.control/Control+correl.html&gt; and tell me what's wrong with it.

(For Rick's benefit, I should say that I used the term Control Ratio differently from him, taking amplitude rather than variance).

Using the seemingly straightforward derivation referenced, I then used the expression sin^2(x) + cos^2(x) = 1 to derive the mimimum correlation between d and o. Again, that might be wrong, but I can't see why it should be, and if it is, I'd like to be shown wherein it is wrong.

The maximum correlation between d and p and the minimum correlation between d and o both should be the actual correlation in the absence of noise, transport lag, and reference variation, and that ought to be testable in a simulation that uses a very fine value of dt.

Is it that you're using "correlation" not in the statistical sense but in an informal sense?

No.

I must be suffering some serious misapprehension here.

Apparently. I think it may be a bit like my misapprehension in the first few days after I heard of PCT. I couldn't believe that we were talking about actual, analyzable control systems in the engineering sense. Once I got over that misapprehansion, analyses such as the one on correlation we are talking about became possible. Why don't you just assume that I'm trying to be moderately mathematically rigorous, as you have recently urged? I may well be wrong, but I'm not being excessively informal (even though I drank 2/3 of a botle of nice wine a few hours ago :slight_smile:

[From Bill Powers (2006.12.18.0020 MST)]
I found an old "track-analyze program" and set it so the model uses no delay and a perfect output integrator instead of the optimum delay and damping ("leakage") settings.

Unfortunately, you included only a trace of a person's track and a model handle track with a record of the deviation between them, rather than a disturbance track along with the model handle track and a record of the perceptual signal. We haven't been discussing how well the model fits what a person does. We are dealing with the correlations in the signals within a model. We need to see the correlations among d, o and p signals in the model. You probably could get them from the same run of the model as the one you graphed with the person's track.

I know you can't create a computer-based model with zero transport lag, but you ought to be able to provide a reasonable simulation test of the analysis by using dt very small compared to 1/W (where W is the bandwidth of the disturbance signal). We should be able to test the correctness of the analysis, at least to a first approximation, by using a very small dt, a low value of c, and a band-limited noise as a disturbance.

The analysis says that the correlation between perceptual and disturbance signals should be (close to) p/d the ratio of perceptual signal fluctuation amplitude (p) to disturbance signal fluctuation amplitude (d), and the correlation between disturbacne and output signals should be (close to) sqrt(1-(p/d)^2). I say "close to" because of the inevitable artefacts of digital simulation.

Just putting an example number on it, if control reduces the perceptual RMS fluctuation to 1/10 of the disturbance RMS, correlation between the perceptual and disturbance signal should be no greater than 0.1, and that between disturbance and output should be no less than approximately 0.98 (sqrt 0.99).

I'd be quite happy to be shown to be wrong, and even happier to be shown analytically where the derivation is wrong.

Martin

[From Bill Powers (2006.12.19.0916 MST)]

Martin Taylor 2006.12.18.17.05 –

If what you say is true, then of
course my derivation is simply wrong, and I’d like to find the
flaw.

I must be missing
something here. The phase shift has nothing to do with the correlation,
either – only with amplitude ratios. Correlation describes the degree of
randomness in a relationship, not a ratio of
amplitudes.

I don’t know why you would assert that phase shift has nothing to do with
correlation.

Well, right at this moment, I don’t, either. When I said I might be
suffering some misapprehensions, I wasn’t being modest or tactful. What’s
happening is that I’m uncovering some assumptions about statistics that
I’ve been relying on without really understanding the subject. I had
thought there was something special about the presence of random
variables in the definitions of things like sigma and the correlation
coefficient, but I’m now suspecting that this thought was erroneous. I’m
looking right now in my mathematics manual’s statistics section, and the
equations I see don’t say a thing about random variables.
In the formula for the correlation coefficient, we find three standard
deviations (sigmas, after the greek letter usually used – in my field of
electronics, we call standard deviations the “RMS” or
root-mean-square value of the variability). For the correlation of X and
Y, we have sigma(X), sigma(Y), and sigma(XY). It is the third one, the
covariance, that we are talking about here. If X = Asin(t) and Y =
B
cos(t), then over an integer number of cycles,
sigma(XY) = sqrt[AB(Sum((X - averageX)(Y - AverageY)]/N
is zero.
The correlation coefficient is simply
Sigma(XY)
r = ---------------- ( the A and B factor out)
Sigma(X)Sigma(Y)
Therefore if Sigma(XY) is zero, r is zero. QED.
Now I can inch a little closer to the problem.What if sigmaX and SigmaY
are also zero? Then we have zero divided by zero, and the value of this
ratio is not necessarily zero. It could actually be any value. This
is more or less what I was thinking, probably less. And that’s not
really the answer.
How could sigma(X) or sigma(Y) be zero, when they are sine or cosine
waves with nonzero amplitudes? This is where the idea of randomness
figured in my thinking. I was thinking that sigma for a systematic
relationship would be zero, because I was somehow thinking that sigma was
a measure of random scatter not relative to zero, but relative to some
expected systematic relationship. In other words, I was thinking that Y
was some systematic, completely predictable,. function of X, plus a
random variable that reduced the predictability. For example,if X =
sine(t), a noisy integrator would make

Y(t) = cosine(t) + random(t).

If Y is a systematic function of X,. any systematic function whatsoever,
I was assuming that the correlation of X and Y would be 1.000… .In
other words, I was assuming, as I said in my last post, that a
correlation was a measure of predictability, with a perfect correlation
meaning that Y is perfectly predictable from X. I see now that this is
not true at all.

Or maybe it’s true, but not when the true relationship between Y and X is
anything but a straight line, Y = aX + b + random(t). Is that it? Are
these statistical measures based on the idea that the only relationship
we can assume is a simple proportionality plus an offset?

If that really is the problem, then these elementary statistical
calculations are not appropriate for a model in which there are
systematic relationships other than simple proportionalities. If we have
an integrator as an output function, then obviously trying to represent
it with the equation qo = a*e + b is simply a mistake, and that has
nothing to do with statistics. We are using the wrong model, not finding
that the output is unpredictable from the error signal. In the absence of
random noise, the output is perfectly predictable from the error signal
– it’s just not predictable from a linear algebra equation. So this
“zero correlation” does not mean lack of
predictability!

Going back to the manual, I do find one measure that resembles what I was
thinking pretty closely: Chi-squared. Chi-squared is defined as the sum
of the squares of departures of X from the expected value of X, (E(X),
divided by the expected value of X.

    SUM[X - E(X)]^2

Chi^2 = ----------------

E(X)

Now we see that if Y is the time integral of X, and if X is cos(t), then
the expected value of Y is E(X) = cos(t) and, for a perfect integrator in
the absence of noise, Y is also cos(t), so Chi^2 is zero, with some
singularities at each crossing of the time axis. The singularities can be
avoided by eliminating the division by E(X).

In fact, when I calculate how the control-system model fits the data from
a tracking experiment, I use exactly the numerator of the expression
above, where E(X) is the value of a variable predicted by a model, and
instead of dividing by E(X) I divide by the maximum value of X, so the
departures from the predicted value are given as a fraction of the range
of variation of the variable being predicted. I add the time variable t
to indicate that X is a continuous function of time, and take the square
root: Call this measure Q:

SUM[X(t) - E(X(t))]^2

Q = ---------------------

     Range(X(t))

Where Range(X(t)) is defined as (Xmax - Xmin).

As it happens this is exactly the reciprocal of the definition of
signal-to-noise ratio as used in electronics.

So, Martin, it turns out that you made no mistakes. I simply had a wrong
idea of what a correlation is. In fact, I’m seeing that I had a wrong
idea about a lot of things in statistics. It’s a whole lot simpler than I
thought it was, except for the parts about computing probabilities from
the properties of different distributions like the Gaussian and Poisson
distributions. I’ve been using some of the standard calculations, like
the t-test, without even knowing that the calculations had names. David
Goldstein informed me this morning that even my so-called “stability
factor” is simply a version of the f-test. Nothing new under the
Sun, I guess.

Over to you.

Best.

Bill

[From Rick Marken (2006.12.19.1730)]

Bill Powers (2006.12.19.0916 MST)

Is that it? Are these statistical measures based on the idea that the only relationship we can assume is a simple proportionality plus an offset?

The Pearson r correlation is a measure of degree of linear relationship between two variables, yes.

If that really is the problem, then these elementary statistical calculations are not appropriate for a model in which there are systematic relationships other than simple proportionalities.

Actually, unless you are using it for statistical purposes (for example, for deciding whether the observed r could have come from a population where the actual correlation between variables is actually 0) r provides a reasonable measure of the degree of relationship between two variables assuming the relationship is monotonic. But a better statistic for measuring the degree to which there is a monotonic relationship between variables is Spearman's rho. But the best way to evaluate non-linear relationships is to guess what the nature of the relationship might be and then use regression analysis to see which hypothetical relationship gives the best fit to the data.

So, Martin, it turns out that you made no mistakes. I simply had a wrong idea of what a correlation is. In fact, I'm seeing that I had a wrong idea about a lot of things in statistics. It's a whole lot simpler than I thought it was, except for the parts about computing probabilities from the properties of different distributions like the Gaussian and Poisson distributions.

Right, it's deriving the _sampling distributions_ for sample statistics like r, t, and F, that is difficult.

I've been using some of the standard calculations, like the t-test, without even knowing that the calculations had names. David Goldstein informed me this morning that even my so-called "stability factor" is simply a version of the f-test. Nothing new under the Sun, I guess.

The stability measure, S, is similar to the F statistic inasmuch as both are variance ratios. I noticed this way back when I was first doing my "mind reading" studies, using the S statistic. But your (our) use of the S statistic is not really like an F test. In the F test, the F statistic is used as a _decision variable_ for hypothesis testing. We use the S value, not to test hypotheses but, rather, as a _measure_ of quality of control (bigger S = better control, at least the way I calculate it)

Use of the F statistic for hypothesis testing requires knowing the sampling distribution of F (when the null hypothesis is true) and the degrees of freedom used in computing it. When you know these things, then you can use the observed value of F to decide whether to reject the null hypotheses (with a sufficiently low probability that the decision is a Type I error -- the error of rejecting the null when it's true).

You could use S as a decision variable as well. The null hypothesis would probably be that the system is _not_ a control system (population S = 1 because expected and observed variance of the CV are equal). In order to use S as a decision variable you would have to derive the sampling distribution of S assuming the null hypothesis is true. This sampling distribution would probably look a lot like the F distribution. Deriving the actual sampling distribution of S is a job for a real statistician. But this would be a way to bring statistical hypothesis testing into control theory research. And we know how psychologists love statistics. Maybe having a statistician derive the S distribution would be just the thing we need to get PCT research accepted.

Best regards

Rick

···

----

Richard S. Marken Consulting
marken@mindreadings.com
Home 310 474-0313
Cell 310 729-1400

[From Bill Powers (2006.12.20.0400 MST)]

Rick Marken (2006.12.19.1730) --

Thanks for the added info about statistics. A couple of questions that my math manual doesn't answer --

But a better statistic for measuring the degree to which there is a monotonic relationship between variables is Spearman's rho.

Do you really mean "monotonic" here? Monotonic means just not changing between positive and negative slope, so y = ax^2 is monotonic for positive x, and also for negative x, but not for both in one expression. A cubic or higher-order relationship can change between positive and negative slopes more often, while sines and cosines change sign of slope repeatedly. A LINEAR relationship never changes slope, so is always monotonic.

Anyway, what is Spearman's rho?

But the best way to evaluate non-linear relationships is to guess what the nature of the relationship might be and then use regression analysis to see which hypothetical relationship gives the best fit to the data.

Is regression analysis what we do with our modeling?

Best,

Bill P.

[From Rick Marken (2006.12.20.0820)]

Slouching towards the solstice.

Bill Powers (2006.12.20.0400 MST)--

Rick Marken (2006.12.19.1730) --

Thanks for the added info about statistics. A couple of questions that my math manual doesn't answer --

But a better statistic for measuring the degree to which there is a monotonic relationship between variables is Spearman's rho.

Do you really mean "monotonic" here? Monotonic means just not changing between positive and negative slope, so y = ax^2 is monotonic for positive x, and also for negative x, but not for both in one expression. A cubic or higher-order relationship can change between positive and negative slopes more often, while sines and cosines change sign of slope repeatedly. A LINEAR relationship never changes slope, so is always monotonic.

Yes, I mean monotonic in that sense. If there is strong non-monotonicity in a relationship between X and Y variables, as in a quadratic function over the domain from minus to plus infinity (for X) then obviously any correlation measure would be very low.

Anyway, what is Spearman's rho?

It's a correlation coefficient that uses the ranks of the X and Y values being correlated rather than the X, Y values themselves. Small deviations from monotonicity will have less effect on rho then on r. Obviously, you have to use these correlation measures sensibly; if the plotted relationship between X and Y is clearly quadratic, say, then using a correlation measure would be ridiculous.

But the best way to evaluate non-linear relationships is to guess what the nature of the relationship might be and then use regression analysis to see which hypothetical relationship gives the best fit to the data.

Is regression analysis what we do with our modeling?

No. I think what we do is evaluate goodness of fit using the inter-ocular trauma test. At least that's what I do, though I have published statistical tests in order to make it into journals more easily. For me, if the RMS deviation of model from data less than 3%, say, then I think I'm safe in saying that I'm probably on the right track. No fancy statistical tests necessary;-)

Best

Rick

···

---

Richard S. Marken Consulting
marken@mindreadings.com
Home 310 474-0313
Cell 310 729-1400

[Martin Taylor 2006.12.20.17.35]

[From Bill Powers (2006.12.19.0916 MST)]

Going back to the manual, I do find one measure that resembles what I was thinking pretty closely: Chi-squared. Chi-squared is defined as the sum of the squares of departures of X from the expected value of X, (E(X), divided by the expected value of X.

You are getting very close to thinking about information theory here.

It's ironic that now, when I am talking about the correlations aomng the signals associated with the control loop, you have become interested in the uncertainties of the relations among the signals, the topic of information-theoretic analyses.

... the definition of signal-to-noise ratio as used in electronics.

Again, one of the variables that often appears in information-theoretic analyses.

So, Martin, it turns out that you made no mistakes.

I'm still not betting on that. My analysis still seems to me to be too simple, and to come up with a result that is too coincidental. It may be correct, but I don't trust it, which is why I pointed people to it in the first place.

[From Rick Marken (2006.12.19.1730)]

In the F test, the F statistic is used as a _decision variable_ for hypothesis testing. We use the S value, not to test hypotheses but, rather, as a _measure_ of quality of control (bigger S = better control, at least the way I calculate it)

You may be amused to know that in one of my early papers, I used set of F-test values as measures of quality, and were themselves subject to statistical testing to estimate how an effect changed with variation in some parameter. I can't remember which paper it was, but if it's of any interest I could try to dig it up. It was a ling time ago :frowning:

F-tests are in themselves no more than measures. If someone wants to use the measure as a decision statistic involving "significance levels", they lay themselves open to all the criticisms I mentioned the other day. Actually, the whole Analysis of Variance procedure has an exact information-theoretic analogue, developed by McGill and Garner in the late 40s or early 50s. The same applies. The number emerging from the calculation can be useful, but if it's used in an unwarranted manner, its usefulness vanishes.

Martin