[spam] Re: PCT-Specific Methodology, and vectors

[Martin Taylor 2006.12.22.20.58]

[From Bill Powers (2006.12.22.1630 MST)]

I think I've forgotten why we got into this in the first place.

It was triggered by my response to [From Bill Powers (2006.12.16.0555 MST)] in which you were following up on questions of correlation in behaviourist and PCT approaches to data analysis. I thought the analysis of maximum and minimum correlations as a function of control quality might prove useful in the discussion.

And that's pretty close to all I know about Laplace transforms. It now seems compatible with what you were saying. I still feel that the "vectors" in question need some fu,rther investigation, but you seem to think so, too, so I'll leave it there.

I suppose you wrote this before seeing my response to Bruce earlier today. If not, then we can pursue that if you want, but it probably should be in a thread with a new subject line!

All the same, it might not hurt for me to lay out the basis for thinking of a waveform as a vector, since doing so makes thinking about all these transforms so much more intuitive.

We can start with the Nyquist limit, which says that for a signal of bandwidth W Hz, the signal can be recovered completely from a set of samples placed epsilon more closely than 1/2W seconds apart. So, for a signal of duration T seconds and bandwidth W, there are 2TW independent and equally spaced time samples. The signal value at any other moment can be recovered exactly from those samples.

Of course, real signals are not hard-limited to a particular band of width W, and there are lots of theoretical analyses that provide equivalent W values for signals whose frequency envelope has any specified shape. The end result of all of them is that one can describe a waveform exactly using 2TW scalar numbers. These numbers can be taken to be the values of the components of a vector. The values of the components of any vector are its projections on the basis vectors of the space within which it is specified. In the case of the waveform, the space has 2TW dimensions.

The set of basis vectors can be freely rotated and translated without moving the vector. All that happens is that the vector becomes represented by a new set of component values in the newly redefined space. It's still the same vector, though:

---x a 2D vector \ x The same 2D vector in a rotated space
  > \ / \ /
  > \ / \/ (imagine these axes
  > / / to be at right angles)

--------- \ /
                              \/

The arbitrary rotation and translation of the axes forms a "linear transformation" of the space. When one does it in practice, one usually thinks of the vector being transformed, when really it's the space that changes, not the vector.

A Fourier Transform is an example of applying a linear operator to the space of description of a waveform. Before the transform, there are 2WT samples in the "time domain". After the transform, we have the same 2WT samples, but now they are repreesnted by values projected onto a new set of axes (basis vectors) that we call the "frequency domain". It's the same waveform, described equally accurately in either domain, and there is an infinite set of other rotations of the space that would provide 2WT values for the vector components.

If the linear transform is just a rotation of the basis space, as is the Fourier transform, then the Euclidean distance of the point that representes the vector retains its distance from the origin of the space. In other words, the sum of the squares of the component values remains unchanged by the transformation, which is why you can find the energy of a waveform by summing the squares of the amplitudes of the time samples or of the frequency components.

I find that kind of visualization really helpful in dealing with these transform questions. The nitty-gritty algebra and calculus may be necessary in order to find the actual components in each particular case, but for me it gets in the way of understanding what is going on.

Martin

[From Bill Powers (2006.12.23.0930 MST)]

Martin Taylor 2006.12.22.20.58 –

The set of basis vectors can be
freely rotated and translated without moving the vector. All that happens
is that the vector becomes represented by a new set of component values
in the newly redefined space. It’s still the same vector,
though:

—x a 2D vector
\ x The same 2D vector in a
rotated space

\ / \ /

\ / / (imagine these axes

/ / to be at right
angles)


\
/
/
I won’t argue with this way of looking at things, because I’m not sure
how it applies to Laplace transforms. I think it is a good example of the
way the human brain is equipped to generalize, to find similarities
behind all the differences. My own inclination is more aimed at finding
differences among apparent similarities, which is also something useful
that the brain can do. Perhaps that’s why it’s hard for me to see a
similarity (other than a formal one) between a spatial vector and a list
of numbers representing frequencies… While you can represent
things like that in hyperspace, and I sometimes do, I don’t see that it’s
required.

I find that kind of
visualization really helpful in dealing with these transform questions.
The nitty-gritty algebra and calculus may be necessary in order to find
the actual components in each particular case, but for me it gets in the
way of understanding what is going on.

Implying, I suppose, that anyone who doesn’t see it this way doesn’t
understand what is going on. The “nitty-gritty algebra and
calculus,” on the other hand, however grubby and lowbrow it may be,
gives me more of an illusion of understanding than does looking for the
most general generalizations. Of course all of that’s a matter of taste.
I don’t even like matrix algebra because if I just do the abstract
manipulations according to the rules, I lose all track of what the
equations mean at the “nitty-gritty” level where things
actually happen. Also, I notice that others lose track, too, in the few
cases where I have been able to see it both ways. However, I’ve used
matrix calculations in my multidimensional models because it’s easier
that way to make sure everything is being done right. It’s not that I’m
against matrix algebra or transform methods. It’s just that I like to
understand things closer to the level where the action is taking place,
and I can’t usually work at both levels.
I really think that something went astray when, in the late 40s and the
1950s everyone started doing frequency-domain analysis of control systems
(which I learned but didn’t like). If you’ll remember, I got into that
with Flach, and was irritated enough to develop that “non-adapting
adaptive system” demo showing that McReuer and Jex were victims of
an illusion when they thought they had proven that the human system
“adapted” just because the Bode plot changed when the
characteristics of the load were changed. There are a few disadvantages
in dealing with phenomena at too abstract a level, the main one being a
temptation to put too much faith in what the mathematics seems to be
saying, and worse, to miss seeing that with a different underlying model,
everything might work differently, and better. If you fall in love with
the mathematics, I think you get stuck with bad models.
I like your suggestion in another post that it would be good for someone
(you or me) to expand the tracking model to include more data-taking. No
reason we couldn’t include calculating some correlations, too. Your
concerns about getting sufficient time resolution using discrete analysis
are unnecessary – we just keep reducing the size of dt until further
reductions make no material difference in the results. That’s what I’ve
been doing all along. The 1/60 second size of dt that I have settled on
gets us within a very small episilon of what we would see in a tracking
experiment if dt were one microsecond or one nanosecond. The Nyquist
criterion was of the most interest for modelers (if not telephone
equipment designers) back when one had to limit the number of data points
accumulated during an experimental run because of slow computing speeds
and limited memory, neither of which is a problem now. Nyquist defined
the minimum necessary sampling rate, but there’s no reason to avoid
higher sampling rates if they’re easy to attain. And with higher sampling
rates, one can stop making excuses for throwing away sharp edges in the
picture on the grounds that they’re not important, a decision that is
often made after one discovers that the required sampling rate
can’t be achieved.

I should add that in the present case, a control system with a simple
integrator as an output function, all the details of the behavior can be
calculated without getting out of our depth.All it needs is for someone
to sit down and do that (ugh) nitty gritty calculus and algebra stuff.
Maybe we could hire an illegal immigrant who doesn’t mind getting his or
her hands dirty.

Or look up the answers in a textbook.

Best,

Bill P.

Re: PCT-Specific Methodology, and
vectors
[Martin Taylor 2006.12.23.14.24]

[From Bill Powers (2006.12.23.0930
MST)]

Martin Taylor 2006.12.22.20.58 –

The set of basis vectors can be freely
rotated and translated without moving the vector. All that happens is
that the vector becomes represented by a new set of component values
in the newly redefined space. It’s still the same vector, though:

—x a 2D vector
\ x The same 2D vector in a
rotated space

\ / \ /

                \ 

/ / (imagine these axes

/ / to be at right
angles)

--------- \
/
/

I won’t argue with this way of looking at things, because I’m not sure
how it applies to Laplace transforms.

Laplace transform is just another linear operation, meaning a
different rotation of the space. To take a Laplace transform neither
increases nor decreases the effective number of sample dimensions
(2WT).

I think it is a good example of the
way the human brain is equipped to generalize, to find similarities
behind all the differences. My own inclination is more aimed at
finding differences among apparent similarities, which is also
something useful that the brain can do.

In the “Psychology of Reading” (Academic Press, 1983)
that Ina and I wrote, we devoted quite a lot of space to this
distinction. Seeing similarities seems to be a different operation
from seeing differences, and in most brains the two work to complement
each other. Some people favour one more than the other. The one
(similarity-seeking) process gets a quick and dirty answer that is
usually right, whereas the other (analytic) may not get an answer,
takes a long time, but is precise when it does get an answer. The
analytic process, according to our theory, often double-checks what
the similarity-seeking process comes up with. They both happen in
parallel in the same brain.

Whether or not the brain really works that way, I think it’s a
good way for people to work. Quick and dirty back-of-the-envelope
answers checked by formal analysis avoid the big errors that you refer
to in connection with Flach, while avoiding the imprecision inherent
in the quick-and-dirty approach.

Perhaps that’s why it’s hard for me
to see a similarity (other than a formal one) between a spatial vector
and a list of numbers representing frequencies… While you can
represent things like that in hyperspace, and I sometimes do, I don’t
see that it’s required.

No, it’s not required. It’s just helpful (to me). However, the
connection is more than formal. It extends to the operations you can
perform on the numbers or in the space, and the results in one
representation are exact mirrors of results in the other. A vector is
just another way of representing a list of numbers, which might
specify a signal waveform of bandwidth W and duration T if there are
2WT of them. People who tend to see similarities often also tend to
like pictorial ways of representing things, whereas people who tend to
see differences also often also prefer to work with symbolic analyses.
Neither is better nor worse, but usually if both are appropriate, they
are more powerful when working together.

For me, a visual vector representation makes it easier to see
what is likely to happen when you operate on the list of numbers. If
an operator is linear, like a Fourier transform, and it conserves the
“energy” (sume of squares) of the components, then the
operator simply rotates the space. The number of numbers (dimensions)
won’t change with a space rotation, but if the result has more or
fewer numbers, then we kinow that either some of the extra ones are
redundant (more) or some infomation about the signal has been lost
(fewer).

I find that kind of visualization really
helpful in dealing with these transform questions. The nitty-gritty
algebra and calculus may be necessary in order to find the actual
components in each particular case, but for me it gets in the way of
understanding what is going on.

Implying, I suppose, that anyone who doesn’t see it this way doesn’t
understand what is going on.

Absolutely not! I said “I find”, because it applies to
me. I know very well that other people find understanding comes
through painstaking analysis. Sometimes it does for me, too, but more
often than not I make a silly mistake like a sign inversion when I do
the analysis. If the analysis fails to confirm the visual image, I
usually look more carefully for an analysis error than for an error in
visualisation. Someone whose analytic performance is more precise
might well do the reverse.

It’s not that I’m against matrix
algebra or transform methods. It’s just that I like to understand
things closer to the level where the action is taking place, and I
can’t usually work at both levels.

I think a lot of people would agree with you on that. I don’t
know the proportion, but I would guess around 70%.

The Nyquist criterion was of the most
interest for modelers (if not telephone equipment designers) back when
one had to limit the number of data points accumulated during an
experimental run because of slow computing speeds and limited memory,
neither of which is a problem now. Nyquist defined the minimum
necessary sampling rate, but there’s no reason to avoid higher
sampling rates if they’re easy to attain.

That’s not at all the reason I mentioned Nyquist. The original
point (I think, and certainly the point quite early) was that you need
no more than 2WT+1 points to specify the waveform exactly. Any
extra points are redundant, as their values could be determined from
the values at the other sample points. The main reason we sample more
often is that we can then use much simpler techniques to recover a
waveform that has a non-rectangular bandpass than would be needed if
we sample at the Nyquist rate.

And with higher sampling rates, one
can stop making excuses for throwing away sharp edges in the picture
on the grounds that they’re not important, a decision that is often
made after one discovers that the required sampling rate can’t
be achieved.

Sharp edges mean high bandwidth, which requires high sampling
rate.

I should add that in the present case, a
control system with a simple integrator as an output function, all the
details of the behavior can be calculated without getting out of our
depth.

Of course. However, one of the benefits of the vector approach is
that it is relatively easy to see qualitatively what would happen if
the output function changes, without having to do all the algebra. You
need the algebra to see what would happen quantitatively, but th
qualitative result nevertheless serves as a check on the algebraic
exact result. For example, if the integrator is leaky, the algebra
gets a bit harder. Qualitatively, Gp becomes more nearly aligned with
p if p has much low-frequency energy. In other words, the angle
between p and Gp, which sum to d, is less than 90 degrees, so that p
is more correlated with d than it would be with a perfect integrator.
You can get that result by analysis, I’m sure, but the vector
representation allows you to see it directly.

Similarly, sa Rick asks what variation in the reference signal
would do… The signal r is orthogonal to Gp and to p, and it adds to
them to equal d. It therefore must decrease the correlation beween d
and the other signals (p and o), by an amount that depends on the
relative RMS amplitudes of d and r. How much? (Gr)^2 + (GP)^2 + §^2
= d^2. Noise, likewise, is orthogonal to all the other signals, and
decreases all the relevant correlations.

Martin

[From Bill Powers (2006.12.27.1700 MST)]

Rick Marken (2006.12.23.1111)–

What would be nice is for
someone (you?) to develop a powerful method for determining a controlled
variable using the basic principles of “the test” (looking for
lack of correlation between d and hypothetical qi) but taking into
account the “wild card” in this research: secular variations in
r that are completely unknown. It seems to me that this might require the
development of some new statistical analysis (since the r waveform can
only be known probabilistically) or, perhaps, some clever mathematical
scheme that can factor out possible variations in qi due to variations in
r.

While not directly relevant, consider the strategy I used in the paper in
Hershberger’s Volitional Action volume, “Quantitative
measurement of volition.” The experiment involves two periods of
tracking, with a period between them in which the subject is volitionally
moving the cursor in a preselected pattern instead of making it follow
the target. A model is fit to the first and third segments, to determine
the gain and time constant of the output function (the other functions
are reduced to unity multipliers by choosing the appropriate scaling
factors). Then the inverse of the output function is used to determine
the error signal, and finally the error signal and assumed perceptual
signal are used to deduce the behavior of the reference signal: (r - p) +
p = r.
Because this experiment is done in the presence of substantial
disturbances, the regular behavior of the cursor which the subject tries
to produce in the middle segment is far from obvious. Yet the deduced
reference signal, after low-pass filtering, moves in a very clear set of
stair-steps, which was the intended pattern of the cursor.
As a check, the subject is asked to do the experiment again but with the
eyes closed during the middle third, the pattern being produced by feel
alone. What we get under those conditions is a very clear stair-step
in the mouse position, but of course nothing recognizeable
(because of the disturbances) in the cursor positions, and a nonsensical
random fluctuation as the deduced state of the reference signal. So it
seems fairly clear that this method actually does let us measure the
variations in the reference signal when the visual relationship between
cursor and target is being controlled.

I think that in an indirect way this is a kind of test for the controlled
variable, in that a model that assumes the existence of a control system
is used to obtain a measure that would not be seen if a control system
were not in fact present. That’s the general idea of the usual TCV,
too.

This doesn’t involve statistics per se. but it does use moving averages
to filter out noise, and I suppose it might be possible to devise some
kind of statistical test for the deduced pattern of reference signals.
I’m not sure what you’d compare them to, but maybe there is something we
could use.

Best,

Bill P.

[From Rick Marken (2006.12.27.1805)]

Bill Powers (2006.12.27.1700 MST) --

Rick Marken (2006.12.23.1111)--

What would be nice is for someone (you?) to develop a powerful method for determining a controlled variable using the basic principles of "the test" (looking for _lack of correlation_ between d and hypothetical qi) but taking into account the "wild card" in this research: secular variations in r that are completely unknown. It seems to me that this might require the development of some new statistical analysis (since the r waveform can only be known probabilistically) or, perhaps, some clever mathematical scheme that can factor out possible variations in qi due to variations in r.

While not directly relevant, consider the strategy I used in the paper in Hershberger's Volitional Action volume, "Quantitative measurement of volition."

It is directly relevant and I was thinking of your approach when I suggested finding a clever mathematical scheme that can factor variations in qi due to r. I think it could be used to great effect in my Mind Reading demo. The demo could start with a simple compensatory tracking phase during which a control model of the subject is built. Then the subject makes a pattern with one of the three squares and the computer not only shows which square is being move but also prints the derived reference pattern on the screen. When the subject switches to a new square, the screen pattern is cleared until the pattern for the square that is now being moved is shown. The subjects themselves can evaluate the goodness of fit of the model-based estimate of the reference pattern to their actual reference pattern. Now that would be cool, no?

Best

Rick

The experiment involves two periods of tracking, with a period between them in which the subject is volitionally moving the cursor in a preselected pattern instead of making it follow the target. A model is fit to the first and third segments, to determine the gain and time constant of the output function (the other functions are reduced to unity multipliers by choosing the appropriate scaling factors). Then the inverse of the output function is used to determine the error signal, and finally the error signal and assumed perceptual signal are used to deduce the behavior of the reference signal: (r - p) + p = r.

Because this experiment is done in the presence of substantial disturbances, the regular behavior of the cursor which the subject tries to produce in the middle segment is far from obvious. Yet the deduced reference signal, after low-pass filtering, moves in a very clear set of stair-steps, which was the intended pattern of the cursor.

As a check, the subject is asked to do the experiment again but with the eyes closed during the middle third, the pattern being produced by feel alone. What we get under those conditions is a very clear stair-step in the mouse position, but of course nothing recognizeable (because of the disturbances) in the cursor positions, and a nonsensical random fluctuation as the deduced state of the reference signal. So it seems fairly clear that this method actually does let us measure the variations in the reference signal when the visual relationship between cursor and target is being controlled.

I think that in an indirect way this is a kind of test for the controlled variable, in that a model that assumes the existence of a control system is used to obtain a measure that would not be seen if a control system were not in fact present. That's the general idea of the usual TCV, too.

This doesn't involve statistics per se. but it does use moving averages to filter out noise, and I suppose it might be possible to devise some kind of statistical test for the deduced pattern of reference signals. I'm not sure what you'd compare them to, but maybe there is something we could use.

Best,

Bill P.

Richard S. Marken Consulting
marken@mindreadings.com
Home 310 474-0313
Cell 310 729-1400