A delurk

[From Richard Kennaway (970415.1650 BST)]

I've been reading CSGNET for a few years, and I've been through its
archives from before I joined, but so far I've never posted. There are a
couple of things that have given me reason to delurk now, but first a few
words about who I am and what I do.

I'm a researcher in computer science, mostly in very abstract, theoretical,
mathematical foundations (term rewriting, graph rewriting, and functional
languages, for those who know what those are). No relevance to PCT I can
think of. More recently, I've also been involved in a project to develop
tools for analysing streams of events (e.g. log files generated by military
combat simulators) for whatever the analyst considers "interesting". There
may be potential for PCT relevance there, but I've not thought about it. I
could tell you stories of the use of statistics (correlations and analysis
of variance), but only off the public record. (I'll just pose the purely
hypothetical question: if you had a helicopter combat simulator, and wanted
to decide which of several equipment configurations to put on the
helicopters, and you could have teams of pilots fly as many simulated
missions as you wanted, how might you go about the task?)

PCT background: I've read B:CP and the archives of CSGNET, and I have Mind
Readings and Living Control Systems I and II. My attitude to PCT might be
summarised as "starry-eyed lay convert" (the last sort of person you want
proselytizing for you, so I try not to). I have no background in
psychology.

On to topic A:

I don't recall much discussion about neural nets here. Has anyone looked
at the possibility of neural nets in which the "neurons" are little control
loops, arranged hierarchically, with some sort of rules for reorganising
the network? It seems an obvious thing to do. At the moment I'm casting
around for new research to do, and this looks like something worth
considering.

and topic B:

A few years ago, when the subject of correlations came up here, Bill Powers
posted a long message about the magnitude of correlations you need to
actually be useful for prediction. I was sufficiently interested to sit
down and work out the mathematics behind his verbal description. I was
very impressed. If he knows the mathematics, he has a rare gift for
explaining it in English, and if he doesn't, he has an even rarer
mathematical intuition. Anyway, I wrote up the maths in a note available
at ftp://ftp.sys.uea.ac.uk/pub/kennaway/drafts/correlationinfo.{dvi,tex}.
All the maths in it can be found in standard stats textbooks, but I've
never seen it collected together. It's incomplete, in that I repeat
critical remarks I've seen on CSGNET about 20% correlations being published
as meaningful results, but I'm just taking that on trust without any actual
references. Can anyone suggest any? The library here doesn't get JEAB, so
I can't browse through it, and I've no experience in reading such material.

Finally, a footnote to a debate here from a year back: some people
(including me) *are* able to tickle themselves. Dealing with an itch on
the sole of my foot raises an interesting conflict of control systems.

···

__
\/__ Richard Kennaway, jrk@sys.uea.ac.uk, http://www.sys.uea.ac.uk/~jrk/
  \/ School of Information Systems, Univ. of East Anglia, Norwich, U.K.

[From Bill Powers (970415.1648 MST)]

Richard Kennaway (970415.1650 BST)--

When you delurk, you debark with a devastating detonation. I hope very much
that you will stay with us, and not bark, vastate, or tonate again.

I have had a look at your paper in the TEX format, and it contains
everything I had dreamed that someone would do. It answers every question I
have wondered about, and in a most gratifying way reassures me that my
mathematical intuition (and not, I assure you, my mathematical knowledge)
has not led me astray. Your analysis, of course, is by far the more
valuable, because it is founded on basic principles. When I get a copy of
the paper I can read in a more friendly format than TEX, I will keep it by
my bedside until at least some of its details sink in. Any chance of a paper
copy when it's done?

I hope you are planning to publish this paper in a conspicuous place such as
Nature or Science, and not bury it in a mathematical journal where those who
really need it will never read it.

Gary Cziko may be able to supply you with a reference to a survey article
which reported that the average correlation in considerable sample of social
science papers was 0.28 -- I think.

I hope that you will be contributing your mathematical (and writing)
abilities to all of the threads that appear on CSGnet. Welcome aboard.

Best,

Bill P.

P.S. to all PCTers. I repeat the reference to Richard's article-in-progress
and strongly urge you to read it:

ftp://ftp.sys.uea.ac.uk/pub/kennaway/drafts/correlationinfo.{dvi,tex}

If someone could post a PostScript version, perhaps more of us could see it
in all its typeset beauty. Is that OK, Richard?

[From Bruce Gregory (970416.1520 EST)]

Richard Kennaway (970415.1650 BST)

My attitude to PCT might be
summarised as "starry-eyed lay convert" (the last sort of person you want
proselytizing for you, so I try not to).

I'm not as reticent.

I have no background in psychology.

Bragging is O.K., as you've no doubr garnered from following the
postings. A Background in psychology does not _totally_
disqualify someone, but it is a difficult handicap to
overcome :wink:

A few years ago, when the subject of correlations came up here, Bill Powers
posted a long message about the magnitude of correlations you need to
actually be useful for prediction. I was sufficiently interested to sit
down and work out the mathematics behind his verbal description. I was
very impressed. If he knows the mathematics, he has a rare gift for
explaining it in English, and if he doesn't, he has an even rarer
mathematical intuition. Anyway, I wrote up the maths in a note available
at ftp://ftp.sys.uea.ac.uk/pub/kennaway/drafts/correlationinfo.\{dvi,tex\}\.

I agree with Bill, a very nice exposition of a _very_ important
point.

Finally, a footnote to a debate here from a year back: some people
(including me) *are* able to tickle themselves.

An invaluable skill for a PCT expert. Keep us posted :wink:

Bruce

[Hans Blom, 970417]

(Richard Kennaway (970415.1650 BST))

ftp://ftp.sys.uea.ac.uk/pub/kennaway/drafts/correlationinfo.\{dvi,tex\}\.

Thanks for the nice review. Useful. I have just a few quibbles where
you relate statistics to its practical usefulness.

In the social, psychological, medical, and biological sciences,
statistical data showing correlations between random variables of
80% are considered interesting, and correlations as low as 20% have
been considered publishable.

Much depends on what the correlation info is _used for_. Even low
correlations may be important to indicate that two variables are
somehow related, i.e. vary together, if only on average. Knowing this
may prompt other researchers to study the presumed relationship in
more depth, i.e. design physical or physiological models, for
instance, rather than statistical ones.

We demonstrate the physical meaning of such correlations for the
special case of a bivariate normal distribution of variables X and Y
with product-moment correlation coefficient c.

This I doubt. Physics is concerned with _how_ variables covary. In
statistics, the usual assumption is that of a linear relationship
between variables. That makes excellent sense in some situations. If
physical (or any other type of deeper) knowledge is unavailable,
nothing better might be available. The same is true when very little
information is available, e.g. only a few observations or large
levels of "noise" ("unexplainable variance"). In such cases, too
little data may be available to find the f that gives a best fit in y
= f(x) and nothing beyond a linear relation improves matters -- or
deteriorates matters (overfitting).

The questions of statistics and physics are different, I think. Much
simplified: Statistics wants to know _whether_ there is a relation,
physics what the _form_ of the relation is. Statistics may, of
course, assist physics in establishing whether (!) a certain relation
is better (gives a smaller average squared error, say) than a
different one.

The calculations show that a correlation of 50% or less is useless
by all four criteria.

I would not want to use the word "useless": a little information may
be better than none at all. Here, for example:

Statistically, such low correlations can be of use when taking some
action over a large population --- casinos stay in business on a
much smaller edge than that --- but in any individual case, knowing
X tells one virtually nothing about Y.

It does tell me, an individual, that _on average_ I have nothing to
gain in a casino. So I take what you assume to be hardly any
information at all as a full "bit" of information that lets me decide
the question "should I gamble?" with an unambiguous "no". Except,
maybe, in exceptional circumstances: if I would have to pay a ten
million dollar ransom to get a beloved one free, knowing that this
would be the only way to ever see her back alive, I might not
hesitate to go to the casino with all the money I could scramble
together. In such cases economics' utility theory would be a better
approach to describing the situation than statistics, I guess.

To draw a line through a set of points does not in itself constitute
an explanation of anything.

In physics, this is viewed differently: it is the _form_ of the line
that we want. Is the line straight or parabolic, or is it impossible
to establish that? The latter, probably, if the correlation is low.
Like statistics, physics does not think of "explanation" in a causal
sense -- it is a relation. Given F = m * a, F may "cause" a or a may
"cause" F. But even F = m * a cannot be "explained"; it just happens
to be that way (approximately).

In other words, a correlation of 86.6% between two jointly normally
distributed random variables means that knowing one variable gives
only one bit of information about the other.

Yet, if a decision is _required_, we'd better even take far less than
one bit of information into account: we cannot be sure, but we can
surely improve the odds. Life is a long sequence of gambles :-).

That is, less than 1/30th of a bit --- for practical purposes, no
information at all.

Depends on what the "practical purpose" is. See above.

Yet because a sufficient quantity of data has been amassed to be
statistically sure that the correlation differs from 0, perhaps even
at the 99% confidence level, such a result can be published even
though it is totally meaningless.

Yes and no. Many medical studies, for instance, report that drug X is
without any doubt (> 99.9%) better than drug Y in a certain category
of patients. Doctors take this to mean that they should prescribe
drug X -- doing so is state of the art, giving drug Y is simply
inferior. Yet we may know that the large scale study actually
demonstrated that X was better for 50.1% of the patients and worse
for 49.9%. _Lacking any other knowledge_, would the medical decision
be different from the logical one? "It does not matter" is no good:
you have to give either X or Y, because anything else is far worse.
Thus, 0.01 bit of information may become "amplified" and used as if
it were 1 bit. In fact, the decision _is_ constrained to be a 1-bit
one. And that would be entirely reasonable and logical in such cases,
I presume.

This may be comparable to an on-off control system, such as most home
heating systems where the furnace is necessarily either on or off and
where the control "algorithm" has to decide whether on or off is more
appropriate, even when the "error" (difference between temperature
and temperature setpoint) is zero or negligible.

But such considerations belong more to decision theory than to
"basic" statistics, I guess.

Greetings,

Hans

[From Bill Powers (970417/0855 MST)]

Hans Blom, 970417] --
RE:(Richard Kennaway (970415.1650 BST))

ftp://ftp.sys.uea.ac.uk/pub/kennaway/drafts/correlationinfo.\{dvi,tex\}\.

Much depends on what the correlation info is _used for_. Even low
correlations may be important to indicate that two variables are
somehow related, i.e. vary together, if only on average. Knowing this
may prompt other researchers to study the presumed relationship in
more depth, i.e. design physical or physiological models, for
instance, rather than statistical ones.

I wonder whether using low correlations as indicators of variables being
"somehow related" is ever wise, however mathematically justifiable it may
be. There must be many ways in which a correlation can be generated without
any actual physical or psychological relationship between the correlated
variables. One that comes to mind is the situation in which X and Y are both
affected by Z. If the correlation between X and Y is high, this sort of
situation is probably easy to discover, because the effect of Z is almost
always there, making it relatively easy to detect if you remember to look
for it. However, as the correlation of X and Y becomes lower and lower, any
effect of Z becomes harder to detect, because it's not always there, or not
always visible. Also, there may be several Zs, so many that no one of them
would show a detectable relationship with X or Y. The chances of FALSELY
concluding that there is no common factor must rise as the correlation
falls. Of course if there is a common factor, then it is literally useless
to try to affect Y by varying X, or to predict Y from observing X.

If you're using X to predict or affect Y, I think this "common-factor"
effect puts a lower limit on the correlation that is usable in this way.

A simple example comes to mind. Suppose you use the score on a screening
test to predict success of a student in school, and scores on standardized
academic tests to measure success in school. If any significant number of
students tends to do worse on formal tests as opposed to practical tests, or
to do better, this will create the appearance of a correlation between the
screening test score and success in school, when it is actually only a
correlation between the effects of a particular way of testing used for
screening and performance measures.

Another example -- of a different confounding effect -- would also tend to
be more significant at lower correlations: misidentifying the manipulated
variable. It's possible that the effect of X on Y is not due to X, but to
_something you do while manipulating X_ -- a side-effect of manipulating X.
When relationships are clear and correlations are high, this sort of thing
may be much more easily caught than it would be if the effect is small and
infrequent, and the correlations are low.

Richard:

Statistically, such low correlations can be of use when taking some
action over a large population --- casinos stay in business on a
much smaller edge than that --- but in any individual case, knowing
X tells one virtually nothing about Y.

Hans:

It does tell me, an individual, that _on average_ I have nothing to
gain in a casino. So I take what you assume to be hardly any
information at all as a full "bit" of information that lets me decide
the question "should I gamble?" with an unambiguous "no".

Clever! But it's making the same point that Richard is making. Anyway, as an
individual in a casino, you have the great disadvantage that you will run
out of money before the casino does: even at 50-50 odds, you will eventually
go broke.

Except,
maybe, in exceptional circumstances: if I would have to pay a ten
million dollar ransom to get a beloved one free, knowing that this
would be the only way to ever see her back alive, I might not
hesitate to go to the casino with all the money I could scramble
together.

The only beneficial effect of doing this would be on your state of mind: at
least you'd feel, however erroneously, that you're doing SOMETHING.

In such cases economics' utility theory would be a better
approach to describing the situation than statistics, I guess.

Here's the one place where I can see a use for the concept of "utility." If
you define utility in terms of a payoff matrix, a lot of statistical
decisions become much easier to make. Suppose you have $10,000, and the
ransom for your sister is $10,000,000. The question is, what is the probable
payoff for using the $10,000 in different ways? If you go to the casino,
your chances of making a million dollars out of $10,000 are calculable, and
vanishingly small. However, if you offer the $10,000 as a reward for
information leading to the arrest of the kidnappers and safe return of your
sister, you probably have a higher chance of a payoff. Of course both
probabilities of a payoff may be so low that you're still just taking action
for the sake of feeling that you're doing something, but at least now you
have a quantitative way of deciding between two courses of action. The best
thing about looking at payoff matrices is that doing so gets you to thinking
in terms of alternatives instead of being locked into one definition of the
problem. It might, for example, occur to you to offer the $10,000 to the
kidnappers, saying that this is the maximum you could raise, so they can
either return your sister and take it, or kill your sister and get nothing.
That has some chance of working, probably higher than either of the first
two approaches.

Suppose a study is done in which it is found that taking
1-2-alphahexameraldehyde hydrate (12H) once a day, starting at age 30, for
the next 50 years of your life will insure completely against getting
multiple sclerosis. The pills cost $200 each, or about $70,000 per year.
What the drug company wants you to think is "Ohmygod, I don't want to get
multiple sclerosis, I'd better figure out where to earn, borrow, or steal
$70,000 a year to buy my 12H pills." But if you think in terms of a payoff
matrix, the first question you'll ask is what the chances of getting MS are
if you _don't_ take the pills. You'll also consider what kind of life you'll
be living if you have this enormous drain on your income essentially
forever, and what you will do for money if you get sick with some other much
more likely life-threatening disease, and lots of other questions like that.
You'll probably end up taking your chances with MS just as if the pill
didn't exist.

Richard:

In other words, a correlation of 86.6% between two jointly normally
distributed random variables means that knowing one variable gives
only one bit of information about the other.

Hans:

Yet, if a decision is _required_, we'd better even take far less than
one bit of information into account: we cannot be sure, but we can
surely improve the odds. Life is a long sequence of gambles :-).

There are two things wrong with this. First, it's very seldom that we have
to make a single decision, after which we are helpless to affect the
outcome. Normally, whether we do anything or not, we will still be present
and capable of acting to affect the outcome or the events leading to the
outcome. Life is not open-loop. The rules of the laboratory, which say it's
cheating to interfere with the experiment once you've set it in motion, do
not normally apply.

Second, when the odds of success are very low, the fact that the outcome is
important is totally irrelevant with respect to your likelihood of success.
If your decision is 95% likely not to work, it is 95% likely to be wrong
even if it is vital that it be right. Your efforts will be far better spent
in trying to change the situation so you don't have to make such an
important decision with so little chance of being right. Plead illness, go
crazy, ask for a postponement, punch a cop, faint, offer a huge bribe,
anything to avoid having to make a decision that is all but certain to be wrong.

Richard:

Yet because a sufficient quantity of data has been amassed to be
statistically sure that the correlation differs from 0, perhaps even
at the 99% confidence level, such a result can be published even
though it is totally meaningless.

Hans:

Yes and no. Many medical studies, for instance, report that drug X is
without any doubt (> 99.9%) better than drug Y in a certain category
of patients. Doctors take this to mean that they should prescribe
drug X -- doing so is state of the art, giving drug Y is simply
inferior. Yet we may know that the large scale study actually
demonstrated that X was better for 50.1% of the patients and worse
for 49.9%. _Lacking any other knowledge_, would the medical decision
be different from the logical one? "It does not matter" is no good:
you have to give either X or Y, because anything else is far worse.
Thus, 0.01 bit of information may become "amplified" and used as if
it were 1 bit. In fact, the decision _is_ constrained to be a 1-bit
one. And that would be entirely reasonable and logical in such cases,
I presume.

But that doesn't make 1/30 bit into 1 bit. Furthermore, doctors don't care
about individual patients; they can't, if they're basing their treatments on
the odds. What they hope for is to be right more often than they are wrong,
over many patients. This gets us into the realms of population statistics,
where everything you say is valid. A "category of patients" is not one
individual; it's a population. If the treatment has any cost at all, the
patient may well elect to skip it, and rationally so, if there is almost the
same chance that it will harm as help. The doctor gets many tries; the
patient only one.

Best,

Bill P.

[Hans Blom, 970421]

(Bill Powers (970417/0855 MST))

Yet, if a decision is _required_, we'd better even take far less
than one bit of information into account: we cannot be sure, but
we can surely improve the odds. Life is a long sequence of gambles
:-).

There are two things wrong with this. First, it's very seldom that
we have to make a single decision, after which we are helpless to
affect the outcome. Normally, whether we do anything or not, we will
still be present and capable of acting to affect the outcome or the
events leading to the outcome. Life is not open-loop. The rules of
the laboratory, which say it's cheating to interfere with the
experiment once you've set it in motion, do not normally apply.

You discuss a different situation from the one I considered: do X or
do not-X _now_; no other alternative, and no time to think and/or
collect more "evidence". You could be right: the situation may not
occur that often. Or you could be wrong: we have to act at all times,
even though frequently we'd like to think (or experiment) longer.

Second, when the odds of success are very low, the fact that the
outcome is important is totally irrelevant with respect to your
likelihood of success. If your decision is 95% likely not to work,
it is 95% likely to be wrong even if it is vital that it be right.
Your efforts will be far better spent in trying to change the
situation so you don't have to make such an important decision with
so little chance of being right. Plead illness, go crazy, ask for a
postponement, punch a cop, faint, offer a huge bribe, anything to
avoid having to make a decision that is all but certain to be wrong.

All good advice, except if I have to decide _now_ and cannot try out
other methods. One -- often neglected -- aspect of control is that
frequently actions must be computed on the basis of missing evidence,
e.g. when in some industrial process a sensor becomes disconnected,
hopefully temporarily. Two approaches are then possible: act, even
though not all the evidence is available; or do not act yet, wait and
collect more evidence. Sometimes only the first is reasonable,
regrettably. Not acting is another way of acting. So it all boils
down what the "best" action is, regardless whether full or partial
evidence is available. If in a nuclear reactor a vital sensor starts
to malfunction, the best action would probably be to shut down the
reactor in a safe way.

In control, the situation of "partial evidence" is normal. It arises,
for instance, when sensors are themselves noisy or disturbance-prone.
Which happens all the time.

Greetings,

Hans