Subjective probability intro

[Martin Taylor 930326 16:20]

The following was drafted some months ago as an
introduction to the "Information leads to PCT" paper I am
trying to draft. But that has been stalled by higher
priority (i.e. paid for) work, so I thought it would be a
reasonable idea to post this part. There has been some
discussion involving probability, and will be more. In
this, I try to describe my approach to probability, and
why I think something like it must be necessary. This
document, or something like it, will be incorporated into
the final paper.

Martin

···

------------------------------

Information, Perception and Control

M. M. Taylor, DCIEM, Box 2000, North York, Ontario,
Canada, M3M 3B9

(Epigraph)
The only justification for our concepts and system of
concepts is that they serve to represent the complex of
our experiences; beyond this they have no legitimacy. I
am convinced that the philosophers have had a harmful
effect upon the progress of scientific thinking in
removing certain fundamental concepts from the domain of
empiricism, where they are under our control, to the
intangible heights of the a priori. For even if it should
appear that the universe of ideas cannot be deduced from
experience by logical means, but is, in a sense, a
creation of the human mind, without which no science is
possible, nevertheless this universe of ideas is just as
little independent of the nature of human experience as
clothes are of the form of the human body. (A. Einstein,
1922, The Meaning of Relativity, 4th Edition, 1950,
London: Methuen, p2).

         Prologue: Probability and Perception

Newton and Einstein

Before the middle of the 17th century, Physics consisted
of a great many aphorism and folk truths, and a few
numerical descriptions of experimental or observational
results. The motions of the planets were well described
by Ptolemaic epicycles, and if a description failed in
some minor detail, another epicycle could always be added
to correct the error.

Then came Newton. Newton imagined a consistent physical
world, in which the interactions among the parts could be
described by a few simple rules that would apply whether
the objects were very large, like the sun, or very small,
like a grain of sand. We might not be able to use these
rules to predict all the motions of the universe, but our
limitation was only our inability to observe all of the
elements in the universe and make the necessary
calculations. Perhaps there were a few rules that had not
yet been discovered, but an omniscient observer who knew
all the rules and could see the state of all the
interacting elements would be able to determine the fate
of the universe for evermore.

Newton's simple laws worked very well, provided that they
were applied to elements that were not too big or small,
and didn't move too fast. But there were some nagging
problems that they could not account for, such as the
advance of the perihelion of Mercury, or the shape of the
radiation spectrum of a black body. It took some 250
years before a new advance occurred in the way we look at
the physical universe; in fact two advances, both
eventually based on the same core concept, and in apparent
contradiction with each other. Einstein and Heisenberg
both pointed out that the universe is intrinsically
unknowable to any single observer, and developed the
consequences of that truth.

Einstein considered the consequences of the finite speed
of signal transmission, and developed the Theory of
Relativity out of the fact that there cannot be an
omniscient observer who can see all of the Universe at the
same moment. Things that happen in one place cannot have
any effect on things that happen at another until signals
have traversed the intervening space, and this takes a
finite time. At point A, an event EA happens, and it is
observed at point B some time after an event EB happens at
point B. To an observer at B, the sequence is EB then EA.
But it can happen that an observer at A does not get the
signal from EB until after EA has happened. For the
observer at A, the sequence is EA then EB. The two
observers, who may later exchange communications, might
disagree about which event came "first." If the two
observers are stationary relative to each other, they can
resolve the disagreement by factoring in the speed of the
light signal, and they will then agree on which event
"really" came first. The reason this agreement is possible
is that the two observers share a common "frame of
reference." They can get away with pretending that their
own frame of reference is the real, absolute point of
view--the point of view that would be held by a God-like
observer.

But what if the observers are in significant motion
relative to each other? A might think that EB preceded
EA, while B might equally legitimately claim that EA came
first, even when the speed of light effect is factored
out. Neither would be wrong. Two observers in relative
motion can no longer settle the argument. They will
disagree about the temporal ordering of events, just as we
on Earth can legitimately disagree about spatial ordering.
Two people standing at different locations on the shore
might disagree on whether one ship in a harbour is to the
left or right of another ship, since they observe the
ships from different angles. Since there is no absolute
spatial perspective from which to view the ships, there is
no absolute answer as to which is to the right or left of
the other. Likewise, since A and B view the timing of
events from different angles in space-time, they will
disagree on which came first. In such a universe, there
can be no simultaneity, except as defined by some
particular observer. There is no one true frame of
reference, no absolute God-like point of view.

Einstein's way of solving this puzzle was to consider as
legitimate only those aspects of the Universe that were
potentially accessible to any single all-powerful observer
limited only by the signal velocity. The result was a
unifying view of the interactions in the world that
provided the same numerical results as Newton's laws for
middle-scale slow-moving objects and that corrected the
errors of Newton's predictions for interactions involving
very large or very fast objects. The central core of the
theory was that even if there were a God-like observer not
limited by the speed of signal transmission, no real
observer could use those observations, and real observers
would see a world different from the one the God-like
observer would see. We speed-limited people would see a
relativistic world.

Einstein's world is simpler than Newton's. It contains
fewer arbitrary laws. The really essential law is the one
that limits the access of an observer to information from
distant places. All the rest follows with few essential
added assumptions.

Heisenberg also developed a theory that we now see as
based on the idea that there is a limit on the amount of
information we can acquire from the world. In his case,
the limitations come when the interacting elements of the
world are very small. If we obtain a very accurate
measure of some parameter of an element, we cannot obtain
an accurate measure of a dual parameter of that element at
the same time. It is not possible, for example, to
determine simultaneously the position and the momentum of
a particle. Time itself is a member of such a pair of
dual parameters; we cannot determine both the time and the
frequency of an oscillatory event.

There is no need to dwell on the theories of Einstein and
Heisenberg. Their detail is irrelevant to the theme of
this paper. The moral to be drawn is that the great
advances in science and technology of the 20th century are
based on consideration of one simple truism, that we can
work with only what we can observe, not with what God
might observe. And that is the theme of this paper.
There are two ways of looking at the world: a fantasy way,
in which we imagine we are God, or a realistic way, in
which we imagine what we might be able to observe. The
former is Newtonian, the latter Einsteinian.

Probability: Frequentist or Subjective?

What does the phrase "the probability of event E" mean?
Most people are taught that if there are an infinite
number of opportunities for E to happen, and E does happen
on a certain fraction of them, then that fraction is the
probability of the event. If a coin is tossed an infinite
number of times, it should fall heads half the time, so
the probability of a head is 0.5. It does not take long
to realize that such a definition cannot be used to
measure the probability of event E, so this "ideal"
definition is taken as a target for more practical
measurement techniquesQ"an infinite number" is replaced by
"many," for example, or a mechanism is described that
would inherently result in E occurring on some specified
fraction of the opportunities. An ideal coin would fall
heads exactly half the time in an infinite number of
tosses, because that is the definition of an ideal coin.
But it doesn't help in determining the probability that
the next toss of this coin will be a head.

Even to ask the question "What is the probability that the
next toss will result in a head" is to deny the validity
of the definition of probability most people are taught.
What does one toss have to do with infinity? It comes out
heads or it does not. There must be some other way of
looking at the notion of probability, because we certainly
feel that there is some value in thinking about whether
one event is more probable than another. Is it more
probable that the next time we see Joe he will be wearing
brown shoes than that it will be raining in Toronto at
noon on July 12, 1997? One feels that there is some sense
in asking such questions, even though neither event has
more than one opportunity of occurring.

The notion that probability has to do with the fraction of
opportunities for an event on which it actually happens
can be called "frequentist." It is a Newtonian view of a
world in which situations are indefinitely repeatable,
observations can be carried on for infinite time, and can
be infinitely precise. The Newtonian world is not a world
accessible to ordinary mortals, and thinking about a
frequentist probability that can occur only in a Newtonian
world can lead one into great confusion and paradox.

A real observer can only go on what is observable. The
real, observable world, we can call Einsteinian. In this
world, probability depends only on what has been observed.
If Joe's friend has just telephoned to say that Joe bought
new brown shoes and is coming round to show them off, we
might say that there is a high probability that the next
time we see Joe he will be wearing brown shoes, even
though we have never before seen him wear brown shoes.
Observations include all sorts of things related to the
uncertain event in question, not simply observations of
the critical factor as it occurred or did not occur on
past occasions we deem to have been similar. If we
believe that a coin has been made with fair balance and
evenly milled edges, we will judge it to have a
probability 0.5 of landing heads, even though we have
never tossed it before. But if we believe that the coin
tosser has a special skill in how high and with what spin
to toss the coin, we may alter that judgment depending on
our belief as to which result the tosser wants to see.

Probability is a subjective matter. The only consistent
and reliable way to deal with probability is to treat it
as a property of an observer, not of the world. There may
be probabilities in the world, but real observers can no
more detect them than they can assert a correct, universal
sequence of events in the world.

To be subjective does not mean to be arbitrary. If one
believes that two events are mutually exclusive and that
one of the two has to happen, then to be consistent, the
subjective probabilities of the two events must sum to
unity. One cannot arbitrarily say that p(A) = 0.8 and
p(B) = 0.8 and p(A xor B) = 1. Subjective probability has
constraints, if it is to be dignified with the name of
probability rather than wantedness or hope, or something
such word. For example, the subjective probabilities of
all mutually exclusive events that could happen in
particular circumstances must sum to unity. If one does
not believe that A can happen unless B does, then the
subjective probability of A cannot be greater than that of
B. All of the usual arithmetic applied to frequentist
probability is appropriate in dealing with subjective
probability.

One of the constraints is that all probability is
conditional. It makes no sense simply to say of an event
that its probability is P. As with the frequentist kind
of probability, one must think of what might be the
alternatives to the eventQwhat counts as the event not
happening, what occasions might be considered as
opportunities for the event to happen, on which the
probability is based. Would Joe appearing barefoot count
against the event "Joe wearing brown shoes the next time,"
or would he have to be wearing some non-brown shoes? We
notate a conditional probability with a vertical bar:
P(A|B) is the probability that A will be true, given that
B is true. B represents the occasion, A the event. B,
like A, must be something observable. The observer must
be able to determine whether B is true, whether this is an
occasion on which it is interesting to observe whether A
is true. We may, for example, be interested in whether
Joe is wearing brown shoes only on condition that he is
wearing some kind of shoes. Or, conversely, Joe may
usually wear no shoes, but when he does, they are usually
brown, so we are interested in whether he is wearing brown
shoes given that we see him, shod or no.

How one defines the condition has a great impact on the
subjective probability of an event. Given that one has
observed a coin thjat has nearly come to rest after being
tossed, one can put a high value on the probability that
it will lie "head" (or not, as the case may be), when it
finally does stop. The probability will be even higher if
the condition is added that there be no earthquake or
other disturbance of the floor before the final
observation, and higher yet if another condition is added
that no person disturb the coin's movement. This example
may seem extreme, but such conditions apply to every
judgment of probability, and in many cases they are both
less obvious and more important than these.

Probability depends on knowledge. If one had not observed
the coin since determining that it had a head side and a
tail side just before it was tossed, one would probably
judge that it had a probability of 0.5 of landing head or
tail, regardless of the earthquake or interference
conditions. After all, we model those events as being
equally likely to end up with it lying either way, and
that is the same as our judgment of the result of the
undisturbed toss, so those conditions have no effect of
the subjective probability judgment. But notice that it
is our model that allows us to make that assessment. If
we had knowledge that a certain unscrupulous person with
skill in magical tricks had a vested interest in seeing
the coin land heads, we might change our subjective
probability that it would do so. A condition that we did
not observe interference would not be sufficient to bring
the probability back to 0.5. In our Einsteinian universe,
we cannot impose a condition that no interference occur,
only that we do not observe it to occur.

In everything that follows, whenever the notation P(x)
occurs, it must always be remembered that in the
background there is a condition Y, so that the correct
notation would have been P(x|Y). The omission of Y may be
justified on the grounds that Y is obvious, but sometimes
it is not so obvious. The notation P(x) can sometimes be
seriously misleading, if the background conditions are not
intuitively apparent.

Probability and replication

Whatever the condition Y, an observer can assess P(x|Y)
for any event x. Of course, for most cases of x and Y the
observer will have no reason to do so. Y may be very
unlikely to happen, given the current condition C, or the
observer may deem Y to be irrelevant to x, given C (i.e.
P(x|Y,C) = P(x|C)). But in other cases, the observer has
some reason for being interested in P(x|Y). Let us
suppose that the observer knows nothing about the relation
between x and Y, other than that it might be interesting.
To find out something of the relation, the observer has to
see what happens when Y occurs.

When Y occurs, there is an opportunity for x to occur. If
it does, the observer can make a note "Yes," and if not, a
note "No." After N occurrences of Y (replications of an
experimental observation), there will be a certain number
X of "Yes" notations. Since we hypothesized that the
observer knew nothing beforehand about the relationship of
x and Y, what the observer now knows is that x occurred on
X/N of the times that Y was true. A rational observer
would use that knowledge to set a value for P(x|Y) close
to X/N. If X/N differed appreciably from P(x|C) averaged
over the various current conditions C prevailing when Y
happened to be true in addition to C, then the observer is
likely to say that Y probably affects the probability of
x. The observer can also assign a value to P (Y affects
x, given C), which we can notate as P(M|C), where M is the
model (Y affects x) . If X/N was not very different from
P(x|C), then P(M|C) would be low; the observer would not
believe very strongly that Y affects x.

X/N is a proportion, not a probability. If in future more
occasions occur in which condition Y is true, X and N will
change. A frequentist view of probability asserts that
the "true" probability of X given Y is the value of X/N
that would be observed after an infinite number of
occurrences of Y (replications). A practical approach to
the frequentist view is less dramatic: X/N will approach
an unknown but true ideal limit ever more closely as the
number of occurrences of Y increases. There exist
theories, which we have no reason to dispute, about the
probability that X/N would take on any particular value
after N occurences of Y, if the ideal has a certain value
Z. A rational observer would be likely to incorporate
these theories into the subjective judgment of P(M|C),
by, for example, comparing the probability P(M1|C), where
M1 is that X "Yes" events would be observed in N
opportunities if Y had no effect, with P(M2|C), where M2
is that P(x|Y,C) = X/N.

There is a hidden assumption in the foregoing: that
condition Y can be repeated (that replications are
possible). What does it mean to say that a condition
recurs? If everything in the observable universe is taken
into account, no condition can ever recur. If nothing
else, the universe has aged since the first occurrence of
the condition. Its original age can never be recovered.
But the observer may well think that this doesn't matter,
given that the difference in age is probably only a few
parts in 10^10. Such an observer will accept that two
occurrences in which Y is the same except for a small
increase in the age of the universe can be considered
together in evaluating P(x|Y). But the age of the
universe is not the only difference in conditions between
the two occasions. The position of the sun, moon and
planets in the sky will have changed as well. If their
arrangement ever recurs, it is at intervals long enough
that the positions of the stars will have changed
significantly. But if our observer is not astrologically
inclined, this difference may not matter either. So the
observer may say that the condition Y recurs even though
the age of the universe is different, and the planets have
moved in the sky. That is strictly a personal judgment by
the observer who is trying to get data that will allow a
reasonable value to be developed for the subjective
P(x|Y), for subjective probabilities are greatly affected
by observation. The value determined with the aid of
observation, of course, applies to some future occasion of
condition Y, not to any that has already occurred, because
for those occurrences of condition Y the observer knows
whether the answer was "Yes" or "No."

Usually, what the observer considers to be an occurrence
of condition Y is specified not by listing all the
irrelevant conditions, but by listing some of the
conditions that would change the situation into something
other than Y if they were varied. All events that happen
when the condition does not include those falsifying
conditions ought to be counted as opportunities for x to
occurQas replications. Of course, in practice, they are
not. The observer notices that when (Y but not Z) x tends
to occur less often than when (Y including Z), and so
changes the definition of Y so that it necessarily
includes Z and a failure for Z to be true makes the
situation not suited to an observation relevant to
determining P(x|Y).

The important point about this discussion is that what
constitutes a replication of an observation is a matter
totally subjective to the observer. For example, in an
experiment in psychophysics, one observer may say that the
presentation to a trained listener of a particular
waveform to which is added noise from a well-controlled
noise generator constitutes a replication of an
observation to determine how well people can detect that
signal in that noise. Another observer may note that the
detection probability depends on whether the trial is
presented early or late in a series of trials, and that
therefore early trials should be considered separately
from late ones. Yet another observer may note that it
depends on whether the observer has just eaten lunch, or
on the particular waveform obtained from the noise
generator, and so on and so forth. It is never possible
to specify objectively what constitutes a replication of
an observation.

So far, I have tried to make the point that the term
"probability" refers only to an attribute of an observer,
who should apply it only to unique events. But the
observer must have some reason to evaluate the probability
of the event, and this reason may come from anywhere. It
could be based on being told something by a trusted (or
untrustworthy) source, it could come from a belief that
some known mechanism causes the event to occur or not to
occur, or it could come from observations of whether a
similar event occurred under apparently similar
circumstances on one or more other occasions. It is up to
the observer to determine how to relate the various
sources of information so as to adjust the subjective
probability of the event. And most importantly, it is up
to the observer to determine what constitutes a
replication of an observation in case the occurrence of
similar events under similar circumstances is a
contributor to that probability estimate.

Probability and measurement

What is the probability that the width of this page is 8.5
inches (assuming you are reading this on North American
letter-size paper)? What is the probability that the
width is 0.4 inches? Technically, the answer to each
question is "zero." The page may be roughly 8.5000001 or
8.4999999 inches wide, but that is not 8.5. Nevertheless,
one feels that the probability of it being 8.5 inches
should be higher than of being 0.4 inches. We "know" that
the width is not 0.4 inches, but we do not "know" in the
same way that it is not 8.5 inches. Indeed, nominally,
the width is 8.5 inches, so if someone asks "Are you
reading something on 8.5 inch paper or 6 inch paper?" we
could confidently answer "8.5 inch," and would not have to
say "neither" as we would if the question were "Are you
reading something on 1 inch paper or 6 inch paper." So,
there is a range of width over which we are satisfied that
this is 8.5 inch paper, more or less.

How well must we measure, to agree that this paper is 8.5
inches wide? That depends on what else we "know." If we
are accustomed to North American standard sizes, we need
only see well enough to say "this is letter paper, not
some other size," because paper does not come in other
widths near 8.5 inches. But if the paper might have come
from some place more in tune with world standards, then it
might be A4, and thus a little taller and narrower than
8.5 x 11. What is the probability that the paper might be
A4? Is it enough to require us to measure the paper more
precisely than by a quick glance?

A quick glance suffices to assure us that the paper is 8.5
x 11 rather than, say, 11 x 14 inches. But a quick glance
is insufficient to assure us that the paper is not A4. A
closer look to get more information about the page is
needed for that judgment. But if the question is whether
the paper-cutting apparatus that was supposed to cut the
sheet to be 8.5 x 11 is properly adjusted, no amount of
looking will be enough. The paper must be carefully
compared with some standard, such as a rod that is
asserted to be 8.5 inches long. Ignoring for the moment
the trust that has to be placed in the assertion about the
rod length, what is the next step in the determination?
One looks to see whether the rod is longer than the paper
is wide. Maybe there is a clear difference, and one can
say something like "On the condition that I am not drunk
or hallucinating or something like that, I am quite sure
this paper is less than 8.5 inches wide." But maybe the
difference is very small, and the best one can say is "I'd
bet 3 to 1 that this paper is less than 8.5 inches wide,"
or even "I have no idea whether this paper is more or less
than 8.5 inches wide."

What is the next step? Get a microscope? That may answer
the question with respect to a specific piece of paper for
one observer, but for another piece or a different
observer, it may not. The same set of possibilities will
exist. There will always be some width of paper for which
you cannot say whether it is greater or less than the
specified width, no matter how precise the measuring
instrument. Your measurement is always a probability
distribution. Using the naked eye and the measuring rod,
you may say that the probability is essentially zero that
the paper is less than 8.48 inches wide (on the condition
that you are not hallucinating) or that it is wider than
8.52 inches. Perhaps you would bet even odds that it is
between 8.495 and 8.505 inches. But this is a lot better
than you could have done before you got the measuring rod.
Without it, you might have given even odds that the paper
was between 8 and 9 inches wide. The measuring rod has
reduced your uncertainty 100-fold, and if you used a
microscope you might be able to reduce it another 100- or
1000-fold. You cannot reduce it to zero, no matter what
measuring instrument you use. All you can do is to get
more and more information by closer observation.

Any number that is the result of a measurement is a
discrete approximations to the "real" value of the thing
being measured. Indeed, all numbers that we can use in
any way at all are either rational with a finite number of
digits in their fractional numerator and denominator, or
are describable by an algorithm of finite length. Such
numbers are useful, since they are the only means we have
of representing values in the world, one must always be
aware that they are not reality. Any such number
represents only a distribution of subjective probability
about reality.

(More to come later -- Much later).