[Martin Taylor 930326 16:20]

The following was drafted some months ago as an

introduction to the "Information leads to PCT" paper I am

trying to draft. But that has been stalled by higher

priority (i.e. paid for) work, so I thought it would be a

reasonable idea to post this part. There has been some

discussion involving probability, and will be more. In

this, I try to describe my approach to probability, and

why I think something like it must be necessary. This

document, or something like it, will be incorporated into

the final paper.

Martin

## ···

------------------------------

Information, Perception and Control

M. M. Taylor, DCIEM, Box 2000, North York, Ontario,

Canada, M3M 3B9

(Epigraph)

The only justification for our concepts and system of

concepts is that they serve to represent the complex of

our experiences; beyond this they have no legitimacy. I

am convinced that the philosophers have had a harmful

effect upon the progress of scientific thinking in

removing certain fundamental concepts from the domain of

empiricism, where they are under our control, to the

intangible heights of the a priori. For even if it should

appear that the universe of ideas cannot be deduced from

experience by logical means, but is, in a sense, a

creation of the human mind, without which no science is

possible, nevertheless this universe of ideas is just as

little independent of the nature of human experience as

clothes are of the form of the human body. (A. Einstein,

1922, The Meaning of Relativity, 4th Edition, 1950,

London: Methuen, p2).

Prologue: Probability and Perception

Newton and Einstein

Before the middle of the 17th century, Physics consisted

of a great many aphorism and folk truths, and a few

numerical descriptions of experimental or observational

results. The motions of the planets were well described

by Ptolemaic epicycles, and if a description failed in

some minor detail, another epicycle could always be added

to correct the error.

Then came Newton. Newton imagined a consistent physical

world, in which the interactions among the parts could be

described by a few simple rules that would apply whether

the objects were very large, like the sun, or very small,

like a grain of sand. We might not be able to use these

rules to predict all the motions of the universe, but our

limitation was only our inability to observe all of the

elements in the universe and make the necessary

calculations. Perhaps there were a few rules that had not

yet been discovered, but an omniscient observer who knew

all the rules and could see the state of all the

interacting elements would be able to determine the fate

of the universe for evermore.

Newton's simple laws worked very well, provided that they

were applied to elements that were not too big or small,

and didn't move too fast. But there were some nagging

problems that they could not account for, such as the

advance of the perihelion of Mercury, or the shape of the

radiation spectrum of a black body. It took some 250

years before a new advance occurred in the way we look at

the physical universe; in fact two advances, both

eventually based on the same core concept, and in apparent

contradiction with each other. Einstein and Heisenberg

both pointed out that the universe is intrinsically

unknowable to any single observer, and developed the

consequences of that truth.

Einstein considered the consequences of the finite speed

of signal transmission, and developed the Theory of

Relativity out of the fact that there cannot be an

omniscient observer who can see all of the Universe at the

same moment. Things that happen in one place cannot have

any effect on things that happen at another until signals

have traversed the intervening space, and this takes a

finite time. At point A, an event EA happens, and it is

observed at point B some time after an event EB happens at

point B. To an observer at B, the sequence is EB then EA.

But it can happen that an observer at A does not get the

signal from EB until after EA has happened. For the

observer at A, the sequence is EA then EB. The two

observers, who may later exchange communications, might

disagree about which event came "first." If the two

observers are stationary relative to each other, they can

resolve the disagreement by factoring in the speed of the

light signal, and they will then agree on which event

"really" came first. The reason this agreement is possible

is that the two observers share a common "frame of

reference." They can get away with pretending that their

own frame of reference is the real, absolute point of

view--the point of view that would be held by a God-like

observer.

But what if the observers are in significant motion

relative to each other? A might think that EB preceded

EA, while B might equally legitimately claim that EA came

first, even when the speed of light effect is factored

out. Neither would be wrong. Two observers in relative

motion can no longer settle the argument. They will

disagree about the temporal ordering of events, just as we

on Earth can legitimately disagree about spatial ordering.

Two people standing at different locations on the shore

might disagree on whether one ship in a harbour is to the

left or right of another ship, since they observe the

ships from different angles. Since there is no absolute

spatial perspective from which to view the ships, there is

no absolute answer as to which is to the right or left of

the other. Likewise, since A and B view the timing of

events from different angles in space-time, they will

disagree on which came first. In such a universe, there

can be no simultaneity, except as defined by some

particular observer. There is no one true frame of

reference, no absolute God-like point of view.

Einstein's way of solving this puzzle was to consider as

legitimate only those aspects of the Universe that were

potentially accessible to any single all-powerful observer

limited only by the signal velocity. The result was a

unifying view of the interactions in the world that

provided the same numerical results as Newton's laws for

middle-scale slow-moving objects and that corrected the

errors of Newton's predictions for interactions involving

very large or very fast objects. The central core of the

theory was that even if there were a God-like observer not

limited by the speed of signal transmission, no real

observer could use those observations, and real observers

would see a world different from the one the God-like

observer would see. We speed-limited people would see a

relativistic world.

Einstein's world is simpler than Newton's. It contains

fewer arbitrary laws. The really essential law is the one

that limits the access of an observer to information from

distant places. All the rest follows with few essential

added assumptions.

Heisenberg also developed a theory that we now see as

based on the idea that there is a limit on the amount of

information we can acquire from the world. In his case,

the limitations come when the interacting elements of the

world are very small. If we obtain a very accurate

measure of some parameter of an element, we cannot obtain

an accurate measure of a dual parameter of that element at

the same time. It is not possible, for example, to

determine simultaneously the position and the momentum of

a particle. Time itself is a member of such a pair of

dual parameters; we cannot determine both the time and the

frequency of an oscillatory event.

There is no need to dwell on the theories of Einstein and

Heisenberg. Their detail is irrelevant to the theme of

this paper. The moral to be drawn is that the great

advances in science and technology of the 20th century are

based on consideration of one simple truism, that we can

work with only what we can observe, not with what God

might observe. And that is the theme of this paper.

There are two ways of looking at the world: a fantasy way,

in which we imagine we are God, or a realistic way, in

which we imagine what we might be able to observe. The

former is Newtonian, the latter Einsteinian.

Probability: Frequentist or Subjective?

What does the phrase "the probability of event E" mean?

Most people are taught that if there are an infinite

number of opportunities for E to happen, and E does happen

on a certain fraction of them, then that fraction is the

probability of the event. If a coin is tossed an infinite

number of times, it should fall heads half the time, so

the probability of a head is 0.5. It does not take long

to realize that such a definition cannot be used to

measure the probability of event E, so this "ideal"

definition is taken as a target for more practical

measurement techniquesQ"an infinite number" is replaced by

"many," for example, or a mechanism is described that

would inherently result in E occurring on some specified

fraction of the opportunities. An ideal coin would fall

heads exactly half the time in an infinite number of

tosses, because that is the definition of an ideal coin.

But it doesn't help in determining the probability that

the next toss of this coin will be a head.

Even to ask the question "What is the probability that the

next toss will result in a head" is to deny the validity

of the definition of probability most people are taught.

What does one toss have to do with infinity? It comes out

heads or it does not. There must be some other way of

looking at the notion of probability, because we certainly

feel that there is some value in thinking about whether

one event is more probable than another. Is it more

probable that the next time we see Joe he will be wearing

brown shoes than that it will be raining in Toronto at

noon on July 12, 1997? One feels that there is some sense

in asking such questions, even though neither event has

more than one opportunity of occurring.

The notion that probability has to do with the fraction of

opportunities for an event on which it actually happens

can be called "frequentist." It is a Newtonian view of a

world in which situations are indefinitely repeatable,

observations can be carried on for infinite time, and can

be infinitely precise. The Newtonian world is not a world

accessible to ordinary mortals, and thinking about a

frequentist probability that can occur only in a Newtonian

world can lead one into great confusion and paradox.

A real observer can only go on what is observable. The

real, observable world, we can call Einsteinian. In this

world, probability depends only on what has been observed.

If Joe's friend has just telephoned to say that Joe bought

new brown shoes and is coming round to show them off, we

might say that there is a high probability that the next

time we see Joe he will be wearing brown shoes, even

though we have never before seen him wear brown shoes.

Observations include all sorts of things related to the

uncertain event in question, not simply observations of

the critical factor as it occurred or did not occur on

past occasions we deem to have been similar. If we

believe that a coin has been made with fair balance and

evenly milled edges, we will judge it to have a

probability 0.5 of landing heads, even though we have

never tossed it before. But if we believe that the coin

tosser has a special skill in how high and with what spin

to toss the coin, we may alter that judgment depending on

our belief as to which result the tosser wants to see.

Probability is a subjective matter. The only consistent

and reliable way to deal with probability is to treat it

as a property of an observer, not of the world. There may

be probabilities in the world, but real observers can no

more detect them than they can assert a correct, universal

sequence of events in the world.

To be subjective does not mean to be arbitrary. If one

believes that two events are mutually exclusive and that

one of the two has to happen, then to be consistent, the

subjective probabilities of the two events must sum to

unity. One cannot arbitrarily say that p(A) = 0.8 and

p(B) = 0.8 and p(A xor B) = 1. Subjective probability has

constraints, if it is to be dignified with the name of

probability rather than wantedness or hope, or something

such word. For example, the subjective probabilities of

all mutually exclusive events that could happen in

particular circumstances must sum to unity. If one does

not believe that A can happen unless B does, then the

subjective probability of A cannot be greater than that of

B. All of the usual arithmetic applied to frequentist

probability is appropriate in dealing with subjective

probability.

One of the constraints is that all probability is

conditional. It makes no sense simply to say of an event

that its probability is P. As with the frequentist kind

of probability, one must think of what might be the

alternatives to the eventQwhat counts as the event not

happening, what occasions might be considered as

opportunities for the event to happen, on which the

probability is based. Would Joe appearing barefoot count

against the event "Joe wearing brown shoes the next time,"

or would he have to be wearing some non-brown shoes? We

notate a conditional probability with a vertical bar:

P(A|B) is the probability that A will be true, given that

B is true. B represents the occasion, A the event. B,

like A, must be something observable. The observer must

be able to determine whether B is true, whether this is an

occasion on which it is interesting to observe whether A

is true. We may, for example, be interested in whether

Joe is wearing brown shoes only on condition that he is

wearing some kind of shoes. Or, conversely, Joe may

usually wear no shoes, but when he does, they are usually

brown, so we are interested in whether he is wearing brown

shoes given that we see him, shod or no.

How one defines the condition has a great impact on the

subjective probability of an event. Given that one has

observed a coin thjat has nearly come to rest after being

tossed, one can put a high value on the probability that

it will lie "head" (or not, as the case may be), when it

finally does stop. The probability will be even higher if

the condition is added that there be no earthquake or

other disturbance of the floor before the final

observation, and higher yet if another condition is added

that no person disturb the coin's movement. This example

may seem extreme, but such conditions apply to every

judgment of probability, and in many cases they are both

less obvious and more important than these.

Probability depends on knowledge. If one had not observed

the coin since determining that it had a head side and a

tail side just before it was tossed, one would probably

judge that it had a probability of 0.5 of landing head or

tail, regardless of the earthquake or interference

conditions. After all, we model those events as being

equally likely to end up with it lying either way, and

that is the same as our judgment of the result of the

undisturbed toss, so those conditions have no effect of

the subjective probability judgment. But notice that it

is our model that allows us to make that assessment. If

we had knowledge that a certain unscrupulous person with

skill in magical tricks had a vested interest in seeing

the coin land heads, we might change our subjective

probability that it would do so. A condition that we did

not observe interference would not be sufficient to bring

the probability back to 0.5. In our Einsteinian universe,

we cannot impose a condition that no interference occur,

only that we do not observe it to occur.

In everything that follows, whenever the notation P(x)

occurs, it must always be remembered that in the

background there is a condition Y, so that the correct

notation would have been P(x|Y). The omission of Y may be

justified on the grounds that Y is obvious, but sometimes

it is not so obvious. The notation P(x) can sometimes be

seriously misleading, if the background conditions are not

intuitively apparent.

Probability and replication

Whatever the condition Y, an observer can assess P(x|Y)

for any event x. Of course, for most cases of x and Y the

observer will have no reason to do so. Y may be very

unlikely to happen, given the current condition C, or the

observer may deem Y to be irrelevant to x, given C (i.e.

P(x|Y,C) = P(x|C)). But in other cases, the observer has

some reason for being interested in P(x|Y). Let us

suppose that the observer knows nothing about the relation

between x and Y, other than that it might be interesting.

To find out something of the relation, the observer has to

see what happens when Y occurs.

When Y occurs, there is an opportunity for x to occur. If

it does, the observer can make a note "Yes," and if not, a

note "No." After N occurrences of Y (replications of an

experimental observation), there will be a certain number

X of "Yes" notations. Since we hypothesized that the

observer knew nothing beforehand about the relationship of

x and Y, what the observer now knows is that x occurred on

X/N of the times that Y was true. A rational observer

would use that knowledge to set a value for P(x|Y) close

to X/N. If X/N differed appreciably from P(x|C) averaged

over the various current conditions C prevailing when Y

happened to be true in addition to C, then the observer is

likely to say that Y probably affects the probability of

x. The observer can also assign a value to P (Y affects

x, given C), which we can notate as P(M|C), where M is the

model (Y affects x) . If X/N was not very different from

P(x|C), then P(M|C) would be low; the observer would not

believe very strongly that Y affects x.

X/N is a proportion, not a probability. If in future more

occasions occur in which condition Y is true, X and N will

change. A frequentist view of probability asserts that

the "true" probability of X given Y is the value of X/N

that would be observed after an infinite number of

occurrences of Y (replications). A practical approach to

the frequentist view is less dramatic: X/N will approach

an unknown but true ideal limit ever more closely as the

number of occurrences of Y increases. There exist

theories, which we have no reason to dispute, about the

probability that X/N would take on any particular value

after N occurences of Y, if the ideal has a certain value

Z. A rational observer would be likely to incorporate

these theories into the subjective judgment of P(M|C),

by, for example, comparing the probability P(M1|C), where

M1 is that X "Yes" events would be observed in N

opportunities if Y had no effect, with P(M2|C), where M2

is that P(x|Y,C) = X/N.

There is a hidden assumption in the foregoing: that

condition Y can be repeated (that replications are

possible). What does it mean to say that a condition

recurs? If everything in the observable universe is taken

into account, no condition can ever recur. If nothing

else, the universe has aged since the first occurrence of

the condition. Its original age can never be recovered.

But the observer may well think that this doesn't matter,

given that the difference in age is probably only a few

parts in 10^10. Such an observer will accept that two

occurrences in which Y is the same except for a small

increase in the age of the universe can be considered

together in evaluating P(x|Y). But the age of the

universe is not the only difference in conditions between

the two occasions. The position of the sun, moon and

planets in the sky will have changed as well. If their

arrangement ever recurs, it is at intervals long enough

that the positions of the stars will have changed

significantly. But if our observer is not astrologically

inclined, this difference may not matter either. So the

observer may say that the condition Y recurs even though

the age of the universe is different, and the planets have

moved in the sky. That is strictly a personal judgment by

the observer who is trying to get data that will allow a

reasonable value to be developed for the subjective

P(x|Y), for subjective probabilities are greatly affected

by observation. The value determined with the aid of

observation, of course, applies to some future occasion of

condition Y, not to any that has already occurred, because

for those occurrences of condition Y the observer knows

whether the answer was "Yes" or "No."

Usually, what the observer considers to be an occurrence

of condition Y is specified not by listing all the

irrelevant conditions, but by listing some of the

conditions that would change the situation into something

other than Y if they were varied. All events that happen

when the condition does not include those falsifying

conditions ought to be counted as opportunities for x to

occurQas replications. Of course, in practice, they are

not. The observer notices that when (Y but not Z) x tends

to occur less often than when (Y including Z), and so

changes the definition of Y so that it necessarily

includes Z and a failure for Z to be true makes the

situation not suited to an observation relevant to

determining P(x|Y).

The important point about this discussion is that what

constitutes a replication of an observation is a matter

totally subjective to the observer. For example, in an

experiment in psychophysics, one observer may say that the

presentation to a trained listener of a particular

waveform to which is added noise from a well-controlled

noise generator constitutes a replication of an

observation to determine how well people can detect that

signal in that noise. Another observer may note that the

detection probability depends on whether the trial is

presented early or late in a series of trials, and that

therefore early trials should be considered separately

from late ones. Yet another observer may note that it

depends on whether the observer has just eaten lunch, or

on the particular waveform obtained from the noise

generator, and so on and so forth. It is never possible

to specify objectively what constitutes a replication of

an observation.

So far, I have tried to make the point that the term

"probability" refers only to an attribute of an observer,

who should apply it only to unique events. But the

observer must have some reason to evaluate the probability

of the event, and this reason may come from anywhere. It

could be based on being told something by a trusted (or

untrustworthy) source, it could come from a belief that

some known mechanism causes the event to occur or not to

occur, or it could come from observations of whether a

similar event occurred under apparently similar

circumstances on one or more other occasions. It is up to

the observer to determine how to relate the various

sources of information so as to adjust the subjective

probability of the event. And most importantly, it is up

to the observer to determine what constitutes a

replication of an observation in case the occurrence of

similar events under similar circumstances is a

contributor to that probability estimate.

Probability and measurement

What is the probability that the width of this page is 8.5

inches (assuming you are reading this on North American

letter-size paper)? What is the probability that the

width is 0.4 inches? Technically, the answer to each

question is "zero." The page may be roughly 8.5000001 or

8.4999999 inches wide, but that is not 8.5. Nevertheless,

one feels that the probability of it being 8.5 inches

should be higher than of being 0.4 inches. We "know" that

the width is not 0.4 inches, but we do not "know" in the

same way that it is not 8.5 inches. Indeed, nominally,

the width is 8.5 inches, so if someone asks "Are you

reading something on 8.5 inch paper or 6 inch paper?" we

could confidently answer "8.5 inch," and would not have to

say "neither" as we would if the question were "Are you

reading something on 1 inch paper or 6 inch paper." So,

there is a range of width over which we are satisfied that

this is 8.5 inch paper, more or less.

How well must we measure, to agree that this paper is 8.5

inches wide? That depends on what else we "know." If we

are accustomed to North American standard sizes, we need

only see well enough to say "this is letter paper, not

some other size," because paper does not come in other

widths near 8.5 inches. But if the paper might have come

from some place more in tune with world standards, then it

might be A4, and thus a little taller and narrower than

8.5 x 11. What is the probability that the paper might be

A4? Is it enough to require us to measure the paper more

precisely than by a quick glance?

A quick glance suffices to assure us that the paper is 8.5

x 11 rather than, say, 11 x 14 inches. But a quick glance

is insufficient to assure us that the paper is not A4. A

closer look to get more information about the page is

needed for that judgment. But if the question is whether

the paper-cutting apparatus that was supposed to cut the

sheet to be 8.5 x 11 is properly adjusted, no amount of

looking will be enough. The paper must be carefully

compared with some standard, such as a rod that is

asserted to be 8.5 inches long. Ignoring for the moment

the trust that has to be placed in the assertion about the

rod length, what is the next step in the determination?

One looks to see whether the rod is longer than the paper

is wide. Maybe there is a clear difference, and one can

say something like "On the condition that I am not drunk

or hallucinating or something like that, I am quite sure

this paper is less than 8.5 inches wide." But maybe the

difference is very small, and the best one can say is "I'd

bet 3 to 1 that this paper is less than 8.5 inches wide,"

or even "I have no idea whether this paper is more or less

than 8.5 inches wide."

What is the next step? Get a microscope? That may answer

the question with respect to a specific piece of paper for

one observer, but for another piece or a different

observer, it may not. The same set of possibilities will

exist. There will always be some width of paper for which

you cannot say whether it is greater or less than the

specified width, no matter how precise the measuring

instrument. Your measurement is always a probability

distribution. Using the naked eye and the measuring rod,

you may say that the probability is essentially zero that

the paper is less than 8.48 inches wide (on the condition

that you are not hallucinating) or that it is wider than

8.52 inches. Perhaps you would bet even odds that it is

between 8.495 and 8.505 inches. But this is a lot better

than you could have done before you got the measuring rod.

Without it, you might have given even odds that the paper

was between 8 and 9 inches wide. The measuring rod has

reduced your uncertainty 100-fold, and if you used a

microscope you might be able to reduce it another 100- or

1000-fold. You cannot reduce it to zero, no matter what

measuring instrument you use. All you can do is to get

more and more information by closer observation.

Any number that is the result of a measurement is a

discrete approximations to the "real" value of the thing

being measured. Indeed, all numbers that we can use in

any way at all are either rational with a finite number of

digits in their fractional numerator and denominator, or

are describable by an algorithm of finite length. Such

numbers are useful, since they are the only means we have

of representing values in the world, one must always be

aware that they are not reality. Any such number

represents only a distribution of subjective probability

about reality.

(More to come later -- Much later).