The Experiment

[Allan Randall (930331.1730 EST)]

This is a response to various posts from Rick Marken and Bill Powers.

Rick Marken (930329.2000) (930330.1130) (930330.1500)

I think you have understood most of the experiment correctly, but a
few points need to be clarified:

Let me see if I have this right. I assume that cutting the "perceptual
line" means that the system is now operating open loop.

Yes. The system is now simply getting percepts and producing outputs.
The output no longer has any effect on the input. The whole objective
is to replicate the closed-loop result in an open loop, using only
the percept.

Also, what is the length of a perceptual
input? Are you referring to the length of the vector of values that
make up Pi?

Basically, yes. The length is the number of bits actually used to specify
the percept in the computer simulation. This is not a single value of
the percept at one moment in time, but a vector of values over a time
period.

...So if D had 1000 samples and I could recover all
1000 values by plugging in a Pi with 500 samples, then the entropy of
D is 500? Is that right?

Yes, provided that 500 is the *minimum* required. That is why we
need to start with the shortest and work up.

Is this right? ... Is P coming in via the usual closed
loop perceptual input or "for free" as an open loop input?

Open loop, for free. Your intuition here was exactly right. The
procedure for H(D|P) is identical to that for H(D) except that we are
given P for free. Both entropy calculations are done strictly open
loop, since the algorithmic definition of entropy we are using is
defined in open loop terms: input, program, language and output.

I know you
expect to need only one P0 value -- that any a P0 value of length
0 can be added to P' and maintain the ability of P' to produce O=D.
But you do have to try adding at least ONE P0 string (other than
the zero length one) to show that this is true.

I think we have a notational confusion here. Pi, *not* P0, is my
notation for an arbitrary value in the set {P0,P1,P2...}. There is
only one P0, and it is of zero length by definition. It is the Pi
values that can be of arbitrary length. Since the procedure is to
look for the shortest successful string, there is no need to try
*any* non-zero-length Pi's *if* P0 is successful. Otherwise, we
try as many as needed to find a successful contender. The procedure
is simple: try all Pi's in order of shortest to longest, and STOP
as soon as one is successful. Also, I assume your P' is the same
as my Pk - it represents the first successful Pi in an open
loop search.

Do you disagree about the results that I have assumed we would get
from this experiment?

Yes. But you'll see when you actually do the experiment.

As I said in a message to Bill, I think this experiment is almost
redundant, since P' will turn out to be zero length (P0) for H(D|P),
and this is the very first Pi we will try. This part of the
experiment is essentially identical to Martin's proposal for a
"mystery function." If I am right, the rest of the experiment
will be unneccesary. If you are right, I will have to continue with
the search. So I guess I'll just have to do the experiment.

Do you understand my assumption that we will not actually be looking
for D in the open loop exponential search? Since you have already agreed
that nearly 100% of the information about D is in the closed loop
output, I have assumed it will be sufficient to look for replication
of this closed-loop output in the open loop phase (without using the
closed-loop, of course). I am trying to show your claim that the
output has 100% of the information about D and the input has 0% of it
to be inconsistent, so I think this assumption is valid for now.

Rick ("There's no information about disturbances in controlled
perceptions") Marken

Ashby's whole point was that perfect control was not possible, and
that error control relies on the detection of minor imperfections
to prevent major imperfections. This is why it's called "error
control," after all. If perfect control were possible, then yes,
there would be no information about the disturbance in the
perception. This is the goal of the control system. But it will
always fall short of this, since it controls via the detection of
imperfections. As Ashby said, perfect control would perfectly block
the very channel through which the control system gets its useful
information. Thus, error control is inherently imperfect. Its very
power lies in its inherent imperfection.

...The length of the first candidate i vector that produces
o values that match d perfectly is H(D) (There is a problem here;
What if you don't get any perfect matches to d, Allan?.Nothing
but an infinite loop gain system could produce outputs that
match d perfectly anyway: how about doing this just until a
candidate i vector produces o values that match the disturbance
to the same degree as did the o values generated in the closed loop
case.

This is basically what I did - I specified that, since we agree
that the closed-loop output has near-100% of the information about
the disturbance, then we are simply going to look for a replication
of the closed-loop output, rather than the disturbance.

Assuming you can find H(D) using this decidedly peculiar technique
(why not just measure the variance of d?)

Because what I have described, as "peculiar" as it may seem,
corresponds directly to the technical definitions you will find
in the literature on algorithmic information theory. Variance does
not. Information is defined in terms of probability, not variance.

Allan is
assuming that the output resulting from the original i vector
along with the "null" (0 length) candidate i vector will produce an
output that perfectly matches the disturbance (or, at least, matches
the disturbance as well as the output did in the original run).

Is this a correct description of the experiment Allan?

Yes, exactly.

Bill Powers (930330.2000 MST)

Allan is using information about o in his method,
as I vaguely understand it now, with your help. It shouldn't be
surprising if he can also reconstruct d.

No, I am not using information about o to reconstruct d (or in this
case the original o).

By the way, I think this notation H(p|d) is not just an ordinary
function, but represents some sort of probability calculation
with base-2 logs and all that. Allen?

H(P|D) = -log prob(P|D) (see my original posting)

Rick Marken (930330.2100) responding to Bill Powers:

Allan is using information about o in his method,
as I vaguely understand it now, with your help. It shouldn't be
surprising if he can also reconstruct d.

I don't think Allan is going to use information about o (the
o generated in the real, closed loop run) at all. In fact, the
way I conceive of Allan's study, you don't need to save o at
all; all you need is d and p from the closed loop run.

Your description is essentially accurate, except that I suggested
looking for the original closed-loop o, rather than an equal
d-correlation with it, as you suggest. Either strategy would
be okay as far as I am concerned. Given that we assume the
original o has 100% of the information about d, it really doesn't
matter terribly much which strategy we use.

...are you with us, Allan?

I think so. As far as I can tell, your description is accurate.

Bill Powers (930331.1030 MST) responds:

As I understood it, Allen was going to assume that control was
perfect, so o = -d with a constant reference signal. So the
procedure depends on knowing o, which amounts to knowing d.

But let's wait to see what Allen says.

No, the procedure that replicates the closed-loop o in the open
loop phase uses only the Pi value, not o. The closed-loop o
value (or d if we choose Rick's version) is used only for
comparison purposes to check the result. It is not used in any
capacity at all to produce the open-loop result. I believe Rick
has understood the proposal, and so I will now attempt to
implement it. I probably will not post anything else on this
subject until I have the results.

ยทยทยท

-----------------------------------------
Allan Randall, randall@dciem.dciem.dnd.ca
NTT Systems, Inc.
Toronto, ON