Levels of perception (Re: PCT researcher who doesn't talk)

[From Bill Powers (2009.03.11.0745)]

I’m supposed to go to a conference at Pam Fox’s “Hotel Villa
Nirvana” with some others from IAACT on Monday. At the moment I’m on
antibiotics for a nasty bronchitis thing and don’t know if I’m up to the
trip. This sounds like a blog, but stay tuned.

Martin Taylor 2009.03.08.14.39 –

I’m interested in more than just
the search for the controlled variable. I want to know something about
the properties of the control loops, and in particular the properties of
their component pathways. That’s one reason for continuing to concentrate
on the Schouten experiment. I’ll use conventional research when it seems
to have something useful to say, which I think is the case more often
than you think is the case.

I think that’s a reasonable objective (in case anyone asks what I think
is reasonable). Just to stick with the basic PCT methodology, however, I
think that the first step in any such investigation is to establish
beyond a resonable doubt what it is that you’re investigating. Exactly
what are the control loops involved in the Schouten experiment? The
evidence on that score is pretty sparse, since no attempt was (or could
have been) made to identify the controlled variable, the means of sensing
it, and the means of affecting it. Before you can measure the properties
of a control loop, or convince anyone that that’s what you’re measuring,
you must identify the control loop with whatever tests are appropriate. I
don’t care whether you call the approach the Test for the Controlled
Variable, but to preserve the chain of evidence, so to speak, you have to
provide some formal reason to suppose that the control system has been
identified as far as possible. It’s easy to investigate the properties of
phlogiston if you don’t have to demonstrate that there’s such a thing as
phlogiston. I don’t want other scientists to think of PCT as
phlogiston.

In the opposite direction, PCT
ought to be able to say something useful about most observations of the
behaviour of a person, no matter how the data were collected – by casual
observation, conventional experiment, or experiment based on PCT
analysis.

I really don’t agree with that. There are certain things you need to know
before you’re justified in saying anything about human control systems.
At least you need to know what the behavior is controlling, and to find
that out you have to do things that aren’t ordinarily done in an
experiment designed under a different theory. For example, I suspect that
in an operant-conditioning experiment, what is being controlled by the
subject organism is some aspect of the reinforcer delivered by pressing a
lever, or some effect of doing that. The simplest way to rule out that
possibility is to apply a disturbance directly to the variable being
controlled, as you define it. If the effect is exactly what you would
predict from knowing the disturbance, and if nothing the subject does
tends to reduce that effect rather strongly, you can give up that
hypothesis without any strain. If that test is passed, you can go on to
identify the particular inputs and outputs that go in and out of the
hypothesized control system, or if you have a publication deadline you
can assume with at least some confidence that there’s a control system
involved somehow, somewhere. But who would think of adding and
subtracting food from the cup that is normally filled by the rat’s
pressings of a lever? (fortunately, some people studying obesity did that
with animals who fed themselves entirely by pressing the bar, with the
results we would expect);

In fact, Bruce Abbott found that our initial assumptions about a rat’s
control systems were wrong: by subtracting out the feeding time from the
total experimental time, Bruce showed that the rats did NOT vary their
rate of bar-pressing as the rate of reinforcement changed. They
apparently just pressed as fast as they could or else didn’t press at
all. We would have looked pretty silly if we had reported our results
before realizing that. Later we realized that the standard method of
maintaining body weight at 80% of the free-feeding level was an
experimenter-driven control process that competed with the rats’ own
control systems, if any, so we were preventing any normal control process
in the rats based on body weight from working.

The main thing I don’t recommend is actually a very bad habit
psychologists often get into: making a fundamental assumption, and then
simply proceeding as if it’s right without pausing for any kind of test.
One particularly bad example was something I read in Carver &
Scheier. They were looking for some effect of self-awareness on something
or other. What they did to manipulate self-awareness was to place a
mirror in the same room with the subject. This is the same principle used
in designing the Three Mile Island reactor controls: measure the command
that was supposed to open or close a cooling-water valve, instead of
verifying the effect on water flow. Does placing a mirror in a room
increase the self-awareness of a subject in the same room? Maybe. Maybe
not. Since we have no way to detect self-awareness itself, there is no
way to test this assumption. Of course that renders the rest of the
experiment as useless as the critical light on the panel of indicators at
Three Mile Island, which showed only that a switch was in the
“on” position. So there was a mirror in the room when the
person did something. So what?

If S-R psychologists had been required to demonstrate that everything
called a “response” was a response to a specific stimulus
as predicted by S-R theory, S-R psychology wouldn’t have lasted
long.

The question really is
whether any particular researcher wants to consider particular aspects
either of data or of mechanism. And that’s a question of the particular
researcher’s reference values for controlled perceptions, something that
can be influenced by, but not easily controlled by other
people.

I think that in any scientific culture, we place requirements on each
other to build our ideas on firm foundations, not just on assumptions.
People can differ on what they call a firm foundation, but I don’t think
that any scientific journal worthy of that name would accept a paper in
which the primary facts needed to support the conclusions were simply
made up by the author. It’s not just a matter of personal preferences and
academic freedom. Think of the 150 years science wasted on phlogiston,
just because nobody asked how we could find out if it really
existed.

I have no objection to measuring the properties of control systems to any
depth one finds interesting. But I’m not going to believe a word of it if
you don’t first show that you’re actually studying a control process. We
have to prove that we studying something every time we study it, or the
results are simply worthless. It really doesn’t take that long to check
out a few assumptions. A chemist doesn’t simply assume that his measuring
scales are calibrated right, or that the liquid in a bottle is really
sulphuric acid. We learn mainly from the times when the assumptions
prove, to our surprise and edification, to be wrong – but that will
never happen if we don’t test them.

Best,

Bill P.

[From Rick Marken (2009.03.11.1030)]

Bill Powers (2009.03.11.0745)]

I’m supposed to go to a conference at Pam Fox’s “Hotel Villa
Nirvana” with some others from IAACT on Monday. At the moment I’m on
antibiotics for a nasty bronchitis thing and don’t know if I’m up to the
trip. This sounds like a blog, but stay tuned.

Please take care of yourself. Mexico is having a drug war anyway so it’s probably not a good idea to go down there until the US ends Prohibition II, which will probably be never.

Martin Taylor 2009.03.08.14.39 –

I’m interested in more than just
the search for the controlled variable. I want to know something about
the properties of the control loops, and in particular the properties of
their component pathways.

I think that’s a reasonable objective (in case anyone asks what I think
is reasonable). Just to stick with the basic PCT methodology, however, I
think that the first step in any such investigation is to establish
beyond a resonable doubt what it is that you’re investigating. Exactly
what are the control loops involved in the Schouten experiment? The
evidence on that score is pretty sparse, since no attempt was (or could
have been) made to identify the controlled variable, the means of sensing
it, and the means of affecting it.

Precisely my point, though, as usual, made more cogently and respectfully.

Before you can measure the properties
of a control loop, or convince anyone that that’s what you’re measuring,
you must identify the control loop with whatever tests are appropriate.

Yea!!

I
don’t care whether you call the approach the Test for the Controlled
Variable

Me neither, though it’s hard to think of what else to call it. Maybe we could call it “Bill”:wink:

but to preserve the chain of evidence, so to speak, you have to
provide some formal reason to suppose that the control system has been
identified as far as possible. It’s easy to investigate the properties of
phlogiston if you don’t have to demonstrate that there’s such a thing as
phlogiston. I don’t want other scientists to think of PCT as
phlogiston.

Right on!

Take care of yourself. I hope you feel better soon. The warm weather in Mexico might actually be good for you.

Best

Rick

···


Richard S. Marken PhD
rsmarken@gmail.com

[From Bill Powers (2009.03.14.1703 MST)]

Martin Taylor 2009.03.14.17.04

What I thought I had shown was that the measurement of perceptual information in the first fraction of a second after the light goes on does not treat the perceptual signal as a continuing analog measure.

I could see no connection between what you discussed an the rate of information transfer.

I know you can't, but I see one. We're still not talking about the same thing.

I mentioned that if this same light perception were used during continuous control of light intensity, its bandwidth, and hence its formal maximum rate of information transfer,

The rate of information transfer depends not only on bandwidth but also on SNR.

Yes, and the SNR I am talking about is essentially infinite during most the the time interval I am talking about. The SNR is apparently zero until about 200 milliseconds after one light has turned on, because there is no signal. The guesses are random until then. Starting the time clock at the (approx) 200 millisecond mark where the proportion of correct guesses is starting to rise above 50%, we see that the probability of a correct guess rises quickly by about 3 standard deviations in 50 milliseconds, or 17 milliseconds per standard deviation, at which delay the probability of guessing wrong has dropped (from my tables for a Gaussian distribution) to 0.0027, for an SNR of around 370:1. By this time the rate of information transfer must be almost zero, since there is essentially no uncertainty left (if you wait for the signal to rise by one more standard deviation, the probability of error goes to 0.000063).

But there is no reason to think that the relationship signal has reached its maximum, or even half or a quarter of its maximum, 50 or 60 milliseconds after the signal begins to rise. If you were to ask the person to react to the light by adjusting its brightness to match some standard, that adjustment would be going on for many times the initial 50-millisecond mark.

If control depended on information flow, it would not be possible to do the task I describe, because that flow would drop rapidly to zero long before the match within some criterion is achieved (as I understand you to be measuring information flow).

What the Schouten experiment does is to examine the probablity of perceiving the correct light at a time when the perceived difference in brightess of the two lights is comparable to the noise level in perceiving the brightness relationship. That noise level can be very small, however, in comparison with the final perceived brightness difference. If we look at relationship signals a hundred or more milliseconds further delayed, there is essentially no noise in them. No further information is being transmitted, yet I would be surprised if control were not even better than it was earlier.

Ninety-five percent of the uncertainty is removed in the first 50 milliseconds after the initial rise,

You have no measure of that. From the data, there is nothing from which you could estimate what percentage of the uncertainty is removed in 50 msec.

I just showed you that measure. You get it by looking at the measured fraction of correct guesses for a given delay in making the guess. That can be taken as the probability of making a correct guess, and given that probability you can use the statistical tables to find out how many standard devations above the noise level the relationship signal is at that delay. You don't know the actual signal level or noise level, but you know their ratio. This assumes a
Gaussian distribution, but if you assume a different distribution you can do a similar calculation using it, or you can simply measure the distribution and use that.

This is why I asked earlier if the basic data taken during this experiment is the fraction of correct guesses at each delay. If it is, then you have the probability of a correct guess right away as a function of delay. Given that probability and a Gaussian distribution, you can determine the relative standard deviation of the noise and the signal-to-noise ratio. The linear relationship between delay and d'^2 (which you said was the standard deviation) is simply what you would expect if the distribution were Gaussian. I suspect that d'^2 is calculated from the fraction of correct guesses in a way that uses the probability tables the way I am doing.

I think that's enough to get into for now.

Best,

Bill P.

[Martin Taylor 09.03.0.4.17.42]

[From Bill Powers (2009.03.04.0237 MST)]

Rick Marken (2009.03.02.2230) –

Where the timing

results matter is that they provide a measure of the channel
capacity of

the pathway marked in red on most of my diagrams – from sensing
the
light > to the output of the perceptual function that creates the
interpreted

category for the location perception.

What you call “channel capacity” is what I would call
"time required

to compute a perception of that type" (whatever the type is).
So,

again, I think we are in complete agreement, we’re just using

different words.

I agree. Channel capacity is a theoretical concept somewhat removed
from
the actual situation (the channel capacity is being measured only in
short bursts of information so there is no way to know what the
steady-state capacity would be).

Technically, you are correct. However, since the data fall very
precisely on a straight line, they are consistent both with the
hypothesis that the measured capacity is the same as the steady state
channel capacity and with the hypothesis that what is measured is only
a burst capacity.

Other experiments, though, may argue for this being a burst rate. Even
if this were true for those experiments, all of them use static signal
presentations, and the answer might be different for a varying
presentation. There’s really no point in continuing to gather data when
you have enough and the situation isn’t changing. Since we are asking
about the channel capacity for perception, we can’t use control
bandwidth as a useful measure of the steady state channel capacity,
control bandwidth being possibly limited by other parts of the control
circuit, not least by the transport lag that would cause oscillation
for too high a gain at too high a frequency.

And the forced-choice format is somewhat
artificial, too: I would probably be a bad subject who resists making a
choice when I am too uncertain to believe I know which light went on.
My
effective channel capacity, in normal circumstances, would be
considerably lower than what this experiment would suggest.

Your effective channel capacity would not be affected by your
reluctance to answer until you acquired sufficient data. What would be
affected would be the apparent transport lag, or the d’^2 you require
before making a free-timed decision (the usual form of a reaction time
experiment).

As for the forced-choice with deadline being an artificial situation, I
just ask whether you have never found yourself having to choose between
two alternative courses of action when you would prefer to wait to get
more information about the situation. If you have never had to make a
choice before getting all the information you might have wanted, you
haven’t lived!

As a circuit-oriented theoretician I view this experiment as a clever
way
of estimating the rise-time of a perceptual signal. The 1/e time would
seem to be about 50 milliseconds.

I have seen no evidence of an exponential curve in the data presented,
so I would certainly not try to venture a guess such as that. I might,
however, guess that by 50 msec the subject had acquired half the
information needed in an ordinary “as soon as possible” reaction time
experiment. In our other experiments (about which I include a quote
below), it seemed that the rise of d’^2 fits two straight lines better
than an exponential.

As a circuit-oriented theoretician, you are probably interested in the
bandwidths of different connections in the control loop, because they
affect the stability of feedback circuits. “Channel Capacity” is a
generalization of the linear concept “Bandwidth”, applicable in linear
and non-linear situations.

Then there is another reaction time, less well-defined, which involves
the rise time of the higher perceptual signal measured in the Schouten
experiment. At what time should we say that the perceptual signal
exists?

I thought perceptual signals always existed, but just had different
magnitudes at different times. Above, you talk about “the rise time of
a perceptual signal” as though you subscribe to this view. Here, you
seem to deny it.

As I see it, you can ask “At what time should we say the perceptual
signal exists” only when you are up to at least the category level,
since only at and above that level is it possible to say categorically
“yes” or “no” to “does the perceptual signal exist?” That’s the level
at which an appropriate answer can be given by a button push rather
than by a continuously variable handle.

When half the responses are correct? 95% of them
(2-sigma)? The latter
would occur 50 milliseconds after the shortest reaction time deduced
from
the data. But that is smaller than our uncertainty about how long it
takes after getting a correct identification for the contact under the
button to be closed. I think we’re exceeding the discriminative power
of
this experiment (and also of the concept of reaction time).

I take it that you are expressing an opinion based on faith, in
opposition to the precise straight-line data that Schouten obtained,
but that you seem to be saying is impossible?
Back to channel capacity. Here’s a quote from a different section of
the same paper from which I took the Schouten diagram, about my own
experiments (“Quantification of Shared Capacity Processing”. Taylor,
Lindsay and Forbes, Acta Psychologica 27, (1967) 223-229). The data
were reported elsewhere. If it’s of interest I might be able to find
the paper with the data:
"Among our experiments on audio-visual capacity sharing have been
some in which presentation time was an experimental variable. … For
the immediate purposes it suffices to remark that the subjects had to
make one or more of four discriminations: left-right discrimination of
a dot on a TV screen, up-down position of the same dot, pitch of a tone
burst presented simultaneously with the dot, and intensity of the same
tone burst. The stimulus duration in various experiments has been
varied from as low as 33 msec to as high as 4 sec. For all four aspects
of the stimulus (we consider the dot and the tone to constitute a
single stimulus) we have consistently found that d’^2 rises linearly
with presentation time up to 130 msec or so, and that for longer
intervals the rise, while possibly linear, is very much slower. These
results are consistent with the proposition that d’^2 is an additive
measure of discrimination performance, while suggesting that the
processor can work effectively on only the first 130 msec of the signal.
"

As an aside, an interesting result from that set of experiments was
uniformly that when a person had to attend to two or more
discriminations (even though answering about only one) the total
information capacity for the discriminations was 85% of that available
for one of them, whether the multiple discriminations were in the same
modality or across modalities, which suggests that the controlled
perception involved is at some level where the actual modality has been
transformed into a more abstract representation.

Martin

[Martin Taylor 2009.03.07.00.19]

[From Bill Powers (2009.03.06.1035 MST)]

I find myself completely flabbergasted at your comments on information and channel capacity, to the extent that I have no idea where to begin to continue the discussion. "Flabbergasted" at the truly fundamental misunderstandings expressed is perhaps rather too weak for the astonishment I experienced on reading your message! We clearly don't speak the same language, and for me to comment further would be obviously pointless.

So I won't. I'll just continue to use information theory and channel capacity correctly, and hope that something of its value will come across on occasion.

However, I am glad that the discussion to this point has at least made it clear that not all data acquired by psychologists unaware of PCT is invalid.

Martin

[Martin Taylor 2009.03.07.00.28]

[From Rick Marken (2009.03.04.2230)]

Martin Taylor
(2009.03.04.16.05) –

“Time required to compute a
perception of
that type” could be translated as “time required to reach d^ = (say)
2.5”. That is equivalent to “Time required to acquire X bits of
information about the perception”…

Does this tell you we are in agreement, or that we are not? I can’t say.

Well, then, I’m going to have to guess that we’re not in agreement
about channel capacity because, if channel capacity has something to do
with acquiring bits of information about a perception, then it doesn’t
exist in my PCT view of things.

Nor does “green” exist in a colour-blind person’s view of things.

Martin

[From Rick Marken (2009.03.08.1810)]

Martin Taylor (2009.03.07.00.19) –

However, I am glad that the discussion to this point has at least made it clear
that not all data acquired by psychologists unaware of PCT is invalid.

My argument (the one made in my “revolution” paper) is not that the data from psychological experiments is invalid. Data is just data. What is invalid (I argue) is the interpretation of this data. What the discussion so far has made clear (to me, anyway) is that it is possible to look carefully at what was actually done in a conventional experiment and understand it in terms of controlled variables and the effects of disturbances on these variables. So the results of such experiments can be useful, to the extent that they suggest further research using methods that are appropriate to the study of closed loop systems, such as the test for the controlled variable.

I never doubted that the results of conventional research might be useful or suggestive; indeed, I said just that in the last section of my paper (on “How to Have a Revolution”). In that section I said:

The move to closed-loop psychology, when it happens, will be like starting psychology all over again, based on a new foundation: the closed-loop control model of behavioral organization. If, while pursuing the new psychology, we find useful or suggestive results obtained from the old one, so much the better.

My focus, however, is on trying to encourage researchers to drop what they are doing and to start doing research on closed loop systems the right way. My concern is that this move will be inhibited if researchers see too much value in what has already been done using conventional methods. That was actually the whole point of my paper. I know that researchers want to stick with what they know and see value in it. But the control theory revolution won’t happen (I believe) until researchers are willing to start off in a completely new direction, looking only at the data collected using the old methods as, at best, suggestive of what to look at using the new ones.

Best

Rick

···

Richard S. Marken PhD
rsmarken@gmail.com

[Martin Taylor 2009.03.08.17.53]

referring back to [From Bill Powers (2009.03.06.1035 MST)]

[From Bill Powers (2009.03.-7.1144 MST)]

Martin Taylor 2009.03.07.00.19 –

I find myself completely flabbergasted at

your comments on information and channel capacity, to the extent that I
have no idea where to begin to continue the discussion. “Flabbergasted”
at the truly fundamental misunderstandings expressed is perhaps rather
too weak for the astonishment I experienced on reading your message! We
clearly don’t speak the same

language, and for me to comment further would be obviously pointless.

Perhaps instead of thinking about everything that needs to be said, you
could think about the first misunderstanding on my part that needs to
be corrected before anything else can be fixed. If just one fundamental
point could be settled, perhaps the next one would yield more easily.
Obviously, what I said about information theory and channel capacity
makes sense to me, but not to you. If there is a “truly fundamental
misunderstanding,” wouldn’t it be important to clear it up? If it’s so
obvious to you that the error lies completely on my side, you should be
able to see where I went off the tracks and set me straight. I,
obviously, can’t see where the mistake is. Don’t I usually change my
tune if I see I’ve said something wrong?

I haven’t been ignoring you. I simply don’t know how to sort out the
misunderstanding. When I said “We clearly don’t speak the same
language”, I meant a higher-level “language” than just English. Up
until your message, I had thought we were talking much the same
language and just had different understandings of the techical aspects.
But now I see that I was wrong, and I’m completely at a loss to figure
out how to generate a translation mechanism. You use English words in a
way that seems as though they should mean something, but to me they
don’t. Reading your words is like making sense of a surrealist
painting; it makes the head spin. I don’t know if you are making a
mistake or not, in your own language.
For just one example, take “I showed that the
same model will yield the same information content as you measure it
even
when the actual channel capacity varies enormously
.” This, to me,
is like saying “I showed that you get the same amount of water in the
bathtub per minute, whether the tap is wide open or nearly shut”. It
just doesn’t make any sense, but since it appears to make sense to you,
it must be that some of the words mean something different to you than
they mean to me. For another example: “I
just don’t think that “Your girlfriend is 18 years old”
contains as much information as “Your girlfriend is 17 years
old,” if by “information” you mean "important
implications
.”" Who, in natural language or in scientific theory,
ever considered “information” to mean “important implications”? I’ve
certainly never heard “information” used to mean “implications of
action” before. But this seems to be part of your meaning of
“information”, even when you are critiquing my mathematical analyses.
There are many other examples in which the sequences of words you use
just make no sense to me, and so I really don’t know whether you are
self-consistent or not (I think self-consistency is a necessary but not
sufficient condition for avoiding error). I do know that very little in
your message says things I would say about information and its
measurement. But then it seems apparent that by “information” you mean
something utterly different from what I would mean if I used the term
either in casual conversation or technically. Technically, for example,
assuming the Shannon way of using the term (as reduction of
uncertainty), you would be quite wrong to say: ““John hit Mary” has
the same
number of bits of information as “Mary hit John”
”, since the
listener’s prior probabilities of the two messages would very likely be
different for the two sentences. Shannon makes it quite clear that the
distribution of probabilities over what you expect is a major component
of the information in a message. If someone had already told you that
John hit Mary, then “John hit Mary” has nearly zero bits of information
about John and Mary (although it may inform you that the speaker did
not know you already knew), whereas “Mary hit John” may have quite a
few, depending on what you believe about Mary’s propensity to hit back.
At least that would be so according to Shannon. To put it in everyday
language, the less surprising the message, the less information it
carries about the matter concerned.
One thing you say does coincide with what most writers on information
claim: “Information theory has nothing to do with meaning;”.
I’ve always disagreed with that claim, even though it is generally
accepted. Because the information gained from a particular physical
message depends entirely on what you knew about the subject beforehand,
it seems to me that information theory has everything to do with
meaning. But channel capacity does not. It represents the maximum rate
at which information could be transmitted through that medium,
independent of the circumstances and the prior knowledge of the
receiver.

The central concept of the Layered Protocol Theory of dialogue is the
hierarchy of one’s perceptions of the dialogue partner at many levels
of abstraction, and their relations to the corresponding reference
perceptions of the partner. Misunderstandings occur when there is
uncertainty about the perception of the partner, or more particularly
when the perception of the partner is a bad representation of the
facts. If your perception of the partner doesn’t conform to the facts,
attempts to correct the error are likely to work in an ineffective
direction, and won’t reduce it. If you speak Chinese and I speak
Albanian, our communication won’t improve until either I learn Chinese
or you learn Albanian. Likewise, if we are to have a meaningful
discussion about information, it can’t happen if it is true both that I
have no concept of what you mean by the word, and that you have no
concept of what I mean by it. Judging by your previous message, both of
these are in fact true.

Martin

[Martin Taylor 2009.03.08.11.23]

[From Rick Marken (2009.03.08.1810)]

Martin Taylor (2009.03.07.00.19) –

However, I am glad that the discussion to this point has at least
made it clear

that not all data acquired by psychologists unaware of PCT is
invalid.

My argument (the one made in my “revolution” paper) is not that the
data from psychological experiments is invalid. Data is just data. What
is invalid (I argue) is the interpretation of this data. What the
discussion so far has made clear (to me, anyway) is that it is possible
to look carefully at what was actually done in a conventional
experiment and understand it in terms of controlled variables and the
effects of disturbances on these variables. So the results of such
experiments can be useful, to the extent that they suggest further
research using methods that are appropriate to the study of closed loop
systems, such as the test for the controlled variable.

I never doubted that the results of conventional research might be
useful or suggestive; indeed, I said just that in the last section of
my paper (on “How to Have a Revolution”). In that section I said:

The move to closed-loop psychology, when it
happens, will be like starting psychology all over again, based on a
new foundation: the closed-loop control model of behavioral
organization. If, while pursuing the new psychology, we find useful or
suggestive results obtained from the old one, so much the better.

My focus, however, is on trying to encourage researchers to drop what
they are doing and to start doing research on closed loop systems the
right way. My concern is that this move will be inhibited if
researchers see too much value in what has already been done using
conventional methods. That was actually the whole point of my paper. I
know that researchers want to stick with what they know and see value
in it. But the control theory revolution won’t happen (I believe) until
researchers are willing to start off in a completely new direction,
looking only at the data collected using the old methods as, at best,
suggestive of what to look at using the new ones.

I think I am in complete agreement with what you say here, and I have
said that I thought you had written a good paper.

If I have any problem in this area it is based on our historical
interactions, in which you have often suggested I am too open to
accepting results obtained in conventional manners, and I have said you
dismiss them too readily. I don’t expect that difference of opinion to
change much simply because we agree on what you say above.

Martin

[From Rick Marken (2009.03.08.1040)]

Martin Taylor (2009.03.08.11.23)–

Rick Marken (2009.03.08.1810)–

I never doubted that the results of conventional research might be
useful or suggestive…

My focus, however, is on trying to encourage researchers to drop what
they are doing and to start doing research on closed loop systems the
right way…

I think I am in complete agreement with what you say here, and I have
said that I thought you had written a good paper.

Great. So let’s start talking about how to create some research jewels based on the closed loop model of behavior and set aside (for now) the attempts to see what kinds of silk purses can be made out of the sow’s ear of conventional research based on the open loop model. I’m just worried that if we keep focusing on what can be salvaged from causal research we’ll never get started on control research.

Best

Rick

···


Richard S. Marken PhD
rsmarken@gmail.com

[Martin Taylor 2009.03.08.14.39]

[From Rick Marken (2009.03.08.1040)]

Martin Taylor
(2009.03.08.11.23)–

Rick Marken (2009.03.08.1810)–

I never doubted that the results of conventional research might be
useful or suggestive…

My focus, however, is on trying to encourage researchers to drop what
they are doing and to start doing research on closed loop systems the
right way…

I think I am in complete agreement with what you say here, and I have
said that I thought you had written a good paper.

Great. So let’s start talking about how to create some research jewels
based on the closed loop model of behavior and set aside (for now) the
attempts to see what kinds of silk purses can be made out of the sow’s
ear of conventional research based on the open loop model. I’m just
worried that if we keep focusing on what can be salvaged from causal
research we’ll never get started on control research.

I’m interested in more than just the search for the controlled
variable. I want to know something about the properties of the control
loops, and in particular the properties of their component pathways.
That’s one reason for continuing to concentrate on the Schouten
experiment. I’ll use conventional research when it seems to have
something useful to say, which I think is the case more often than you
think is the case.

In the opposite direction, PCT ought to be able to say something useful
about most observations of the behaviour of a person, no matter how the
data were collected – by casual observation, conventional experiment,
or experiment based on PCT analysis. The question really is whether any
particular researcher wants to consider particular aspects either of
data or of mechanism. And that’s a question of the particular
researcher’s reference values for controlled perceptions, something
that can be influenced by, but not easily controlled by other people.

Martin

[From Rick Marken (2009.03.08.2225)]

Martin Taylor (2009.03.08.14.39)–

I’m interested in more than just the search for the controlled
variable. I want to know something about the properties of the control
loops, and in particular the properties of their component pathways.
That’s one reason for continuing to concentrate on the Schouten
experiment. I’ll use conventional research when it seems to have
something useful to say, which I think is the case more often than you
think is the case.

OK, but have you ever used control theory based research? Ever done research using control methods like the test for the controlled variable? Might you be finding conventional research (like Schouten’s) to be useful because it’s more confortable to stick with what is familiar rather than try something new?

In the opposite direction, PCT ought to be able to say something useful
about most observations of the behaviour of a person, no matter how the
data were collected – by casual observation, conventional experiment,
or experiment based on PCT analysis.

Sure And I think that was shown, for example, with the Schouten experiment where it became clear (to me, anyway) that the behavior in that experiment could be explained as the control of a relationship perception (light/press) and that the quality of control was determined by the timing of the elements of the perception, which affects the ability to perceive the variable under control. But just “saying things” like this about this kind of data is not the same as actually doing the kind of research that could tell you what is really going on – what variables are under control and how they are being controlled.

The question really is whether any
particular researcher wants to consider particular aspects either of
data or of mechanism.

It seems to me that the question is whether any particular researcher wants to take seriously the possibility that organisms – the subjects of their research – are organized as closed loop control systems. A researcher who does take it seriously will (I believe) then have the courage to try something new and concentrate on doing research using control theory based methods.

Best

Rick

···


Richard S. Marken PhD
rsmarken@gmail.com

[Martin Taylor 2009.03.14.16.33]

[From Rick Marken (2009.03.08.2225)]

Martin Taylor
(2009.03.08.14.39)–

I’m interested in more than just the search for the controlled
variable. I want to know something about the properties of the control
loops, and in particular the properties of their component pathways.
That’s one reason for continuing to concentrate on the Schouten
experiment. I’ll use conventional research when it seems to have
something useful to say, which I think is the case more often than you
think is the case.

OK, but have you ever used control theory based research? Ever done
research using control methods like the test for the controlled
variable? Might you be finding conventional research (like Schouten’s)
to be useful because it’s more confortable to stick with what is
familiar rather than try something new?

I thought about your questions and wondered whether you might be right.
So I went back through my research history, and what I found made me
think you probably are not right.

Item: My thesis predicted a well-defined pattern of errors when
locating a dot on an index card or in reproducing a simple drawn
figure. It was based on the notion that one’s perception of a distance
differed when one was controlling one end of the distance as compared
to when one is presented with both ends, and following this up I did a
fair amount of magnitude matching based on the idea that control made a
difference. In fact, my first presentation to a scientific conference
was based on this, with S.S.Stevens in the audience. Since I was
essentially arguing that his methods of magnitude estimation were
misguided because he didn’t take into account the effects of control,
he asked a lot of questions, and in fact I had tea with him and his
wife afterwards to try to get some of the concepts reconciled. I don’t
think we did, but he was a nice person to talk with.

Item: When I was studying aftereffects, one technique I used was to
allow the subject to control to oppose the after effect – making the
perception be “normal” (i.e. keep a display perceptually stationary
after observing a moving one).

Item: Most of my research was on discrimination under different
conditions, studying the limits on people’s capability, using
methodology that our recent discussion has shown to be valid in a PCT
framework.

Item: My Layered Protocol Theory of dialogue was used in analyzing both
natural dialogues and the design of computer interfaces, and was found
to be a special case of PCT.

In addition to these, you know I have done specifically PCT research to
look at what aspects of control are affected by sleep loss.

So, though your questions may be quite reasonable to ask, I think their
implications are misguided, even though I cannot, for obvious reasons,
evaluate them by introspection. I have never had any interest in “the
test for the controlled variable” because that was never the question
at issue for me (and still has not been). One always assumed that the
subject was trying to control what the experimenter asked her to
control. In everyday life, at any one moment we are usually controlling
many low-level variables, and are changing whatever high-level
variables are of momentary interest. Testing for the controlled
variable is of interest when one wants to figure out, as you have,
whether a perception of “size” refers to area, perimeter, or some other
function of the shape, or what it is that a fielder controls when
trying to catch a ball, and it is very important when one is interested
in clinical psychology. I’m more interested in “how” than in “what”. I
hope that’s permissible?

I do, however, continue to believe that a lot of “conventional”
research has provided data that is and continues to be useful in a PCT
framework. In [Martin Taylor 2009.02.17.11.23] I presented one
situation where it can be valid (on which nobody has commented), and
that is a situation that I think covers quite a lot of “conventional”
research, including most of my own.

Martin

[Martin Taylor 2009.03.14.17.04]

[From Bill Powers (2009.03.09.0759 MST)]

Martin Taylor 2009.03.08.17.53 –

For just one example,
take
I showed that the same model will yield the same information
content as you measure it even when the actual channel capacity varies
enormously
.” This, to me, is like saying “I showed that you
get the same amount of water in the bathtub per minute, whether the tap
is wide open or nearly shut”.

What I thought I had shown was that the measurement of perceptual
information in the first fraction of a second after the light goes on
does not treat the perceptual signal as a continuing analog measure.

I could see no connection between what you discussed an the rate of
information transfer.

I
mentioned that if this same light perception were used during
continuous
control of light intensity, its bandwidth, and hence its formal maximum
rate of information transfer,

The rate of information transfer depends not only on bandwidth but also
on SNR.

would depend on the long-term rise time
which could be many times slower than the initial increase in
signal-to-noise ratio. Ninety-five percent of the uncertainty is
removed
in the first 50 milliseconds after the initial rise,

You have no measure of that. From the data, there is nothing from which
you could estimate what percentage of the uncertainty is removed in 50
msec.

What you’re saying, it seems to me, is that all
you have to do is observe the water level in the tub for the first 10
seconds, and from that you can infer how full it is going to
get.

You have no idea how full it is going to get, when you are measuring
just the flow from the tap. The channel capacity is that flow rate –
or rather, the flow rate provides a lower bound on the channel
capacity, and is a measure of the effective channel capacity.

This is actually taking us back to the discussions of “subjective
probability.” If probability is something that has existence outside
a person, which is what we tacitly assume when we calculate it from
observations of external events,

I’ve tried to point out that this tacit assumption should be dropped,
have I not?

then it can’t also be subjective and
different for each person. The same holds for information – if the
maximum information capacity of a channel can be calculated by a
formula
involving bandwidth and signal-to-noise ratio, it can’t also vary from
person to person depending on what surprises that person.

Why not? The channel capacity is the maximum that could be transmitted
over the channel. If the recipient knew beforehand what would be sent,
no information would be transmitted over the channel, but that would
not affect the channel capacity. If, in the Schouten experiment, the
subject was warned beforehand which light would go on, then he could
press the correct button no matter what the delay. As the experiment
was actually run, the subject had only a 50-50 chance of hitting the
right button if the delay was less than about 200 msec, so clearly that
information was not available, and all the information relevant to the
choice of button was passed through the channel in question.

We must be
talking about two different things. If the same
maximum-information-capacity message can convey knowledge to one person
and nothing to another person, then clearly we have to look inside the
person to see what is really happening – the message is only part of
the
story.

OK. You are getting the idea!

The idea that the message is carrying something from one
person to
another, other than photons or sound waves, is simply a
misunderstanding,
like the “looking rays” that came out of people’s eyes in
midaeval drawings.

Oh, it’s carrying something, all right, but what it is carrying depends
on the recipient as much as on the sender.

I think that’s really the most basic of my problems with the idea of
information transfer. I don’t think there is any information
“in” a message. All the information that seems to be
transmitted is really coming from inside the person receiving the
message.

Yes, you do seem to be getting it. That’s the essence of LPT. No third
party can possibly tell for sure what messages are passing between two
parties on a dialogue.

One thing you say does
coincide
with what most writers on information claim: “Information theory
has nothing to do with meaning
;”. I’ve always disagreed with
that claim, even though it is generally accepted. Because the
information
gained from a particular physical message depends entirely on what you
knew about the subject beforehand, it seems to me that information
theory
has everything to do with meaning. But channel capacity does not. It
represents the maximum rate at which information could be transmitted
through that medium, independent of the circumstances and the prior
knowledge of the receiver.

But the same channel capacity would be computed even though what is
transmitted means something to one person and nothing to another. How
can
you say that channel capacity has something to do with information? To
me, that is a self-contradiction.

Think of the bathtub again, but have it filled by a hose rather than a
faucet. I can measure the rate of flow out of the hose nozzle, but that
only limits how much of the water gets into the bathtub. If the hose is
pointed in the other direction, the bathtub stays empty; of if the
bathtub is already overflowing, the level doesn’t change even when all
the water from the hose is aimed at the tub. The hose flow is channel
capacity; the change in bathtub level is information received.

The central concept of
the
Layered Protocol Theory of dialogue is the hierarchy of one’s
perceptions
of the dialogue partner at many levels of abstraction, and their
relations to the corresponding reference perceptions of the partner.
Misunderstandings occur when there is uncertainty about the perception
of
the partner, or more particularly when the perception of the partner is
a
bad representation of the facts.

While I agree in general with LPT, I have reservations about some
aspects
of it. It seems to me that you have too much faith in the ability of
one
person to grasp, simply through informal interactions, what another
person is perceiving.

That’s why LPT is based on feedback loops in both parties. The sender
controls for seeing aspects of the recipient that would suggest that te
recipient has “got the message” (done the sender’s bidding, understood
what the sender said,…), while the recipient controls for perceiving
that the sender believes the recipient to have “go the message”. Both
act to influence the other’s perceptions until both arrive at
sufficiently low values of the error signal. Unlike a lecture or a
written book, a dialogue is the interplay of control systems.

I don’t think it’s that easy or that we do it that
well. It’s not that we’re “uncertain” about the perception of
the partner – we’re quite certain, but about the wrong thing. We
don’t,
or I should say I don’t since I may be a freak, look over the field of
all the things the partner might mean and try to fill in the blanks. We
do a certain amount of that, but mostly we just assume that the meaning
that springs instantly to mind when the partner speaks is the meaning
the
partner intended. The problem is not too much uncertainty; it’s too
little.

When you are talking with someone, do you never ask the equivalent of
“how’s that again?” I have observed, of course, that in our own
interactions you are much more liable to assume I have meant something
quite other than what I did mean than you are to ask, so I can well
believe that you do as you say. But different people have different
biases in this respect.

I went so far as to find Schouten’s paper in the Library (after which I
found it on my own shelf – go figure), and I note that he did some
preliminary “free reaction” (as fast as you can) runs. He found great
differences among the subjects in the error rate (between 0.4% and 4.0%
on average between his subjects A and B) and the speed of the responses
(average 335 and 290 msec respectively), but when he plotted the error
rate against actual reaction time, they seemed to fall near a single
curve (it’s hard to see precisely because he plotted on a linear
scale). Some people are happy making a reasonable guess, while others
want to be more certain before they commit.

If your perception of
the
partner doesn’t conform to the facts, attempts to correct the error are
likely to work in an ineffective direction, and won’t reduce it. If you
speak Chinese and I speak Albanian, our communication won’t improve
until
either I learn Chinese or you learn Albanian.

But even then, the word “family” will probably not be given the
same meanings by the two parties. My point is that they will
give
meanings to the word they perceive, but the meanings will come from
their
own memories, not from the word or the other person’s memories. I
repeat,
this is not a matter of decreasing uncertainty; it is being certain and
at the same time mistaken.

I thought about this, and I agree that in some cases one will
mistakenly believe one has a correct understanding, but I don’t think
that is the usual situation. More commonly, I think, one either has a
correct understanding or one remains uncertain and asks for
clarification.

When one (as recipient) believes wrongly that one understands, the
result is likely to be error in some related controlled perception. The
partner (the sender) is most likely to be the one whose controlled
perception experiences increased error, as when you close the door
after I ask you to close the window and you signal that you have
understood.

The result of this is likely to be more actions on the part of the
sender to reduce the error in the controlled variable (door open,
window closed). Or else actions to close off the conversation, such as
by getting up to open the door and close the window.

Schouten_accuracy-8.jpg

···

On the Schouten experiment, I now have more details to answer some past
questions: The bips were 20 msec long, separated by 75 msec. Schouten
measured the actual response time and used that in plotting the data,
not the requested response time. He noted that for the shorter
requested response times the subjects tended to delay their actual
responses.

In [From Bill Powers (2009.03.09.0215 MST)] you say:

------------quote---------

Perhaps if I boil my proposition down to its minimum form something
useful may yet emerge from it.

Let’s say that in this channel there is a signal that begins to rise
about 200 milliseconds after a light turns on, with an amplitude A
described by

A = Ao(1 - e^-k(t - To)) for t >= To and To = 200 msec.

This signal represents the difference in perceived intensity between
two lights.

Let the net difference signal s be the sum of A and a zero-average
random variable R with a Gaussian distribution:

s = A + R.

Now we can ask, what is the probability that s will be >= zero,
indicating (arbitrarily) that the right-hand light is lit when it is
actually lit? It is simply the probability that A will be greater than
-R. I am skipping over scaling factors and the necessary summing of
effects in two directions.

The Schouten experiment arranges for the perceived brightness
difference between the two lights to be sampled at various times t
after To, which means sampling it at various values of A[t]. The
fraction of correct guesses to total guesses for a given time delay t
will depend on the probability that A[t] > -R, which can be
calculated from formulae or tables. If A[t] is plotted in units of
standard deviations of R, and the distribution of R is Gaussian, the
result will be a straight line during the linear rise portion of the
perceptual signal.

Obviously a number of details need to be cleaned up, but that is the
structure of the model I proposed for the subject in the Schouten
experiment.

----------end quote-----------

This is exactly how Schouten saw it, and I have and had no reason to
disagree. It’s where I started. Here’s one of his diagrams to
illustrate the point:

It was my knowledge that d’^2 was a measure of the information
available from a discrimination of such accuracy that allowed me to
plot the information gain over time by subject B and by the “average
subject” for the figure I showed in several earlier messages. There’s
no conflict between his model (which is also yours) and my calculation
of information gain rate. I don’t remember whether Schouten gave me his
data to make my calculations – I suspect he did, because the meeting
was close to his laboratory – or whether I read off his graphs (which
would have been rather hard to do accurately). Bandwidth doesn’t come
into it explicitly; as I said in an earlier message, channel capacity
is a generalized version of bandwidth, which includes sampling accuracy
in addition to bandwidth.

The reason I did the recomputation in terms of d’^2 was that it is
impossible to measure differences in discriminability when d’ gets much
above 2.5 or 3. At the time I was working on a theory that perceived
magnitude (S.S.Stevens, if you like) was proportional to the
discriminability of the difference between the ends of an interval. In
my thesis work it would have been the difference between the
discriminability of a one inch distance between two dots when the
subject was or was not controlling the position of one of the dots.

In my first published paper I predicted the existence and parametric
form of an illusion that I had never seen mentioned, simply on the
basis that a path perceived to be straight would be a geodesic in a
space in which the discriminability of different possible paths was
affected by the proximity of a nearby disk. Given that, the properties
of partial differential equations allowed for the computation of the
parametric form of the illusion (a straight line as a function of disk
diameter, which was fitted very closely by the data). I also made a
correct counter-intuitive prediction that subjects who were more
precise would show more illusion. So I was interested in measuring the
differences in discriminabilities at levels far, far, higher than can
be measured by conventional means. The rate of information gain from
the discrimination seemed to be a reasonable way of differentiating
such big differences.

To Rick: As for what the subjects were controlling, I asked them to say
whether a straight line between the dots would go through the disk or
bypass it. I assume they were controlling for telling me what they saw,
but beyond that, I can’t guess. But I don’t think that makes the data
any the less reliable.

Martin

[Martin Taylor 2009.03.14.23.17]

[From Bill Powers (2009.03.14.1703 MST)]

Martin Taylor 2009.03.14.17.04

What I thought I had shown was that the measurement of perceptual information in the first fraction of a second after the light goes on does not treat the perceptual signal as a continuing analog measure.

I could see no connection between what you discussed an the rate of information transfer.

I know you can't, but I see one. We're still not talking about the same thing.

From the rest of your message, we most decidedly are not. I'm trying to discuss Shannon's information and uncertainty measures, and you seem determined that this should not be discussed, by substituting your own definitions for Shannon's, and then making inferences based on your definitions rather than his, which makes discussion rather difficult. You have a habit of talking about very small things as though they are zero, which makes discussion rather difficult. You talk about signal to noise ratio as though it changes during the presentation of an ongoing signal, and that also makes discussion difficult. You use your own definition of "uncertainty" to dispute the calculations based on the technical definition, and that makes discussion difficult. And from all that you the derive the preposterous statement

If control depended on information flow, it would not be possible to do the task I describe, because that flow would drop rapidly to zero long before the match within some criterion is achieved (as I understand you to be measuring information flow).

As you misunderstand me to be measuring information flow.

What the Schouten experiment does is to examine the probablity of perceiving the correct light at a time when the perceived difference in brightess of the two lights is comparable to the noise level in perceiving the brightness relationship. That noise level can be very small, however, in comparison with the final perceived brightness difference. If we look at relationship signals a hundred or more milliseconds further delayed, there is essentially no noise in them.

That the integrated noise level becomes very small in comparison with the final perceived brightness difference is the very reason that I wanted to calculate the information rate rather than the uncomputable residual uncertainty at long delay, so I agree with the second sentence. I explained that in the message to which you are responding, and gave examples in which the discrimination was of location separation between dots over separations of inches seen at normal reading distance. To equate "essentially no noise" with "no noise" is a real sticking point. You assert "essentially no" is equal to zero, and then say:

No further information is being transmitted,

which does follow from your equation of "very small" with "zero", but does not follow from the facts of the situation.

Ninety-five percent of the uncertainty is removed in the first 50 milliseconds after the initial rise,

You have no measure of that. From the data, there is nothing from which you could estimate what percentage of the uncertainty is removed in 50 msec.

I just showed you that measure.

You did no such thing. So far as I can see, you seem to take "proportion of uncertainty" to be the proportion of the range between 50% and 100% correct, so that by the time the subject is getting 75%, half the uncertainty has been removed. Except that can't be it, because you say 95% is removed in 50 msec, when by eye it looks to me as though the percentage correct is around 80-85%, so I am quite uncertain as to what you must mean. Either way, it's Humpty-Dumpty language in a discussion based around Shannon uncertainty measures. If I "define" a step to be roughly a quarter million miles, I can get to the moon in one step. But that doesn't much help NASA.

You get it by looking at the measured fraction of correct guesses for a given delay in making the guess. That can be taken as the probability of making a correct guess, and given that probability you can use the statistical tables to find out how many standard devations above the noise level the relationship signal is at that delay. You don't know the actual signal level or noise level, but you know their ratio. This assumes a
Gaussian distribution, but if you assume a different distribution you can do a similar calculation using it, or you can simply measure the distribution and use that.

This is why I asked earlier if the basic data taken during this experiment is the fraction of correct guesses at each delay. If it is, then you have the probability of a correct guess right away as a function of delay. Given that probability and a Gaussian distribution, you can determine the relative standard deviation of the noise and the signal-to-noise ratio. The linear relationship between delay and d'^2 (which you said was the standard deviation) is simply what you would expect if the distribution were Gaussian.

You can't derive the linear relation over time from an assumption that the noise distribution is Gaussian. The linearity of the relation comes about from the very particular way that the standard deviation of the distribution (if it is Gaussian) changes over time -- the inverse of its square varies linearly with time. That "particular way" can be derived from an assumption that the noise contributions on successive samples are of equal variance, the sample values are independent of each other, and all samples contribute equally to the reduction in the overall variance (where the separation of samples is 1/2W, and W is the bandwidth), whether the distribution is Gaussian or not. That "sampling" notion is an intellectual convenience, not a proposal that the sensory systems actually do sample. Shannon shows how this intellectual convenience applies to continuous signals.

I suspect that d'^2 is calculated from the fraction of correct guesses in a way that uses the probability tables the way I am doing.

Of course it is. I told you it was. And it keeps growing as the proportion correct goes from 90% to 99% to 99.9% to 99.9999% .... You have in these data absolutely zero indication of what proportion of the uncertainty is removed in 50 msec, or in 100 msec, because you have absolutely nothing in the data to tell you what the eventual limiting uncertainty might be. The only thing these data show is that the rate of information gain is constant over the range of delays measured, after the initial transport lag (and gives a measure of what that rate is).

Of course, we presume that there would be a time after which there would be no point in continuing to gain information, because for control one usually doesn't need micrometric precision, but we have no indication of it in these data. I suggested in an earlier message that this point might come after about 130 msec, based on a fallible memory of long ago experiments. In the case of a binary choice experiment, for all practical purposes there wouldn't be much point in waiting until you were sure of making less than one error in a billion trials -- indeed, Schouten mentions that a good part of the training with the bips was needed in order to get subjects to withhold their responses for long delays (he went up to 800 msec, but the results for the long delays aren't reported because there were too few errors to allow a reliable (or any) calculation of d'). People just don't want to wait that long to make their decision.

Martin

[Martin Taylor 2009.03.15.10.02]

[Martin
Taylor 2009.03.14.23.17]

[From Bill Powers (2009.03.14.1703 MST)]

Martin Taylor 2009.03.14.17.04

    What I thought I had shown was that the

measurement of perceptual information in the first fraction of a second
after the light goes on does not treat the perceptual signal as a
continuing analog measure.

I could see no connection between what you discussed an the rate of
information transfer.

I know you can’t, but I see one. We’re still not talking about the same
thing.

from the rest of your message, we most decidedly are not.

I think I have to apologise to you. I’ve been blind to what should have
been apparent to me about where you have been coming from. This morning
I awoke with a possible insight as to why we seem to be talking past
one another. It has to do with information always being about
something. We have been thinking about different “somethings”.

I’ve been working on the assumption that you had read the relevant
portion of my Bayesian notes, and without thinking further about it,
I’ve been assuming that you understood that the information being
transmitted was about the lights. That was always my background
assumption throughout the discussion, since I have been assuming that
you had read the relevant part of my Bayesian Seminar 2, as I asked
some time ago. But now, this morning, I think maybe you have been
thinking of the information transmitted only about the location of the
lit light, a one-bit discrimination. That would fit your talk about
“half the information”, a concept that baffled me in the context of the
information being about the lights rather than only about the location
of the lit light.

Here’s a graph of the uncertainty of a one-bit choice as a function of
the prior probability of one of the choices being correct
(Sum(-plog2(p) - (1-p)log2(1-p))). To make this concrete, let’s say
that if the subject already had a probability x of choosing the lit
light, her remaining uncertainty would be y bits. Flip the chart upside
down to get the information already received. The right panel is just a
magnification of the high-probability part above 95% correct.

When the subject gets 89% correct, the remaining uncertainty is 0.5
bits, and the subject has gained half the information that could be
eventually available about the location of the lit light. When the
subject has obtained 95% of the available information about the
location of the light, she can get about 99.45% correct.

I think this may be what you have been thinking of, while I have been
thinking of the steady availability of information about the lights,
which is about much more than just the choice of which of the two is
lit.

Does this seem right? Are our views reconciled, or at least nearer to
being so?

Martin

[From Bill Powers (2009.03.15.0720 MDT)] –

Martin Taylor 2009.03.14.23.17 –

I’m trying to discuss Shannon’s
information and uncertainty measures, and you seem determined that this
should not be discussed, by substituting your own definitions for
Shannon’s, and then making inferences based on your definitions rather
than his, which makes discussion rather difficult.

I’m not intentionally changing Shannon’s definitions. We are simply
imagining the physical situation differently – using different models of
what may be going on, prior to any application of information
theory.
I am seeing a perceptual input function that receives inputs, via
intervening lower-level input functions, from two light sources that turn
on and off. It reports, as a signal, the unbalance between the lower
signals. This higher PIF contains a constant Gaussian noise source of
unspecified but unchanging magnitude that is always present. The
noise does not vary with magnitudes of the input signals. It’s the
background hiss in the receiver.

As I imagine this system, the presence of a constant noise signal means
that with no light on, sampling the relationship signal at any fixed time
will generate a signal equally likely to indicate “left” or
“right”. With continued repetitions, the proportion of correct
guesses to total guesses will approach 50%. As the samples are delayed
more and more after a light turns on, this long-term average will remain
50% until the sample delay equals the transport lag.

When the delay exceeds the transport lag, the noise-free component of the
left-right signal begins to rise in the correct direction (separate left
and right signals would be needed in a nervous system). For small added
sample delays the noise component still dominates. As the sample delay
increases more, the systematic component increases, and we observe (over
many repetitions at the same sample delay) a bias in the guesses toward
the correct direction that increases with sample delay. The probability
of a correct guess begins to increase. At a sampling delay such that the
systematic signal is twice the standard deviation of the (constant)
noise signal, the long-term fraction of correct guesses has become
95%.

In the Schouten data, we can see that the transport lag for subject B is
about 230 milliseconds. That is where the fraction of correct guesses
begins to rise above 50%. Had the data been plotted as percent correct
guesses versus sampling delay, we would have seen a negative-accelerated
curve that approaches asymptote very soon after the initial rise above
50% – it would be 99% in less than 50 milliseconds after the transport
lag. Actually, the original data would have been the fraction of correct
guesses at each delay, with the value of d’ or d’^2 being calculated from
that.

A different assumption would be that the noise level rather than being
constant is Poisson-distributed, proportional to the square root of
signal level. In that case the signal-to-noise ratio would increase as
the square root of the signal rather than linearly as in the above, and
the approach to asymptote would be slower. So there is some possibility
of deducing the distribution from knowledge of the probability of a
correct guess.

As you can see, this is a purely physical analysis based on a model, and
says nothing about information. You can use the Shannon formulation if
you like, but it’s not necessary for characterizing the performance of
this model.

You have a habit of
talking about very small things as though they are zero, which makes
discussion rather difficult.

Funny, I was noticing a similar tendency in you, though in the opposite
direction. It seems that when differences get smaller, you crank up the
gain so they look just as large as before, forgetting that their
importance is getting smaller each time you increase the magnification.
It’s true that 99.99% correct is not the same as 100% correct, but the
difference is not the kind that makes a difference to any normal
person.

You talk about signal to
noise ratio as though it changes during the presentation of an ongoing
signal, and that also makes discussion difficult.

What makes it difficult is that I don’t think you have understood my
model yet. I assume a constant amount of background noise, with the
signal changing magnitude relative to the noise. I see the signal as
increasing with time after the initial transport lag, while the noise
level remains the same. This means that there is less relative random
error in a large signal than in a small one, and I treat errors of less
than a few tenths of a percent as negligible, as any good engineer
would.

You use your own
definition of “uncertainty” to dispute the calculations based
on the technical definition, and that makes discussion difficult. And
from all that you the derive the preposterous statement

If control depended on information flow, it would not be possible to do
the task I describe, because that flow would drop rapidly to zero long
before the match within some criterion is achieved (as I understand you
to be measuring information flow).

What you haven’t seen here is that I am assuming some small constant RMS
noise which matters only when the systematic component of signal is of
comparable size, a few milliseconds after the transport lag is finished.
As the systematic part of the signal increases after that point,
snapshots of the total signal at the same time delay will record more
signal but the same noise, so signal-to-noise ratio will increase as we
look at points farther to the right. It also increases during each trial,
because the noise is constant but the signal is increasing as time
progresses.

I just showed you that measure.

You did no such thing.

Well I did so, and nyaa nyaa to you.

So far as I can see, you seem to
take “proportion of uncertainty” to be the proportion of the
range between 50% and 100% correct, so that by the time the subject is
getting 75%, half the uncertainty has been removed.

No. I take the uncertainty to be 10% when 90% of the guesses are right,
1% when 99% of the guesses are right, and so on.

Except that can’t be it,
because you say 95% is removed in 50 msec, when by eye it looks to me as
though the percentage correct is around 80-85%,

You’re not reading the figure right, or I have it scaled wrong. The
ordinate is not percentage correct, it’s that measure d’^2, which you say
is the standard deviation of the noise. In 50 milliseconds, d’^2
increases by about 3 standard deviations, showing that the signal
increases to 3 SD above the chance level, which, my statistical tables
says, means that the fraction correct must be around 99.73%. By my count
that leaves an uncertainty of only about 0.27%. Maybe I misunderstood
what you said about the scaling of the ordinate. In that case, you can
substitute the correct numbers. They won’t be drastically different from
mine, since the probability of an error drops so fast with each added
standard deviation.

You can’t derive the linear
relation over time from an assumption that the noise distribution is
Gaussian.

I’m assuming a perceptual signal that rises linearly after the initial
transport lag, not deriving it. That signal will lead to a straight line
on the Schouten diagram if the distribution is Gaussian.

The linearity of the relation
comes about from the very particular way that the standard deviation of
the distribution (if it is Gaussian) changes over time – the inverse of
its square varies linearly with time. That “particular way” can
be derived from an assumption that the noise contributions on successive
samples are of equal variance, the sample values are independent of each
other, and all samples contribute equally to the reduction in the overall
variance (where the separation of samples is 1/2W, and W is the
bandwidth), whether the distribution is Gaussian or not. That
“sampling” notion is an intellectual convenience, not a
proposal that the sensory systems actually do sample.

I think you’re just describing the same relationships in my statistical
tables that I use, which show the probable occurrance of a deviation as a
function of the ratio of that deviation to a standard deviation. In the
Schouten experiment, the probable occurrance is measured, and from that
you deduce how many standard deviations above noise that is, assuming a
Gaussian distribution. I am taking the ordinate in the diagram as the
computed standard deviation, and using the tables the other way to
compute what the percent correct over a large number of trials must have
been.

My hypothetical perceptual signal has a certain magnitude as a function
of time: constant at zero, then rising linearly at least at first. If you
measure its amplitude in units of standard deviation of the noise, which
is just a scalar number, it would lie on top of the Schouten data (or it
would show up as a fuzzy cloud of curves with an average value
corresponding to the Schouten data).

I suspect that d’^2 is
calculated from the fraction of correct guesses in a way that uses the
probability tables the way I am doing.

Of course it is. I told you it was. And it keeps growing as the
proportion correct goes from 90% to 99% to 99.9% to 99.9999% … You
have in these data absolutely zero indication of what proportion of the
uncertainty is removed in 50 msec, or in 100 msec, because you have
absolutely nothing in the data to tell you what the eventual limiting
uncertainty might be.

Well, that one really escapes me. In your series, it looks to me as if
the uncertainty is decreasing from 10% to 1% to 0.1% to 0.00001% …, and
it looks as if the eventual limiting uncertainty is going to be, well,
just roughly speaking, zero.

The only thing these data show
is that the rate of information gain is constant over the range of delays
measured, after the initial transport lag (and gives a measure of what
that rate is).

I think that makes no sense at all. At a constant rate of information
gain, you’re saying, the rate at which we approach the final answer gets
slower and slower without limit. I just can’t see that in going from 90%
right to 99% right, we are gaining the same amount of information that we
gain in going from 99% right to 99.9% right. If that is what information
theory says, then the word “information” in that context means
something different from what it means to me.

Of course, we presume that there
would be a time after which there would be no point in continuing to gain
information, because for control one usually doesn’t need micrometric
precision, but we have no indication of it in these
data.

What you’re saying is that as we gain in precision, information becomes
rapidly less useful in moving us toward the exact answer. I suspect that
you’re mentally viewing this situation on a logarithmic scale, which
makes the shrinking uncertainty into a constant ratio. In that sort of
presentation you can never get to zero error, and it always looks as if
you still have as much uncertainty as you had before. Well, come to think
of it, information is defined as a log function, isn’t it? That may
explain your inability ever to admit that some differences are too small
to matter.

I suggested in an earlier
message that this point might come after about 130 msec, based on a
fallible memory of long ago experiments. In the case of a binary
choice experiment, for all practical purposes there wouldn’t be much
point in waiting until you were sure of making less than one error in a
billion trials – indeed, Schouten mentions that a good part of the
training with the bips was needed in order to get subjects to withhold
their responses for long delays (he went up to 800 msec, but the results
for the long delays aren’t reported because there were too few errors to
allow a reliable (or any) calculation of d’). People just don’t want to
wait that long to make their decision.

OK, so my understanding is confirmed: the experimental data are in units
of

fraction of correct responses (or fraction of errors). If you showed
plots with that as the ordinate, you would get a curve approaching an
asymptote (100%), not a straight line. The “discriminability”
measure introduces the Gaussian distribution which has a form such that
one unit of discriminabilty is proportional to one unit of signal
amplitude relative to noise level. My model says that there is a
perceptual signal that increases linearly with time after the transport
delay, and that with a constant noise level this means that the signal
increases a certain number of standard deviations above the noise, at the
rate of about 1 standard deviation in 17 milliseconds. By requiring
subjects to report the unbalance of light intensities at specific delays,
the experiment effectively records the signal magnitude at specific
delays after onset of the light. The signal magnitude is deduced from the
percent correct guesses, plus the assumed distribution.

The linearity of the response as plotted in Fig. 1 depends entirely on
the assumed form of the noise distribution. If the noise is
Poisson-distributed, signal-to-noise ratio will be the square root of
signal magnitude, whereas with constant Gaussian noise it will be
proportional to signal magnitude. Maybe that’s the difference between
using d’ and using d’^2. Of course it may be that in computing the
measure each way, you get a straight line but of different
slope.

I’m sure our onlookers are getting terribly bored with this, but they
don’t have to read it, do they?

Best,

Bill P.

[From Bill Powers (2009.03.15.1112 MDT)]

Martin Taylor 2009.03.15.10.02 --

I've been working on the assumption that you had read the relevant portion of my Bayesian notes, and without thinking further about it, I've been assuming that you understood that the information being transmitted was about the lights. That was always my background assumption throughout the discussion, since I have been assuming that you had read the relevant part of my Bayesian Seminar 2, as I asked some time ago. But now, this morning, I think maybe you have been thinking of the information transmitted only about the location of the lit light, a one-bit discrimination. That would fit your talk about "half the information", a concept that baffled me in the context of the information being about the lights rather than only about the location of the lit light.

Sorry, Martin, that gets us nowhere. I don't understand what "information about the lights" you mean, if it's not which one is lit. The manufacturer, the price, the difference in brightness, the distance between them, the part number ...?

As to the idea of a prior probability of choosing the lit light, I don't see how that can be anything other than 0.5, since either light can come on and there is no way to predict which one. If there is a subjective probability other than 0.5, it's a delusion, and makes no difference in the chances of picking the next light correctly. You'll lose at roulette playing rouge e noir at the same rate no matter what you think your chances are.

I'm afraid that there is something about the whole Bayesian approach that I simply don't get -- or with which I disagree so thoroughly that I can't believe it's what is actually proposed.

I have to spend most of today that remains getting ready for my trip starting tomorrow. I'll have my laptop but have no idea whether there will be time or facilities for pursuing this.

re my just previous post:
Measuring uncertainty in bits does make it a log measure, so if the same percentage error remains after every improvement, there is no decrease in uncertainty no matter how close to zero the error in ordinary units is. Didn't someone named Xeno come up with this one a long time ago?

Best,

Bill P.

[Martin Taylor 2009.03.15.13.38]

[From Bill Powers (2009.03.15.1112 MDT)]

Martin Taylor 2009.03.15.10.02 –

I've been working on the assumption that you

had read the relevant portion of my Bayesian notes, and without
thinking further about it, I’ve been assuming that you understood that
the information being transmitted was about the lights. That was always
my background assumption throughout the discussion, since I have been
assuming that you had read the relevant part of my Bayesian Seminar 2,
as I asked some time ago. But now, this morning, I think maybe you have
been thinking of the information transmitted only about the location of
the lit light, a one-bit discrimination. That would fit your talk about
“half the information”, a concept that baffled me in the context of the
information being about the lights rather than only about the location
of the lit light.

Sorry, Martin, that gets us nowhere.

Oh well, I tried, and I still think this is where we have a
miscommunication.

I don’t understand what “information about the lights” you
mean, if it’s not which one is lit. The manufacturer, the price, the
difference in brightness, the distance between them, the part number
…?

It means anything that could be perceived about the lights as a
consequence of one of them being lit that could not be perceived about
them when neither was lit. The choice of which was lit is only one
possibility. It’s impossible to enumerate the potentially infinite
variety of others, but some possibilities might include the sparkle
pattern due to the cover glass, the distance between them (which you
mentioned), the colour, the shape of the cover glass, …

As to the idea of a prior probability of choosing the lit light, I
don’t see how that can be anything other than 0.5, since either light
can come on and there is no way to predict which one.

I must have been unclear. The prior probability at delay zero is indeed
50%, but at delay 250 msec it has risen a bit, and by 300 msec there
isn’t much uncertainty left. The prior at any particular moment depends
on all the data gathered up to that moment.

I’m afraid that there is something about the whole Bayesian approach
that I simply don’t get – or with which I disagree so thoroughly that
I can’t believe it’s what is actually proposed.

I get the feeling you are looking for complication when what exists is
clean and simple. There just isn’t any complexity in the Bayesian
approach beyond the everyday observation that if you have some notion
why a particular event should occur and it does, you feel confirmed in
your reasoning, whereas if it doesn’t, you feel your reasoning must
have some flaw. Example: you see a big black cloud that seems to be
moving your way, and you reason that it should start raining soon. If
it doesn’t, you might consider another possibility, such as perhaps
that it was a cloud of smoke rather than a raincloud. If you then
observe that the cloud continues to be big and black in the direction
it was at first, but seems lighter and wispier in the opposite
direction, you feel more assured that it was smoke and not a raincloud.

If you are considering two lines of reasoning and you might be prepared
to accept either (such as raincloud or smoke), you look for some
prediction made by one and not the other, and see which more nearly
agrees with your observations. If both are reasonably close, but one is
better, you look for more observations that might distinguish them, but
this time you are a bit biased toward accepting the one that better fit
your first observation.

That’s really all there is to it, set in quantitative form.

Measuring uncertainty in bits does make it a log measure, so if the
same percentage error remains after every improvement, there is no
decrease in uncertainty no matter how close to zero the error in
ordinary units is.

What do you mean? I can’t relate this to anything I have said, or that
you have said previously. Did the curves I showed look like logarithms
of y against x to you? They don’t to me. Uncertainty is quantified as
Sigma(-plog2(p)) where p is summed over all the non-overlapping
possibilities.
from your earlier message today [From Bill Powers (2009.03.15.0720
MDT)], we are indeed imagining exactly the same physical situation, and
presumably have been all along:
----------quote--------
We are simply
imagining the physical situation differently – using different models
of
what may be going on, prior to any application of information
theory.
I am seeing a perceptual input function that receives inputs, via
intervening lower-level input functions, from two light sources that
turn
on and off. It reports, as a signal, the unbalance between the lower
signals. This higher PIF contains a constant Gaussian noise source of
unspecified but unchanging magnitude that is always present.
The
noise does not vary with magnitudes of the input signals. It’s the
background hiss in the receiver.

--------end quote-------

Good. I had been imagining you to be thinking that the background noise
magnitude was indeed diminishing with time. Your language certainly
suggested so. So that misunderstanding, at least, can be cleared up.

But this is not:

----------quote--------

A different assumption would be that the noise level rather than being
constant is Poisson-distributed, proportional to the square root of
signal level. In that case the signal-to-noise ratio would increase as
the square root of the signal rather than linearly as in the above, and
the approach to asymptote would be slower. So there is some possibility
of deducing the distribution from knowledge of the probability of a
correct guess.

---------end-quote------

You don’t have to say Poisson-distributed. You just have to say “with a
variance dependent on the signal level”. You can have
Poisson-distributed noise that is independent of the signal level, just
as you can have Gaussian noise that depends on the signal level.

Going back to a discussion earlier this year or late last year (I’m not
going to search for it) on the fine structure of the neural impulses,
the most likely distribution seems to me to be Poisson, but dependent
on the intensity of the light source, not on the amount of information
so far transmitted about the decision. However, since there are many
affected nerve fibres, the additive property would make the overall
distribution indistinguishable from Gaussian, so the distinction seems
unimportant. It would, however, suggest that the rate of information
gain (rise in d^2) implicit in Schouten’s data would be lower if the
lights were appreciably dimmer.

It is, of course, possible that later processing stages might have
noise variances that depended on the levels of their own perceptual
signals, but Schouten’s data suggest that if they do, the effect is
negligible under the conditions of the experiment.

Martin

Martin