Levels of perception (Re: PCT researcher who doesn't talk)

[From Bill Powers (2009.02.28.0943 MST)]

Martin Taylor 2009.02.28.10.46 --

[to Rick Marken] You question two of these control systems because they are "open". In past discussions, such systems have been called "fire and forget". There's nothing a gunner can do to make a shell more or less likely to hit its target after it has been launched. But the gunner can observe the fall of shot and correct the aim for the next shot. You could call this a kind of "learning".

You could also call it a control loop that uses discrete sampled variables rather than continuous ones. A slowing factor is needed for stability, which is why the aiming point is changed only in small increments on each trial. The size of the increments has to be such that the absolute error is not greater on each successive trial. Ideally it should be just large enough to correct the error, but normal variations and disturbances make it necessary to use smaller steps than the maximum ones possible for stable operation.

A different control loop that controls the perception of fall of shot affects the parameters of the "shooting" control loop, not the signal values during the execution of a single shot.

You're just describing different levels of control.

At the higher level, perceptual signals do not represent the signal values during execution of a single shot. They represent some sort of running average of the perceived fall of the shot. The gunner's mate says "We're getting closer," showing that even though the last splash has disappeared, he's still perceiving the distance to the target, in fact a series of distances ("getting closer").

I think you're overlooking a simple analog solution to this control problem. Let the perception of the fall of the shot be the running average of, say, the last three shots, computed once per shot. At the same time this average is taken, the position of the target is also perceived, and the perception represents the average distance between the average splash and the target. The error signal or (possibly two error signals, azimuth and range) causes the next aiming point to shift to reduce the distance to the target, and the next shell is fired. A simple fixed algorithm will thus gradually reduce the average distance of the splash from the target until the pattern of splashes is centered on the target or a hit is registered. This is a control system, not an open-loop system. I think this kind of control system works at the logic level -- it's a program as I've described it.

Best,

Bill P.

[Martin Taylor 2009.02.28.14.08]

[From Bill Powers (2009.02.28.0943 MST)]

Martin Taylor 2009.02.28.10.46 --

[to Rick Marken] You question two of these control systems because they are "open". In past discussions, such systems have been called "fire and forget". There's nothing a gunner can do to make a shell more or less likely to hit its target after it has been launched. But the gunner can observe the fall of shot and correct the aim for the next shot. You could call this a kind of "learning".

You could also call it a control loop that uses discrete sampled variables rather than continuous ones. A slowing factor is needed for stability, which is why the aiming point is changed only in small increments on each trial. The size of the increments has to be such that the absolute error is not greater on each successive trial. Ideally it should be just large enough to correct the error, but normal variations and disturbances make it necessary to use smaller steps than the maximum ones possible for stable operation.

A different control loop that controls the perception of fall of shot affects the parameters of the "shooting" control loop, not the signal values during the execution of a single shot.

You're just describing different levels of control.

At the higher level, perceptual signals do not represent the signal values during execution of a single shot. They represent some sort of running average of the perceived fall of the shot. The gunner's mate says "We're getting closer," showing that even though the last splash has disappeared, he's still perceiving the distance to the target, in fact a series of distances ("getting closer").

I think you're overlooking a simple analog solution to this control problem. Let the perception of the fall of the shot be the running average of, say, the last three shots, computed once per shot. At the same time this average is taken, the position of the target is also perceived, and the perception represents the average distance between the average splash and the target. The error signal or (possibly two error signals, azimuth and range) causes the next aiming point to shift to reduce the distance to the target, and the next shell is fired. A simple fixed algorithm will thus gradually reduce the average distance of the splash from the target until the pattern of splashes is centered on the target or a hit is registered. This is a control system, not an open-loop system. I think this kind of control system works at the logic level -- it's a program as I've described it.

I have no real comment on this, other than to say that I think you offer possible explicit mechanisms to do what I said must be done in a higher-level control system whose output changes the parameters of the "shooting" system, as opposed to being done within the "fire and forget" shooting control loop itself. I don't think I overlooked either an analogue or a program-level mechanism. I ignored mechanism entirely, just noting that changing the gunner's aiming point is done by a control system other than the one that releases the shot.

Martin

[From Rick Marken (2009.03.01.1020)]

Sorry, had to take time off to celebrate the birthdays of two
remarkable economists who were, coincidentally, born on the same day,
February 28: Paul Krugman and myself.

Martin Taylor (2009.02.28.10.46) --

I should have said "a
single scalar variable for which the control has some influence on the
probability that the subject's button push corresponds to the light that
turns on."

Before we go on I wonder if you could clarify, again, what the data in
the Schouten study show. I am under the (perhaps incorrect) impression
that the graph you posted shows a measure of response accuracy (d') as
a function of the delay between the onset of a light and the third of
a sequence of beeps with which subjects are supposed to synchronize
their responses. Most importantly , I am under the impression that the
measure of response accuracy (d') measures accuracy only in terms of
how often the subject presses the right (and wrong) lighted button.

The d' measure with which I am familiar is a function of the
proportion of hits and false alarms in the experiment. In the Schouten
experiment I would imagine that a hit occurs when pressing the right
button (Pr) occurs after the right button is illuminated (Lr) or when
when pressing the left button (Pl) occurs after left button is
illuminated (Ll). So the proportion of hits, p(H), is
p(Pr|Lr)+p(Pl|Ll). False alarms occur when the opposite button to the
one illuminated is pressed. So p(FA) = p(Pr|Ll)+p(Pl|Lr). d' = f(p(H),
p(FA)), the function, as I recall, being the difference standard
normal deviates corresponding to p(H) and p(FA).

So I understand the graph of the Schouten data to show nothing about
the relative timing of the third beep and the press, unless the d'
measure is somehow scaled in terms of that relative timing. If it is,
please explain how. Otherwise, I am assuming that the only trials
included in the d' measure shown in your graph are those on which the
subject's button presses were synchronous with the third beep, at
least to the experimenter's satisfaction.

Is this right?

Best

Rick

···

--
Richard S. Marken PhD
rsmarken@gmail.com

[Martin Taylor 2009.03.01.22.59]


[From Rick Marken (2009.03.01.1020)]
Sorry, had to take time off to celebrate the birthdays of two
remarkable economists who were, coincidentally, born on the same day,
February 28: Paul Krugman and myself.

Well, belated happy birthday, and may your joint endeavours speed the
economic recovery :slight_smile:


Martin Taylor (2009.02.28.10.46) --

I should have said "a
single scalar variable for which the control has some influence on the
probability that the subject's button push corresponds to the light that
turns on."
Before we go on I wonder if you could clarify, again, what the data in
the Schouten study show. I am under the (perhaps incorrect) impression
that the graph you posted shows a measure of response accuracy (d') as
a function of the delay between the onset of a light and the third of
a sequence of beeps with which subjects are supposed to synchronize
their responses.

Yes, but I don’t usually think of d’ as a measure of response accuracy,
since response accuracy is the raw data and d’ the derived measure.

Most importantly , I am under the impression that the
measure of response accuracy (d') measures accuracy only in terms of
how often the subject presses the right (and wrong) lighted button.

What is the thought behind that word “only”? The choice of button press
is indeed the only output the subject makes that is related to which
light the subject perceives to have been lit, but your use of “only”
suggests that you think the d’ measure should include some other data,
and I can’t guess where that other data might come from.

The d' measure with which I am familiar is a function of the
proportion of hits and false alarms in the experiment.

That’s an inferior type of experiment, and not the kind of experiment
Schouten did. It’s inferior because the calculation of d’ depends quite
nastily on the prior knowledge the subject has on exactly what form the
signal will have if it is present, and on the subject’s criterion for
saying “yes I saw it”. It’s only in the case of what Tanner and
Birdsall called SKE (Signal Known Exactly) that the assumptions of
equal variance for noise alone (N) and signal plus noise (SN) apply,
and when they don’t apply, the ROC curve changes shape. It becomes
asymmetric and approaches a threshold-like form. This isn’t just a
theoretical problem, but a real-life one, as we demonstrated in a
series of auditory detection studies (we provided waveform shape
information in the ear opposite to the one that might contain the
signal to create the SKE situation; by doing that, the SKE case could
give detection levels as much as 10-15db better than the normal “did
you hear it” situation).

This kind of problem is largely avoided in the two alternative forced
choice (2AFC) experiment, because the asymmetries in the two
alternatives balance one another. It is all discussed in the second of
the two handwritten Bayesian seminar papers I put on my web site a few
weeks ago (linked from ).
Schouten’s experiment is 2AFC, so the d’ values are more likely to be
reliable than they would be for a hit-and-false-alarm kind of
experiment.
I assume that Schouten did one of three things: (1) binned the results
according to when the button was actually pressed, or (2) trained the
subjects so that they were pretty accurate with their synchrony to the
third button, or (3) recorded the results according to when the third
bip actually happened, and assumed the subjects were accurate enough
for his purposes. The time markers are separated by only about 20 msec,
so I don’t know which of these was the case. It probably says in
Schouten’s paper in “Attention and Perfoemance 1”, which I have
somewhere – probably in one of about 20 cardboard boxes into which my
stuff was packed in a series of office downsizings over the last
decade. So the graph does show the relative timing of the third beep
and the press in one way or another, but I don’t know precisely in
which of the possible ways.
I very much doubt that Schouten would have dropped any data, which is
the implication I get from your “the only trials”. But I don’t know at
first hand.
For the purposes of continuing to correct our misunderstandings about
control systems and the analysis of component pathways, the important
thing is still that the button presses are assumed to result from the
subject’s best effort to control the match between the light perceived
and the answer that the experimenter has told the subject is
appropriate for each light. The timing results really have no bearing
on that aspect. Where the timing results matter is that they provide a
measure of the channel capacity of the pathway marked in red on most of
my diagrams – from sensing the light to the output of the perceptual
function that creates the interpreted category for the location
perception.
Martin

···

http://www.mmtaylor.net/Academic.html

So I understand the graph of the Schouten data to show nothing about
the relative timing of the third beep and the press,

I am assuming that the only trials
included in the d' measure shown in your graph are those on which the
subject's button presses were synchronous with the third beep, at
least to the experimenter's satisfaction.
Is this right?

Hi, Denny --

I'm trying these two papers out on you first. One is about economics,
the other is about astronomy. The one on astronomy grabs me -- I
think it could be done using my little star-tracking mount and a
computer-controllable camera with a fast telephoto lens. Got a
dark-sky location somewhere nearby?

Love,

Dad

StarPhotometry.doc (15 KB)

Adversarial.doc (30.5 KB)

[From Bill Powers (2009.03.02.0237 MST)]

Sorry about that last post with the two attachments. I don't know how
I managed to send it to CSGnet. Of course any comments are welcome.

Bill P.

StarPhotometry.doc (15 KB)

Adversarial.doc (30.5 KB)

[From Dick Robertson, 2009.03.02.1950CDT]

Happy belated birthday Rick

Best,

Dick R

[From Rick Marken (2009.03.02.2230)]

Martin Taylor (2009.03.01.22.59)--

Rick Marken (2009.03.01.1020)--

Before we go on I wonder if you could clarify, again, what the data in
the Schouten study show. I am under the (perhaps incorrect) impression
that the graph you posted shows a measure of response accuracy (d') as
a function of the delay between the onset of a light and the third of
a sequence of beeps with which subjects are supposed to synchronize
their responses.

Yes

Good, that's what I thought.

Most importantly , I am under the impression that the
measure of response accuracy (d') measures accuracy only in terms of
how often the subject presses the right (and wrong) lighted button.

What is the thought behind that word "only"?

I realized that there were two variables that the subject was asked to
control in this experiment: 1) synchrony of a button press with the
3rd bip and 2) match of button press to button light. And then I also
realized that only the ability to control variable 2, the match, is
measured by d'. Assuming that Schouten would only include trials in
the d' measure when the press was synchronous with the 3rd bip, then
at all delays between light and 3rd bip for which d' was measured, the
subject was pressing the button (right or wrong) in synchrony with the
light. This suggests to me that the difference in ability to control
variable 2 (as measured by d') as a function of light/3rd bip delay is
a result of perceptual processing rather than comparison or response
production time, just as in my perceptual hierarchy demo. So the fact
that a delay of about 350 msec is needed for nearly perfect control of
variable 2 (the light/press match) suggests that the variable
controlled in this experiment is, indeed, at an intermediate level
between transition (which in my study tool a frame rate of 250
msec/frame) and sequence (which in my demo takes about 500 msec/frame.

For the purposes of continuing to correct our misunderstandings about
control systems and the analysis of component pathways, the important thing
is still that the button presses are assumed to result from the subject's
best effort to control the match between the light perceived and the answer
that the experimenter has told the subject is appropriate for each light.

Yes, I agree. And control of this variable is nearly perfect (for most
subjects) when the delay between the elements of this perception
(light and press) is about 350msec, just as control of a transition
(apparent movement clockwise or counterclockwise) is nearly perfect
when the delay between the appearance of one element of movement and
another is 250 msec or just as control of a sequence is nearly
perfect when the delay between the appearance of each element of the
sequence is 500 msec

The timing results really have no bearing on that aspect.

But in the graph you presented the timing results do seem to have a
lot to do with how well the subject controls the match between light
and press. At a small delay between light and 3rd bip (which coincides
with the press) there is almost no control; d' is nearly 0. At a delay
of 350 msec between light and press (3rd bip) control is nearly
perfect; the subject is always pressing only the button that was lit.

Where the timing
results matter is that they provide a measure of the channel capacity of the
pathway marked in red on most of my diagrams -- from sensing the light to
the output of the perceptual function that creates the interpreted category
for the location perception.

What you call "channel capacity" is what I would call "time required
to compute a perception of that type" (whatever the type is). So,
again, I think we are in complete agreement, we're just using
different words.

Best

Rick

···

--
Richard S. Marken PhD
rsmarken@gmail.com

[From Rick Marken (2009.03.02.2232)]

Dick Robertson, 2009.03.02.1950CDT]

Happy belated birthday Rick

Thanks, Dick!

Best

Rick

···


Richard S. Marken PhD
rsmarken@gmail.com

[From Bill Powers (2009.03.04.0237 MST)]

Rick Marken (2009.03.02.2230) –

So the fact

that a delay of about 350 msec is needed for nearly perfect control
of

variable 2 (the light/press match) suggests that the variable

controlled in this experiment is, indeed, at an intermediate level

between transition (which in my study tool a frame rate of 250

msec/frame) and sequence (which in my demo takes about 500
msec/frame.

I see your reasoning. This suggests that relationships are the main level
involved in the timing, which makes sense. Other levels, however, are
also involved because of having to recognize the presentation, and there
is uncertainty in how long it takes for the output to be generated and to
bring the force on the button to the level needed to close the
contact.

The timing results really have no bearing on that aspect.

But in the graph you presented
the timing results do seem to have a

lot to do with how well the subject controls the match between light

and press. At a small delay between light and 3rd bip (which
coincides

with the press) there is almost no control; d’ is nearly 0. At a
delay

of 350 msec between light and press (3rd bip) control is nearly

perfect; the subject is always pressing only the button that was
lit.

Where the timing

results matter is that they provide a measure of the channel
capacity of

the pathway marked in red on most of my diagrams – from sensing the
light > to the output of the perceptual function that creates the
interpreted

category for the location perception.

What you call “channel capacity” is what I would call
"time required

to compute a perception of that type" (whatever the type is).
So,

again, I think we are in complete agreement, we’re just using

different words.

I agree. Channel capacity is a theoretical concept somewhat removed from
the actual situation (the channel capacity is being measured only in
short bursts of information so there is no way to know what the
steady-state capacity would be). And the forced-choice format is somewhat
artificial, too: I would probably be a bad subject who resists making a
choice when I am too uncertain to believe I know which light went on. My
effective channel capacity, in normal circumstances, would be
considerably lower than what this experiment would suggest.
Of course that might be of interest to a geriatric psychologist, and I
leave the ambiguity of that one unresolved.
As a circuit-oriented theoretician I view this experiment as a clever way
of estimating the rise-time of a perceptual signal. The 1/e time would
seem to be about 50 milliseconds.
One point: in judging level in the hierarchy on the basis of reaction
times, it’s essential to look only for the shortest reaction
times. A task at any level can involve longer reaction times depending on
the complexity of the task and the properties of the comparator, output
function, and environmental feedback function: just suppose that the
button is on a lever with a damper connected to it (like a screen door
closer). Any level from transitions upward can involve very long times,
like the length of the event called Beethoven’s 5th Symphony, or getting
a college degree. Only the simplest tasks can reveal the actual minimum
reaction time. The Schouten experiment is not the simplest possible
task.
When I measured reaction times as the delay between a mechanical
disturbance and the first appearance of electromygraph responses in
pronator teres, I came up with about 50 milliseconds (back in the
1950s). Of course that was much shorter than what would be measured by
looking for movement of the hand or a contact closure of a switch. It
showed that some rather large portion of reported reaction times was
determined mainly by Newton’s laws of motion rather than delays in the
nervous system.

On that basis, I would measure the first reaction time in the Schouten
experiment at about 210 milliseconds, the intercept of the one-subject
plot with the time axis. That is when the perceptual input function
begins to respond to inputs from below, which implies that this is how
long it takes for the lower levels to start providing those inputs after
the light goes on.

Then there is another reaction time, less well-defined, which involves
the rise time of the higher perceptual signal measured in the Schouten
experiment. At what time should we say that the perceptual signal exists?
When half the responses are correct? 95% of them (2-sigma)? The latter
would occur 50 milliseconds after the shortest reaction time deduced from
the data. But that is smaller than our uncertainty about how long it
takes after getting a correct identification for the contact under the
button to be closed. I think we’re exceeding the discriminative power of
this experiment (and also of the concept of reaction time).

Best,

Bill P.

[Martin Taylor 2009.03.04.16.05]

[From Rick Marken (2009.03.02.2230)]
Martin Taylor (2009.03.01.22.59)--
Rick Marken (2009.03.01.1020)--
Most importantly , I am under the impression that the
measure of response accuracy (d') measures accuracy only in terms of
how often the subject presses the right (and wrong) lighted button.
What is the thought behind that word "only"?
I realized that there were two variables that the subject was asked to
control in this experiment: 1) synchrony of a button press with the
3rd bip and 2) match of button press to button light. And then I also
realized that only the ability to control variable 2, the match, is
measured by d'. Assuming that Schouten would only include trials in
the d' measure when the press was synchronous with the 3rd bip, then
at all delays between light and 3rd bip for which d' was measured, the
subject was pressing the button (right or wrong) in synchrony with the
light. This suggests to me that the difference in ability to control
variable 2 (as measured by d') as a function of light/3rd bip delay is
a result of perceptual processing rather than comparison or response
production time, just as in my perceptual hierarchy demo.

I agree. At delays for which the straight line increase of d’^2 is
valid, the differences in d’^2 are a result of perceptual processing.
Delays less than about 230 msec (for Subject B – it’s different for
others, according to the graph) can be thought of as transport lag,
which could include several different things: the “button-press”
control system processing, physical dynamics of finger motion, neural
signal time … and not least, possibly a fixed lag in addition to the
differential processing in the “red” path from lights to the
match-control loop input.

So the fact
that a delay of about 350 msec is needed for nearly perfect control of
variable 2 (the light/press match) suggests that the variable
controlled in this experiment is, indeed, at an intermediate level
between transition (which in my study tool a frame rate of 250
msec/frame) and sequence (which in my demo takes about 500 msec/frame.

You may well be right, but I think I need a more detailed analysis of
the control systems and pathways that model your data before I accept
that you must be. To me, this equivalence is suggestive, no more. One
issue is that Schouten’s experiment was done only with one brightness
level of the lights. I’d be willing to bet that if the experiment were
redone with lights brighter or dimmer than used by Schouten, or with
patches that had a clear demarcation or no demarcation, or with a
single patch that lit with different colours, one would still get
straight lines, but with quite varying slopes for the different
conditions. The level of the perception would be the same in a diagram
(possibly not for the colours), but the time to a sufficiently high
value of d’ (say 2.5) would be very different.

Where the timing
results matter is that they provide a measure of the channel capacity of the
pathway marked in red on most of my diagrams -- from sensing the light to
the output of the perceptual function that creates the interpreted category
for the location perception.
What you call "channel capacity" is what I would call "time required
to compute a perception of that type" (whatever the type is). So,
again, I think we are in complete agreement, we're just using
different words.

Maybe, but I’m not sure yet. “Time required to compute a perception of
that type” could be translated as “time required to reach d^ = (say)
2.5”. That is equivalent to “Time required to acquire X bits of
information about the perception”. If the transport lag did not come
into it, then dividing X by the time in question (about 330 msec for
subject B) would indeed be the channel capacity. The channel capacity
would be X/0.33 or 3X. But transport lag does come into it, and the
time that enters the channel capacity calculation has to be measured
from the start of the rising straight line, or around 230 msec. That
being the case, the channel capacity would be 10
X. The question is
“what is X?”

The argument in my “BayesianSeminar_2” gives d’^2 = 8ln(2) * U, where U
is the information that could be carried by this channel regarding the
differences between the two most discriminable members of an ensemble
that contained entities all equally discriminable from some signal half
the size of the entity in question. In this case, the two most
discriminable members are “left on - right off”, and “left off - right
on”. For such an ensemble the information about the differences between
any one ensemble member and the null state is U = d’^2/2ln(2) =
1.39*d’^2. For d’ = 2.5, U = 8.8, and the channel capacity would be 88
bits/second. (In the paper, we said that these same data, using the
same analysis gave 140 bits/sec. I don’t know where the difference
comes from; either I must have miscalculated or we miscalculated in
1967. Perhaps my by-eye reading off the graph is inaccurate).

Does this tell you we are in agreement, or that we are not? I can’t say.

Martin

[From Rick Marken (2009.03.04.2210)]

Bill Powers (2009.03.04.0237 MST)–

I see your reasoning. This suggests that relationships are the main level
involved in the timing, which makes sense. Other levels, however, are
also involved

Of course. But the main controlled variable is the light/press relationship because that’s one that was measured.

I agree. Channel capacity is a theoretical concept somewhat removed from
the actual situation

That’s the polite way to put it;-)

As a circuit-oriented theoretician I view this experiment as a clever way
of estimating the rise-time of a perceptual signal. The 1/e time would
seem to be about 50 milliseconds.

I’m not sure that I would call it “clever” myself. I think an experiment is clever if the cleverness was intentional. I don’t think Schouten conceived of the experiment as a way of estimating the rise time of a perceptual signal corresponding to the state of the relationship between light and press. The cleverness is in the minds of those who were able to piece through the experiment and figure out that it could be seen that way.

One point: in judging level in the hierarchy on the basis of reaction
times, it’s essential to look only for the shortest reaction
times. A task at any level can involve longer reaction times depending on
the complexity of the task and the properties of the comparator, output
function, and environmental feedback function

Yes, that’s why I use the same output (mouse click) to control the configuration, transition and sequence in my hierarchy demo. But the fact is that reaction time can be taken out of play completely by just measuring the frame rate at which each perception becomes controllable. The demo is currently set up so that the frame rates are above the threshold for perceiving each type of perception. But I would like to re-write the demo so that frame rate can be adjusted until a particular perception can just be controlled at some level.

just suppose that the
button is on a lever with a damper connected to it (like a screen door
closer). Any level from transitions upward can involve very long time

Right. That’s why frame rate (the equivalent of the light-press delay in the Schouten study) is a nice way to measure the time constant for different levels of perception; reaction times are not involved. In my demo I shouldn’t have even measured reaction times; it’s the frame rate that gives the best measure of the time constant for a pereptual type, I think.

Best

Rick

···


Richard S. Marken PhD
rsmarken@gmail.com

[From Rick Marken (2009.03.04.2230)]

Martin Taylor (2009.03.04.16.05) –

I agree.

Great.

So the fact
that a delay of about 350 msec is needed for nearly perfect control of
variable 2 (the light/press match) suggests that the variable
controlled in this experiment is, indeed, at an intermediate level
between transition (which in my study tool a frame rate of 250
msec/frame) and sequence (which in my demo takes about 500 msec/frame.

You may well be right, but I think I need a more detailed analysis of
the control systems and pathways that model your data before I accept
that you must be.

Me too.

One
issue is that Schouten’s experiment was done only with one brightness
level of the lights. I’d be willing to bet that if the experiment were
redone with lights brighter or dimmer than used by Schouten, or with
patches that had a clear demarcation or no demarcation, or with a
single patch that lit with different colours, one would still get
straight lines, but with quite varying slopes for the different
conditions.

I don’t think the slopes are what matters; it’s the delay (frame rate) where control becomes possible – ~350 msec in this case – that’s important (to me, anyway). And if the variable controlled is the light-press relationship then I think neither light intensity (as long as the light were visible), nor hue nor patchiness will affect the frame rate necessary for control of the relationship. But that’s just a prediction.

What you call "channel capacity" is what I would call "time required
to compute a perception of that type" (whatever the type is). So,
again, I think we are in complete agreement, we're just using
different words.

Maybe, but I’m not sure yet.

I’m not sure either; I was just trying to be agreeable;-)

“Time required to compute a perception of
that type” could be translated as “time required to reach d^ = (say)
2.5”. That is equivalent to “Time required to acquire X bits of
information about the perception”…

Does this tell you we are in agreement, or that we are not? I can’t say.

Well, then, I’m going to have to guess that we’re not in agreement about channel capacity because, if channel capacity has something to do with acquiring bits of information about a perception, then it doesn’t exist in my PCT view of things.

Best

Rick

···


Richard S. Marken PhD
rsmarken@gmail.com

[From Bill Powers (2009.03.05.1005 MST)]

Martin Taylor 09.03.0.4.17.42 –

Your effective channel capacity
would not be affected by your reluctance to answer until you acquired
sufficient data. What would be affected would be the apparent transport
lag, or the d’^2 you require before making a free-timed decision (the
usual form of a reaction time experiment).

But there would still be a rising d^2 after the minimum time lag, in a
forced-choice experiment. It might have a steeper slope, but there would
still be a range from “just barely willing to make a choice” to
“no problem, absolutely sure.” Well, maybe. Of course the
choices, when made, would give d’^2 some distance above the origin of the
Schouten data, so you’re probably right.

As for the forced-choice with
deadline being an artificial situation, I just ask whether you have never
found yourself having to choose between two alternative courses of action
when you would prefer to wait to get more information about the
situation. If you have never had to make a choice before getting all the
information you might have wanted, you haven’t lived!

I usually wait until I’m pretty sure, and devote my efforts before then
to making the decision unneccessary just yet. The main thing is that I
know what the outcome is quite likely to be if I decide prematurely:
wrong. If you have only a 60% chance of choosing right, you might as well
not choose, because your estimate of the chances is probably off by a
large amount, too. Of course if I get to make the same choice over and
over, the picture would be different, more like the Schouten data, I
imagine. Knowing I had a lot of chances would definitely change my
tolerance for being wrong. I’m not a gambler; knowing that the dice have
been weighted a little in my favor does not tempt me to bet my bankroll
on a few throws. Twenty-five cents, maybe.

As a circuit-oriented
theoretician I view this experiment as a clever way of estimating the
rise-time of a perceptual signal. The 1/e time would seem to be about 50
milliseconds.

I have seen no evidence of an exponential curve in the data presented, so
I would certainly not try to venture a guess such as
that.

You’re right, and I didn’t state the case correctly. What I should have
said was that the signal rises fast enough to be clear of the noise by 2
or 3 standard devations in about 50 milliseconds. We don’t know how large
the signal would eventually get or what its 1/e rise time would be,
because (a) that depends on the light intensity, and (b) the measurements
plotted extend only to about 2.2 standard deviations (square root of 5),
and anyway the probability of a correct guess is so close to 1 by then
that it can’t really be measured in any reasonable number of trials.

Use Courier (monospaced) font for this plot:

···
 SIGNAL     
  • < ----- approx 1/e rise time at this signal
    magnitude

    *
    

–*------------------1 SD noise------------------------------

*------------------------------------------------------------

^
^
TIME AXIS ------->

50 msec, 95% probability correct

------- Threshold time of perceptual signal rise

I might, however, guess
that by 50 msec the subject had acquired half the information needed in
an ordinary “as soon as possible” reaction time experiment. In
our other experiments (about which I include a quote below), it seemed
that the rise of d’^2 fits two straight lines better than an
exponential.

Whatever the case, you can see that my hypothesis might give a different
answer for the channel capacity. If the magnitude of the perceptual
signal is as shown in the above diagram, the 1/e rise time, which is
related to bandwidth, can be much longer than the time it takes to arrive
at 95% correct choices. Bandwidth is usually stated as 0.35/T where T is
the rise-time (10% to 90%).

From a Wiki page we have this, for C = channel capacity in bits/sec, B =
single-pole bandwidth in Hertz, S/N = signal to (Gaussian) noise ratio
(in standard deviations?):
C =  B \log \left( 1+\frac{S}{N} \right)\Anyhow, it’s clear that the linear rise seen in the Schouten data
could represent just a small part of a 1 - e^-n rise in perceptual signal
after the initial detection of a light, with the actual bandwidth of the
perceptual signal being unknown. The same data would be observed for all
combinations of rise time and asymptotic magnitude of the perceptual
signal that give the same slope of the data near the origin relative to
the noise level. This means that you can’t calculate the channel capacity
of the perceptual input function, if my model is correct, without knowing
the magnitude of the perceptual signal at asymptote and its 1/e rise
time.

As a circuit-oriented
theoretician, you are probably interested in the bandwidths of different
connections in the control loop, because they affect the stability of
feedback circuits. “Channel Capacity” is a generalization of
the linear concept “Bandwidth”, applicable in linear and
non-linear situations.

Then there is another reaction
time, less well-defined, which involves the rise time of the higher
perceptual signal measured in the Schouten experiment. At what time
should we say that the perceptual signal exists?

I thought perceptual signals always existed, but just had different
magnitudes at different times.

I meant, at what time would the observer start to react as if it exists?
I would not say it exists until it reached somewhere between the 1- and
2-sigma level, though of course an oscilloscope connected in the right
place would show a signal before then.

Above, you talk about
“the rise time of a perceptual signal” as though you subscribe
to this view. Here, you seem to deny it.

Just more verbal miscommunication.

I take it that you are
expressing an opinion based on faith, in opposition to the precise
straight-line data that Schouten obtained, but that you seem to be saying
is impossible?

Well, thanks. I hope you can see now what I was talking about.

Back to channel capacity. Here’s
a quote from a different section of the same paper from which I took the
Schouten diagram, about my own experiments (“Quantification of
Shared Capacity Processing”. Taylor, Lindsay and Forbes, Acta
Psychologica 27, (1967) 223-229). The data were reported elsewhere. If
it’s of interest I might be able to find the paper with the
data:
"Among our experiments on audio-visual capacity sharing have been
some in which presentation time was an experimental variable. … For the
immediate purposes it suffices to remark that the subjects had to make
one or more of four discriminations: left-right discrimination of a dot
on a TV screen, up-down position of the same dot, pitch of a tone burst
presented simultaneously with the dot, and intensity of the same tone
burst. The stimulus duration in various experiments has been varied from
as low as 33 msec to as high as 4 sec. For all four aspects of the
stimulus (we consider the dot and the tone to constitute a single
stimulus) we have consistently found that d’^2 rises linearly with
presentation time up to 130 msec or so, and that for longer intervals the
rise, while possibly linear, is very much slower. These results are
consistent with the proposition that d’^2 is an additive measure of
discrimination performance, while suggesting that the processor can work
effectively on only the first 130 msec of the
signal.
"

I think that the interpretation you put on this kind of data will depend
very critically on the model you’re assuming in the background. To
“make a discrimination” suggests some sort of threshold effect,
where you get a signal when the input is above the threshold and no
signal when it’s not. Just what sort of “processor” were you
envisioning? When you say “the subject had acquired half the
information needed in an ordinary “as soon as possible”
reaction time”, I wonder what sort of process could use half of the
information needed and make any sense of it at all: “Four and years
our” doesn’t help me at all toward “Four score and seven years
ago our forefathers…”

While the concept of channel capacity is relevant to analog systems as
well as digital ones, its meaning is rather different when amplitude
measures matter as much as durations and times of occurrance. In the
Schouten case above, the amplitude of the perceptual response to the
light matters because the part of the curve that is near the noise level
does not distinguish between members of a family of curves having the
same measured initial slope, and the channel capacity that actually
exists can’t be calculated without knowing which member of that family is
present. It is certainly not the case that an analogue circuit could
“work effectively only on the first 130 milliseconds of the
signal”, especially when that part of the signal gives no
information about what the signal does after reaching that level. To the
contrary, most of what happens depends on the behavior of the signal
through its whole course – unless you arrange the task so that there is
no measured consequence of the signal after the response has been
made.

Best,

Bill P.

[Martin Taylor 2009.03.06.00.04]

[From Bill Powers (2009.03.05.1005 MST)]

Martin Taylor 09.03.0.4.17.42 –

As for the forced-choice
with
deadline being an artificial situation, I just ask whether you have
never
found yourself having to choose between two alternative courses of
action
when you would prefer to wait to get more information about the
situation. If you have never had to make a choice before getting all
the
information you might have wanted, you haven’t lived!

I usually wait until I’m pretty sure, and devote my efforts before then
to making the decision unneccessary just yet. The main thing is that I
know what the outcome is quite likely to be if I decide prematurely:
wrong. If you have only a 60% chance of choosing right, you might as
well
not choose, because your estimate of the chances is probably off by a
large amount, too.

That’s your privilege, of course, but if it happened to be a tiger
rather than your friend who made that leaf rustle, you’d be dead by
then. Think about your everyday life. Do you never have to make choices
by deadlines, whether you have got enough data to be quite sure of the
best choice? Do you never vote (I assume you never have much really
solid information about most of the candidates for office). One of the
few things I remember from my short days as a student of Industrial
Engineering was the professor pointing out that “Inaction can be
extreme action”.

As a circuit-oriented
theoretician I view this experiment as a clever way of estimating the
rise-time of a perceptual signal. The 1/e time would seem to be about
50
milliseconds.

I have seen no evidence of an exponential curve in the data presented,
so
I would certainly not try to venture a guess such as
that.

You’re right, and I didn’t state the case correctly. What I should have
said was that the signal rises fast enough to be clear of the noise by
2
or 3 standard devations in about 50 milliseconds. We don’t know how
large
the signal would eventually get or what its 1/e rise time would be,
because (a) that depends on the light intensity, and (b) the
measurements
plotted extend only to about 2.2 standard deviations (square root of
5),
and anyway the probability of a correct guess is so close to 1 by then
that it can’t really be measured in any reasonable number of trials.

That was precisely what made me interested in the Schouten study in the
first place. It is indeed extremely difficult to determine the
detectability or discriminability of obvious things like bright lights
or the locations of well separated things. I saw the Schouten technique
as a possible way to get at these very detectable or very discriminable
signals, which orthodox techniques cannot touch for the reason you
mention.

Back to channel
capacity. Here’s
a quote from a different section of the same paper from which I took
the
Schouten diagram, about my own experiments (“Quantification of
Shared Capacity Processing”. Taylor, Lindsay and Forbes, Acta
Psychologica 27, (1967) 223-229). The data were reported elsewhere. If
it’s of interest I might be able to find the paper with the
data:

*"Among our experiments on audio-visual capacity sharing have

been
some in which presentation time was an experimental variable. … For
the
immediate purposes it suffices to remark that the subjects had to make
one or more of four discriminations: left-right discrimination of a dot
on a TV screen, up-down position of the same dot, pitch of a tone burst
presented simultaneously with the dot, and intensity of the same tone
burst. The stimulus duration in various experiments has been varied
from
as low as 33 msec to as high as 4 sec. For all four aspects of the
stimulus (we consider the dot and the tone to constitute a single
stimulus) we have consistently found that d’^2 rises linearly with
presentation time up to 130 msec or so, and that for longer intervals
the
rise, while possibly linear, is very much slower. These results are
consistent with the proposition that d’^2 is an additive measure of
discrimination performance, while suggesting that the processor can
work
effectively on only the first 130 msec of the
signal.*"

I think that the interpretation you put on this kind of data will
depend
very critically on the model you’re assuming in the background. To
“make a discrimination” suggests some sort of threshold effect,
where you get a signal when the input is above the threshold and no
signal when it’s not.

Obviously I knew nothing of PCT at the time. Now I would work with the
same model we have been discussing, with the exception that there would
be four “match” control loops for the main presentation, and another
control system (possibly multi-level – I haven’t thought about it) for
which the output selects which of the four “match” systems is allowed
to send its output as a reference value to the button-push.

Just what sort of “processor” were you
envisioning? When you say “the subject had acquired half the
information needed in an ordinary “as soon as possible”
reaction time”, I wonder what sort of process could use half of the
information needed and make any sense of it at all:

Well, you see what such a process can do in the Schouten data, when you
look at the graph for about 270 msec delay. By that time, the
"processor’ has half the information it has by 320 msec. It is able to
get (from memory) about 85% correct button presses.

More generally, now knowing something of PCT, I would equate
“processor” with the complex of control loops that interact to enable
the subject to perform in teh experiment as the experimenter asks.

“Four and years
our” doesn’t help me at all toward “Four score and seven years
ago our forefathers…”

However, it might help you perceive that neither “To be or not to be”
nor “We shall fight on the beaches” was the quote from which some words
had been obscured by noise bursts. Possibly if someone mentioned
“forefathers”, that might even be enough to cue you into recognizing
which words you missed. Shannon did determine that the redundancy of
English is about 50%, meaning that in a long passage with a random 50%
of the words substituted by gaps or backouts (so you know where they
were), you will be able to correctly replace most of the missing words.
But that really has nothing to do with the case at hand, which is the
rate of gain of information over time. Your example might have been a
slightly better analogy if you had said “Four score and seven” rather
than “Four an years our”.

While the concept of channel capacity is relevant to analog systems as
well as digital ones, its meaning is rather different when amplitude
measures matter as much as durations and times of occurrance. In the
Schouten case above, the amplitude of the perceptual response to the
light matters because the part of the curve that is near the noise
level
does not distinguish between members of a family of curves having the
same measured initial slope, and the channel capacity that actually
exists can’t be calculated without knowing which member of that family
is
present.

I would prefer to continue this kind of discussion if I was assured
that you understood the analysis that allows us to equate d’^2 with
information. It’s on page 13-14 of my Bayesian_Seminar_2.pdf. Shannon’s
expression c9381f69e65d9c81edae2056095f2f74.png, which you mention from the wiki page, is central to the
argument. Given that, the equation of d’^2 and information follows from
the fact that d’^2 = 2E/No under the same “ideal observer” conditions
that apply to Shannon’s formula, where E is the signal energy and No
the noise power per unit bandwidth.

It is certainly not the case that an analogue circuit
could
“work effectively only on the first 130 milliseconds of the
signal”, especially when that part of the signal gives no
information about what the signal does after reaching that level.

No, but it seems to be the case that at least some of the human sensory
systems have that limitation, at least with static signal bursts such
as our tones and light flashes. Garner found the same thing in the
1950s, when measuring channel capacities was all the rage. He was
looking at information integration in hearing. He found the same as we
did a decade later, that there is a break at around 130 msec between
complete information integration and a following slower rise.

If there were a similar break at other perceptual levels, the break
time would presumably be longer the higher the level. That might be yet
another way of teasing out the levels of perception.

One question of real interest is whether the 130 msec interval has any
relevance when the signal is changing. A very crude notion might be
that it might suggest a rough limit on the rapidity with which we could
see (and control?) independent values of a variable at the level of
sensation, at about 7 per second. I don’t remember whether that was one
of Rick’s levels, and if it was, whether his rate was in that ballpark.
It’s a very much slower rate than visual flicker fusion rate (around 60
Hz) or the ability to detect auditory modulation, but it could be a
rate at which we can perceive the individual changes of level. Just a
speculation…

Martin

[From Bill Powers (2009.03.06.1035 MST)]

[Martin Taylor 2009.03.06.00.04]

[WTP] I usually wait until I’m
pretty sure, and devote my efforts before then to making the decision
unneccessary just yet. The main thing is that I know what the outcome is
quite likely to be if I decide prematurely: wrong. If you have only a 60%
chance of choosing right, you might as well not choose, because your
estimate of the chances is probably off by a large amount, too.

[MMT] That’s your privilege, of
course, but if it happened to be a tiger rather than your friend who made
that leaf rustle, you’d be dead by then.

Made-up positive examples don’t prove your case. For every one of those,
there are many negative examples. The conditions, remember, are that it
is almost equally likely that your response will be wrong: if there’s
only one possible place the tiger could be when you hear a leaf rustle,
then by all means run for your life in a direction away from the tiger,
assuming you’re pretty sure what direction that is. But if running could
equally well take you toward the tiger, it doesn’t matter whether you act
or not: you have a fifty-fifty (or 60-40 or 40-60) chance of being dead
if you act even sooner than if you did nothing. You may take some comfort
from knowing that you at least did something, but we’re talking
about cases in which you will do the right thing only about half of the
time, and in which doing the wrong thing is very bad.

Think about your everyday
life. Do you never have to make choices by deadlines, whether you have
got enough data to be quite sure of the best choice?

I think of a way to put off the deadline, because I know that just acting
in order to be acting is futile, and uses up resources I might need. I
have much more faith in my ability to handle whatever comes up than I do
in my ability to correctly forecast the future and have the appropriate
action all ready. For one thing, having the appropriate action all ready
can be a terrible handicap if what happens is different from what you
expected. That can be much worse than doing nothing but remaining
alert.

Do you never vote (I assume you
never have much really solid information about most of the candidates for
office).

Yes, of course, and it’s been a long time since I had a hard time
deciding. When I know nothing about candidates on a ballot, I just skip
that item. Why bother, when half of the time I will make a choice that
turns out exactly wrong?

One of the few things I
remember from my short days as a student of Industrial Engineering was
the professor pointing out that “Inaction can be extreme
action”.

Sure, it can be. But is it usually? You can always make up a scenario
under which any proposal would be correct. But that doesn’t tell us what
sort of policy to follow in general. In general, I would say that
inaction has many advantages over action when you’re not reasonably sure
of what the right action would be.

and anyway the probability
of a correct guess is so close to 1 by then that it can’t really be
measured in any reasonable number of trials.

That was precisely what made me interested in the Schouten study in the
first place. It is indeed extremely difficult to determine the
detectability or discriminability of obvious things like bright lights or
the locations of well separated things. I saw the Schouten technique as a
possible way to get at these very detectable or very discriminable
signals, which orthodox techniques cannot touch for the reason you
mention.

I need to ask again what the actual observed variable was in these
experiments. Wasn’t it a count of correct and incorrect responses at each
delay time? Isn’t that why you need so many trials when the probability
of a wrong response gets down to 0.003? That’s the probability at 3
sigma. As I’ve been understanding this, the measure of d’ is derived from
the measured probabilities of getting a correct answer, which is Nc/N,
where Nc is the number of correct choices and N is the total number of
choices at a given delay. Since you have to measure those probabilities,
you’re not gaining any advantage by the Schouten approach.

It seems that d’ is derived from those probability measurements, not the
other way around. Given the probability of a correct choice, we can use
the tables to find out how many standard deviations over the noise level
correspond to that probability, assuming a Gaussian distribution of the
noise. From this we can deduced what the underlying noise level is. This
seems to fit the data, since the probabilities you mention that go with
different values of d’ approach certainty very rapidly with each added
increment in d’. The straight lines in Schouten show that the
distribution is Gaussian.

Well, you see what such a
process can do in the Schouten data, when you look at the graph for about
270 msec delay. By that time, the "processor’ has half the
information it has by 320 msec. It is able to get (from memory) about 85%
correct button presses.

This estimate of how much information exists at each delay depends very
strongly on how you think this detection works. What you describe can
also be imitated by a series resistor terminated by a capacitor connected
to ground, followed by a threshold detector. To characterize the voltage
on the capacitor at a given time as “information” is simply a
matter of what computation you choose to use. What you call
“processing” is simply a flow of current into the capacitor.
This is why I ask what model you’re assuming. If we’re just talking about
an R-C circuit, the concept of information is overcomplicating something
very simple. You can think of a computer accumulating digital information
over time and reasoning out what it means on the basis of the information
so far received, or you can just think of a capacitor charging up. The
latter could be much closer to what is actually happening; all that
“processing” would then be a figment of the imagination that
adds complexity without increasing our understanding.

More generally, now knowing
something of PCT, I would equate “processor” with the complex
of control loops that interact to enable the subject to perform in the
experiment as the experimenter asks.

I would equate it with something much simpler, since this experiment is
focused on just one perceptual input function. The other complexities are
there in the background, but the experiment is not measuring
them.

“Four and years our”
doesn’t help me at all toward “Four score and seven years ago our
forefathers…”

However, it might help you perceive that neither “To be or not to
be” nor “We shall fight on the beaches” was the quote from
which some words had been obscured by noise bursts.

But there is nothing in that stream of words that tells me how much, if
anything, is missing from it. The original sentence could be “four
and twenty blackbirds, a verse from the years of our childhood.”
Information is not interchangeable, so we get can get half of the way to
understanding given ANY half of the bits. When we actually have half of
the information, we don’t know that, since we have no idea how much we
have missed or how much is yet to come. “On the nose” does not
give us half of the information in “Mary kissed John on the
nose” or “Mary socked John on the nose” or “That is
right on the nose.”

Possibly if someone
mentioned “forefathers”, that might even be enough to cue you
into recognizing which words you missed.

Or that could completely mislead you if the original sentence was another
one I mentioned. Information theory has nothing to do with meaning; it’s
just a way of characterizing channel capacity, no matter what message is
being carried by the channel. “John hit Mary” has the same
number of bits of information as “Mary hit John” but the two
clearly do not mean the same thing.

Shannon did determine that
the redundancy of English is about 50%, meaning that in a long passage
with a random 50% of the words substituted by gaps or backouts (so you
know where they were), you will be able to correctly replace most of the
missing words. But that really has nothing to do with the case at hand,
which is the rate of gain of information over time. Your example might
have been a slightly better analogy if you had said “Four score and
seven” rather than “Four an years our”.

How would that help us arrive at the rest of the sentence, which is
“… is a quotation from Lincoln”? “Half of the
information,” as you use the term, implies that any half of the bits
will do, in or out of sequence, with large gaps or small.

While the concept of channel
capacity is relevant to analog systems as well as digital ones, its
meaning is rather different when amplitude measures matter as much as
durations and times of occurrance. In the Schouten case above, the
amplitude of the perceptual response to the light matters because the
part of the curve that is near the noise level does not distinguish
between members of a family of curves having the same measured initial
slope, and the channel capacity that actually exists can’t be calculated
without knowing which member of that family is present.

I would prefer to continue this kind of discussion if I was assured that
you understood the analysis that allows us to equate d’^2 with
information. It’s on page 13-14 of my Bayesian_Seminar_2.pdf. Shannon’s
expression
C =  B \log \left( 1+\frac{S}{N} \right)\
, which you mention from the wiki page, is central to the argument. Given
that, the equation of d’^2 and information follows from the fact
that d’^2 = 2E/No under the same “ideal observer”
conditions that apply to Shannon’s formula, where E is the signal energy
and No the noise power per unit bandwidth.

I don’t think you understand the point I am making. The actual channel
capacity can be calculated from the measured bandwidth. I showed that the
same model will yield the same information content as you measure it even
when the actual channel capacity varies enormously. That is because the
1/e rise time of the perceptual signal can generate initial rise rates of
the signal that are indistinguishable from one another even if the 1 -
e^-kt waveforms have greatly different values of k – which determines
the bandwidth. If you’re only looking at the first 5% or less of the
total rise, the exponential differs from a straight line by at most 1
part in 400, which would never be detected in this experiment.
I was hoping you would look at my argument and realize for yourself that
this is a large loophole in the Schouten experiment. The conclusions you
draw about the first 130 milliseconds apply only to that first 130
milliseconds. They say nothing about the actual channel capacity of the
perceptual input function you’re trying to characterize. The actual
channel capacity depends on how the perceptual signal behaves even after
the contribution of noise has become only a small part of the total
signal. If you were to apply a sine-wave disturbance to the light
intensity, you would find that the perceptual signal would have a much
smaller amplitude and a much larger phase lag than you would expect from
the channel capacity you measure over 130 milliseconds after a step-rise
in intensity. That is because most of the perceptual signal variation
would occur when the signal is far above the noise level and represents
the light intensity very accurately (save for nonlinearities). Your
approach does not consider that the signal contains any information after
the second or third sigma of excess over the noise level. Relative to the
assigned task, which is merely to report “detected”, that may
be true. But it has nothing to do with the channel capacity of that input
function, which is a physical property that doesn’t depend on what the
signal is or what use it made of it.
I would be much more interested in an argument that forces you to
equate d^2 with information, and that pins down a definition of
information that is not simply the equation by which you calculate it. I
just don’t think that “Your girlfriend is 18 years old”
contains as much information as “Your girlfriend is 17 years
old,” if by “information” you mean “important
implications.” If you’re just counting bits, that’s fine: sometimes
we might want to know how many bits per second a given channel can
transmit, even not knowing which bits they will be. But that kind of
knowledge is a far cry from what we normally mean by information, which
is what the string of bits conveys by way of meaning. Shannon hijacked
the term, but I don’t admit that he got away with it. If he had just
called the units “Shannons” we would be much better off. The
connection with the term “information” is entirely
gratuitous.

It is certainly not the case
that an analogue circuit could “work effectively only on the first
130 milliseconds of the signal”, especially when that part of the
signal gives no information about what the signal does after reaching
that level.

No, but it seems to be the case that at least some of the human sensory
systems have that limitation, at least with static signal bursts such as
our tones and light flashes.

But you’re finding out only that the signal-to-noise ratio is lowest in
the first 130 milliseconds, not what the control system using that signal
does on the basis of its changing (and accurately-perceived) amplitude
after the uncertainty has dropped essentially to zero. You’re thinking
that the only thing such a signal could be used for is to trigger a
simple act, a “response.” But if, instead of a button, we had
provided the subject with two potentiometers to turn, and told the
subject to keep both lights below some particular brightness (perhaps
compared with another light), the first 130 milliseconds wouldn’t matter
much because most of the control action would be taking place after that.
Yet the perceptual input function could be the same physical one measured
in the Schouten experiment.

Garner found the same
thing in the 1950s, when measuring channel capacities was all the rage.
He was looking at information integration in hearing. He found the same
as we did a decade later, that there is a break at around 130 msec
between complete information integration and a following slower
rise.

My objection is simply that if you use the data to calculate
“shannons,” that is all you will get: an arbitrary measure of a
physical process, or part of it. It’s like deciding to measure something
in units of quarts per log(parsec). Sure, you can probably come up with a
formula for converting a physical measurement into those units, but
what’s the justification? That’s what I see as missing from these
applications of information theory that go far beyond channel capacity.
Is the fact that you CAN compute information capacity any indication that
you should? Or that what you’re computing has anything to do with any
independent definition of “information?”

Best,

Bill P.

[From Rick Marken (2009.03.06.1350)]

Bill Powers (2009.03.06.1035 MST)

Made-up positive examples don’t prove your case…

My objection is simply that if you use the data to calculate
“shannons,” that is all you will get: an arbitrary measure of a
physical process, or part of it. It’s like deciding to measure something
in units of quarts per log(parsec).

This post was brilliant; just beautiful. Sure better than my little screeds. On that last note, I actually had a dream about you last night; you were nodding approvingly about my having adopted a more “up a level” approach to argument and I was saying that I was afraid I was falling back into my “down a level” bad groove. But you just ignored me; very MOL;-)

Thanks for the very informative (non-Shannon version) read and for being so nice to me in my dreams. I didn’t deserve it.

Best

Rick

···


Richard S. Marken PhD
rsmarken@gmail.com

[From Bill Powers (2009.03.-7.1144 MST)]

Martin Taylor 2009.03.07.00.19 --

I find myself completely flabbergasted at your comments on information and channel capacity, to the extent that I have no idea where to begin to continue the discussion. "Flabbergasted" at the truly fundamental misunderstandings expressed is perhaps rather too weak for the astonishment I experienced on reading your message! We clearly don't speak the same
language, and for me to comment further would be obviously pointless.

Perhaps instead of thinking about everything that needs to be said, you could think about the first misunderstanding on my part that needs to be corrected before anything else can be fixed. If just one fundamental point could be settled, perhaps the next one would yield more easily. Obviously, what I said about information theory and channel capacity makes sense to me, but not to you. If there is a "truly fundamental misunderstanding," wouldn't it be important to clear it up? If it's so obvious to you that the error lies completely on my side, you should be able to see where I went off the tracks and set me straight. I, obviously, can't see where the mistake is. Don't I usually change my tune if I see I've said something wrong?

Best,

Bill P.

[From Bill Powers (2009.03.09.0215 MST)]

[Martin Taylor 2009.03.07.00.19]

We clearly don't speak the same language, and for me to comment further would be obviously pointless.

So I won't. I'll just continue to use information theory and channel capacity correctly, and hope that something of its value will come across on occasion.

Perhaps if I boil my proposition down to its minimum form something useful may yet emerge from it.

Let's say that in this channel there is a signal that begins to rise about 200 milliseconds after a light turns on, with an amplitude A described by

A = Ao(1 - e^-k(t - To)) for t >= To and To = 200 msec.

This signal represents the difference in perceived intensity between two lights.

Let the net difference signal s be the sum of A and a zero-average random variable R with a Gaussian distribution:

s = A + R.

Now we can ask, what is the probability that s will be >= zero, indicating (arbitrarily) that the right-hand light is lit when it is actually lit? It is simply the probability that A will be greater than -R. I am skipping over scaling factors and the necessary summing of effects in two directions.

The Schouten experiment arranges for the perceived brightness difference between the two lights to be sampled at various times t after To, which means sampling it at various values of A[t]. The fraction of correct guesses to total guesses for a given time delay t will depend on the probability that A[t] > -R, which can be calculated from formulae or tables. If A[t] is plotted in units of standard deviations of R, and the distribution of R is Gaussian, the result will be a straight line during the linear rise portion of the perceptual signal.

Obviously a number of details need to be cleaned up, but that is the structure of the model I proposed for the subject in the Schouten experiment. It should be possible to adjust the parameters of the cleaned-up model to match its performance to the results for real subjects.

This may show that I have no conception of how to use the concepts of channel capacity and information theory correctly, but I think it also shows that the results of the Schouten experiment can be understood without bringing in those concepts, however applicable they may be.

Best,

Bill P.

[From Bill Powers (2009.03.09.0759 MST)]

Martin Taylor 2009.03.08.17.53 –

For just one example, take
I showed that the same model will yield the same information
content as you measure it even when the actual channel capacity varies
enormously
.” This, to me, is like saying “I showed that you
get the same amount of water in the bathtub per minute, whether the tap
is wide open or nearly shut”.

What I thought I had shown was that the measurement of perceptual
information in the first fraction of a second after the light goes on
does not treat the perceptual signal as a continuing analog measure. I
mentioned that if this same light perception were used during continuous
control of light intensity, its bandwidth, and hence its formal maximum
rate of information transfer, would depend on the long-term rise time
which could be many times slower than the initial increase in
signal-to-noise ratio. Ninety-five percent of the uncertainty is removed
in the first 50 milliseconds after the initial rise, yet the input
signal’s time constant could be five or ten times that long. The capacity
of the channel to represent changes in the controlled variable would
customarily be measured by using that time-constant, not the
signal-to-noise ratio during the first few milliseconds of signal after a
step-change of the input. What you’re saying, it seems to me, is that all
you have to do is observe the water level in the tub for the first 10
seconds, and from that you can infer how full it is going to
get.

This is actually taking us back to the discussions of “subjective
probability.” If probability is something that has existence outside
a person, which is what we tacitly assume when we calculate it from
observations of external events, then it can’t also be subjective and
different for each person. The same holds for information – if the
maximum information capacity of a channel can be calculated by a formula
involving bandwidth and signal-to-noise ratio, it can’t also vary from
person to person depending on what surprises that person. We must be
talking about two different things. If the same
maximum-information-capacity message can convey knowledge to one person
and nothing to another person, then clearly we have to look inside the
person to see what is really happening – the message is only part of the
story. The idea that the message is carrying something from one person to
another, other than photons or sound waves, is simply a misunderstanding,
like the “looking rays” that came out of people’s eyes in
midaeval drawings.

I think that’s really the most basic of my problems with the idea of
information transfer. I don’t think there is any information
“in” a message. All the information that seems to be
transmitted is really coming from inside the person receiving the
message. The person has learned to use certain perceptions as pointers to
other perceptions of his own, and assumes that when those same pointers
are provided to others, the other person will be reminded of the same
perceptions. Of course as the present interchange is illustrating quite
well, that simply isn’t how it works. Each of us supplies his own
meanings when presented with the lower-level inputs we call messages.
Nothing is carried from outside the person to inside except the message
itself, and even that is perceived at the destination in ways the
originator couldn’t have anticipated. It is rather miraculous that we can
get any feeling of communicating with each other in a consistent way.
When this process breaks down, as it has done here, we can see how
precarious communication really is, particularly at higher
levels.

One thing you say does coincide
with what most writers on information claim: “Information theory
has nothing to do with meaning
;”. I’ve always disagreed with
that claim, even though it is generally accepted. Because the information
gained from a particular physical message depends entirely on what you
knew about the subject beforehand, it seems to me that information theory
has everything to do with meaning. But channel capacity does not. It
represents the maximum rate at which information could be transmitted
through that medium, independent of the circumstances and the prior
knowledge of the receiver.

But the same channel capacity would be computed even though what is
transmitted means something to one person and nothing to another. How can
you say that channel capacity has something to do with information? To
me, that is a self-contradiction.

To say the information “gained from a message” depends entirely
on what you knew about the subject beforehand comes within hailing
distance of my view. The main difference is that I don’t think the
message does anything but be a set of perceptions. We don’t actually
“gain” something “from” the message. We use the
message as a pointer to experiences of our own, as when I say
“red” and you are reminded of experiencing that color as a
color rather than a word. The color you’re reminded of is probably not
the exact shade of red as the color I’m reminded of, but at this low
level of perception the differences in meaning don’t usually cause
problems (except to interior decorators). However, as we proceed to
higher levels, the experiences we are reminded of by hearing or reading
words like “information” can diverge sharply. Then the
contradictions and inconsistencies multiply and become obvious.

The central concept of the
Layered Protocol Theory of dialogue is the hierarchy of one’s perceptions
of the dialogue partner at many levels of abstraction, and their
relations to the corresponding reference perceptions of the partner.
Misunderstandings occur when there is uncertainty about the perception of
the partner, or more particularly when the perception of the partner is a
bad representation of the facts.

While I agree in general with LPT, I have reservations about some aspects
of it. It seems to me that you have too much faith in the ability of one
person to grasp, simply through informal interactions, what another
person is perceiving. I don’t think it’s that easy or that we do it that
well. It’s not that we’re “uncertain” about the perception of
the partner – we’re quite certain, but about the wrong thing. We don’t,
or I should say I don’t since I may be a freak, look over the field of
all the things the partner might mean and try to fill in the blanks. We
do a certain amount of that, but mostly we just assume that the meaning
that springs instantly to mind when the partner speaks is the meaning the
partner intended. The problem is not too much uncertainty; it’s too
little.

If your perception of the
partner doesn’t conform to the facts, attempts to correct the error are
likely to work in an ineffective direction, and won’t reduce it. If you
speak Chinese and I speak Albanian, our communication won’t improve until
either I learn Chinese or you learn Albanian.

But even then, the word “family” will probably not be given the
same meanings by the two parties. My point is that they will give
meanings to the word they perceive, but the meanings will come from their
own memories, not from the word or the other person’s memories. I repeat,
this is not a matter of decreasing uncertainty; it is being certain and
at the same time mistaken.

Likewise, if we are to have a
meaningful discussion about information, it can’t happen if it is true
both that I have no concept of what you mean by the word, and that you
have no concept of what I mean by it. Judging by your previous message,
both of these are in fact true.

Yes, and by my way of seeing it, it can’t be otherwise. I can’t
experience your meanings. You are surprised, even flabbergasted, when I
use words in ways that you can’t fit meanings to in an
internally-consistent way. I’m not surprised when this happens the other
way around; I expect it. I know that I don’t know what you mean by
“information.” I can only try to pick up hints from the way you
use the word in sentences. When I can’t put those meanings together into
a consistent framework in my own mind, all I can do is sputter “But
… but… but…” and say what I am getting from the words. Of
course I have to convey that to you with more words, and who knows
whether we are converging to a stable picture, or working our way farther
and farther away from one?

Best,

Bill P.