Neurophysiology of perception

[From Rick Marken (961096.1045)]

Me:

can you give me any reason to believe that there is something in the neural
literature that would help me understand perceptual control?

Benzon:

Give you RM a reason? If I haven't done that by now, then, no, I can't.
One of us is going to have to work alot harder to get over

I can see that my question might have seemed impudent. But I was actually
asking in earnest. Earlier, you [Bill Benzon (961020)] had said to Bill
Powers (961020.0430 MDT)

Well Bill, I really can't resist. If the input function is so critical,
how can you possibly justify your black box attitude about how those
functions operate?

I thought this was a good question, actually. In PCT, the nature of the
perceptual input functions _are_ critical; they determine what we control.
PCT suggests that an understanding of behavior at the neurophysiological
level will be based on an understanding of how neural networks compute
perceptions, not outputs.

I think it is not really correct to say that PCT has a "black box" attitude
towards how perceptual functions (or any of the other neural functions in the
hierarchical control model) operate. I think Bill Powers has done an
excellent job of keeping PCT model speculations consistent with what we know
about neural function. I don't think there is anything in the HPCT model that
cannot plausibly be carried out by a real nervous system. I don't think the
same can be said of "holographic" models of memory, Fourier models of
perception or inverse kinematic models of output generation. There is no
evidence that a real nervous system can carry out computations assumed by
these models. On the other hand, there is considerable evidence that the
nervous system can compute the perceptual variables assumed by PCT. We don't
know how the nervous system actually computes these perceptions. But the
work of people like Hubel and Weisel shows that neural signals can represent
variations in rather complex aspects of the environment. So the PCT model of
perception certainly seems compatible with neurophysiological fact.

Nevertheless, it's true that we don't know the details of how the nervous
system produces perceptions more complex than, say, colors or tones
(sensations). We don't know, for example, how the nervous system produces
what for most people are importantly controllable perceptions, like the
degree of honesty in a relationship. We just assume that the nervous system
_can_ compute such perceptions because we have (and control) such perceptions
- - I can perceive and control the degree of honesty in a relationship --
and because such perceptions seem to depend on the existence of a brain
("Kill the brain and you kill the perception" -- Night of the Living Control
Theorist;-))

So this is why I asked if you could "give me any reason to believe that there
is something in the neural literature that would help me understand
perceptual control?" I asked this becuase I really wanted to know if you
thought that there WAS something in the neural literature -- relevant to the
neurophysiology of the perceptual inout functions -- that might be of use in
the study of control. I think that it is perfectly plausible that there IS
something in this literature that would either lend greater credence to our
functional model of perceptual computation or suggest changes that would
bring the functional model into line with the most recent neurophysiological
evidence.

But you said nothing in your replies to Bill or me about the neurophysiology
of perceptual input functions. You criticize PCT for a "black box attitude"
toward a critical component of the model -- the input function. But then you
never say what's missing or how putting it into the model would help. I'm
willing to assume that nerophysiology has something useful to offer PCT. But
it's difficult for me to see what that is if all I hear is that it's a
mistake _not_ to consider it.

I realize that you have better things to do than to try to educate me. But I
hope you can see that my questions about neurophysiology are not meant to be
impudent. I really would like to know what you think we should know about the
neurophysiological of the perceptual input functions.

By the way. What do you give senminars on?

Best

Rick

Rick Marken (961096.1045) sez:

cannot plausibly be carried out by a real nervous system. I don't think the
same can be said of "holographic" models of memory, Fourier models of
perception or inverse kinematic models of output generation. There is no
evidence that a real nervous system can carry out computations assumed by
these models.

This is not true, and hasn't been for 20 years. In the first place, no one
really takes the holographic metaphor literally. There is considerable
observational evidence that the visual system has units of some type which
are turned to various spatial frequencies -- which the holographic metaphor
suggests. This has now, in fact, become standard wisdom in the field (at
least, that's what I infer from an summarizing article on the visual system
which appeared in Science a year or three ago; it talked of spatial
frequency senstivity as though it were the most natural thing in the
world).

On the other hand, there is considerable evidence that the
nervous system can compute the perceptual variables assumed by PCT. We don't
know how the nervous system actually computes these perceptions. But the
work of people like Hubel and Weisel shows that neural signals can represent
variations in rather complex aspects of the environment.

I believe that, in a previous post, I suggested that the Hubel & Weisel
observations can be reinterpreted in spatial frequency terms. The fact
that a neuroscientist sees the stimulus pattern as a line or an edge
doesn't mean that that is what the nervous system is sensitive two. "Line"
and "edge" are just intuitively obvious descriptions of certain visual
patterns. Those words don't necessarily designate with the pattern
essentially is.

toward a critical component of the model -- the input function. But then you
never say what's missing or how putting it into the model would help.

This is nonsense. I've written a paper (where have I heard that before?).
The literature it references is 15 years old and the paper's direct
relevance to PCT may be hard to see. But the paper wouldn't have been
written without PCT. But I really can't do any better than that paper. Not
without writing another, and another, and another. And I don't have that
kind of time. Though the time I've spent here certainly has influenced
(favorably, to my mind) how I will write those papers if and when...

* * * * *

Talking though paradigm differences sure is frustrating, isn't it?

···

********************************************************
William L. Benzon 518.272.4733
161 2nd Street bbenzon@global2000.net
Troy, NY 12180 http://www.newsavanna.com/wlb/
USA
********************************************************
What color would you be if you didn't know what you was?
That's what color I am.
********************************************************

[From Rick Marken (961021.1530)]

Me:

There is no evidence that a real nervous system can carry out computations
assumed by these [holographic, Fourier, inverse kinematic] models.

Bill Benzon --

There is considerable observational evidence that the visual system has
units of some type which are turned to various spatial frequencies

I know. I meant that, as far as I know, there is no neurophysiological
evidence that the neurons do the mathematical operations (eg. convolution of
input signal with set of sine and cosine signals, for example) the way they
are done by the spacial frequency analysis model. It might be convenient to
say that neural networks act "as if" they do a spacial frequency analysis but
it is quite another thing to show that they _do_ do a spatial frequency
analysis.

I suggested that the Hubel & Weisel observations can be reinterpreted in
spatial frequency terms.

Right. This just shows that the neurophysiologists have no idea _how_ the
neural networks compute perceptual outputs as a function of sensory inputs.
They just know that neural networks _do_ compute (somehow) output perceptual
signals (the single unit activity) as a function of a set of input sensory
signals (the optical pattern presented to the retina). So the neuro-
physiologists are in the same position as the PCT modeller; they don't
know how perceptual signals are computed as a function of sensory (or lower
level perceptual) inputs and neither do we.

Talking though paradigm differences sure is frustrating, isn't it?

Rather like walking through the looking glass, I'd say;-)

Best

Rick

From Bill Powers (961021.1705 )]

Bill Benzon --

The fact
that a neuroscientist sees the stimulus pattern as a line or an edge
doesn't mean that that is what the nervous system is sensitive two.

Then we've got a problem, Houston. How do you account for the fact that the
neuroscientist sees a line or an edge, if his nervous system isn't sensitive
SOMEWHERE within it to lines or edges? ESP?

Incidentally, sensitivity to spatial frequencies does not imply a
sensitivity to spatial frequencies as such, any more than an audio amplifier
is sensitive to audio frequencies as such. Spatial frequency test patterns
are used to determine characteristics of perceptual systems that make
spatial discriminations, but this doesn't mean that the signal emitted by
the perceptual system represents spatial frequency.

Best,

Bill P.

Rick Marken (961021.1530) sez:

I know. I meant that, as far as I know, there is no neurophysiological
evidence that the neurons do the mathematical operations (eg. convolution of
input signal with set of sine and cosine signals, for example) the way they
are done by the spacial frequency analysis model. It might be convenient to
say that neural networks act "as if" they do a spacial frequency analysis but
it is quite another thing to show that they _do_ do a spatial frequency
analysis.

Well, I can live with that "as if." The "as if" they did a spatial
frequency analysis is quite a bit different from the "as if" in which we
start with edge detectors, and concatenate edges into lines, lines into
simple figures, simple figures into faces and monkeys and trees and
mountains, etc. These are two very different ways of going about image
recognition. One ought to be able to do tests which differentiate between
these two general strategies without getting hung up on the exact details.
We'll get to those later.

···

********************************************************
William L. Benzon 518.272.4733
161 2nd Street bbenzon@global2000.net
Troy, NY 12180 http://www.newsavanna.com/wlb/
USA
********************************************************
What color would you be if you didn't know what you was?
That's what color I am.
********************************************************

Peter Cariani (961022.1105 )

Some comments on neurophysiology and PCT, re the ongoing
discussion of the last two weeks:

To some degree, questions of how percepts are represented
in the brain (what is the nature of the perceptual signals?)
really are distinct from those concerning how such signals
are used behaviorally.

(I think) the places where issues of sensory coding and neural
information processing intersect with the primary concerns
of perceptual control theory concern questions
of whether each perceptual signal is separated out and
processed as a scalar quantity or whether there can be
multidimensional (vectoral) signalling and/or control
processes. And of course this has been an ongoing discussion
thread involving many of us on the CSGnet. On one level
it really doesn't affect the core tenets of CSG, whether
one has a scalar or a multidimensional control mechanism,
but on the other hand, it might radically simplify many
complex control problems. (I understand that any multi-
dimensional optimization problem can be decomposed into
many single-dimensional problems, but there may be good
reasons to deal with multiple signals at the same time --
we don't need to recapitulate this discussion).

As you all know I've been working on the neural representation
for the pitches of complex tones (our papers finally came out
in J. Neurophysiology in the Sept issue). What I see at the
level of the auditory nerve and cochlear nucleus is that the
interval distributions of populations of auditory neurons have
forms that are very similar to the autocorrelation function of
the stimulus (for frequencies below, say 3 kHz). It also turns
out that many, many complicated patterns of human pitch judgments
can be explained in terms of an autocorrelation-based analysis
of the stimulus waveform. It would not be a surprise then if
it turns out that we hear the pitches that we do because the
auditory system in effect is doing an autocorrelation analysis.

I don't want to quibble about whether the all-order population
interval distributions are an "autocorrelation" representation --
suffice it to say that their form is such that they could be
used in that capacity provided that there are the appropriate
central neural mechanisms available to analyze them. I think
that there very well might be such mechanisms and I'm trying
to think through how these would work, and how one would test
to see if such mechanisms can be found in the auditory CNS.

The problem of "periodicity pitch" in the auditory system
is closely related to the problem of spatial frequency
sensitivity in the visual system -- the notion of Fourier
analysis in vision, I'm told, came from the auditory system.
Re: spatial tuning and Fourier analysis of visual form,
you've got to realize that these spatial-frequency tuning curves
are gotten by drifting gratings of different spatial frequencies
across receptive fields <at a given velocity>, so what is
being measured is really a <temporal> modulation transfer
function, where visual cortical neurons discharge more
frequently at particular ranges of (luminance) modulation
frequencies (e.g. 2-4 Hz). Similar kinds of bandpass
modulation transfer functions (BPMTF) are seen in the auditory
cortex, the best modulation frequencies generally being in
the 1-16 Hz range. (It's possible that these particular
"tunings" are just reflections of the recovery properties of
cortical pyramidal neurons and their general connectivity
patterns rather than being specifically tuned as "spatial
frequency" detectors.) In the auditory system there have been
suggestions that the auditory BPMTF units might represent
pitch, but very few of them cover the periodicity ranges
where most strong pitches of complex tones lie (e.g. 100-500 Hz).
Moreover, both the auditory and visual BMTF units have
pretty broad periodicity tuning (an octave is typical), whereas
our acuities for spatial frequencies and pitch are far
better than this. When I look at what is actually going on
in (the published data on) these units, it appears to me
that the timing of spikes provides a much higher quality
representation of the time-structure of the stimulus.
(Both auditory and visual cortical neurons produce discharges
that are "locked" to their respective stimuli, and in both
cases it appears that the timing information is over an
order of magnitude better than the rate-based tuning of the
unit. If someone were to hold a gun to your head and tell you
to estimate what the periodicity of the stimulus is from the
responses of one of these units, by all means use the timing
information.)

If one thinks about what will happen in
these phase-locking visual cortical units ("simple cells")
when an image is swept across their corresponding patch
of retina, one realizes that there is a very basic space-to-
time transformation going on here and that the time patterns
of edges crossing the retinal receptive fields will mirror
the spatial patterns in the image (in the direction of motion).
If retinal units fire when edges cross receptive fields, the
spatial intervals between edges are "encoded" in the time
intervals between spikes. Essentially what one then has in the
spike trains in any given retinotopic patch of simple cells is
a running spatial autocorrelation of the image in a particular
direction. (Reichardt used these kinds of timing models to
explain motion detection in the fly, but the mainstream vision
community is very resistant to thinking about temporal coding
of visual form in any way. Models that use spatial tunings
generally assume that there is some time->place transformation
early on and that all subsequent operations are on patterns
of output rates of spatial frequency feature detectors.)

        The upshot of all of this is that one could have an
autocorrelation-like representation of the visual stimulus that
parallels in many ways those that we (believe we) see in
the auditory system. Such a system essentially carries out
many operations of Fourier analysis, but uses time
domain representations and operations. There are theories of
form and texture perception that are based on spatial
autocorrelation analyses (William Uttal's models), and they
appear (to me) to fit well with the visual psychophysics.
Analogous temporal autocorrelation mechanisms
for pitch and spatial frequency (time and
spatial intervals) might explain why
we perceive "missing fundamentals" in both audition and
in vision (DeValois' book, Spatial Vision discusses the
missing spatial fundamental in vision). (One can do
spatial frequency matching of spatial harmonics f3-f5 to the
spatial fundamental f1, just as one can match the low pitch
of tone harmonics 3-5 to a tone at the fundamental. It does
appear that spatial frequency can be used as a percept for
control.)

As far as I know there is no consensus about how the vision
community thinks about visual representations, except maybe
that there are two main camps, the local feature detector
camp and the spatial frequency camp. (The temporal autocorrelation
account that I gave above would be off the map completely.)
To the degree that the organization of the system is nonlocal,
the system is "holographic", but even connectionist models
that use local feature detectors often assume various kinds
of "distributed" higher-level representations, so these
are "holographic" in a somewhat different way (as in a
distributed memory). Because almost nobody has actually found
units whose response properties have anything like the
specificity of our percepts and actions (does anyone have
counterexamples to this blanket statement?), in a sense
almost all current theories that I know of assume some
sort of distributed representation. In its broadest sense
then, the "holographic paradigm" has been integrated into
mainstream neuroscience through connectionist models.

Whatever you read out there, be critical --
if it were all so tidy and well-understood, we'd have viable
models for these phenomena already -- and we don't. (Don't
trust the "standard wisdom" of any of these fields -- they are
almost invariably laced with unsupported dogmas
that are tacitly held -- never trust what the textbooks
tell you in terms of functional explanations without first
scrutinzing the empirical basis for these claims and
without also thinking through the possible alternatives.
Critical textbooks that communicate the difficulty
of the problems and the "cognitive dissonances" of a
given field are few and far between. I know I don't need
to say these things to the PCT community
-- you all know how arbitrary the received scientific
opinion can be........)

That notwithstanding, I very much agree with Bill Benzon that
hypotheses about neural representations can and should be
tested, and that they are potentially very useful in understanding
the nuts and bolts about how neural control systems might
actually work. And it may be that while we are reaching some
understanding of what the brain is doing, we may very well
come up with new insights that could inform perceptual control
theory. Don't hold your breath because progress is painfully
slow, but I wouldn't rule it out entirely. To paraphrase,
there is much more in nature than we have dreamt of in our theories.

Peter Cariani

[From Bill Powers (962023.0645 MDT)]

Peter Cariani (961022.1105 )--

This post was full of nice integrative thoughts, Peter. Of course I always
have a few quibbles ...

(I think) the places where issues of sensory coding and neural
information processing intersect with the primary concerns
of perceptual control theory concern questions
of whether each perceptual signal is separated out and
processed as a scalar quantity or whether there can be
multidimensional (vectoral) signalling and/or control
processes.

I have never argued against the idea that there are multiple perceptions and
control processes at any given level of organization. But consider a simple
example. Suppose we have three control systems controlling position in x, y,
and z. I would say that these are three independent dimensions of control,
but anyone could combine x, y, and z into a single vector v and say that
this combination of systems is exerting control over a perceptual position
vector. The reference vector would be made of X, Y, and Z, where the capital
letters designate reference signals. In terms of formal representations we
would then have a vector perception v being compared with a vector reference
signal V to generate a vector error, and presumably a vector output. But how
would this work in terms of the actual control systems?

Suppose you wanted to control the direction of this vector perception
without changing its magnitude. Very simple, mathematically: you just vary
the direction of the reference vector without changing its magnitude. But
how would you do that?

You would have to change the three reference signals such that X^2 + Y^2 +
Z^2 remained constant, while X, Y, and Z were computed as proportional to
the direction cosines of the desired angle(s). In other words, you would
need a higher level of system to adjust the three reference signals in just
the right relation to each other. And these three reference signals would
still enter three independent control systems, each of which controls a
single scalar variable in a single dimension.

The vector notation IN NO WAY changes the physical organization of the three
control systems. All it does is hide the fact that the three independent
systems exist.

The question of vector versus scalar control is a non-issue. When we are
using vector (or matrix) notation, we are simply using a shorthand way of
describing a complex set of equations in which _every variable is a scalar_.
When you try to _apply_ a model cast in vector or matrix terms, you find
that the model has to perform every implied operation in complete detail --
particularly if you build it as a physical model. The real world operates
only in terms of scalars, quantities that have a single value at a single
moment.

This problem is closely related to the idea of levels of perception. The
three position signals x, y, and z are just that: three signals that exist
independently of each other. When we think in terms of a position _vector_,
we're implicitly introducing a higher level of perception: a perception of
magnitude, and two more perceptions of angle. There are many ways to
perceive in vector space, but if we percieve in terms of a magnitude and two
angles, we will have three potentially controllable variables. Magnitude
could be perceived by adding the squares of the three position signals from
the lower level. This sum is itself a scalar value, and could be compared
with a single scalar reference magnitude to yield an error signal. The error
signal could be connected to the X, Y, and Z reference inputs to control
magnitude. The connection might be rather complex, since the signs of the
effects have to change according to the combined signs of x, y, and z. This
would give us a higher level of control which does in fact control in terms
of magnitude (leaving direction uncontrolled).

In a similar way we could derive two perceptions of angle, each represented
by a scalar signal, and generate two more control systems which add their
outputs to the lower reference inputs so as to bring the two angles under
independent control.

If we did this, you would find that the result is exactly the same result
expressed in vector or matrix notation -- except that every operation hidden
within the mathematical symbols is spelled out in full detail. The physical
system must ACTUALLY DO what the mathematical notation only IMPLIES.

So as I say, it's a non-issue. Whether you write out the equations and
represent every individual relationship explicitly or use a compact
notation, you're talking about exactly the same physical system. There's no
difference.

The main thing to notice here is the need for the higher level systems. If
you just have the three independent signals x, y, and z, you don't have
either a magnitude or a direction, except in your own perceptions. In the
system you're looking at, you simply have three variables, any of which can
vary independently of the others. These variables coexist at the same level,
but without the higher level they are not coordinated, they do not have any
relation to each other. THIS IS TRUE EVEN IF THE PERCEPTUAL AXES ARE NOT
ORTHOGONAL. As long as you have three non-collinear axes, any point in the
space is reachable. Constraints don't appear until you have higher-level
systems that perceive in terms of specific functions of the three variables.

Best,

Bill P.

Bill Powers (961021.1705 ) sez:

Bill Benzon --

The fact
that a neuroscientist sees the stimulus pattern as a line or an edge
doesn't mean that that is what the nervous system is sensitive two.

Then we've got a problem, Houston. How do you account for the fact that the
neuroscientist sees a line or an edge, if his nervous system isn't sensitive
SOMEWHERE within it to lines or edges? ESP?

"Line" and "edge" are simply designators the neuroscientist applies to the
stimuli s/he's presenting to the monkey (that also has electrodes implanted
in its brain). Having seem pictures of those stimuli I think those
designators are quite resonable, and a lot more direct than doing a Fourier
transform of those stimuli and using that transform as the designator.
Same with "salt" and "sodium chloride." If you dip your finger in a white
crystalline substance, taste it, and conclude that it's salt, I'll belive
you. But if I'm feeling pedantic I may want to inform you that it's really
sodium chloride (plus trace impurities). It seems to me you made a similar
distinction last week in discussing "fear of strangers" among infants where
you suggested that, what the experimenter designated as a stranger, was
really just a person from whom the infant had no stored reference level.

The important point is that when "line" and "edge" are turned into
technical terms by following them with "detector" there is a strong
tendency to think about object recognition in a certain way -- syntactic
concatenation of simple and complex features using spatial (left, right,
up, down, etc.) and logical (and, or, not) operators (where a complex
feature is simply an object which already is such a concatenation). In
contrast, those of us who think of those small lines and edges as
high-frequency components of an image have a very different way of thinking
about object recognition (in the case of the experimental stimuli, if you
think of the stimulus image in relation to the visual field, then it is a
stimulus consisting exclusively of high-frequency components). Then Peter
Cariani informs us that there is an intermediate possibility (I'm reminded
of Tycho Brahe's solor system model in relation to the Copernica and
Ptolemaic models) where features are "gathered" into object schemas through
neural nets rather than syntactic construction.

So, imagine the following thought experiment: We have all the Thinking
Machines we want, and a staff of programmers who like nothing better than
putting in 100-hour weeks programming neural simulations. Let's build
three. Hubel recognizes images with a pure feature detection/syntactic
combination model. Wiesel recognizes images with a pure spatial frequency
scheme. And Husel employs a hybrid scheme. We present each of these
simulations with the stimuli H&W originally used with their monkeys and ask
them what they see (yep, we've also got natural langauge in these
simulations). Each one of them replies, "lines and edges." I'm not
surprised. The point is that there is no relation between how these
simulations verbally characterize the stimulus images and the process which
they employ in simulating vision & recognition.

I'll take this little fantasy a step further. David and Hays and I have
developed a theory of abstract conceptualization which sees abstract
concepts as existing on various levels (which we call ranks), with a
different cognitive mechanism employed at each of these levels. On order
to get a simulation to verbally describe those stimulus objects as I have
been doing it will be necessary to add a level of abstract
conceptualization beyond that which is sufficient for the line/edge model.

Rick Marken (961021.2200) sez:

Bill Benzon said:

The "as if" they did a spatial frequency analysis is quite a bit
different from the "as if" in which we start with edge detectors,
and concatenate edges into lines, lines into simple figures, simple
figures into faces and monkeys and trees and mountains, etc.

Which are both quite a bit different from the "as if" in which there
are several hierarchically related _classes_ of perception, all levels
existing at the same time, with higher level classes existing only as a
function of lower level classes of perception: sensation perceptions
existing only if there are intensity perceptions; configuration perceptions
existing only if there are intensity and sensation perceptions, etc.

Let's invoke the distinction BP has made between what HPCT is modelling,
which I'll call functional performance, and what interested me, which is
the nature of the neural mechanisms which achieve that performance. So
HPCT is a model of functional performance while feature detection and
spatial frequecy analysis are two different accounts of how the functional
performance is achieved. "Pure" HPCT is indifferent to which of these two
classes of model is doing the job.

So I don't see it as a three-way choice, as your comment seems to imply.
To see it as a three-way choice is to make what philosophers call a
category mistake. Functional performance models are a different category
of model from neural computational model.

However, at the moment I sense default HPCT bias toward the feature
detection class of implementation schemes. At the very least that bias has
to be made explicit and differentiated from viable alternatives.

One ought to be able to do tests which differentiate between these two
general strategies without getting hung up on the exact details.
We'll get to those laters

These "general strategies" (spatial frequency analysis, feature detection,
hierarchical perceptual construction, etc) are models of possible mechanisms
of perception. The "exact details" of how these models work are the neural
processes that actually produce these perceptions.

It seems to me that you've been doggedly insisting that a major shortcoming
of PCT is that its perceptual model is a "black box"; PCT does not include
a detailed model of the neural processes that produce perceptions. Now you
are apparently insisting that we can understand perception without getting
hung up on the exact details of neural processing -- which is exactly the
approach to perception taken by PCT.

As James Dean might have said as he entered looking glass world:
"I'm so confused!".

These two classes of models are so very different that they lead to quite
different simulations and artificial systems and to different expectations
about what to look for in the brain. The feature detection model leads you
to look for neurons whose output signal "means" grandmother, little yellow
VW, the old oak tree, etc. The spatial frequency model leads you to think
that recognition signals will be carried by a population of neurons with
little likelihood that single neurons will signal definite objects. Then
we have the hybrid model Peter Cariani brought up. Discrimintating between
it and the pure spatial frequency model may be difficult (I haven't given
it much thought).

Obviously, human behavior is a very complicated affair. No doubt we need
models within models, where choices among one class of models are logically
independent of choices among another class of models. In fact, we could
take the functional performance vs. implementation mechanism distinction
and apply it to feature detection schemes vs. spatial frequence schemes,
vs. hybrid schemes vs. the scheme which Peter said was way off everyone
else's map (which gives it a special place in my heart even though I don't
understand it). Let's just treat each of those 4 alternatives as a
specification of the performance of some mechanism. We can now inquire as
to the possible implementation alternatives for each of them. And so on,
perhaps down to quarks and leptons...and beyond. (Note that "neural nets"
is now clearly a general class of adaptive model which, however it may have
been inspired by real nervous systems, has an independent life. Thus we
can consider a neural net as a performance spec in a particular context and
an implementation model in another.)

So, at the "top" level of our epistemological choice tree we've got the
phenomena of human behavior. We have at least two functional models for
it, behaviorism & HPCT. Corresponding to each we have various possible
implementation models. It turns out that, at least on the HPCT side, the
implementation possibilities are so rich and so "far" from neural
microstructure and process that we have to make the function/implementation
distinction at least one more time to bring out thinking within range of
neural reality.

It just doesn't get any simpler as we go along.

later,

Bill B

···

********************************************************
William L. Benzon 518.272.4733
161 2nd Street bbenzon@global2000.net
Troy, NY 12180 http://www.newsavanna.com/wlb/
USA
********************************************************
What color would you be if you didn't know what you was?
That's what color I am.
********************************************************

[From Bill Powers (961023.1015 MDT)]

Bill Benzon --

"Line" and "edge" are simply designators the neuroscientist applies to the
stimuli s/he's presenting to the monkey (that also has electrodes implanted
in its brain). Having seem pictures of those stimuli I think those
designators are quite resonable ...

I think you misunderstand me. I'm not asking about different kinds of
models. I'm just asking about the way we experience the world. You can
evidently see a "line" or an "edge," because you can see pictures of "those
stimuli." But when you look around in a room, don't you see any objects, or
any edges where those objects end and something else begins, or any lines
like this __________________________________, or curves, and so on? These
aren't specially-prepared and formally-named "stimuli." They're just the way
the world looks -- so far, to everyone I've asked. Is it different for you?

Best,

Bill P.

[From Bill Powers (961023.1015 MDT)]

Bill Benzon --

"Line" and "edge" are simply designators the neuroscientist applies to the
stimuli s/he's presenting to the monkey (that also has electrodes implanted
in its brain). Having seem pictures of those stimuli I think those
designators are quite resonable ...

I think you misunderstand me. I'm not asking about different kinds of
models. I'm just asking about the way we experience the world. You can
evidently see a "line" or an "edge," because you can see pictures of "those
stimuli." But when you look around in a room, don't you see any objects, or
any edges where those objects end and something else begins, or any lines
like this __________________________________, or curves, and so on? These
aren't specially-prepared and formally-named "stimuli." They're just the way
the world looks -- so far, to everyone I've asked. Is it different for you?

No, it is not. But what of it? As you know, lots of attention has been
given to just how one can see lines and edges and objects and such, how a
perceptual system can construct such things out of the flow of physical
energies which impinge on the sensors. I'm interested in what kind of
account we need to explain why/how we experience the world we do given the
energies available to our sensors.

But, in what sense are you interested in "the way we experience the world."
Do you want to do some Husserlian phenomenology based on introspection? Do
you want to write poetry? Are you interested in getting some definitions
clear as a prelude to a technical discussion?

I don't understand the thrust of your initial question. Hubel & Weisel see
lines and edges, you do, I do. What follows from that? Certainly we don't
see Fourier transforms of visual objects [unless you're looking at the
appropriate technical articles -- a number of years ago Byte did an article
about image processing art and showed pictures of the "Mona Lisa" in
various stages of processing (I think) along with images of the transforms
-- I found it fascinating to speculate about what aspect of the image was
represented by this or that point in the transform]. But the fact that we
don't see transforms doesn't mean the nervous system isn't computing them
and basing, e.g. recognition on them.

* * * * * *

A long time ago I found my way to the work of JJ Gibson, who was presented
to me as some kind of avant-garde perceptual psychologist what with his
ecological approach and all. I found his work fascinating & was especially
taken by his book on visual perception. Now, Gibson inveighed mightly
against the cognitivists with all their information processing, insisting
that it was all there in the perceptual flow, that the visual system just
naturally picked up the affordances the environment offered and that was
perception.

As I read this I realized that, in some sense, he was coming out of the old
behaviorist bag, that it's all there in the environment & nothing is going
on in the nervous system which we need to concern ourselves about. Now, I
do have some sympathy with some of his inveighing, but still I have to
think that the sensors and the nervous system do something and we ought to
think about it. He simply had no account of those sensors.

You certainly don't seem to be a closet Gibsonian; for example, your talk
of scaler quantities is more than a true-blue Gibsonian would tolerate --
too much like "information processing." . . . Have no idea where this is
going, it just occurred to me...

···

********************************************************
William L. Benzon 518.272.4733
161 2nd Street bbenzon@global2000.net
Troy, NY 12180 http://www.newsavanna.com/wlb/
USA
********************************************************
What color would you be if you didn't know what you was?
That's what color I am.
********************************************************

[Martin Taylor 961024 15:50]

Bill Benzon sometime or other

Certainly we don't
see Fourier transforms of visual objects [unless you're looking at the
appropriate technical articles -- a number of years ago Byte did an article
about image processing art and showed pictures of the "Mona Lisa" in
various stages of processing (I think) along with images of the transforms
-- I found it fascinating to speculate about what aspect of the image was
represented by this or that point in the transform].

Remember that a Fourier transform involves both phase and amplitude. What
you ordinarily see represented as a Fourier transform is usually just the
amplitude component. I suppose this is so because for audition the phase
is so much less important than the amplitude. But in vision, it is the
phase that is important, and that's usually not shown in the journal
pictures of Fourier transforms of scenes.

Many years ago, I was treated to a demonstration of phase and amplitude effects
in Fourier reconstruction of normal (i.e. grey-scale) scenes, by Jan-Olof
Eklundh (Swedish Royal Institute of Technology; more recently joe@bion.kth.se).

What Eklundh had done, among other things, was to do a Fourier analysis of
the scene, and then "lose" various parts of the transform by substituting
random numbers or flat (equal) values for the components in question. When
he did that with the phase components, the reconstructed image was barely,
if at all, recognizable as related to the original. I remember it as being
largely noise. On the other hand, if the amplitude components were all
equalized or replaced by random numbers (the latter is better), the
reconstructed image looked like a cartoon of the original. The grey values
were lost, but the shapes of objects were very clear, more or less in
outline.

Now, if you think how a nervous system would compute a Fourier transform,
at least of a visual scene, you can see that the phase affects _which_
neurons have strong outputs and the amplitude affects _by how much_ the
strong ones exceed the weak. This is quite different from what happens in
audition, where the physical waveform progresses over time. If I remember
Eklundh's estimate, in vision about 80% of the information is carried in
the transform phase, and only 20% in the transform amplitude. This seems
to make sense if the transform actually is used as such by the nervous
system, because phase is computed directly, whereas amplitude can be
determined only in terms of the difference in output across phases of
neurons sensitive to the same spatial frequency. Differences are always
noisier than the values compared.

Anyway, I'm mystified as to why this discussion has so much emphasis on
_Fourier_ transforms, as if they were the only ones to be considered. At
the very least, the transform has to be localized, so you are talking
about windowed transforms. And "lines" and "edges" are very close to being
localized cosine and sine components of narrowly windowed transforms.

The nervous system is very good at doing these kinds of transform--it's
almost as if that's what it was designed for. By transform, I mean the
rotation of an input space according to some orthogonal set of basis vectors.
Lines and edges may form an (incomplete) basis set, just as windowed
Fourier transforms may do. And there's no reason to dismiss the possibility
that lots of vectors are available, the set used preferentially at
any moment being selected according to the incoming data (I showed how
lateral inhibition might have this function, some 20+ years ago, and it
is not at all unreasonable to guess that these "mysterious" corticofugal
signals might have some such function). Maybe the visual system does
wider-windowed Fourier transforms on repetitive visual patterns, and
lines and edges on hard-edged objects, at the same time.

Anyway, my bottom line is that I can't see what the fuss is about, in
arguments as to whether the visual system at its lower levels deals in
lines and edges, or in Fourier transforms. The difference is largely in
the phase, and with the windowing that must be there, that makes the
distinction one of the analyst's interpretation more than of what signals
come from what neurons when exposed to what visual patterns.

"What matter do you read, My Lord?" "Words, words, words."

Martin

Martin Taylor 961024 15:50 sez:

Anyway, my bottom line is that I can't see what the fuss is about, in
arguments as to whether the visual system at its lower levels deals in
lines and edges, or in Fourier transforms. The difference is largely in
the phase, and with the windowing that must be there, that makes the
distinction one of the analyst's interpretation more than of what signals
come from what neurons when exposed to what visual patterns.

Fine with me. But as I've said before, the line of thinking which begins
with edges and lines tends to think, for example, that basic recognition of
a face involves some syntactive combination over a nose, two eyes, a mouth,
a jaw, hair, ears, and so forth. Whereas the line of thinking which begins
with spatial frequency components gets you your face recognition more or
less in a single (massively parallel) computational step without some
syntactic combination of components (though that certainly has to be added
to the model, but not for basic recognition; that's an elaboration).

I'm not at all wedded to Fourier transforms. I'm wedded to a general way
of thinking about how a (real) neural net recognizes things.

* * * * *

Now, imagine a scene where we have a bunch of edges, all of them more or
less optically identical. However, there is one set of edges which
separates an object from the background. It would be useful to have system
which enhances that edge more than others; that would be a useful thing for
corticofugal connections to the retina to be doing (where the output is
being driven by a stored schema for the foreground object). Any reaction?

···

********************************************************
William L. Benzon 518.272.4733
161 2nd Street bbenzon@global2000.net
Troy, NY 12180 http://www.newsavanna.com/wlb/
USA
********************************************************
What color would you be if you didn't know what you was?
That's what color I am.
********************************************************

[From Bruce Abbott (961024.1735 EST)]

Bill Benzon (24 Oct 1996 18:12) --

Now, imagine a scene where we have a bunch of edges, all of them more or
less optically identical. However, there is one set of edges which
separates an object from the background. It would be useful to have system
which enhances that edge more than others; that would be a useful thing for
corticofugal connections to the retina to be doing (where the output is
being driven by a stored schema for the foreground object). Any reaction?

Sounds reasonable -- any evidence for it?

I'm reminded of the old hi-contrast black-and-white photo showing what
appears at first to be rather shapless blobs, but which turns out to be a
picture of a cow. Once you see that cow, you can't NOT see the cow.
Something has changed, but what?

Regards,

Bruce

Bruce Abbott (961024.1735 EST) sez:

Bill Benzon (24 Oct 1996 18:12) --

Now, imagine a scene where we have a bunch of edges, all of them more or
less optically identical. However, there is one set of edges which
separates an object from the background. It would be useful to have system
which enhances that edge more than others; that would be a useful thing for
corticofugal connections to the retina to be doing (where the output is
being driven by a stored schema for the foreground object). Any reaction?

Sounds reasonable -- any evidence for it?

Could be, but I'm not aware of any. I was just making a suggestion in line
with what I would do if I were designing a nervous system from the ground
up.

I'm reminded of the old hi-contrast black-and-white photo showing what
appears at first to be rather shapless blobs, but which turns out to be a
picture of a cow. Once you see that cow, you can't NOT see the cow.
Something has changed, but what?

Well, remember my story about the first time I saw a porno flick, but had
no verbal clues about the nature of the film? It appeared to be moving
grey blobs for some number of seconds. Then, all of a sudden...there they
were.

In the case of the "hidden" cow we need go no higher than configuration,
while the porno flick involves sequences (and probably something higher to
extract the "causal" (as in the type of perceptual causality identified by
Michotte) relationship between the two moving bodies).

In both cases, the input was a blurr until stored patterns were "triggered"
and thus able to "make sense of" the input. Those stored patterns then
become the reference signals which propagate down the stack, ultimately, it
would seem, to the retina itself.

···

********************************************************
William L. Benzon 518.272.4733
161 2nd Street bbenzon@global2000.net
Troy, NY 12180 Account Suspended
USA
********************************************************
What color would you be if you didn't know what you was?
That's what color I am.
********************************************************

Peter Cariani (961022.1105

Comments re Powers (multi-D control), Taylor (phase information)

[From Bill Powers (962023.0645 MDT) on multidimensional control]

I think I said originally that I thought that multidimensional
optimization problems can usually be decomposed into a set of
1-D problems (maybe there are some bizarre functions in which
one cannot do such a decomposition, but I don't think that
this is likely to be the case here). I agree with you, that it's a
non-issue on a basic level of control theory; where there is
some question, I think, would concern how individual perceptual
signals are processed in the brain -- whether they are separated
out, processed separately, and recombined (in localized processing
modules that operate on scalars) or whether the signals are
embedded in different aspects of spike trains and population
discharge patterns, such that the signals are always mixed together
such that multiple aspects of the higher-D signal would be processed
in the same structure. I'll try to think of a good simple, concrete
example.......

A major motivation for looking at higher-dimensional processing is
that organizing an information processing system around scalar signals
requires some other means of keeping track of each signal-type (e.g.
is the signal a reference signal for X-position? is it an intensity
signal for brightness? is it a pitch signal?). In many of our devices
we simply use different wires for different signal-types, and we make
sure that the device is wired up properly so that the "semantics" of
the various signals are kept straight. (For other devices, like those
ubiquitous radio-controlled toy cars, there are frequency-based signals
that multiplex different signals (turn R/L, move forward/back) together
and then these signals are demultiplexed and then dealt with in the
usual way, with wires. This is fine for relatively simple systems where
one knows in advance what the signals will be and there aren't too many
signal-types to contend with. But I think the problem of "specific wiring"
gets much more complicated in the brain, adaptive adjustment of connections
notwithstanding, because of the high variety of signal-types involved. If,
on the other hand, one can neurally represent multidimensional signals,
then one aspect of the signal can convey the signal-type (its "semantics",
what kind of signal it is) and another can convey its magnitude (scalar).
If one can do this, then one can send signals which contain their own
signal-type, so that signals of compatible types (e.g. a sensory input
signal and its corresponding reference signal) can be processed together
and signals of incompatible types (say, coding for pitch & color) can be
kept separate (unlike in a scalar connectionist system, the signals do
not have to interact destructively). If signals contain their own signal-type,
then broadcast strategies for getting signals to various places can be
used, such that particular "labelled lines" are not needed. It then doesn't
matter which exact lines the signals are coming in on.

Keeping the variables straight in a perceptual control model isn't difficult
because we have symbols and symbol-manipulating systems that do most of the
work for us, but handling the myriad of signal-types in the brain is a
nightmare if one has to do it with specific point-to-point connections, not
to mention the problems of getting the signals all to the right places at
the right times (so they can interact). If one can have multidimensional
signalling (e.g. temporal codes), then this terrible global "wiring problem" is
greatly ameliorated. One can then use adaptive wiring mechanisms within
neural assemblies and populations to sort out the different signal types
(by choosing particular sets of delays and/or weights on connections),
rather than to solve the combinatorially-worse problem of routing all the
signals to where they need to go. So this is a global broadcast system
with local connectionist (time-delay network) processing

I know that PCT and many neuroscientists assume that a hierarchy of
signal processors can solve this problem by reducing the number
of signal-types that are used at any given processing step.
Maybe the worst case for these kinds of theories is in semantic
processing in language, where words and phrases can activate potentially
many different kinds of semantic linkages, and the brain has to decide
one way or another which ones are relevant. How can a hierarchical
processing scheme possibly deal with this?

I'll try to be more concrete about a model for how this could all be put
together; I'll let you all know when I have a good demonstration ready.
(It will take a long time........)

···

--------------------------------------------------------------------------
[Martin Taylor 961024 15:50] said:

Remember that a Fourier transform involves both phase and amplitude. What
you ordinarily see represented as a Fourier transform is usually just the
amplitude component. I suppose this is so because for audition the phase
is so much less important than the amplitude. But in vision, it is the
phase that is important, and that's usually not shown in the journal
pictures of Fourier transforms of scenes.

In the running autocorrelation scanning representation that I outlined
earlier in the week, phase information is preserved in the relative
times-of-arrivals of spikes across retinotopic channels.
Exactly when a given spike is produced in a channel (latency)
tells you when a particular feature, such as an edge,
crossed the receptive field. The time pattern of spikes in each channel
(all-order interspike interval distribution) is the spatial autocorrelation
in the direction of motion. So in a running autocorrelation, phase is
preserved on the time axis, frequency on the delay axis.

While it is true that phase does not matter much in audition for stationary
sounds (pitch, timbre), I think that this does not hold for nonstationary sounds.
Arguably, when phase relations are kept constant, the auditory system groups
the whole pattern together, whereas when these relations are constantly
changing, the auditory system separates their constituents. An example is
a pair of vowel formant harmonics (e.g. F1=700 Hz, F2 = 1500 Hz)
in which the harmonics have either the same or different fundamental
frequencies (F0 = 100 vs 110)). When they have the same FO they fuse into
one natural-sounding 2-formant vowel sound, when they have different F0's,
it sounds like there are two very artificial objects. In the first case
phase relations between the two sets of harmonics are preserved and the
global time pattern is the same from F0-cycle to cycle; in the second
phase relations between the two sets of harmonics are changing (every 10 ms)
and there are two constant global time patterns that correspond to each
set of harmonics.

Phase relations for lower-frequencies are also important, as for speech sounds.
J.C.R. Licklider in the 1940's studied the intelligibility of speech that was
1) peak clipped (preserving zero-crossings while degrading frequency information)
and 2) center-clipped (destroying zero-crossing information). Peak clipping
hardly affects intelligibility, while center-clipping very rapidly degrades it.
Jan-Olof Eklundh's experiments that you outlined seem very reminiscent of
Licklider's.

I absolutely agree that sinusoids need not be the basis of all of these
representations, that there are other kinds of short-time representations
that will do just as well. (Auto- and cross-correlation mechanisms do not
automatically presuppose any particular set of basis functions.)

I do think that there are basic commonalities between auditory and visual
computation, but I think the analogy runs the other way than is commonly
supposed -- that most of the information is in the time domain, and most of
the "higher level" operations concerning scene analysis (fusion/binding,
scene segmentation) fall out of "low level" properties of time domain
representations. I think that the basic "bottom-up" design of
biological sensory systems is much more simple, elegant, flexible,
and powerful than the top-down design strategies that we have
generally incorporated into our machines thus far.

Peter Cariani
peter@epl.meei.harvard.edu