REFERENCE:
VISUAL INPUT FUNCTIONS:
[From Erling Jorgensen (2000.02.13.0130 CST)]
Dan Palmer (2000.2.12.1658)
My understanding of PCT is that "the visual image" does get sent inward in the
form of first-order intensity signals which are combined into various
higher-level "representations" or signals and compared with similarly packaged
reference signals.
This reply is not about ecological psychology, and the specifics of these
comments may or may not interest you, Dan. But I wanted to point out a
reference which suggests both a working model and some of the transfer
functions that might be involved physiologically in the visual system.
Last summer I was reading a book by Vadim D. Glezer, a Russian
researcher, called _Vision and Mind: Modeling Mental Functions_.
(Mahwah, NJ: Lawrence Erlbaum Associates, 1995). It was heavy going,
and many times I could only follow the main contours of his research,
not all the details. But he seemed to be deriving approximations of
visual input functions, using them in working models, and comparing
the results to experimental data.
In Hierarchical PCT we have Bill's proposals about different classes
of perceptions. But except for the lowest levels, and occasional other
suggestions (such as Martin's flip-flop mechanism of categories, or Rick's
logical functions), we have very few specific proposals about how to
model perceptual input functions (pif's) corresponding to recognizable
perceptions. I know I have wondered, ever since running across a
schematic proposal by Warren McCullough years ago, about how
"configurations" could be simulated. More recently Bill has posted
(2000.01.24.1036 MST) about acceleration, velocity, and position,
and an image sharpening procedure that uses "optically analogous
quantities" to those integrated perceptions, (although I have a hard
time visualizing what those correspond to).
As for Glezer, I find his math beyond me, but it is intriguing that he
suggests actual weighting functions for receptive fields in the retina
(p.4), lateral geniculate nucleus (p.13), and striate cortex (p.28f.), and
an algorithm for texture segregation in the peristriate and prestriate
cortices (p.127f.). He also ties these areas to different types of
visual processing. I believe he even presents a comparison process
for discrimination, with equations providing a best fit to experimental
data (p.108), that sounds somewhat like a control loop. He also
sums up his findings in an extended series of equations, defining "the
algorithm of invariant image coding" (pp. 219-221), which from the
verbal descriptions sounds remarkably like what we might call a
configuration.
I certainly do not think Glezer's level of detail is necessary for doing PCT
modeling of visual processes. Weighted sums, for instance, can define a
generic "sensation," which is sufficient for examining some elementary
forms of control. In the same vein, there are a _variety_ of algorithms for
simulating color on a computer, for example; it certainly need not be the
precise way the human nervous system implements it. However, if
someone wanted to make a close copy of what the visual system is doing,
and fit it into an elaborate control model of a person, Glezer's work could
be a valuable resource.
Partially for the sake of the CSGNet Archives, and partially to guide and
wet the appetite of anyone who would like to pursue it, I have reproduced
below (with occasional insertions) quite a number of excerpts from Glezer's
book. The book is not exactly for the faint of heart, but I think it does
provide some mathematical and physiological bulk to the perceptions we
frequently talk about.
···
------------------------------------------------------------------
Excerpts [with comments and clarifications in brackets] from --
Glezer, V.D. (1995). _Vision and Mind: Modeling Mental Functions_.
Mahwah, NJ: Lawrence Erlbaum Associates.
[One of Glezer's chief goals is to present a model of how modules in the
visual cortex work, built up from the organization and structure at lower
levels.]
"The neurons of the retina and the LGN [lateral geniculate nucleus] measure
the integrated light energy in the central summation zone of their receptive
fields. The existence of an inhibitory periphery causes the neurons at this
level to carry out a series of preliminary processing operations: separating
the signal from surrounding noise, emphasizing contours and high spatial
frequencies, and producing spatial and temporal decorrelations of the image.
... Because of the appearance of separate on- and off-systems, a foundation
is created for the transition to local spectral analysis at the next level,
in the striate cortex." (pp. 1-2)
"Because of the algebraic summation of the spatially organized excitatory
and inhibitory process, the resulting profile has the form of _difference
of Gaussian_ (DOG). ... What operations are performed by such a retinal
construction? One of the operations the retina performs is retinal
adaptation. The purpose of adaptation is to keep the visual response the
same when the level of illumination changes; in other words, adaptation
allows invariant description, despite changes in illumination." (p. 4)
[This sounds like sensation being created through weighted sums, which
can be kept constant despite changes in intensities (i.e., illumination).
Glezer goes on to describe figure-ground distinctions (creation of
configurations?), which are derived from transformations of sensations
such as texture, depth, and color.]
"The RFs [receptive fields] at this level [the striate cortex] of the visual
system form a two-dimensional lattice of spatial-frequency filters. ...this
operation may be defined as a segregation of the figure from the background.
This operation is performed by nonlinear cells, which measure piecewise
power spectra and extract the areas filled in by homogenous texture; by
directional cells, which extract the areas composed of elements moving in
the same direction; by binocular cells, which extract the areas lying at
different depths; and by color cells." (pp. 15-16)
[For those who purchase the book, helpful schematic diagrams of
receptive fields at various levels occur on pages: 2, 12, 27, 48, 52,
76, 94, and 167.]
[Glezer next describes the columnar modules of the visual cortex.]
"Columnar organization is important at higher levels of the visual system.
It allows for the combining of two antagonistic types of description:
retinotopical (point to point) and spectral (distributed). Columnar
organization creates the foundation for local spectral analysis in the
cortex." (p. 61)
"the module is a device that performs a Fourier analysis of the image...
which I call the model of modules." (p. 64)
"The weighting functions of the cells of the module form a harmonic
[logarithmic] series: 1, 1.41, 2, 2.83, 4. The weighting functions of
each harmonic are out of phase 0, 90, 180, and 270 [degrees]. The
modules of different size form a lattice that subserves the piecewise
Fourier analysis of images of different size and position in the field of
vision." (p. 88)
"The modules represented in Fig. 3.14d as columns actually provide
identical spectral descriptions of an object, irrespective of its position
and size (i.e., of its localization in three-dimensional space)." (p. 86)
[This sounds like configurations which do not change despite changes
in their constituent perceptions. At a later point, Glezer describes the
process this way -- ]
"A special meaning is attributed to one part, the figure, which we refer
to as being extracted from the background. ... We describe these as
areas of identical brightness, color, or texture; I refer to these as
_subimages_. In this term, I also include areas composed of elements
moving in the same direction at identical speed, or, in the case of
binocular vision, areas composed of elements with identical binocular
disparity. The process of extracting subimages is referred to as _primary
segmentation_." (p. 118)
[Glezer suggests that the harmonics in the Fourier analysis are behind
such abilities as detection and recognition.]
"It is important to stress that narrowly tuned filters appear in the case
of recognition. ... The 1st harmonic with a broad bandwidth is enough
for detection. The narrowly tuned harmonics are needed for recognition.
... Ginsburg (1976) studied the effect of filtering high frequencies on
the recognition of individual faces, and showed that 3 to 4 harmonics
per image are enough for image recognition." (p. 102)
"I want to stress that recognition is performed by harmonics; that is,
by relative frequencies, not by absolute frequencies. We can observe
an image from a distance and still recognize the object if 4 to 5
harmonics are kept (assuming, of course, that these frequencies can
be seen at this distance)." (p. 104)
[A later summary characterizes it this way -- ]
"The foundation for the appearance of some of the simplest types of
invariance are created by the organization of the modules of the
striate and prestriate cortices. The modules are chosen according
to the size and position of the image on the retina, and produce
identical patterns of excitation in the cells of the module, regardless
of the size and distance of the object, because the harmonic composition
of the modules is identical. The spectrum of the image is invariant with
transformations of size, distance, and position, but the number of the
module gives information about these changes." (p. 146)
[Glezer also tries to treat how such information is used by higher visual
processes, although not with the same detail and quantification of the
earlier processes. It is interesting that describing relationships and
classifying (which also appear in the HPCT taxonomy) are two of
the processes he deals with. A sample of this treatment follows.]
"perceiving a complex figure as a whole and describing the details
and spatial relations between them are performed by different
mechanisms. ... According to clinical data, these mechanisms are
localized in different areas of the visual cortex. The classification
mechanism is located in ITC [inferotemporal cortex], and the
mechanism for the description of spatial relationships is located in
the PPC [posterior parietal cortex]. In man, a lesion in the ITC
causes visual object agnosia (i.e., nonrecognition of objects...),
and a lesion in the PPC causes spatial agnosia." (p. 145)
"Patients suffering from object agnosia caused by lesions of the
ITC cannot recognize an object presented to them. They do not
recognize a pen or a comb, but describe them as long narrow
objects...; that is, they perceive their spatial characteristics."
(p. 152-3)
[Finally, for those with the requisite mathematical background,
a precise and elaborate "algorithm of invariant image coding,"
with equations for each step from the retina to his model of a
module in the striate cortex, is presented by Glezer on pages
219-221. I believe his "image code, invariant to position and
size (position in depth)" (p.221) is closest to what we in CSG
might mean by the term visual configuration.]
All the best to anyone who might want to pursue this further!
Erling Jorgensen