error as memory address

[From Bruce Nevin (2003.05.27 11:48 EDT)]

Bill Powers (2003.05.24.0i843 MDT)--

The error goes to a
steady value just large enough that, when amplified and turned into action,
it is just sufficient to maintain the error at that size

I know this is the standard view.

What of error signals as associative memory addresses, and memory as the proximal source of reference signals? (B:CP pp. 217f) If that concept is still part of the model, I don't understand how the magnitude of the error signal is relevant. The kind of signal (location, context) determines what memory signals are addressed.

Maybe you replied to my earlier query about this and I missed it. There's certainly been a lot of traffic on the net of late.

         /Bruce Nevin

[From Bill Powers (2003.05.27.1032 MDT)]

Bruce Nevin (2003.05.27 11:48 EDT)--

>What of error signals as associative memory addresses, and memory as the
>proximal source of reference signals? (B:CP pp. 217f) If that concept is
>still part of the model, I don't understand how the magnitude of the error
>signal is relevant. The kind of signal (location, context) determines what
>memory signals are addressed.

You bring up a critical difficulty with the idea of memory addressing being
done by analog error signals. An analog (or continuous-variable) control
system _must_ have a continuously variable error signal in order to work at
all. And such systems should also receive continuously variable reference
signals, though they would work with piecewise constant reference signals
(a set of discrete reference signal values).

I think we have to conclude that communication between levels of analog
control systems does not use memory as an intermedary. Your point taken,
and it calls for a revision of the memory aspects of the model (even though
they are still only proposed parts of the model until we can think of
experiments to test them).

To the extent that higher systems use discrete variables, the
memory-addressing scheme can work because all that must be specified is
which lower-level reference signal is to be employed, not how much of it.

On the way up the hierarchy, there must be a level where sets of analog
variables are transformed by perceptual input functions into discrete
variables, for example by forming classes or categories and naming them.
Perhaps we can now see that a corresponding conversion in the other
direction has to take place at the same level, whatever it is. I can think
of one simple example: setting the flame on a gas burner of a stove. After
some experience, I have found that once the water for the eggs boils, I
have to set the control to indicate 4 (out of 10) to maintain the boil. The
control turns continuously with no detents, so I have to turn the symbolic
memory "about 4" into a specific appearance of the control knob in relation
to the index mark, to use as a reference against which to compare the
actual appearance of the knob. The continuous analog system that controls
the knob rotation then brings the visual look of the knob into congruence
with the reference condition.

Going the other way, when I experienced the analog rotation of the knob
that maintained the water at the right degree of boiling, I thought "about
4", which was my discrete-variable symbolic representation of the
continuously-variable visual picture. I remembered not that visual picture
but the phrase "about 4". When I get to the point in the sequence of
perceptions for boiling eggs where the flame should be lowered, the phase
"about 4" comes up and, presumably by addressing specific analog
recordings, produces the specific analog reference setting needed -- even
if my attention is elsewhere.

But from there on down to generating the required signals from my Golgi
tendon receptors, the levels communicate by passing analog signals each way.

So how does the story strike you?

Best,

Bill P.
]

[From Bruce Nevin (2003.05.27 16:28 EDT)]

Bill Powers (2003.05.27.1032 MDT)--

An analog (or continuous-variable) control
system _must_ have a continuously variable error signal in order to work at
all. And such systems should also receive continuously variable reference
signals, though they would work with piecewise constant reference signals
(a set of discrete reference signal values).

I think we have to conclude that communication between levels of analog
control systems does not use memory as an intermediary.

I had thought perhaps a varying signal was stored. But that's nonsense: After all, the unpredictable disturbances that 'cause' error aren't stored in memory, they stay in the environment. And that applies even to piecewise sequences of constant stored values resulting in a square-wave error output that gets smoothed. All the problems of 'planning' systems all over again!

To the extent that higher systems use discrete variables, the
memory-addressing scheme can work because all that must be specified is
which lower-level reference signal is to be employed, not how much of it.

It follows from your first statement above that higher systems are not analog control systems. Perhaps that was an overstatement; but it opens an interesting possibility.

On the way up the hierarchy, there must be a level where sets of analog
variables are transformed by perceptual input functions into discrete
variables, for example by forming classes or categories and naming them.

There is a mechanism already proposed, which exists at every level, and indeed is local to every elementary control system in the hierarchy, which perhaps can effect the translation from analog to digital: associative memory. This gives us for free the apparent promiscuity of the proposed Category level, where seemingly any perception at any level is amenable to categorization.

So how does the story strike you?

With associative memory addressing as the interface between analog and discrete, and Hofstadter's models of cognitive processes using associative memory operating at the higher, 'digital' levels, I think it makes a great deal of sense. The Fluid Analogies Research Group (FARG) have an interesting approach to questions of attention and consciousness, which have always seemed to me (subjectively and inchoately) to be closely affiliated with memory. It must be possible to get access to some of the code they have developed. (My daughter is home, and I don't know where she's put the copy I bought of their book.)

         /Bruce Nevin

···

At 12:56 PM 5/27/2003, Bill Powers wrote:

[From Bill Powers (2003.05.28.1345 MDT)]

Bruce Nevin (2003.05.27 16:28 EDT)--
>It follows from your first statement above that higher systems are not
>analog control systems. Perhaps that was an overstatement; but it opens an
>interesting possibility.

I think there are analog aspects. For example, take the categories named
"cat" and "dog." It's possible to morph a picture of a cat into a picture
of a dog. At some point, the categories suddenly switch, switching back at
a different point on the way back toward catness. But at the same time,
there's a sense during the initial transformation that the catness is
fading and something else is growing. In drawing letters, there are "A" and
"B", but there are also very good, not so good, and terrible examples of
the letters. Aside from the binary property of the categories, there is an
analog property as well.

I'd be interested in hearing a description of what the "fluid analogies"
research group is proposing.

Best,

Bill P.

[From Bruce Nevin (2003.05.30 14:43 EDT)]

Bill Powers (2003.05.27.1032 MDT)–

An analog (or continuous-variable)
control

system must have a continuously variable error signal in order to work
at

all. And such systems should also receive continuously variable
reference

signals, though they would work with piecewise constant reference
signals

(a set of discrete reference signal values).

Bill Powers (2003.05.28.1345 MDT)–

Bruce Nevin (2003.05.27 16:28 EDT)–

It follows from your first statement above

[repeated above for reference]

that higher systems are not

analog control systems. Perhaps that was an overstatement; but it
opens an

interesting possibility.

I think there are analog aspects. For example, take the categories
named

“cat” and “dog.” It’s possible to morph a picture of
a cat into a picture

of a dog. At some point, the categories suddenly switch, switching back
at

a different point on the way back toward catness. But at the same
time,

there’s a sense during the initial transformation that the catness
is

fading and something else is growing.

Consider how this works with memory addressing. The analog perceptions
that address memories are gradually shifted until a different memory is
addressed. The subjective phenomena seem to be quite natural attributes
of the model.

In drawing letters, there are “A”
and

“B”, but there are also very good, not so good, and terrible
examples of

the letters. Aside from the binary property of the categories, there is
an

analog property as well.

This follows naturally from a memory-addressing model in the same way.
Hofstadter’s “Letter Spirit” proposals are in this domain. See
below.

I’d be interested in hearing a description of
what the “fluid analogies”

research group is proposing.

The obvious reference is Fluid Concepts and Creative Analogies:
Computer Models of the Fundamental Mechanisms of Thought
, by Douglas
Hofstadter & [members of] the Fluid Analogies Research Group (FARG).
I mentioned previously (2003.04.27 13:07 EDT) Dennett’s review of the
Hofstadter book at
http://pp.kpnet.fi/seirioa/cdenn/hofstadt.htm
– FWIW. Dennett assumes the perspective of his fellow AI researchers, which is as usual of limited value for us.
I haven’t got far with the book, partly because I’ve been swamped with work lately, and partly because the book disappeared when my daughter got home. I’m trying to figure out what they mean by memory and analogy. Analogy is taken to be at the root of higher levels of cognitive processes, and also the basis for expectations and those violations of expectation that underlie humor. One idea that I’ve got from this so far (but whether extracted directly from what I have read or constructed from it I’m not sure) is the sense that analogy is not a relationship perception, nor a matter of smooth variation by an analog control process, but rather a storage relationship between two perceptions in memory. It is not the analogy between them that is perceived, but rather the two perceptions are each perceived, and because they are constituted in part by the same lower-level perceptions, there is overlap in their memory addressing and in evoked memories and imaginings. The experience is a kind of resonance in memory and imagination. It is possible to identify perceptions that the two have in common (in Picasso’s drawing the shape of the muzzle of the monkey is a ‘pun’ on the shape of the woman’s breast because this curve is the same, that one similar, etc.) but this step of analysis is at one remove from the experience – rather as explaining a joke spoils it.
Letter Spirit (“Letter Spirit: Esthetic perception and creative play in the rich microcosm of the Roman alphabet”, by Hofstadter & Gary McGraw published as Chapter 10 of the Hofstadter & FARG book) “is an attempt to model central aspects of human creativity on a computer. […] The aim is to model how the 26 lowercase letters of the roman alphabet can be rendered in many different but internally coherent styles. Starting with one or more seed letters representing the beginnings of a style, the program will attempt to create the rest of the alphabet in such a way that all 26 letters share that same style, or spirit.” The proposal and a sketch of the architecture dates from as early as 1980, but as of that writing (published 1995), the implementation had progressed only as far as recognition of letters. So far I haven’t seen any code in the book, but I did find some LISP code on John Rehling’s page at <http://www.cogsci.indiana.edu/farg/rehling/lspirit/code>. On pp. 459-463 of the book and in Rehling’s proposal at <http://www.cogsci.indiana.edu/farg/rehling/proposal/proposal.html> is a critique of a connectionist approach to the Letter Spirit problem, so we know they’re not doing it with neural nets.
The FARG web site at http://www.cogsci.indiana.edu/farg/mcgrawg/lspirit.html lists some publications, and it links to McGraw’s dissertation, which (from the TOC) appears to have some implementation detail and reports some psychological experiments. Rehling’s proposal (URL above) lays out the Letter Spirit architecture. I’m finding this stuff because the directories are readable.
(They suffer the karma of all academic projects, including in particular transience of staff as grad students get their degrees and see employment. McGraw currently works for a software company in Virginia <http://www.cigital.com/~gem/>. Rehling works at NASA <http://www.cogsci.indiana.edu/farg/rehling/resume.html> and seems to have an interest in astronomy <http://www.cogsci.indiana.edu/farg/rehling/astro/>.)
Hofstadter’s home page <http://www.psych.indiana.edu/people/homepages/hofstadter.html> includes this summary statement: “Several [FARG] programs that perceive structures and discover subtle as well as simple analogies by means of a tight interplay between concepts in long-term memory [the ‘slipnet’] and perceptual agents in short-term memory [the ‘workspace’] have been realized over the years; these include Copycat and Tabletop. The Letter Spirit project, modeling the perception and creation of diverse artistic styles, has been under way for several years, and a first implementation has recently been completed. The Metacat project, which deepens Copycat by bringing in episodic memory and some degree of self-awareness, has also been implemented in a preliminary fashion.” The link to copycat is broken (no host psych.indiana.edu/ftp) and the link to Tabletop goes to McGraw’s page.
They use a “parallel terraced scan” in all their projects (“CopyCat, MetaCat, TableTop, and the various stabs at Letter Spirit” per Michael Roberts, quoted below). In the earliest FARG system, Jumbo (for solving ‘word jumble’ puzzles), there’s a static data structure that associates letters into clusters. (Hofstadter calls this the ‘chunkabet’. He put this data structure together intuitively; a statistical analysis seems the obvious way to do it, but he wanted his subjective sense of ‘affinities’ represented. This ignorance of the strengths of and evidence for statistical learning is typical of annoyances.) A jumble is input - the letters of a word scrambled to a random sequence. A pair of letters ‘sparks’ if they are associated in the chunkabet. A spark is “a short-lived simple data-structure telling who is sparking with whom, and in what order.” A codelet is generated with each spark, and placed on the coderack. When a codelet for a spark is selected to run, it “will look at the spark, evaluate its viability [?], and then suggest whether it is worthwhile going on with further exploration in this tentative ‘romance’ between the given pair of letters. If this flirtation fails, then both letters will go on their merry ways, each of them free to spark with other partners instead. If the flirtation seems promising, though, then the codelet will create a ‘flash’ - the next stage of a romance” (p. 106). Codelets are generated in various ways in each of the FARG systems. They are stored on and removed from a ‘coderack’, on the analogy of a coatrack, like a kind of random-access stack (p. 106). This looks to be a hack to simulate parallel processing. Choice of which codelet to run next is random, but each codelet is weighted with ‘urgency’. When a codelet has run, it is removed from the coderack. The effects of running it may include other codelets being generated and placed on the coderack. A metaphor of cellular metabolism is used, long chains of chemical reactions carried out in small disjoint steps within the cytoplasm, each step setting up conditions for its possible continuation.
The graduation from spark to flash to dalliance is the ‘terraced’ aspect of the terraced scan, and is a way of managing the combinatorial explosion of possibilities, where checking them all by brute force is computationally intractible. A terraced scan is “a parallel investigation of many possibilities to different levels of depth, quickly throwing out bad ones and homing in rapidly and accurately on good ones” (p. 107). The resemblances to stochastic reorganization are obvious. The progressively more expensive tests (computationally expensive) are performed on a progressively reduced population of candidates, and each test contributes to a cumulative score for the given candidate, which (largely?) determines how viable it is when a codelet looks at it. Parallel processing is presumed, and must be simulated in Jumbo.
There are further steps of aggregation of candidate sequences whose members are insulated from disturbance (that is, from sparking with other individual letters outside the aggregation): bonds, chains, gloms, and membranes. Somewhere in here, letter clusters are identified as candidate syllables, so that e.g. pang-loss is not recognized as a word (I guess by lookup in a lexicon-list). Unhappy gloms and squeaky wheels emerge. Entropy-preserving transformations are applied (codelets are placed on the coderack), resulting in fluid data structures. One of the transforms of pang-loss is pan-gloss, which is recognized as a word. Something very like failure to reduce net error (“the overall happiness of the cytoplasm … has not reached a satisfactory level” – registered as the ‘temperature’ of the ‘cytoplasm’) causes Jumbo to “pour” codelets for entropy-increasing transformations onto the coderack, which break up word candidates that are not working. The overall performance of the system is governed by urgencies and happiness/temperature.
Michael Roberts is working on a project called Magnificat <http://www.cogsci.indiana.edu/farg/michael/proleg.html>. One of the aims is to generalize the ‘codelets’, which have been too domain-specific. He says that, in Rehling’s estimation, only about 20% of the code is transferable from one domain to another. I recall from my BBN days that this is a very common problem with AI research.
Maybe this gives some idea of what the FARG does. I mentioned one annoyance, which may be generalized to H’s insistence on going his own brilliant way and ignoring relevant work of others. We can’t fault him too much for that in present company. Another is the efflorescence of playful metaphors, and the worry that perhaps they are being taken too literally. Hofstadter critiques this sort of fallacy and its pervasiveness in AI and CogSci in Chapter 4, and in its preface “The ineradicable Eliza effect and its dangers”, so I think he’s not merely the extremely clever and witty dilettante that his earlier books might lead you to believe, and indeed so says Dennett in his review. It may be his way of being successful at attracting students while working very much as a maverick, and who could fault him for that either.

    /Bruce Nevin
···

At 12:56 PM 5/27/2003, Bill Powers wrote:
At 03:52 PM 5/28/2003, Bill Powers wrote:

[From Bill Powers (2003.05.31.0833 MDT)]

Bruce Nevin (2003.05.30 14:43 EDT)--

Consider how this works with memory addressing. The analog perceptions
that address memories are gradually shifted until a different memory is
addressed. The subjective phenomena seem to be quite natural attributes of
the model.

But if the same memory is addressed until the analog error or output
signals (not perceptions!) that do the addressing have changed enough, the
imagined or reference memory signal will not change smoothly, but stay the
same until the new address comes into effect. I think your model has
slipped a bit -- perceptions don't address memories, they get recorded as
memories. In associative addressing, part of a group of recordings can be
selected as an address, bringing forth the rest.

However, I'm pretty vague about just how this associative addressing would
actually be implemented, and prefer to leave that level of detail to those
who know more -- now, or more likely in the future.

The "fluid analogies" stuff is interesting but not my cup of tea.

Best,

Bill P.

P.S. could you send your vita to Alice? APMcE@aol.com.