Learning and memory

[From Shannon Williams (960206.00:30 Calgary time)]

Bill Powers (960203.0500 MST)--

   Before I try to model the process of learning, I want to
   be sure we understand what it is that is learned -- the
   end-point of the process.

You assume that there is a dynamic process ("the process of learning")
and a static result ("the result of learning"). But I think that
this assumption is based in your imagination. I do not believe that
this separation is real.

You could convince me that "the results of learning" exist
independently of the process that created them if you:

1) Show me a person that can apply the results of his previous
   "learning" in an environment where no object or concept is
   familiar to him.
or
2) Show me a person whose skill in some task does not vary. Any
   time he does a task, whether daily or once a year, his skill
   gets neither better nor worse. It stays the same for the rest
   of his life.

What could I show you which would convince you that "the results of
learning" are not independent of the process that created them?

I think it's premature to try to model learning now.

I think that any model of learned behavior, must model learning. In
fact, any legitimate model of learned behavior must:

1) allow you to visualize the learning process in an individual,
2) allow you to visualize how the learning process evolved in
   animals.

···

--------------------------------------------------------------

Bill Powers (960128.1800 MST)--

    If (A) = Hinsdale then (B) = '1'
    If (A) = '1' then (B) = '9'
    If (A) = '9' then (B) = '2'
    if (A) = '2' then (B) = '7'

But where does the "Hinsdale" come from? If I already have one input
that goes "Hinsdale 1927," then why do I need the second delayed input?

What second delayed input? (What first delayed input? - The delay is
used to create the association. It is not needed to replay the
association.)

I already have the number I want to remember.

If you already have the number that you want, then you have the number
that you want. What more do you want?

I think you may be talking strictly about sequence memory here.

What you perceive as a sequence, someone else may not. The adjective
"sequence" is irrelevant here.

    In other words, everything is associated. Memories do not exist by
    themselves, they are triggered by some association. For example,
    if you wanted to remember your multiplication tables, you could
    remember each aspect of the table by associating it with something
    that you find familiar, ie. you could pretend you were walking
    through a garden and and associate different parts of the garden
    with whatever you wanted to remember.

Have you read my discussion of memory in B:CP, starting on page 205? See
specifically p. 215-216 where the Method of Loci is mentioned.

Where do you think I got the example?

That is an old model of memory, but I don't think the numbers are right
to suppose it exists in the nervous system in neural form.

Martin answered you on this count. I did not receive your reply.

Molecular storage and retrieval would have far higher capacity,
although a plausible mechanism for accomplishing this escapes me.

Do you see the mechanism in associative memory?

And of course there are also cases where detailed memories prove to be
wholly manufactured.

Do you see how easy this is with associative memory?

we
have to remember analog relationships, rates of change, configurations,
rules, principles, and so forth.

What makes you think that this memory is not associative? Many people
learn to associate rules, pictures, graphs, formulas, etc. with key
words. I never learned that way, but I do visualize things. I change
my visualizations as I apply different rules or scenarios to my
current visualization. Perhaps you would not call this "associative
memory", but the same memory module that you recognize as generating
"sequence memory", could be used to generate this other kind of
memory.

Look at it this way, associative memory allows you to associate
different aspects of your current perceptions with similar aspects
of other perceptions. In other words, it creates/causes metaphors
and analogies.

The kind of memory that isn't explained by your diagram is experiential
memory -- simply calling a scene to mind.

Give me a break! "Experiential memory" is EXACTLY what is explained
by the neural network diagram. The whole operation of the network
dependes upon generating new perceptions from current perceptions. In
other words, the whole operation of the network dependes upon generating
new experiences from current experiences.

You could explain what might evoke this scene, but not how we can
effectively view it again, even noticing details we did not
consciously register during the original experience.

There are many ways to do this. You can associate your behavior
with causes for your behavior, you can hear the story re-told, you
can manufacture new details, etc.

It seems to me that you're talking about the conditions
under which memories can be linked to other memories, but not about what
is, in fact, remembered.

I do not believe that you can separate "what is remembered" from "the
process of remembering".

-Shannon Williams

[Martin Taylor 921005 11:30]
(Bill Powers 921005.0730)

This is in response to the discussion between Bill and Greg.

To start with, both have produced a set of principles, but neither gave a
functional description of an ECS and its place in the hierarchy. To have
a clear discussion of learning, we have to have agreement on what the
hierarchy looks like, at minimum. So here is what I have understood to
be more or less agreed. If any of this is wrong, it affects the discussion
and should be put right.

As I understand the hierarchy, the unit of which all is built is the ECS.
The ECS has two kinds of input function:
(i1) a perceptual input function, which combines all the many sensory
inputs according to some algorithm and produces a scalar value called the
perceptual signal. The word "sensory" includes the possibilities of direct
input from sensor systems and the perceptual signals produced by the perceptual
input functions of other (lower) ECSs.
(i2) a reference input function, which combines action output signals from
other (higher) ECSs. Typically, this function is considered to be a simple
summation, but it could be any algorithm. The output range of the reference
input function must not exceed that of the perceptual input function. The
output of the reference input function is a scalar value called the reference
signal of the ECS.

The perceptual signal is compared with the reference signal to produce a
scalar value called the error signal. The error signal is transformed by
an output function, typically but not necessarily an integrating amplifier,
to produce an output signal.

The ECS has two kinds of output that are distributed to other ECSs (except that
in the case of the lowest level ECSs the output goes directly to muscles of
other effector systems. In each case there is the possibility that the output
is weighted differently in transmission to each individual destination:
(o1) A set of perceptual outputs that consist of the perceptual signal possibly
mulitplied by some weight. These are the sensory inputs of other (higher) ECSs.
(o2) A set of action outputs that consist of the output signal possibly
multiplied by some weight. These are the reference inputs of other (lower)
ECSs.

The ECS has the possibility of storing one or more values of its perceptual
signal and of using a stored value in place of its reference signal (episodic
memory).

The ECS has the possibility of using an action output signal as one of its
sensory input signals (imagination).

The hierarchy consists of ECSs linked only by the connection of the (possibly
weighted) perceptual signals of lower ECSs to sensory inputs of higher ECSs,
and of the (possibly weighted) action output signals of higher ECSs to the
reference inputs of lower ECSs. Each kind of link is one-to-many.

···

===================

If the foregoing is a correct description of an ECS and its place in the
hierarchy, what opportunities are there for learning? The following all
seem plausible:

Within a set of ECSs already in existence:

(1) Alteration of the perceptual function.
(2) Alteration of the perceptual-sensory link structure.
(3) Alteration of the output function.
(4) Alteration of the action-reference link structure.
(5) Alteration of the reference input function.
(6) Alteration of the content of the internal memory of the ECS.

And
(7) Incorporation of a new ECS into the hierarchy.

There are subclasses:
(1a) (3a) and (5a) Modification of the parameter values of the function
(1b) (3b) and (5b) Alteration of the form of the function.

(2a) and (4a) Alteration of the connection weights
(2b) and (4b) Alteration of which ECSs are linked.

All in all, this makes 12 logical possibilities for types of ways the hierarchy
can learn. Not all are effective, and the discussion has generally focussed
on only a couple.

Hebbian learning is a term usually denoting topologically smooth changes
of parameter values in a combining function. There are many forms, but
generally speaking they all involve changes based on some measure of goodness
of the present set of parameter values in relation to the data input to the
function. One way is to make more extreme a pattern that creates a large
output, and to make less extreme a pattern that produces a small output.
Another way is to move the input parameters in such a way as to increase
the output when some "teacher" asserts that the input pattern is one to
which the function "should" give a large response.

Within the ECS hierarchy, Hebbian learning has usually been discussed as it
applies to the perceptual input functions (1a), and then usually when the input
functions are seen as nonlinear compressions of weighted sums of their
inputs. If the hierarchy consisted only of ECSs having this form of perceptual
input function, the perceptual side would be a classical multilayer perceptron,
and Hebbian learning would suffice to allow it to perceive (and hence possibly
to control) any describable partitioning of the sensory input data space.
Other forms of perceptual input function, possibly involving delayed sensory
inputs, are possible and may be necessary for the control of dynamical
percepts such as sequences.

Hebbian learning could apply in classes 1a, 3a, or 5a. Because the success
of control does not depend much on loop gain if the loop gain is high enough,
one would not expect it to be very important in 3a, except in adjusting
parameters such as integration time constants. It might be important in 5a,
which affects the relative strength of different higher-level ECSs on the
reference signal of the ECS in question. No discussion (that I remember) has
considered this possibility, or how it might work if it happened at all.

Hebbian learning could apply in 2a and 4a as well, but numerically the
results would be indistinguishable from 1a and 5a applied in a different ECS.
The reason these possibilities are listed is that there is a potential
question about the location of responsibility for the alterations of weights.
"What does the ECS know, and when does it know it?"

Hebbian learning cannot be relevant to structural alterations in the hierarchy
(1b-5b), because the learning is not topologically smooth. This is the
province of "reorganization." It also cannot apply to classes 6 or 7, which
involve discrete events that change either the content or the structure of
the hierarchy.

(Powers)

Hebbian learning makes parameters depend on long-term effects of the
signals passing through the network that is being reorganized; that's
the Hebbian version of reorganization. For this to work in general, it
must be true that there is some "best" way of handling signals. This
assumption shows up in the postulate that the output of a neuron
somehow "strengthens" the effects of input signals that exist at the
same time as the output signal. The implication is that it is best for
the organism that all signals contributing to a given neural output
have the maximum possible effect. This sort of rule, I believe, is an
attempt (of which I approve) to get away from a "teacher" that already
knows how a neural function should be organized.

This is indeed one form of Hebbian learning, but the motivation is not to
get away from a teacher, so much as to provide the maximum discrimination among
input signals that is consistent with the variation in stimulus patterns.
Usually, weights on low-valued inputs decrease when weights on high-valued
ones increase, so as to sharpen the discrimination.

Any system in which there is a preferred kind of
input or output function loses the ability to adapt to environments in
which some other kind of input or output function is required for
successful control.

True. And I think that your "levels" of ECS accommodates that. But Hebbian
learning can operate within different kinds of input function, provided that
the function is such that small changes in parameter values cause small
changes in the function's behaviour.

I had intended to carry on this posting with a discussion of reorganization
possibilities, but I think it is long enough, and I have spent too long on
it, already. Maybe later. But if you accept my categorization of learning
possibilities, there seem to be 5 kinds of reorganization, of which perhaps
1 or 2 may be useful in learning. And then there are episodic memory learning
and the introduction of new ECSs. So there are many possibilities for ways
in which the hierarchy might change to develop new skills.

When there are so many possibilities, it makes good sense to see what happens
when only the most prominent are used. In this case, I think that the most
likely candidates for effective learning are (in no particular order),
1a, 2+4b (together), 6, and 7. These can be verbalized as "What can I perceive"
"What can I do to control what I perceive" "What have I perceived" and "I
can do nothing right--let's try something completely different."

Martin