New Perceptual Functions proposal (was Re: Reinforcement Learning)

[Martin Taylor 2017.10.11.10.21]

[From Rupert Young (2017.10.11 10.00)]

  (Rick Marken (2017.10.09.0915)]

Yes. Which was what I said.

The following is a tentative suggestion about simulating the

development of novel perceptions of an environment, in an attempt to
answer Rupert’s question.

Remember that the Arm2 demo operates in an external environment

devoid of content. The Little Man operates in an external
environment that contains a single moving object. Neither
environment has anything for the “organism” to learn about, so they
can’t be used as examples for the development of novel perceptions.
But their principles can.

Let's imagine something along the lines of a complement of the Arm 2

demo, or rather, a possible series of them, which I call Baby0,
Baby1, …, BabyN. Each builds on its predecessor. Baby0 starts with
an almost trivially simple environment and a bunch of pseudo-neurons
that could arrange themselves into the form of control loops with
perceptual functions. The question is whether Baby0 can develop two
perceptual functions that will allow it to stabilize two variables
in its environment.

To help describe BabyN and its environment, I will use some naming

conventions. Variables in the environment will be Vn, as in V1,
V2,… Environmental variables are individually influenced by
disturbance variables Dn, in the sense that Vj is influenced by Dj.
The Dn are driven by the experimenter. Sensory variables, reported
to the interior of BabyN will be S1, S2, … It may be true, but
more often will not be true, that an S variable corresponds directly
to a V variable. Perceptual variables will be Pn, reference
variables Rn, error variables En, output variables On, and Actuator
variables (analogous to muscle tensions or joint angles) An. As with
the sensor variables, an actuator variable may, but usually will
not, be associated directly with an environment variable.

One more acronym I will use is "PNn" for "Pseudo-neuron number n". A

PN has a number of inputs, performs some function on them, (often a
simple addition) and produces an output. PNn output is connected to
one of the PNm inputs by link Lnm, which has a real number weight
and possibly a defined delay (but we will worry about that much
later). In principle, any PN output may be linked as an input to any
or all other PNs. In the early BabyN series, however, PNs come in
families that have specialized linkage possibilities. The
experimental (or demo) question for any of the BabyNs is whether,
under the initial conditions specified, it will reorganize to
produce perceptual functions that enable it to live effectively in
its particular environment. I describe “live effectively” when I
come to “intrinsic variables”.

A "newborn" BabyN may have some control units already constructed,

but it also has (or may create) uncommitted PNs that might form new
control units. Baby0 has nothing pre-constructed, but uncommitted
PNs come in several different types or flavours, perceptual,
comparator, and output (two sub-types). A perceptual PN (pPN) can
accept input from a Sensor or from the output of a pPN. An output PN
(oPN) can accept input from the output of a comparator PN (cPN), and
produces output that can be linked to any number of oPNs or As. A
cPN can accept input from oPNs and pPNs and sends output to oPNs.
oPNs are special in another way. They come in two subtypes, one that
acts like a simple multiplier with a gain subject to reorganization,
while oPNs of the other type act as leaky integrators, and their
gain rate and leak rate are subject to reorganization.

The classification of PNs into different types reduces computational

complexity greatly if there are large numbers of S and A variables
and the environment is complicated. It would be perfectly possible
to describe the whole experiment as using PNs with no type
distinction, and I did so in an early draft of this message, but a
few back-of-the-envelope calculations suggested that the
reorganization time would very quickly get out of hand, even using
the e-coli method used by Powers to solve that same problem.

The influence of actuators on environmental variables is fixed for

the duration of any experimental or demo run, as is the influence of
the environmental variables on the sensors. The disturbances vary at
the experimenter’s whim.

Time must be considered. As this is a computer-based simulation,

time must be sampled, and one time unit could be the time between
samples. There must also be a delay associated with every link, but
these delays might be much shorter than a sample time, and therefore
could be taken as zero. The total loop delay, however, cannot be
taken to be zero. To ensure this, I will arbitrarily say that it
takes at least one sample time for a change in the output of an
actuator to be felt at a V environmental variable. Integration rates
and leak rates for the leaky integrator kind of output function can
be measured in rates per sample without loss of generality.

One further thing is needed, intrinsic variables to set the rate of

reorganization. Reinforcement Learning has a significant problem in
the “assignment of credit”, which different proposals finesse in
different ways. PCT has less of a problem in this, because each
control loop is responsible for best controlling its own perceptual
variable. The better the control, the more stable are all the
variables in the loop. I propose to use this fact as a local
intrinsic variable, to go along with the global intrinsic variable
of minimizing overall error. Reinforcement rate is the probability
of an e-coli move during any time sample.

Even though this is a simulation in which the intrinsic variable is

quite arbitrary, we can take a cue from real life. The big problem
with the human brain is how to dissipate the heat of neural
operations, primarily caused by neural firing. Our simulated neurons
(our PNs) don’t fire. They produce larger and smaller “neural
currents”, which can be positive or negative. Since electrical power
is I2R or V2 /R, one index of equivalent energy
usage of a PN might be the variance of its output. But then “R”,
resistance, also enters the formula. R must be related to link
weights. Indeed, what reorganization will do is change link weights,
so they are critical to the intrinsic variable we seek.

In a real brain, the firing rate of a neuron is not (so fas as I

know) influenced by how many synapses are reached by each impulse.
If it were an actual current, this would not be true. But a firing
is actually a voltage spike, not a blast of current. The Powers
“neural current” is perhaps better seen as an average voltage rather
than a summed current. Taking this view, in our BabyN, resistance
would be the inverse of link weight, so the momentary power of a PN
would be represented by (output2 )/(sum of output link
weights to other units). The energy generated as heat over a time t
would be t*(output variance)/(sum of link weights).

Energy cannot be dissipated instantaneously. It leaks away at a rate

proportionate to the temperature difference between the hot and cold
regions. In BabyN, we can simplify this and use it in reverse, to
indicate how hot the PN would be after some period of activity. The
temperature is proportional to the integrated energy produced by
firings, less that dissipated into the environment. It is therefore
the output of a leaky integrator that has input rate as above and
leak rate proportional to the current temperature difference between
the PN and the surroundings.

What are the surroundings of a PN? There are two parts, an outer

“cold” universe that we can take to be at zero temperature, and
nearby PNs. Since we have no self-organized way to localize a PN, we
must be arbitrary. I propose that all PNs of the same type (pPN,
cPN, and oPN, not distinguishing the oPN subtypes) should be the
neighbours of a PN. The resulting “cold sink” temperature would then
be somewhere between the average temperature of the neighbours and
zero. Perhaps half the average temperature of the neighbours might
be a simple starting point.

This temperature will be our intrinsic variable. If a PN gets too

cold or too hot, it will lose efficiency and reduce its output for
any given input. On the other hand, in a global sense, the cooler
the entire ensemble is, the less difficult it is to configure the
brain (in an evolutionary sense) to fit in a reasonable size of
skull. This suggests that there should also be a global intrinsic
variable for average temperature, with some reference value. For
simplicity, we might start by suggesting that the global temperature
optimum would be the same as for the individual PNs (a value
controlled by the experimenter in the early members of the BabyN
series, but possibly self-optimized in more complex versions).

The tension between the global and the many local intrinsic

variables should lead to a steady operating temperature with a
minimum number of PNs contributing to the work, because the more
operating PNs there are, the more heat there is to dissipate and the
hotter the ensemble becomes.

Reorganization: Each BabyN has variable input through its sensors.

These induce variation in all the PNs that are initially randomly
interconnected, generating “pseudo-heat”, which makes them all get
hot, some more than others. Each PN uses local e-coli
reorganization, treating the in- and out-link weights as coordinates
in a local space. (There must be an e-coli administrator hidden in
the genetic structure of the organism, but I won’t attempt to
suggest how it got there. I just assume it exists, as did Powers).

The global intrinsic variable also induces its own e-coli

reorganization, using all the link weights. The interactions between
the local and global e-coli effects will skew the apparent
directions in both when seen by an external analyst, but the local
and global effects do not interfere with each other, nor do they
operate at the same rate. A particular PN might be near its optimum
“temperature” while the ensemble is much hotter than the global
optimum. Imagine that we had just two links to work with. Locally,
the last step was, say, {1,2} and performance improved, so the next
step will again be in the same direction {1,2}. But global
reorganization says that the next step will be in the direction
{-.5,1}. The resulting actual next step will be {0.5,3} as seen by
an external analyst, but will be {1, 2} and {-.5, 1} as seen by the
local and global e-coli administrators (who are hidden in the
structure of the BabyN series).

As written above, I feel there is too much arbitrariness in the

detail, and perhaps a simpler global-local algorithm might be used
to determine changes in the global and local intrinsic variables. In
more complex organisms than are anticipated in the BabyN series, the
organism would have to find food in order to sustain its energy
dissipation, a topic that is totally ignored here. But note that
warm-blooded animals and cold-blooded animals alike must maintain
their body temperature within fairly strict limits or cease
effectively controlling, so treating temperature as an important
intrinsic variable both locally and globally seems a very natural
thing to do.

···
              RY: Sure. But, as far as I can

see, it is mostly conceptual. The only formal
definition I’m aware of is Bill’s arm reorg example in
LCS3, of gradual weight adjustments of output links,
which improves control performance. I don’t see
anything on how perceptual functions are learned. Or
how memory (which is learning) fits into
reorganisation.

            RM: The arm reorg example in LCS3 seems to be more

than conceptual; it actually works.

[From Rick Marken (2017.10.17.0840)

···

Martin Taylor (2017.10.11.10.21)–

[From Rupert Young (2017.10.11 10.00)]

RY:Yes. Which was what I said.
MT: The following is a tentative suggestion about simulating the
development of novel perceptions of an environment, in an attempt to
answer Rupert’s question.

RM: This reminded me of a demo Bill Powers did involving N control systems simultaneously controlling perceptions, each of which is a function of N environmental variables. I managed to find the program (which unfortunately runs only on PCs). But here it is:

 https://www.dropbox.com/s/2u00ac87bix2sjv/MultiControlPrj.exe?dl=0

RM: And I highly recommend running this program (if you can) after reading the write up associated with it:

https://www.dropbox.com/s/rwoqfa8v96g62ob/multiple_control.pdf?dl=0

RM: I won’t go into an explanation of the demonstration here but I’ll just say that it’s a great test bed for testing models of reorganization of perception. Indeed, Bill says as much in the write up:

BP: Since the initial perceptual weightings are selected at random, there is no guarantee that the perceptions are independent of each other. Thus there can be considerable conflict between some pairs of control systems. Also, the weightings can add up to a small or a large effect on the perception. For both reasons, some control systems will control more accurately than others. This, indeed, is one basis on which reorganization could occur: the weightings for a given control system’s input function could be randomly shuffled until the control error is minimized. This is a promising topic for further investigation. [italics mine]

RM: I think I recall there being a reorganizing version of this demo but I can’t seem to find it. If anyone has access to it please let us know about it. But I think it would be pretty easy to write up some perceptual learning models for this test bed program here. The goal would be to reorganize the weights of the linear combination of environmental variables that produce each of the N controlled perceptual variables so that these perceptions are as orthogonal as they can be. When this is true, you will see all N perceptions coming very close to their respective reference signals. I’m pretty sure that it would be found that RL algorithms don’t work, unless they are actually versions of E. coli reorganization by a different name.

RM: This demo also illustrate a few points I made earlier about the relationship between perception and environmental variables and about the nature of perceptual learning from a PCT perspective. The demo shows that perceptual variables are functions of environmental variables; they don’t correspond to variables in the environment. There are no CEV’s in this demo, just N environmental variables. The perceptual functions of the N control systems define the aspects of these N environmental variables that are controlled. The demo also shows that perceptual learning involves variation and selective retention of the parameters of “built in” types of perceptual functions. In this case, the type of perceptual function that is “built in” to all N control systems is a linear weighting function of the type p = w.1q.1+w.2q.2+… w.N*q.N where the w.i are the weights of the environmental variables q.i.Â

RM: As usual, you can learn a lot about PCT by reading Powers and, more importantly, following Powers’ demonstrations of how the model works.Â

Best

Rick

Remember that the Arm2 demo operates in an external environment

devoid of content. The Little Man operates in an external
environment that contains a single moving object. Neither
environment has anything for the “organism” to learn about, so they
can’t be used as examples for the development of novel perceptions.
But their principles can.

Let's imagine something along the lines of a complement of the Arm 2

demo, or rather, a possible series of them, which I call Baby0,
Baby1, …, BabyN. Each builds on its predecessor. Baby0 starts with
an almost trivially simple environment and a bunch of pseudo-neurons
that could arrange themselves into the form of control loops with
perceptual functions. The question is whether Baby0 can develop two
perceptual functions that will allow it to stabilize two variables
in its environment.

To help describe BabyN and its environment, I will use some naming

conventions. Variables in the environment will be Vn, as in V1,
V2,… Environmental variables are individually influenced by
disturbance variables Dn, in the sense that Vj is influenced by Dj.
The Dn are driven by the experimenter. Sensory variables, reported
to the interior of BabyN will be S1, S2, … It may be true, but
more often will not be true, that an S variable corresponds directly
to a V variable. Perceptual variables will be Pn, reference
variables Rn, error variables En, output variables On, and Actuator
variables (analogous to muscle tensions or joint angles) An. As with
the sensor variables, an actuator variable may, but usually will
not, be associated directly with an environment variable.

One more acronym I will use is "PNn" for "Pseudo-neuron number n". A

PN has a number of inputs, performs some function on them, (often a
simple addition) and produces an output. PNn output is connected to
one of the PNm inputs by link Lnm, which has a real number weight
and possibly a defined delay (but we will worry about that much
later). In principle, any PN output may be linked as an input to any
or all other PNs. In the early BabyN series, however, PNs come in
families that have specialized linkage possibilities. The
experimental (or demo) question for any of the BabyNs is whether,
under the initial conditions specified, it will reorganize to
produce perceptual functions that enable it to live effectively in
its particular environment. I describe “live effectively” when I
come to “intrinsic variables”.

A "newborn" BabyN may have some control units already constructed,

but it also has (or may create) uncommitted PNs that might form new
control units. Baby0 has nothing pre-constructed, but uncommitted
PNs come in several different types or flavours, perceptual,
comparator, and output (two sub-types). A perceptual PN (pPN) can
accept input from a Sensor or from the output of a pPN. An output PN
(oPN) can accept input from the output of a comparator PN (cPN), and
produces output that can be linked to any number of oPNs or As. A
cPN can accept input from oPNs and pPNs and sends output to oPNs.
oPNs are special in another way. They come in two subtypes, one that
acts like a simple multiplier with a gain subject to reorganization,
while oPNs of the other type act as leaky integrators, and their
gain rate and leak rate are subject to reorganization.

The classification of PNs into different types reduces computational

complexity greatly if there are large numbers of S and A variables
and the environment is complicated. It would be perfectly possible
to describe the whole experiment as using PNs with no type
distinction, and I did so in an early draft of this message, but a
few back-of-the-envelope calculations suggested that the
reorganization time would very quickly get out of hand, even using
the e-coli method used by Powers to solve that same problem.

The influence of actuators on environmental variables is fixed for

the duration of any experimental or demo run, as is the influence of
the environmental variables on the sensors. The disturbances vary at
the experimenter’s whim.

Time must be considered. As this is a computer-based simulation,

time must be sampled, and one time unit could be the time between
samples. There must also be a delay associated with every link, but
these delays might be much shorter than a sample time, and therefore
could be taken as zero. The total loop delay, however, cannot be
taken to be zero. To ensure this, I will arbitrarily say that it
takes at least one sample time for a change in the output of an
actuator to be felt at a V environmental variable. Integration rates
and leak rates for the leaky integrator kind of output function can
be measured in rates per sample without loss of generality.

One further thing is needed, intrinsic variables to set the rate of

reorganization. Reinforcement Learning has a significant problem in
the “assignment of credit”, which different proposals finesse in
different ways. PCT has less of a problem in this, because each
control loop is responsible for best controlling its own perceptual
variable. The better the control, the more stable are all the
variables in the loop. I propose to use this fact as a local
intrinsic variable, to go along with the global intrinsic variable
of minimizing overall error. Reinforcement rate is the probability
of an e-coli move during any time sample.

Even though this is a simulation in which the intrinsic variable is

quite arbitrary, we can take a cue from real life. The big problem
with the human brain is how to dissipate the heat of neural
operations, primarily caused by neural firing. Our simulated neurons
(our PNs) don’t fire. They produce larger and smaller “neural
currents”, which can be positive or negative. Since electrical power
is I2R or V2 /R, one index of equivalent energy
usage of a PN might be the variance of its output. But then “R”,
resistance, also enters the formula. R must be related to link
weights. Indeed, what reorganization will do is change link weights,
so they are critical to the intrinsic variable we seek.

In a real brain, the firing rate of a neuron is not (so fas as I

know) influenced by how many synapses are reached by each impulse.
If it were an actual current, this would not be true. But a firing
is actually a voltage spike, not a blast of current. The Powers
“neural current” is perhaps better seen as an average voltage rather
than a summed current. Taking this view, in our BabyN, resistance
would be the inverse of link weight, so the momentary power of a PN
would be represented by (output2 )/(sum of output link
weights to other units). The energy generated as heat over a time t
would be t*(output variance)/(sum of link weights).

Energy cannot be dissipated instantaneously. It leaks away at a rate

proportionate to the temperature difference between the hot and cold
regions. In BabyN, we can simplify this and use it in reverse, to
indicate how hot the PN would be after some period of activity. The
temperature is proportional to the integrated energy produced by
firings, less that dissipated into the environment. It is therefore
the output of a leaky integrator that has input rate as above and
leak rate proportional to the current temperature difference between
the PN and the surroundings.

What are the surroundings of a PN? There are two parts, an outer

“cold” universe that we can take to be at zero temperature, and
nearby PNs. Since we have no self-organized way to localize a PN, we
must be arbitrary. I propose that all PNs of the same type (pPN,
cPN, and oPN, not distinguishing the oPN subtypes) should be the
neighbours of a PN. The resulting “cold sink” temperature would then
be somewhere between the average temperature of the neighbours and
zero. Perhaps half the average temperature of the neighbours might
be a simple starting point.

This temperature will be our intrinsic variable. If a PN gets too

cold or too hot, it will lose efficiency and reduce its output for
any given input. On the other hand, in a global sense, the cooler
the entire ensemble is, the less difficult it is to configure the
brain (in an evolutionary sense) to fit in a reasonable size of
skull. This suggests that there should also be a global intrinsic
variable for average temperature, with some reference value. For
simplicity, we might start by suggesting that the global temperature
optimum would be the same as for the individual PNs (a value
controlled by the experimenter in the early members of the BabyN
series, but possibly self-optimized in more complex versions).

The tension between the global and the many local intrinsic

variables should lead to a steady operating temperature with a
minimum number of PNs contributing to the work, because the more
operating PNs there are, the more heat there is to dissipate and the
hotter the ensemble becomes.

Reorganization: Each BabyN has variable input through its sensors.

These induce variation in all the PNs that are initially randomly
interconnected, generating “pseudo-heat”, which makes them all get
hot, some more than others. Each PN uses local e-coli
reorganization, treating the in- and out-link weights as coordinates
in a local space. (There must be an e-coli administrator hidden in
the genetic structure of the organism, but I won’t attempt to
suggest how it got there. I just assume it exists, as did Powers).

The global intrinsic variable also induces its own e-coli

reorganization, using all the link weights. The interactions between
the local and global e-coli effects will skew the apparent
directions in both when seen by an external analyst, but the local
and global effects do not interfere with each other, nor do they
operate at the same rate. A particular PN might be near its optimum
“temperature” while the ensemble is much hotter than the global
optimum. Imagine that we had just two links to work with. Locally,
the last step was, say, {1,2} and performance improved, so the next
step will again be in the same direction {1,2}. But global
reorganization says that the next step will be in the direction
{-.5,1}. The resulting actual next step will be {0.5,3} as seen by
an external analyst, but will be {1, 2} and {-.5, 1} as seen by the
local and global e-coli administrators (who are hidden in the
structure of the BabyN series).

As written above, I feel there is too much arbitrariness in the

detail, and perhaps a simpler global-local algorithm might be used
to determine changes in the global and local intrinsic variables. In
more complex organisms than are anticipated in the BabyN series, the
organism would have to find food in order to sustain its energy
dissipation, a topic that is totally ignored here. But note that
warm-blooded animals and cold-blooded animals alike must maintain
their body temperature within fairly strict limits or cease
effectively controlling, so treating temperature as an important
intrinsic variable both locally and globally seems a very natural
thing to do.

--------------

I will discuss what I suggest might be the first three members of a

BabyN series. Baby0 is the simplest of the series, and lives in the
simplest environment. Baby1 lives in almost the same environment.
Baby2 lives in a slightly more complicated environment based on the
earlier ones. The structure of the environment does not change
during any one experimental run, but may differ from run to run.

Here are the initial conditions for Baby0 and Baby1. Baby2 is based

on an already reorganized Baby1, and might be thought of as a stage
in the maturation of Baby1.

Organism: Baby0 and Baby1 have two sensors, S1 and S2, two actuators

A1 and A2, and several PNs initially randomly interconnected
according to the rules described above. “Randomly interconnected”
means connected with a link weight chosen from a uniform random
distribution between =1 and +1. To make it concrete, some numbers
pulled out of the air are 4 pPNs, 3 cPNs and 6 oPNs, three of each
type (multiplier and leaky integrator).

Environment: There are 2 environmental variables, V1 and V2. They

are independent of each other, and are influenced by independent
disturbances D1 and D2, as well as by the outputs of Baby0’s two
actuators A1 and A2.

Baby0: Actuator A1 output influences only variable V1 and A2

influences only V2. Likewise the V variables influence only their
corresponding sensors, so there are two independent environmental
paths from actuator through a V variable to the corresponding
sensor.

Baby1: Each actuator influences both environmental variables through

a link with a randomly chosen weight. Each environmental variable
variable influences both sensors through a link with a randomly
chosen weight. The existence of the two independent and
independently disturbed environmental variables is thereby hidden
from actuators and sensors (but may prove not to be hidden from
Baby1’s perceptual functions).

--------------

Some experimental questions.



Baby0: Will Baby0 learn to control any perceptions relating to the

independently disturbed, sensed, and acted-upon V1 and V2? If it
does, will the relevant control structure look like two independent
control loops, one controlling a perception of V1, the other of V2,
or will the controlled perceptions be joint functions of V1 and V2,
such as V1+V2 and V1-V2. Will unused PNs be pruned away by bringing
their link weights near zero?

Baby1: Will Baby1 learn to perceive and influence V1 and V2 as

independent variables?

Comment: The difference here between Baby0 and Baby1 is that in

principle, Baby0 could function optimally by simply connecting S1 to
A1 and S2 to A2 by links with negative weights. This is disallowed
by the connection rules, which do not provide for that kind of
connection. Baby0 must use at least a pPN, a cPN, and an oPN in each
loop, but each of them could work in a functioning control loop with
just one in-link and one output link. Is that the structure that
will be produced?

Baby1, on the other hand, could control perceptions of V1 and V2

separately only if both S variables are linked to two independent
pPNs, and each of two oPNs is linked to both actuators. Would Baby1
wind up controlling perceptions of V1 and V2, or complexes of both?

For both of these BabyN stages, the environment may be too simple,

in that there is unlikely to be much difference in energy
dissipation among structures that “perceive” different orthogonal
functions of V1 and V2, so different structures may emerge from
different experimental runs. However, in discussing Baby2, I will
assume Baby1 at least learned to perceive two orthogonal functions
of the two environmental variables, whether those functions are the
variables themselves, their sum and difference, or something else. I
will call the values of the perceptions output by those functions
P11 and P12 (level1 function 1 and 2).

------------Baby2-------

Organism: Baby2 is initialized as an already reorganized Baby1,

reorganized as above or manually organized to have some specified
structure of two perceptual control units controlling perceptions
P11 and P12. Baby2 is also given a few uncommitted PNs of each type.

Environment: In addition to V1 and V2 that Baby1 can keep stable by

controlling the two perceptions P11 and P12, the environment has a
new variable V21. V21 has its own separate disturbance D21, but the
disturbances D1 and D2 are no longer active. They are superseded by
the fact that variations in V21 are linked to V1 and V2 by links
with randomly initialized weights, meaning that the disturbances to
V1 and V2 are highly correlated because they have a common cause.

Question: Since to control a perception of V21 and through it V1 and

V2 will be energetically better than to control V1 and V2
separately, will Baby2 develop a perceptual function that outputs a
perception of V21 that we could legitimately call P21?

Comment: On the face of it, Baby2 seems more likely to develop a

perceptual function that produces P21 than Baby1 is to develop
perceptual functions that correspond individually to V1 and V2.
Baby2 is intended to be an attempt to answer Rupert’s question about
whether a formal description of reorganization can be created that
would result in the creation of new perceptual functions. I don’t
know whether the BabyN series would actually work as I hope it will.
It would be nice if someone takes up the programming challenge and
tries it at least as far as Baby2, because if Baby2 does develop a
perceptual function for V21 and controls the resulting perception,
it would answer a few questions that have bedevilled CSGnet and
would lead directly to more competent BabyN and perhaps to robots
that learn autonomously how best to navigate their environments and
perform useful tasks autonomously.

Further comment: Up to Baby2 and probably for a few more of the

BabyN series, the only side-effect that influences an intrinsic
variable is the creation of heat by the operation of the individual
PNs. To deal properly with reorganization of a larger hierarchy,
side-effects that work through the external environment but
influence some internal intrinsic variable should be included in
more complex versions of BabyN – always assuming that Baby2 works
as hoped. In a lizard, control of the local environment temperature
by moving into sun or shadow as required would be an example of such
an environmentally mediated side-effect, as it would change the
energy dissipation rate from individual neurons and from the
ensemble.

Martin


Richard S. MarkenÂ

"Perfection is achieved not when you have nothing more to add, but when you
have nothing left to take away.�
                --Antoine de Saint-Exupery

            RM: The arm reorg example in LCS3 seems to be more

than conceptual; it actually works.

[From Bruce Nevin (2017.10.17.08:39)]

At one meeting, maybe St. Louis, Bill brought a demonstration of reorganization, According to my surely imperfect memory, it involved 256 control loops each closed through a common environment, so that the out put of each affected the inputs of all. Regularities developed out of their random activities. Does this demo still exist, and is it relevant to the present question?

···

On Sun, Oct 15, 2017 at 11:49 AM, Martin Taylor mmt-csg@mmtaylor.net wrote:

[Martin Taylor 2017.10.11.10.21]

[From Rupert Young (2017.10.11 10.00)]

(Rick Marken (2017.10.09.0915)]

  Yes. Which was what I said.
The following is a tentative suggestion about simulating the

development of novel perceptions of an environment, in an attempt to
answer Rupert’s question.

Remember that the Arm2 demo operates in an external environment

devoid of content. The Little Man operates in an external
environment that contains a single moving object. Neither
environment has anything for the “organism” to learn about, so they
can’t be used as examples for the development of novel perceptions.
But their principles can.

Let's imagine something along the lines of a complement of the Arm 2

demo, or rather, a possible series of them, which I call Baby0,
Baby1, …, BabyN. Each builds on its predecessor. Baby0 starts with
an almost trivially simple environment and a bunch of pseudo-neurons
that could arrange themselves into the form of control loops with
perceptual functions. The question is whether Baby0 can develop two
perceptual functions that will allow it to stabilize two variables
in its environment.

To help describe BabyN and its environment, I will use some naming

conventions. Variables in the environment will be Vn, as in V1,
V2,… Environmental variables are individually influenced by
disturbance variables Dn, in the sense that Vj is influenced by Dj.
The Dn are driven by the experimenter. Sensory variables, reported
to the interior of BabyN will be S1, S2, … It may be true, but
more often will not be true, that an S variable corresponds directly
to a V variable. Perceptual variables will be Pn, reference
variables Rn, error variables En, output variables On, and Actuator
variables (analogous to muscle tensions or joint angles) An. As with
the sensor variables, an actuator variable may, but usually will
not, be associated directly with an environment variable.

One more acronym I will use is "PNn" for "Pseudo-neuron number n". A

PN has a number of inputs, performs some function on them, (often a
simple addition) and produces an output. PNn output is connected to
one of the PNm inputs by link Lnm, which has a real number weight
and possibly a defined delay (but we will worry about that much
later). In principle, any PN output may be linked as an input to any
or all other PNs. In the early BabyN series, however, PNs come in
families that have specialized linkage possibilities. The
experimental (or demo) question for any of the BabyNs is whether,
under the initial conditions specified, it will reorganize to
produce perceptual functions that enable it to live effectively in
its particular environment. I describe “live effectively” when I
come to “intrinsic variables”.

A "newborn" BabyN may have some control units already constructed,

but it also has (or may create) uncommitted PNs that might form new
control units. Baby0 has nothing pre-constructed, but uncommitted
PNs come in several different types or flavours, perceptual,
comparator, and output (two sub-types). A perceptual PN (pPN) can
accept input from a Sensor or from the output of a pPN. An output PN
(oPN) can accept input from the output of a comparator PN (cPN), and
produces output that can be linked to any number of oPNs or As. A
cPN can accept input from oPNs and pPNs and sends output to oPNs.
oPNs are special in another way. They come in two subtypes, one that
acts like a simple multiplier with a gain subject to reorganization,
while oPNs of the other type act as leaky integrators, and their
gain rate and leak rate are subject to reorganization.

The classification of PNs into different types reduces computational

complexity greatly if there are large numbers of S and A variables
and the environment is complicated. It would be perfectly possible
to describe the whole experiment as using PNs with no type
distinction, and I did so in an early draft of this message, but a
few back-of-the-envelope calculations suggested that the
reorganization time would very quickly get out of hand, even using
the e-coli method used by Powers to solve that same problem.

The influence of actuators on environmental variables is fixed for

the duration of any experimental or demo run, as is the influence of
the environmental variables on the sensors. The disturbances vary at
the experimenter’s whim.

Time must be considered. As this is a computer-based simulation,

time must be sampled, and one time unit could be the time between
samples. There must also be a delay associated with every link, but
these delays might be much shorter than a sample time, and therefore
could be taken as zero. The total loop delay, however, cannot be
taken to be zero. To ensure this, I will arbitrarily say that it
takes at least one sample time for a change in the output of an
actuator to be felt at a V environmental variable. Integration rates
and leak rates for the leaky integrator kind of output function can
be measured in rates per sample without loss of generality.

One further thing is needed, intrinsic variables to set the rate of

reorganization. Reinforcement Learning has a significant problem in
the “assignment of credit”, which different proposals finesse in
different ways. PCT has less of a problem in this, because each
control loop is responsible for best controlling its own perceptual
variable. The better the control, the more stable are all the
variables in the loop. I propose to use this fact as a local
intrinsic variable, to go along with the global intrinsic variable
of minimizing overall error. Reinforcement rate is the probability
of an e-coli move during any time sample.

Even though this is a simulation in which the intrinsic variable is

quite arbitrary, we can take a cue from real life. The big problem
with the human brain is how to dissipate the heat of neural
operations, primarily caused by neural firing. Our simulated neurons
(our PNs) don’t fire. They produce larger and smaller “neural
currents”, which can be positive or negative. Since electrical power
is I2R or V2 /R, one index of equivalent energy
usage of a PN might be the variance of its output. But then “R”,
resistance, also enters the formula. R must be related to link
weights. Indeed, what reorganization will do is change link weights,
so they are critical to the intrinsic variable we seek.

In a real brain, the firing rate of a neuron is not (so fas as I

know) influenced by how many synapses are reached by each impulse.
If it were an actual current, this would not be true. But a firing
is actually a voltage spike, not a blast of current. The Powers
“neural current” is perhaps better seen as an average voltage rather
than a summed current. Taking this view, in our BabyN, resistance
would be the inverse of link weight, so the momentary power of a PN
would be represented by (output2 )/(sum of output link
weights to other units). The energy generated as heat over a time t
would be t*(output variance)/(sum of link weights).

Energy cannot be dissipated instantaneously. It leaks away at a rate

proportionate to the temperature difference between the hot and cold
regions. In BabyN, we can simplify this and use it in reverse, to
indicate how hot the PN would be after some period of activity. The
temperature is proportional to the integrated energy produced by
firings, less that dissipated into the environment. It is therefore
the output of a leaky integrator that has input rate as above and
leak rate proportional to the current temperature difference between
the PN and the surroundings.

What are the surroundings of a PN? There are two parts, an outer

“cold” universe that we can take to be at zero temperature, and
nearby PNs. Since we have no self-organized way to localize a PN, we
must be arbitrary. I propose that all PNs of the same type (pPN,
cPN, and oPN, not distinguishing the oPN subtypes) should be the
neighbours of a PN. The resulting “cold sink” temperature would then
be somewhere between the average temperature of the neighbours and
zero. Perhaps half the average temperature of the neighbours might
be a simple starting point.

This temperature will be our intrinsic variable. If a PN gets too

cold or too hot, it will lose efficiency and reduce its output for
any given input. On the other hand, in a global sense, the cooler
the entire ensemble is, the less difficult it is to configure the
brain (in an evolutionary sense) to fit in a reasonable size of
skull. This suggests that there should also be a global intrinsic
variable for average temperature, with some reference value. For
simplicity, we might start by suggesting that the global temperature
optimum would be the same as for the individual PNs (a value
controlled by the experimenter in the early members of the BabyN
series, but possibly self-optimized in more complex versions).

The tension between the global and the many local intrinsic

variables should lead to a steady operating temperature with a
minimum number of PNs contributing to the work, because the more
operating PNs there are, the more heat there is to dissipate and the
hotter the ensemble becomes.

Reorganization: Each BabyN has variable input through its sensors.

These induce variation in all the PNs that are initially randomly
interconnected, generating “pseudo-heat”, which makes them all get
hot, some more than others. Each PN uses local e-coli
reorganization, treating the in- and out-link weights as coordinates
in a local space. (There must be an e-coli administrator hidden in
the genetic structure of the organism, but I won’t attempt to
suggest how it got there. I just assume it exists, as did Powers).

The global intrinsic variable also induces its own e-coli

reorganization, using all the link weights. The interactions between
the local and global e-coli effects will skew the apparent
directions in both when seen by an external analyst, but the local
and global effects do not interfere with each other, nor do they
operate at the same rate. A particular PN might be near its optimum
“temperature” while the ensemble is much hotter than the global
optimum. Imagine that we had just two links to work with. Locally,
the last step was, say, {1,2} and performance improved, so the next
step will again be in the same direction {1,2}. But global
reorganization says that the next step will be in the direction
{-.5,1}. The resulting actual next step will be {0.5,3} as seen by
an external analyst, but will be {1, 2} and {-.5, 1} as seen by the
local and global e-coli administrators (who are hidden in the
structure of the BabyN series).

As written above, I feel there is too much arbitrariness in the

detail, and perhaps a simpler global-local algorithm might be used
to determine changes in the global and local intrinsic variables. In
more complex organisms than are anticipated in the BabyN series, the
organism would have to find food in order to sustain its energy
dissipation, a topic that is totally ignored here. But note that
warm-blooded animals and cold-blooded animals alike must maintain
their body temperature within fairly strict limits or cease
effectively controlling, so treating temperature as an important
intrinsic variable both locally and globally seems a very natural
thing to do.

--------------

I will discuss what I suggest might be the first three members of a

BabyN series. Baby0 is the simplest of the series, and lives in the
simplest environment. Baby1 lives in almost the same environment.
Baby2 lives in a slightly more complicated environment based on the
earlier ones. The structure of the environment does not change
during any one experimental run, but may differ from run to run.

Here are the initial conditions for Baby0 and Baby1. Baby2 is based

on an already reorganized Baby1, and might be thought of as a stage
in the maturation of Baby1.

Organism: Baby0 and Baby1 have two sensors, S1 and S2, two actuators

A1 and A2, and several PNs initially randomly interconnected
according to the rules described above. “Randomly interconnected”
means connected with a link weight chosen from a uniform random
distribution between =1 and +1. To make it concrete, some numbers
pulled out of the air are 4 pPNs, 3 cPNs and 6 oPNs, three of each
type (multiplier and leaky integrator).

Environment: There are 2 environmental variables, V1 and V2. They

are independent of each other, and are influenced by independent
disturbances D1 and D2, as well as by the outputs of Baby0’s two
actuators A1 and A2.

Baby0: Actuator A1 output influences only variable V1 and A2

influences only V2. Likewise the V variables influence only their
corresponding sensors, so there are two independent environmental
paths from actuator through a V variable to the corresponding
sensor.

Baby1: Each actuator influences both environmental variables through

a link with a randomly chosen weight. Each environmental variable
variable influences both sensors through a link with a randomly
chosen weight. The existence of the two independent and
independently disturbed environmental variables is thereby hidden
from actuators and sensors (but may prove not to be hidden from
Baby1’s perceptual functions).

--------------

Some experimental questions.



Baby0: Will Baby0 learn to control any perceptions relating to the

independently disturbed, sensed, and acted-upon V1 and V2? If it
does, will the relevant control structure look like two independent
control loops, one controlling a perception of V1, the other of V2,
or will the controlled perceptions be joint functions of V1 and V2,
such as V1+V2 and V1-V2. Will unused PNs be pruned away by bringing
their link weights near zero?

Baby1: Will Baby1 learn to perceive and influence V1 and V2 as

independent variables?

Comment: The difference here between Baby0 and Baby1 is that in

principle, Baby0 could function optimally by simply connecting S1 to
A1 and S2 to A2 by links with negative weights. This is disallowed
by the connection rules, which do not provide for that kind of
connection. Baby0 must use at least a pPN, a cPN, and an oPN in each
loop, but each of them could work in a functioning control loop with
just one in-link and one output link. Is that the structure that
will be produced?

Baby1, on the other hand, could control perceptions of V1 and V2

separately only if both S variables are linked to two independent
pPNs, and each of two oPNs is linked to both actuators. Would Baby1
wind up controlling perceptions of V1 and V2, or complexes of both?

For both of these BabyN stages, the environment may be too simple,

in that there is unlikely to be much difference in energy
dissipation among structures that “perceive” different orthogonal
functions of V1 and V2, so different structures may emerge from
different experimental runs. However, in discussing Baby2, I will
assume Baby1 at least learned to perceive two orthogonal functions
of the two environmental variables, whether those functions are the
variables themselves, their sum and difference, or something else. I
will call the values of the perceptions output by those functions
P11 and P12 (level1 function 1 and 2).

------------Baby2-------

Organism: Baby2 is initialized as an already reorganized Baby1,

reorganized as above or manually organized to have some specified
structure of two perceptual control units controlling perceptions
P11 and P12. Baby2 is also given a few uncommitted PNs of each type.

Environment: In addition to V1 and V2 that Baby1 can keep stable by

controlling the two perceptions P11 and P12, the environment has a
new variable V21. V21 has its own separate disturbance D21, but the
disturbances D1 and D2 are no longer active. They are superseded by
the fact that variations in V21 are linked to V1 and V2 by links
with randomly initialized weights, meaning that the disturbances to
V1 and V2 are highly correlated because they have a common cause.

Question: Since to control a perception of V21 and through it V1 and

V2 will be energetically better than to control V1 and V2
separately, will Baby2 develop a perceptual function that outputs a
perception of V21 that we could legitimately call P21?

Comment: On the face of it, Baby2 seems more likely to develop a

perceptual function that produces P21 than Baby1 is to develop
perceptual functions that correspond individually to V1 and V2.
Baby2 is intended to be an attempt to answer Rupert’s question about
whether a formal description of reorganization can be created that
would result in the creation of new perceptual functions. I don’t
know whether the BabyN series would actually work as I hope it will.
It would be nice if someone takes up the programming challenge and
tries it at least as far as Baby2, because if Baby2 does develop a
perceptual function for V21 and controls the resulting perception,
it would answer a few questions that have bedevilled CSGnet and
would lead directly to more competent BabyN and perhaps to robots
that learn autonomously how best to navigate their environments and
perform useful tasks autonomously.

Further comment: Up to Baby2 and probably for a few more of the

BabyN series, the only side-effect that influences an intrinsic
variable is the creation of heat by the operation of the individual
PNs. To deal properly with reorganization of a larger hierarchy,
side-effects that work through the external environment but
influence some internal intrinsic variable should be included in
more complex versions of BabyN – always assuming that Baby2 works
as hoped. In a lizard, control of the local environment temperature
by moving into sun or shadow as required would be an example of such
an environmentally mediated side-effect, as it would change the
energy dissipation rate from individual neurons and from the
ensemble.

Martin
              RY: Sure. But, as far as I can

see, it is mostly conceptual. The only formal
definition I’m aware of is Bill’s arm reorg example in
LCS3, of gradual weight adjustments of output links,
which improves control performance. I don’t see
anything on how perceptual functions are learned. Or
how memory (which is learning) fits into
reorganisation.

            RM: The arm reorg example in LCS3 seems to be more

than conceptual; it actually works.

[From Bruce Nevin 2017.10.17.08:46 PT)]

Bingo!

Rick Marken (2017.10.17.0840) –

···

On Tue, Oct 17, 2017 at 8:42 AM, Richard Marken rsmarken@gmail.com wrote:

[From Rick Marken (2017.10.17.0840)

Martin Taylor (2017.10.11.10.21)–

[From Rupert Young (2017.10.11 10.00)]

RY:Yes. Which was what I said.
MT: The following is a tentative suggestion about simulating the
development of novel perceptions of an environment, in an attempt to
answer Rupert’s question.

RM: This reminded me of a demo Bill Powers did involving N control systems simultaneously controlling perceptions, each of which is a function of N environmental variables. I managed to find the program (which unfortunately runs only on PCs). But here it is:

 https://www.dropbox.com/s/2u00ac87bix2sjv/MultiControlPrj.exe?dl=0

RM: And I highly recommend running this program (if you can) after reading the write up associated with it:

https://www.dropbox.com/s/rwoqfa8v96g62ob/multiple_control.pdf?dl=0

RM: I won’t go into an explanation of the demonstration here but I’ll just say that it’s a great test bed for testing models of reorganization of perception. Indeed, Bill says as much in the write up:

BP: Since the initial perceptual weightings are selected at random, there is no guarantee that the perceptions are independent of each other. Thus there can be considerable conflict between some pairs of control systems. Also, the weightings can add up to a small or a large effect on the perception. For both reasons, some control systems will control more accurately than others. This, indeed, is one basis on which reorganization could occur: the weightings for a given control system’s input function could be randomly shuffled until the control error is minimized. This is a promising topic for further investigation. [italics mine]

RM: I think I recall there being a reorganizing version of this demo but I can’t seem to find it. If anyone has access to it please let us know about it. But I think it would be pretty easy to write up some perceptual learning models for this test bed program here. The goal would be to reorganize the weights of the linear combination of environmental variables that produce each of the N controlled perceptual variables so that these perceptions are as orthogonal as they can be. When this is true, you will see all N perceptions coming very close to their respective reference signals. I’m pretty sure that it would be found that RL algorithms don’t work, unless they are actually versions of E. coli reorganization by a different name.

RM: This demo also illustrate a few points I made earlier about the relationship between perception and environmental variables and about the nature of perceptual learning from a PCT perspective. The demo shows that perceptual variables are functions of environmental variables; they don’t correspond to variables in the environment. There are no CEV’s in this demo, just N environmental variables. The perceptual functions of the N control systems define the aspects of these N environmental variables that are controlled. The demo also shows that perceptual learning involves variation and selective retention of the parameters of “built in” types of perceptual functions. In this case, the type of perceptual function that is “built in” to all N control systems is a linear weighting function of the type p = w.1q.1+w.2q.2+… w.N*q.N where the w.i are the weights of the environmental variables q.i.Â

RM: As usual, you can learn a lot about PCT by reading Powers and, more importantly, following Powers’ demonstrations of how the model works.Â

Best

Rick

Remember that the Arm2 demo operates in an external environment

devoid of content. The Little Man operates in an external
environment that contains a single moving object. Neither
environment has anything for the “organism” to learn about, so they
can’t be used as examples for the development of novel perceptions.
But their principles can.

Let's imagine something along the lines of a complement of the Arm 2

demo, or rather, a possible series of them, which I call Baby0,
Baby1, …, BabyN. Each builds on its predecessor. Baby0 starts with
an almost trivially simple environment and a bunch of pseudo-neurons
that could arrange themselves into the form of control loops with
perceptual functions. The question is whether Baby0 can develop two
perceptual functions that will allow it to stabilize two variables
in its environment.

To help describe BabyN and its environment, I will use some naming

conventions. Variables in the environment will be Vn, as in V1,
V2,… Environmental variables are individually influenced by
disturbance variables Dn, in the sense that Vj is influenced by Dj.
The Dn are driven by the experimenter. Sensory variables, reported
to the interior of BabyN will be S1, S2, … It may be true, but
more often will not be true, that an S variable corresponds directly
to a V variable. Perceptual variables will be Pn, reference
variables Rn, error variables En, output variables On, and Actuator
variables (analogous to muscle tensions or joint angles) An. As with
the sensor variables, an actuator variable may, but usually will
not, be associated directly with an environment variable.

One more acronym I will use is "PNn" for "Pseudo-neuron number n". A

PN has a number of inputs, performs some function on them, (often a
simple addition) and produces an output. PNn output is connected to
one of the PNm inputs by link Lnm, which has a real number weight
and possibly a defined delay (but we will worry about that much
later). In principle, any PN output may be linked as an input to any
or all other PNs. In the early BabyN series, however, PNs come in
families that have specialized linkage possibilities. The
experimental (or demo) question for any of the BabyNs is whether,
under the initial conditions specified, it will reorganize to
produce perceptual functions that enable it to live effectively in
its particular environment. I describe “live effectively” when I
come to “intrinsic variables”.

A "newborn" BabyN may have some control units already constructed,

but it also has (or may create) uncommitted PNs that might form new
control units. Baby0 has nothing pre-constructed, but uncommitted
PNs come in several different types or flavours, perceptual,
comparator, and output (two sub-types). A perceptual PN (pPN) can
accept input from a Sensor or from the output of a pPN. An output PN
(oPN) can accept input from the output of a comparator PN (cPN), and
produces output that can be linked to any number of oPNs or As. A
cPN can accept input from oPNs and pPNs and sends output to oPNs.
oPNs are special in another way. They come in two subtypes, one that
acts like a simple multiplier with a gain subject to reorganization,
while oPNs of the other type act as leaky integrators, and their
gain rate and leak rate are subject to reorganization.

The classification of PNs into different types reduces computational

complexity greatly if there are large numbers of S and A variables
and the environment is complicated. It would be perfectly possible
to describe the whole experiment as using PNs with no type
distinction, and I did so in an early draft of this message, but a
few back-of-the-envelope calculations suggested that the
reorganization time would very quickly get out of hand, even using
the e-coli method used by Powers to solve that same problem.

The influence of actuators on environmental variables is fixed for

the duration of any experimental or demo run, as is the influence of
the environmental variables on the sensors. The disturbances vary at
the experimenter’s whim.

Time must be considered. As this is a computer-based simulation,

time must be sampled, and one time unit could be the time between
samples. There must also be a delay associated with every link, but
these delays might be much shorter than a sample time, and therefore
could be taken as zero. The total loop delay, however, cannot be
taken to be zero. To ensure this, I will arbitrarily say that it
takes at least one sample time for a change in the output of an
actuator to be felt at a V environmental variable. Integration rates
and leak rates for the leaky integrator kind of output function can
be measured in rates per sample without loss of generality.

One further thing is needed, intrinsic variables to set the rate of

reorganization. Reinforcement Learning has a significant problem in
the “assignment of credit”, which different proposals finesse in
different ways. PCT has less of a problem in this, because each
control loop is responsible for best controlling its own perceptual
variable. The better the control, the more stable are all the
variables in the loop. I propose to use this fact as a local
intrinsic variable, to go along with the global intrinsic variable
of minimizing overall error. Reinforcement rate is the probability
of an e-coli move during any time sample.

Even though this is a simulation in which the intrinsic variable is

quite arbitrary, we can take a cue from real life. The big problem
with the human brain is how to dissipate the heat of neural
operations, primarily caused by neural firing. Our simulated neurons
(our PNs) don’t fire. They produce larger and smaller “neural
currents”, which can be positive or negative. Since electrical power
is I2R or V2 /R, one index of equivalent energy
usage of a PN might be the variance of its output. But then “R”,
resistance, also enters the formula. R must be related to link
weights. Indeed, what reorganization will do is change link weights,
so they are critical to the intrinsic variable we seek.

In a real brain, the firing rate of a neuron is not (so fas as I

know) influenced by how many synapses are reached by each impulse.
If it were an actual current, this would not be true. But a firing
is actually a voltage spike, not a blast of current. The Powers
“neural current” is perhaps better seen as an average voltage rather
than a summed current. Taking this view, in our BabyN, resistance
would be the inverse of link weight, so the momentary power of a PN
would be represented by (output2 )/(sum of output link
weights to other units). The energy generated as heat over a time t
would be t*(output variance)/(sum of link weights).

Energy cannot be dissipated instantaneously. It leaks away at a rate

proportionate to the temperature difference between the hot and cold
regions. In BabyN, we can simplify this and use it in reverse, to
indicate how hot the PN would be after some period of activity. The
temperature is proportional to the integrated energy produced by
firings, less that dissipated into the environment. It is therefore
the output of a leaky integrator that has input rate as above and
leak rate proportional to the current temperature difference between
the PN and the surroundings.

What are the surroundings of a PN? There are two parts, an outer

“cold” universe that we can take to be at zero temperature, and
nearby PNs. Since we have no self-organized way to localize a PN, we
must be arbitrary. I propose that all PNs of the same type (pPN,
cPN, and oPN, not distinguishing the oPN subtypes) should be the
neighbours of a PN. The resulting “cold sink” temperature would then
be somewhere between the average temperature of the neighbours and
zero. Perhaps half the average temperature of the neighbours might
be a simple starting point.

This temperature will be our intrinsic variable. If a PN gets too

cold or too hot, it will lose efficiency and reduce its output for
any given input. On the other hand, in a global sense, the cooler
the entire ensemble is, the less difficult it is to configure the
brain (in an evolutionary sense) to fit in a reasonable size of
skull. This suggests that there should also be a global intrinsic
variable for average temperature, with some reference value. For
simplicity, we might start by suggesting that the global temperature
optimum would be the same as for the individual PNs (a value
controlled by the experimenter in the early members of the BabyN
series, but possibly self-optimized in more complex versions).

The tension between the global and the many local intrinsic

variables should lead to a steady operating temperature with a
minimum number of PNs contributing to the work, because the more
operating PNs there are, the more heat there is to dissipate and the
hotter the ensemble becomes.

Reorganization: Each BabyN has variable input through its sensors.

These induce variation in all the PNs that are initially randomly
interconnected, generating “pseudo-heat”, which makes them all get
hot, some more than others. Each PN uses local e-coli
reorganization, treating the in- and out-link weights as coordinates
in a local space. (There must be an e-coli administrator hidden in
the genetic structure of the organism, but I won’t attempt to
suggest how it got there. I just assume it exists, as did Powers).

The global intrinsic variable also induces its own e-coli

reorganization, using all the link weights. The interactions between
the local and global e-coli effects will skew the apparent
directions in both when seen by an external analyst, but the local
and global effects do not interfere with each other, nor do they
operate at the same rate. A particular PN might be near its optimum
“temperature” while the ensemble is much hotter than the global
optimum. Imagine that we had just two links to work with. Locally,
the last step was, say, {1,2} and performance improved, so the next
step will again be in the same direction {1,2}. But global
reorganization says that the next step will be in the direction
{-.5,1}. The resulting actual next step will be {0.5,3} as seen by
an external analyst, but will be {1, 2} and {-.5, 1} as seen by the
local and global e-coli administrators (who are hidden in the
structure of the BabyN series).

As written above, I feel there is too much arbitrariness in the

detail, and perhaps a simpler global-local algorithm might be used
to determine changes in the global and local intrinsic variables. In
more complex organisms than are anticipated in the BabyN series, the
organism would have to find food in order to sustain its energy
dissipation, a topic that is totally ignored here. But note that
warm-blooded animals and cold-blooded animals alike must maintain
their body temperature within fairly strict limits or cease
effectively controlling, so treating temperature as an important
intrinsic variable both locally and globally seems a very natural
thing to do.

--------------

I will discuss what I suggest might be the first three members of a

BabyN series. Baby0 is the simplest of the series, and lives in the
simplest environment. Baby1 lives in almost the same environment.
Baby2 lives in a slightly more complicated environment based on the
earlier ones. The structure of the environment does not change
during any one experimental run, but may differ from run to run.

Here are the initial conditions for Baby0 and Baby1. Baby2 is based

on an already reorganized Baby1, and might be thought of as a stage
in the maturation of Baby1.

Organism: Baby0 and Baby1 have two sensors, S1 and S2, two actuators

A1 and A2, and several PNs initially randomly interconnected
according to the rules described above. “Randomly interconnected”
means connected with a link weight chosen from a uniform random
distribution between =1 and +1. To make it concrete, some numbers
pulled out of the air are 4 pPNs, 3 cPNs and 6 oPNs, three of each
type (multiplier and leaky integrator).

Environment: There are 2 environmental variables, V1 and V2. They

are independent of each other, and are influenced by independent
disturbances D1 and D2, as well as by the outputs of Baby0’s two
actuators A1 and A2.

Baby0: Actuator A1 output influences only variable V1 and A2

influences only V2. Likewise the V variables influence only their
corresponding sensors, so there are two independent environmental
paths from actuator through a V variable to the corresponding
sensor.

Baby1: Each actuator influences both environmental variables through

a link with a randomly chosen weight. Each environmental variable
variable influences both sensors through a link with a randomly
chosen weight. The existence of the two independent and
independently disturbed environmental variables is thereby hidden
from actuators and sensors (but may prove not to be hidden from
Baby1’s perceptual functions).

--------------

Some experimental questions.



Baby0: Will Baby0 learn to control any perceptions relating to the

independently disturbed, sensed, and acted-upon V1 and V2? If it
does, will the relevant control structure look like two independent
control loops, one controlling a perception of V1, the other of V2,
or will the controlled perceptions be joint functions of V1 and V2,
such as V1+V2 and V1-V2. Will unused PNs be pruned away by bringing
their link weights near zero?

Baby1: Will Baby1 learn to perceive and influence V1 and V2 as

independent variables?

Comment: The difference here between Baby0 and Baby1 is that in

principle, Baby0 could function optimally by simply connecting S1 to
A1 and S2 to A2 by links with negative weights. This is disallowed
by the connection rules, which do not provide for that kind of
connection. Baby0 must use at least a pPN, a cPN, and an oPN in each
loop, but each of them could work in a functioning control loop with
just one in-link and one output link. Is that the structure that
will be produced?

Baby1, on the other hand, could control perceptions of V1 and V2

separately only if both S variables are linked to two independent
pPNs, and each of two oPNs is linked to both actuators. Would Baby1
wind up controlling perceptions of V1 and V2, or complexes of both?

For both of these BabyN stages, the environment may be too simple,

in that there is unlikely to be much difference in energy
dissipation among structures that “perceive” different orthogonal
functions of V1 and V2, so different structures may emerge from
different experimental runs. However, in discussing Baby2, I will
assume Baby1 at least learned to perceive two orthogonal functions
of the two environmental variables, whether those functions are the
variables themselves, their sum and difference, or something else. I
will call the values of the perceptions output by those functions
P11 and P12 (level1 function 1 and 2).

------------Baby2-------

Organism: Baby2 is initialized as an already reorganized Baby1,

reorganized as above or manually organized to have some specified
structure of two perceptual control units controlling perceptions
P11 and P12. Baby2 is also given a few uncommitted PNs of each type.

Environment: In addition to V1 and V2 that Baby1 can keep stable by

controlling the two perceptions P11 and P12, the environment has a
new variable V21. V21 has its own separate disturbance D21, but the
disturbances D1 and D2 are no longer active. They are superseded by
the fact that variations in V21 are linked to V1 and V2 by links
with randomly initialized weights, meaning that the disturbances to
V1 and V2 are highly correlated because they have a common cause.

Question: Since to control a perception of V21 and through it V1 and

V2 will be energetically better than to control V1 and V2
separately, will Baby2 develop a perceptual function that outputs a
perception of V21 that we could legitimately call P21?

Comment: On the face of it, Baby2 seems more likely to develop a

perceptual function that produces P21 than Baby1 is to develop
perceptual functions that correspond individually to V1 and V2.
Baby2 is intended to be an attempt to answer Rupert’s question about
whether a formal description of reorganization can be created that
would result in the creation of new perceptual functions. I don’t
know whether the BabyN series would actually work as I hope it will.
It would be nice if someone takes up the programming challenge and
tries it at least as far as Baby2, because if Baby2 does develop a
perceptual function for V21 and controls the resulting perception,
it would answer a few questions that have bedevilled CSGnet and
would lead directly to more competent BabyN and perhaps to robots
that learn autonomously how best to navigate their environments and
perform useful tasks autonomously.

Further comment: Up to Baby2 and probably for a few more of the

BabyN series, the only side-effect that influences an intrinsic
variable is the creation of heat by the operation of the individual
PNs. To deal properly with reorganization of a larger hierarchy,
side-effects that work through the external environment but
influence some internal intrinsic variable should be included in
more complex versions of BabyN – always assuming that Baby2 works
as hoped. In a lizard, control of the local environment temperature
by moving into sun or shadow as required would be an example of such
an environmentally mediated side-effect, as it would change the
energy dissipation rate from individual neurons and from the
ensemble.

Martin

Richard S. MarkenÂ

"Perfection is achieved not when you have nothing more to add, but when you
have nothing left to take away.�
                --Antoine de Saint-Exupery

            RM: The arm reorg example in LCS3 seems to be more

than conceptual; it actually works.

[Martin Taylor 2017.10.18.08.08]

Yes. Thanks for providing this, which I had not come across. It's a

nice paper, and indeed it (without looking at the code) seems a good
testbed for reorganization. One would, of course, have to
incorporate one or more intrinsic variables to adjust the rate of
reorganization. A very interesting question would be whether the
eventual organization would approximate a one-to-one relationship
between environmental variable n, perceptual variable n and actuator
output n, or whether a matrix is equally optimal with respect to the
intrinsic variable. For an intrinsic variable to be separate from
the perceptual hierarchy, the environmental variables or the
side-effects of controlling them must influence it somehow. In my proposal for a series of “BabyN” systems, the intrinsic
variable I proposed was the internal temperature of the system,
which is influenced by energy generation and dissipation rates.
Using the output RMS amplitude (V for voltage in this message) as a
proxy for neural firing rate, the energy output would be V/R,
where R is the effective resistance of all the parallel outputs, or
1/(sum of all output weights from a pseudo-neuron --a “PN”). V/R
is then a product of the stability of the output of the PN and the
influence it has on its neighbours.
In Bill’s structure there would be a tendency for the input and
output matrices to minimize their off-diagonal elements, and move
toward a structure in which perceptions correspond directly to
environmental variables and the interconnected mesh (which controls
perfectly well) to resolve into a set of almost independent control
loops. Only if the disturbances were correlated would one expect the
result of long reorganization to contain off-diagonal elements in
the perceptual matrix, and correspondingly in the output matrix.
If disturbances were correlated, the correlation would analytically
imply that there exists some other variable behind the overt ones
described in Bill’s presentation. That variable might show up as a
second-level perception if there is a temperature optimizing
intrinsic variable driving reorganization.
Each of which is a CEV, by definition, even if it doesn’t correspond
directly to a single perceptual variable in a structure that is
artificially constructed and that has no intrinsic variables other
than control quality to drive reorganization. That’s not the issue,
and never has been. We expect the relationships between perceptions
and regularities in the environment to change as the organism learns
to handle ever-changing situations by reorganization. The question
has always been about which perceptions survive and stabilize to be
useful as inputs to ever higher levels in a hierarchy of
perceptions. What Bill shows is that a one-level set of perceptions can exist
that stabilize a set of CEVs against disturbance without here being
a one-to-one correspondence between the perceptions and the CEVs.
It’s the same thing, demonstrated elegantly, as Kent and I have been
doing with our discussions of “Giant Virtual Controllers” (GVCs).
The question in both Bill’s within-the-organism structure and the
social structure of GVCs is which perception-to-environment
connections survive the “slings and arrows of outrageous”
disturbances in the longer term. Are there social “intrinsic
variables”? It’s a good question. Maybe there’s an analogue to
temperature in social interactions. Who knows?
Anyway, it’s definitely a paper that advances PCT by loosening
constraints and thereby inspires further ideas.
I recommend also thinking about what the demos mean and what they
imply, and how they relate to real life, which is what I would call
an aspect of “following”, as in “following up”. Leaving what Bill
says as the final word on a topic, set in stone, is the antithesis
of what Bill’s enquiring mind was about.
Martin

···

On 2017/10/17 11:42 AM, Richard Marken
wrote:

[From Rick Marken (2017.10.17.0840)

Martin Taylor (2017.10.11.10.21)–

[From Rupert Young (2017.10.11 10.00)]

                        RM: The arm reorg example in LCS3 seems

to be more than conceptual; it actually
works.

RY:Yes. Which was what I said.
MT: The following is a tentative suggestion about
simulating the development of novel perceptions of an
environment, in an attempt to answer Rupert’s question.

          RM: This reminded me of a demo Bill Powers did

involving N control systems simultaneously controlling
perceptions, each of which is a function of N
environmental variables. I managed to find the program
(which unfortunately runs only on PCs). But here it is:

https://www.dropbox.com/s/2u00ac87bix2sjv/MultiControlPrj.exe?dl=0

          RM: And I highly recommend running this program (if you

can) after reading the write up associated with it:

https://www.dropbox.com/s/rwoqfa8v96g62ob/multiple_control.pdf?dl=0

          RM: I won't go into an explanation of the demonstration

here but I’ll just say that it’s a great test bed for
testing models of reorganization of perception. Indeed,
Bill says as much in the write up:

            BP: Since the initial

perceptual weightings are selected at random, there is
no guarantee that the perceptions are independent of
each other. Thus there can be considerable conflict
between some pairs of control systems. Also, the
weightings can add up to a small or a large effect on
the perception. For both reasons, some control systems
will control more accurately than others. * This,
indeed, is one basis on which reorganization could
occur: the weightings for a given control system’s
input function could be randomly shuffled until the
control error is minimized.* This is a promising
topic for further investigation. [italics mine]

22

          RM: This demo also illustrate a few points I made

earlier about the relationship between perception and
environmental variables and about the nature of perceptual
learning from a PCT perspective. The demo shows that
perceptual variables are functions of environmental
variables; they don’t correspond to variables in the
environment. There are no CEV’s in this demo, just N
environmental variables.

          RM: As usual, you can learn a lot about PCT by reading

Powers and, more importantly, following Powers’
demonstrations of how the model works.

[Martin Taylor 2017.10.18.09.21]

PS....

[Martin Taylor 2017.10.18.08.08]

...
The question in both Bill's within-the-organism structure and the social structure of GVCs is which perception-to-environment connections survive the "slings and arrows of outrageous" disturbances in the longer term.

In light of the other thread on reinforcement learning, it's interesting to note that J.G.Taylor (no relation) addressed this very question in a reinforcement learning context in his book "The Behavioral Basis of Perception" (Yale U.P., 1962).

Martin

[From Rupert Young (2017.10.19 15.00)]

(Martin Taylor 2017.10.11.10.21]

Very interesting, and raises lots of questions. I hope to get on to modelling some learning soon, if I can, so may come back to this.

One further thing is needed, intrinsic variables to set the rate of reorganization. Reinforcement Learning has a significant problem in the "assignment of credit", which different proposals finesse in different ways. PCT has less of a problem in this, because each control loop is responsible for best controlling its own perceptual variable. The better the control, the more stable are all the variables in the loop. I propose to use this fact as a local intrinsic variable, to go along with the global intrinsic variable of minimizing overall error. Reinforcement rate is the probability of an e-coli move during any time sample.

Could this local intrinsic error be the system's persistent error? That is, the error measured over a prolonged period.

The global intrinsic variable also induces its own e-coli reorganization, using all the link weights. The interactions between the local and global e-coli effects will skew the apparent directions in both when seen by an external analyst, but the local and global effects do not interfere with each other, nor do they operate at the same rate. A particular PN might be near its optimum "temperature" while the ensemble is much hotter than the global optimum. Imagine that we had just two links to work with. Locally, the last step was, say, {1,2} and performance improved, so the next step will again be in the same direction {1,2}. But global reorganization says that the next step will be in the direction {-.5,1}. The resulting actual next step will be {0.5,3} as seen by an external analyst, but will be {1, 2} and {-.5, 1} as seen by the local and global e-coli administrators (who are hidden in the structure of the BabyN series).

On a related point the changes made to the set of weights in each iteration of Bill's arm reorg were randomly different for each weight; some could be +ve and some -ve. My understanding of standard neural networks is that the sign of the changes for each weight is the same, in each iteration. Is that the case? If so, wouldn't that mean that for some weights they go in the wrong direction?

Organism: Baby0 and Baby1 have two sensors, S1 and S2, two actuators A1 and A2, and several PNs initially randomly interconnected according to the rules described above. "Randomly interconnected" means connected with a link weight chosen from a uniform random distribution between =1 and +1. To make it concrete, some numbers pulled out of the air are 4 pPNs, 3 cPNs and 6 oPNs, three of each type (multiplier and leaky integrator).

I presume you meant "-1 and +1"?

Rupert

[Martin Taylor 2017.10.19.11.06]

[From Rupert Young (2017.10.19 15.00)]

(Martin Taylor 2017.10.11.10.21]

Very interesting, and raises lots of questions. I hope to get on to modelling some learning soon, if I can, so may come back to this.

I'd be delighted if you did so.

One further thing is needed, intrinsic variables to set the rate of reorganization. Reinforcement Learning has a significant problem in the "assignment of credit", which different proposals finesse in different ways. PCT has less of a problem in this, because each control loop is responsible for best controlling its own perceptual variable. The better the control, the more stable are all the variables in the loop. I propose to use this fact as a local intrinsic variable, to go along with the global intrinsic variable of minimizing overall error. Reinforcement rate is the probability of an e-coli move during any time sample.

Could this local intrinsic error be the system's persistent error? That is, the error measured over a prolonged period.

I don't think so, because to use that has a flavour of circular reasoning (circular loops are OK, but usually not in reasoning). However, I hope that my suggestion of temperature has a similar effect, as temperature varies with stability. The problem with using error alone as an intrinsic variable is that you can reduce error to zero by making all the link weights zero. That's a kind of extreme Zen approach. Want nothing (i.e. control no variables) and just go with the flow. The problem is that even the most practiced Zen monk must eat (and maintain his body temperature within tight limits).

I will now commence (or maybe continue) waffling...

For a long time, decades before I ever heard of PCT, I have wondered why it seemed that there was some kind of variance equalization process gong on across levels of perception. If you artificially stabilize the input to a level, it seems to become more sensitive to small variations, keeping more or less constant its output variance. You can get some of the same effect if what is reported is the logarithm of proportional change (a log of a change in a log), but I don't think that's what is happening. Rather, it is as though a background (I hesitate to say "intrinsic") system controls not for minimum variance but for some level of optimum variance. This thought leads in two different but not orthogonal directions. The one that I want to follow first is the temperature idea. Too little variance globally means the brain gets colder, which is presumably as bad as getting too hot.

Heating the waffle-iron a bit more, a maximum entropy situation (not putting all your eggs in one basket) is generally easier to maintain than is a low-entropy situation. In this case, the entropy component of interest is the variance of variance across control units. So on the basis of very flimsy analogy, one might expect that if one variance sticks its head up over the crowd, it will get chopped down to size.

That leads to the other direction of thought, an analogy with tensegrity structures that I have been following for quite other reasons in my thinking about PCT. Tensegrity structures transfer throughout the structure loads applied to any one point. I think one could call "tensegrity" an emergent property in the same way as is control. Control just doesn't happen unless there exists a connected loop of components with the particular structure of a control loop. In the case of tensegrity, it takes (I believe) a minimum of three compression members and nine tension members before the tensegrity property can emerge in a free-standing structure. Be that as it may, the analogy is between load on some point in a tensegrity structure and disturbance to a controlled perception. The disturbance causes at least momentary deviation of the perception from its reference value, a deviation that passed throughout the control hierarchy and dissipated if the hierarchy has been effectively reorganized, though, like a badly constructed tensegrity structure, it can collapse catastrophically under an unfortunately placed load if reorganization has been less effective.

In this case, all we need to consider is that if a high-variance output is distributed to several higher (on the perceptual side) or lower (on the output side) systems, their outputs will become more variable (those systems will not control as well). That doesn't reduce the variability of the one we started with that had a high variance, but if that one is a perceptual function, high variance means it isn't controlling well, which means it will be getting hot, and if temperature is an intrinsic variable, local reorganization will take place in one or both sides of the hierarchy. When it does control well, its variance will have subsided.

What if the maverick component has a very low variance output? It will tend to get cold and reduce its output further. If it is contributing to other controlled perceptions, the loss of its contribution will reduce the available degrees of freedom as well as require the other inputs to compensate for its loss. The feedback from that should increase its variance once again. If it's not contributing, its cooling should continue, in effect freezing it out of the network. That could be a way of pruning the mess of random synaptic connections, leaving only perceptual functions and output functions that correspond to real environmental properties. At least that's my intuition, which could easily be very wrong.

The global intrinsic variable also induces its own e-coli reorganization, using all the link weights. The interactions between the local and global e-coli effects will skew the apparent directions in both when seen by an external analyst, but the local and global effects do not interfere with each other, nor do they operate at the same rate. A particular PN might be near its optimum "temperature" while the ensemble is much hotter than the global optimum. Imagine that we had just two links to work with. Locally, the last step was, say, {1,2} and performance improved, so the next step will again be in the same direction {1,2}. But global reorganization says that the next step will be in the direction {-.5,1}. The resulting actual next step will be {0.5,3} as seen by an external analyst, but will be {1, 2} and {-.5, 1} as seen by the local and global e-coli administrators (who are hidden in the structure of the BabyN series).

On a related point the changes made to the set of weights in each iteration of Bill's arm reorg were randomly different for each weight; some could be +ve and some -ve. My understanding of standard neural networks is that the sign of the changes for each weight is the same, in each iteration. Is that the case? If so, wouldn't that mean that for some weights they go in the wrong direction?

In e-coli reorganization change of direction happens only when the performance criterion ceases to improve and starts getting worse. Since there's no way the organism can tell which way is "uphill", e-coli tries directions at random until it finds one that is at least a little uphill. If by "iteration" you mean "moment of direction change", then yes, it is random. If by "iteration" you mean "moment of change of value" then the direction is usually invariant until going that way doesn't help any more. And when the direction change happens, one would expect that on average half the weights would change in the wrong direction. Without some external guiding hand that knows where the target is, how could it be otherwise?

Organism: Baby0 and Baby1 have two sensors, S1 and S2, two actuators A1 and A2, and several PNs initially randomly interconnected according to the rules described above. "Randomly interconnected" means connected with a link weight chosen from a uniform random distribution between =1 and +1. To make it concrete, some numbers pulled out of the air are 4 pPNs, 3 cPNs and 6 oPNs, three of each type (multiplier and leaky integrator).

I presume you meant "-1 and +1"?

Yes. Even after reading your message pointing out the typo, my eyes were not good enough to see what you were driving at right away. I read what I had written as a minus, even though it had two bars, until I looked carefully and closely.

Thanks for the comments. If any of this seems to make sense, I'll be happy, because it's really rather a long-distance speculation on not much evidence. What isn't so speculative is that reorganization should, as Bill believed, create perceptions that correspond to real environmental properties, whether those perceptions are represented locally or are distributed across the perceiving network. I think of that difference as being analogous to the difference between representing a particular waveform as a set of sample values or as a Fourier transform or a short-time Fourier transform. It's all the same waveform, equally exactly described, whether the descriptive elements are localized in time, spread out over shortish time intervals, or all spread over the whole time of the waveform.

It would actually be rather interesting if a BabyN (where N is quite high) reorganized to distribute its visual perceptual representations over fairly wide regions of environmental space, especially if the distribution looked anything like a Wavelet transform or a Fourier transform.

Martin

Martin

[From Rupert Young (2017.10.20 15.05)]

(Martin Taylor 2017.10.19.11.06]
    Could this local intrinsic error be the

system’s persistent error? That is, the error measured over a
prolonged period.

  I don't think so, because to use that has a flavour of circular

reasoning (circular loops are OK, but usually not in reasoning).
However, I hope that my suggestion of temperature has a similar
effect, as temperature varies with stability. The problem with
using error alone as an intrinsic variable is that you can reduce
error to zero by making all the link weights zero. That’s a kind
of extreme Zen approach. Want nothing (i.e. control no variables)
and just go with the flow. The problem is that even the most
practiced Zen monk must eat (and maintain his body temperature
within tight limits).

If we're talking weights on a perceptual function then the making

them zero would make the output of the perceptual function zero, but
not the error, which would now be = r ; the reference signal.
Were you talking about something else?

      The global intrinsic variable also

induces its own e-coli reorganization, using all the link
weights. The interactions between the local and global e-coli
effects will skew the apparent directions in both when seen by
an external analyst, but the local and global effects do not
interfere with each other, nor do they operate at the same
rate. A particular PN might be near its optimum “temperature”
while the ensemble is much hotter than the global optimum.
Imagine that we had just two links to work with. Locally, the
last step was, say, {1,2} and performance improved, so the
next step will again be in the same direction {1,2}. But
global reorganization says that the next step will be in the
direction {-.5,1}. The resulting actual next step will be
{0.5,3} as seen by an external analyst, but will be {1, 2} and
{-.5, 1} as seen by the local and global e-coli administrators
(who are hidden in the structure of the BabyN series).

    On a related point the changes made to the set of weights in

each iteration of Bill’s arm reorg were randomly different for
each weight; some could be +ve and some -ve. My understanding of
standard neural networks is that the sign of the changes for
each weight is the same, in each iteration. Is that the case? If
so, wouldn’t that mean that for some weights they go in the
wrong direction?

  In e-coli reorganization change of direction happens only when the

performance criterion ceases to improve and starts getting worse.
Since there’s no way the organism can tell which way is “uphill”,
e-coli tries directions at random until it finds one that is at
least a little uphill. If by “iteration” you mean “moment of
direction change”, then yes, it is random. If by “iteration” you
mean “moment of change of value” then the direction is usually
invariant until going that way doesn’t help any more. And when the
direction change happens, one would expect that on average half
the weights would change in the wrong direction. Without some
external guiding hand that knows where the target is, how could it
be otherwise?

By iteration I mean each time the weights change ("when the

performance criterion ceases to improve"). A different random number
is applied to each weight. So, in this function the change to each
of w(i,j) may have a different sign.

![offnbencogpooogc.jpg|816x612](upload://1CtOlSGRp2iIPzzyVfnz8y1Le71.jpeg)

However,  in the neural networks delta rule;

()
the delta is the same for each weight as it is based on the error
(difference between y(tarj)-y(j) ). So, this raises a couple of question for me:
Regards,
Rupert

···

http://www.cs.stir.ac.uk/courses/CSC9YF/lectures/ANN/3-DeltaRule.pdf

  1.     In NN's as all the weights change in the same direction how
    

can this converge, as I would have thought it is likely to be
the case that the change for some should be +ve and some -ve, to
descend the gradient.

  1.     In PCT the changes are random and this results in a meandering
    

of the weight space. Wouldn’t this be a problem with increase of
dimensions (more inputs) as there is more chance that the
weights will go in the wrong direction (error doesn’t decrease)?

[Martin Taylor 2017.10.20.10.11]

[From Rupert Young (2017.10.20 15.05)]

(Martin Taylor 2017.10.19.11.06]

      Could this local intrinsic error be the

system’s persistent error? That is, the error measured over a
prolonged period.

    I don't think so, because to use that has a flavour of circular

reasoning (circular loops are OK, but usually not in reasoning).
However, I hope that my suggestion of temperature has a similar
effect, as temperature varies with stability. The problem with
using error alone as an intrinsic variable is that you can
reduce error to zero by making all the link weights zero. That’s
a kind of extreme Zen approach. Want nothing (i.e. control no
variables) and just go with the flow. The problem is that even
the most practiced Zen monk must eat (and maintain his body
temperature within tight limits).

  If we're talking weights on a perceptual function then the making

them zero would make the output of the perceptual function zero,
but not the error, which would now be = r ; the reference
signal. Were you talking about something else?

No, I'm not talking about something else. Imagine where the

reference signals come from, recognizing that the top level has all
the reference values effectively zero, because there is no higher
level to supply any other value. The only place variation would come
from is the environment, and if all the weights to the internal
pseudo-neurons are zero, environmental changes don’t affect the
inside of the BabyN. All the signals would be zero. It’s all too
easy for the whole thing to collapse into itself or explode without
limit in the absence of some countervailing influence. That
influence (n PCT) is the reorganization that increases its rate as
intrinsic variables depart from their optima. So we need to have an
intrinsic variable that is distinct from the simple ability to
control.

Remember also that Baby0 and Baby1 start with no control units at

all, and Baby2 starts with only those that Baby1 developed.
Otherwise, all the units are randomly connected within the rules I
imposed to reduce the computational complexity of the experiment
(some are committed, in effect, to become perceptual functions, some
to become comparators, and some to become output functions).

        The global intrinsic variable also

induces its own e-coli reorganization, using all the link
weights. The interactions between the local and global
e-coli effects will skew the apparent directions in both
when seen by an external analyst, but the local and global
effects do not interfere with each other, nor do they
operate at the same rate. A particular PN might be near its
optimum “temperature” while the ensemble is much hotter than
the global optimum. Imagine that we had just two links to
work with. Locally, the last step was, say, {1,2} and
performance improved, so the next step will again be in the
same direction {1,2}. But global reorganization says that
the next step will be in the direction {-.5,1}. The
resulting actual next step will be {0.5,3} as seen by an
external analyst, but will be {1, 2} and {-.5, 1} as seen by
the local and global e-coli administrators (who are hidden
in the structure of the BabyN series).

      On a related point the changes made to the set of weights in

each iteration of Bill’s arm reorg were randomly different for
each weight; some could be +ve and some -ve. My understanding
of standard neural networks is that the sign of the changes
for each weight is the same, in each iteration. Is that the
case? If so, wouldn’t that mean that for some weights they go
in the wrong direction?

    In e-coli reorganization change of direction happens only when

the performance criterion ceases to improve and starts getting
worse. Since there’s no way the organism can tell which way is
“uphill”, e-coli tries directions at random until it finds one
that is at least a little uphill. If by “iteration” you mean
“moment of direction change”, then yes, it is random. If by
“iteration” you mean “moment of change of value” then the
direction is usually invariant until going that way doesn’t help
any more. And when the direction change happens, one would
expect that on average half the weights would change in the
wrong direction. Without some external guiding hand that knows
where the target is, how could it be otherwise?

  By iteration I mean each time the weights change ("when the

performance criterion ceases to improve"). A different random
number is applied to each weight. So, in this function the change
to each of w(i,j) may have a different sign.

Not in e-coli reorganization. Occasionally, that happens, but

usually the weights change by the same amount as they did on the
last iteration (or in a variant of that, by a multiplier of that
amount that is the same for all of them, which keeps the weights
changing in the same direction. Rick has a nice demo of e-coli
approach to a target in 2D
. Check
it out.
The problem Powers faced was that the reorganizing system has no
access to the required differentiable activation function. It
doesn’t know which direction is likely to lead to an improvement. If
it did, the hill-climbing problem would have been soluble using
statistical techniques that could be implemented fairly well by
conceivable neural arrangements. But in a living organism the system
is blind, in that it has no way of knowing whether, compared to what
it was doing, going more leftward or rightward, up or down (in 3D)
would be better. “Doing that” made it better the last time, so
“let’s do more of that” is the e-coli method. I suppose you could
call that a differentiable activation function, but it is
differentiable only along the direction of past changes. In what you quote, I assume “tarj” means a target for j, which I
suppose would be a reference value for perception j (y)
if node j was a perceptual function (as it often is in neural
network research). But there’s no such error value for any of the
other types of node in the PCT hierarchy, and in any case, the
parameters involved are unlikely to be linearly additive weights.
For e-coli, all that is needed is that there be parameter values
that can be changed, whether continuously or discretely (discrete
changed mean finding the best of N possibilities rather than an
optimum from a range, but the operation is effectively the same, so
long as the N possibilities can be ordered; if they can’t be
ordered, e-coli reverts to random choice.)
Correct. There’s no constraint on the direction of change, as there
is on the actual firing rate, which can’t go negative (neural
current in PCT can’t go negative). In our usual discussions we
ignore the non-negative nature of neural currents, and do analyses
as though all values can be anywhere on the real number line. But
there was a thread not long ago that considered the implications of
this simplification, so it hasn’t been totally ignored here on
CSGnet.
Very much so, to the extent that if the choices were only ternary
(go left or go right or stay as you were), and there were only 10
weights to consider, the chance of going in exactly the best
direction would be only 1/59049. It’s a problem that bedevilled
Powers until he thought of the e-coli solution. I’ve forgotten where
I read it, but in one of his writings early on, he said something
like for even a moderately sized network, the time it would take for
a reasonably good optimization by random choice would exceed the age
of the Universe, so he just had to assume that there must exist a
method of reorganization that would work usefully well in times that
were appropriate to the demands on the organism. Modularization is
one approach to a solution (Kauffmann found that rather small
modules that interacted tightly within the module but interacted
between modules only at their boundaries were optimum in his toy
task – a kind of analogue of reorganization). E-coli is another, as
the Arm2 demo shows. For a long time Powers and some of us both
on-line and off-line were trying to find a naturalistic approach to
a modular e-coli reorganization structure. That is still an open
question, so far as I know. In my BabyN proposal, I chose what I think is fairly a fairly
natural optimization, by making each PN its own module within which
e-coli reorganization happens. In what I wrote, I added a global
change to the local change, but now I think of it, maybe the global
parameters that are reorganized might better be the scale of the
local changes, in the sense that each PN is given a scalar weight
that it applies to its incoming and outgoing weights, meaning that
weight w is adjusted relative to weights w
by local e-coli within node j, and then multiplied by the scale
factors for nodes i and j. If a node is useless in controlling
through the environment, its scale weight would approach zero, just
would happen to a link if it was unhelpful in making a node useful.
That might accomplish effective pruning.
Just a thought.
Martin

···

http://www.mindreadings.com/ControlDemo/Select.html

  However,  in the neural networks delta rule; ()

the delta is the same for each weight as it is based on the error
(difference between y(tarj)-y(j) ).

j

  So, this raises a couple of question for me:
  1.       In NN's as all the weights change in the same direction how
    

can this converge, as I would have thought it is likely to be
the case that the change for some should be +ve and some -ve,
to descend the gradient.

  1.       In PCT the changes are random and this results in a
    

meandering of the weight space. Wouldn’t this be a problem
with increase of dimensions (more inputs) as there is more
chance that the weights will go in the wrong direction (error
doesn’t decrease)?

ij.j

http://www.cs.stir.ac.uk/courses/CSC9YF/lectures/ANN/3-DeltaRule.pdf

[From Rupert Young (2017.10.21 12.12)]

(Martin Taylor 2017.10.20.10.11]

    If we're talking weights on a perceptual function then the

making them zero would make the output of the perceptual
function zero, but not the error, which would now be = r ;
the reference signal. Were you talking about something else?

  No, I'm not talking about something else. Imagine where the

reference signals come from, recognizing that the top level has
all the reference values effectively zero, because there is no
higher level to supply any other value. The only place variation
would come from is the environment, and if all the weights to the
internal pseudo-neurons are zero, environmental changes don’t
affect the inside of the BabyN. All the signals would be zero.
It’s all too easy for the whole thing to collapse into itself or
explode without limit in the absence of some countervailing
influence. That influence (n PCT) is the reorganization that
increases its rate as intrinsic variables depart from their
optima. So we need to have an intrinsic variable that is distinct
from the simple ability to control.

But surely due to the environmental feedback the higher system would

not perceive zero, resulting in error, resulting in non-zero
references so the signals don’t get to zero. In my balancing
reorganisation demo https://www.youtube.com/watch?v=QF7K6Lhx5C8
(which uses local error) each system in the hierarchy has a
reference for zero, but if a gain value (weight) went to zero then,
due to the environmental feedback, the higher system would
experience error, which would mean that it changes the reference to
non-zero for the lower system. Any movement (due to reorganisation)
of the gain away from the optimum value would introduce error which
would then (due to reorganisation) bring the value back to its
optimum.

Also what is meant by zero? In computer models is it not just

another number on a scale, for perceptions? And it could be avoided,
I think, in computer models by rescaling the values to non-zero
regions.

In real neural systems does zero have the same meaning considering

that signals don’t go -ve. Is there not a difference between no
activity and activity which we might think of representing zero on
some scale? So, does the zero issue actually occur in real systems?

        On a related point the changes made

to the set of weights in each iteration of Bill’s arm reorg
were randomly different for each weight; some could be +ve
and some -ve. My understanding of standard neural networks
is that the sign of the changes for each weight is the same,
in each iteration. Is that the case? If so, wouldn’t that
mean that for some weights they go in the wrong direction?

      In e-coli reorganization change of direction happens only when

the performance criterion ceases to improve and starts getting
worse. Since there’s no way the organism can tell which way is
“uphill”, e-coli tries directions at random until it finds one
that is at least a little uphill. If by “iteration” you mean
“moment of direction change”, then yes, it is random. If by
“iteration” you mean “moment of change of value” then the
direction is usually invariant until going that way doesn’t
help any more. And when the direction change happens, one
would expect that on average half the weights would change in
the wrong direction. Without some external guiding hand that
knows where the target is, how could it be otherwise?

    By iteration I mean each time the weights change ("when the

performance criterion ceases to improve"). A different random
number is applied to each weight. So, in this function the
change to each of w(i,j) may have a different sign.

  Not in e-coli reorganization. Occasionally, that happens, but

usually the weights change by the same amount as they did on the
last iteration (or in a variant of that, by a multiplier of that
amount that is the same for all of them, which keeps the weights
changing in the same direction. Rick has a nice demo of e-coli
approach to a target in 2D .
Check it out.

What I meant was when there was a tumble; i.e. when the weight

change changes. Each time there is a tumble the weight change for
each weight is derived from a different random number for each
weight, between -1 and +1. So some of the new weight changes may be
-ve and some +ve.

Regards,

Rupert
···

http://www.mindreadings.com/ControlDemo/Select.html

[Martin Taylor 2017.10.21.13.27]

[From Rupert Young (2017.10.21 12.12)]

Good questions all. I'll try to suggest answers.

(Martin Taylor 2017.10.20.10.11]

      If we're talking weights on a perceptual function then the

making them zero would make the output of the perceptual
function zero, but not the error, which would now be = r ;
the reference signal. Were you talking about something else?

    No, I'm not talking about something else. Imagine where the

reference signals come from, recognizing that the top level has
all the reference values effectively zero, because there is no
higher level to supply any other value. The only place variation
would come from is the environment, and if all the weights to
the internal pseudo-neurons are zero, environmental changes
don’t affect the inside of the BabyN. All the signals would be
zero. It’s all too easy for the whole thing to collapse into
itself or explode without limit in the absence of some
countervailing influence. That influence (n PCT) is the
reorganization that increases its rate as intrinsic variables
depart from their optima. So we need to have an intrinsic
variable that is distinct from the simple ability to control.

  But surely due to the environmental feedback the higher system

would not perceive zero,

OK. I guess I was a bit confusing, mixing up two things. The first

is an already organized hierarchy, while the other is an unorganized
BabyN system. BabyN may have some control units inherited from an
earlier BabyN, but its experimental environment is one in which the
question is whether higher-level control systems would emerge in an
environment in which it would be advantageous to the BabyN’s
intrinsic variable(s) for it to do so. So we talk about an already
organized part of a BabyN hierarchy, because if there isn’t one, may
or may not be any feedback loops, and there would certainly be no
higher-level systems, or indeed any levels at all since the BabyN
systems start with random links.

So the question was whether the evolution/reorganization of the

system would collapse by on average reducing the strengths of all
the links, explode into a fiery death by increasing the weights
without limit and thereby magnifying all variation, which is equated
with energy that has to be dissipated, or would come to some optimum
and stay around there as reorganization proceeds.

I said: "*      The only place variation would come from is the

environment, and if all the weights to the internal pseudo-neurons
are zero, environmental changes don’t affect the inside of the
BabyN. All the signals would be zero* . The possibility I was
alluding to was the collapse. Even if there is a “top-level” control
system, there would be nothing feeding it a reference signal, so
that if the perceptual inputs all had zero weight, its error would
be the inverse of the perceptual value, zero. That would propagate
down to lower levels, making their reference values all zero, while
their perceptual weights declined toward zero. In the end, it would
all be zero-ed out.

The objective of choosing a suitable intrinsic variable is to make

sure that doesn’t happen. If control quality is measured by error,
bringing all the weights everywhere to zero eliminates error. Seen
from outside, the environmental variables wouldn’t look well
controlled, but the BabyN has no way to see what the “real”
environmental variables are. It’s only got its perceptions, and they
are all staying perfectly at their reference values. That’s where
the intrinsic variables come in, in the sense that they either must
be influenced by variation in the external variables otherwise than
through the sensors, or they may be influenced by some effects from
the controlling systems. I hazarded a guess that effective
temperature might do the job if the optimum was not zero.

  resulting in error, resulting in non-zero references so the

signals don’t get to zero. In my balancing reorganisation demo https://www.youtube.com/watch?v=QF7K6Lhx5C8
(which uses local error) each system in the hierarchy has a
reference for zero, but if a gain value (weight) went to zero
then, due to the environmental feedback, the higher system would
experience error, which would mean that it changes the reference
to non-zero for the lower system. Any movement (due to
reorganisation) of the gain away from the optimum value would
introduce error which would then (due to reorganisation) bring the
value back to its optimum.

  Also what is meant by zero? In computer models is it not just

another number on a scale, for perceptions? And it could be
avoided, I think, in computer models by rescaling the values to
non-zero regions.

It's more than that if you are talking about variances, which are

squared values of changes around an average. Squared values in a
real-valued system are always positive, and zero has a meaning of
perfect invariance. Scaling perceptions to shift the zero point
wouldn’t change the variance. Scaling by changing the magnification
would.

  In real neural systems does zero have the same meaning considering

that signals don’t go -ve. Is there not a difference between no
activity and activity which we might think of representing zero on
some scale? So, does the zero issue actually occur in real
systems?

Probably not in the actual functioning, because one thing that

happens in a human newborn (at least) is an awful lot of pruning of
synaptic connections (bringing their weights to zero), leaving only
non-zero weights, for signals (firing rates) that vary over time. I
don’t think of it as an issue for living organisms, but it is an
issue for simulation of reorganization when you don’t allow a
pre-assigned source of perturbation. Setting an optimum simulation
temperature isn’t a specific source of perturbation. It just says
that if there isn’t enough, do something to make more.

          On a related point the changes made

to the set of weights in each iteration of Bill’s arm
reorg were randomly different for each weight; some could
be +ve and some -ve. My understanding of standard neural
networks is that the sign of the changes for each weight
is the same, in each iteration. Is that the case? If so,
wouldn’t that mean that for some weights they go in the
wrong direction?

        In e-coli reorganization change of direction happens only

when the performance criterion ceases to improve and starts
getting worse. Since there’s no way the organism can tell
which way is “uphill”, e-coli tries directions at random
until it finds one that is at least a little uphill. If by
“iteration” you mean “moment of direction change”, then yes,
it is random. If by “iteration” you mean “moment of change
of value” then the direction is usually invariant until
going that way doesn’t help any more. And when the direction
change happens, one would expect that on average half the
weights would change in the wrong direction. Without some
external guiding hand that knows where the target is, how
could it be otherwise?

      By iteration I mean each time the weights change ("when the

performance criterion ceases to improve"). A different random
number is applied to each weight. So, in this function the
change to each of w(i,j) may have a different sign.

    Not in e-coli reorganization. Occasionally, that happens, but

usually the weights change by the same amount as they did on the
last iteration (or in a variant of that, by a multiplier of that
amount that is the same for all of them, which keeps the weights
changing in the same direction. Rick has a nice demo of e-coli
approach to a target in 2D .
Check it out.

  What I meant was when there was a tumble; i.e. when the weight

change changes. Each time there is a tumble the weight change for
each weight is derived from a different random number for each
weight, between -1 and +1. So some of the new weight changes may
be -ve and some +ve.

Not quite. What changes in a tumble isn't the weights, but the

amount they will change in the next simulation cycle. The weights
won’t change at all at the moment of a tumble. What changes is the
direction they will move thereafter. If that makes things worse,
they will tumble again very soon. If it makes things go better, they
will keep going that direction for a while.

Martin
···

http://www.mindreadings.com/ControlDemo/Select.html

[From Rupert Young (2017.10.23 17.10)]

(Martin Taylor 2017.10.20.10.11]

RY:

    ![offnbencogpooogc.jpg|816x612](upload://1CtOlSGRp2iIPzzyVfnz8y1Le71.jpeg)
    the delta is the same for each weight as it is based on the

error (difference between y(tarj)-y(j) ).

  The problem Powers faced was that the reorganizing system has no

access to the required differentiable activation function. It
doesn’t know which direction is likely to lead to an improvement.
If it did, the hill-climbing problem would have been soluble using
statistical techniques that could be implemented fairly well by
conceivable neural arrangements. But in a living organism the
system is blind, in that it has no way of knowing whether,
compared to what it was doing, going more leftward or rightward,
up or down (in 3D) would be better. “Doing that” made it better
the last time, so “let’s do more of that” is the e-coli method. I
suppose you could call that a differentiable activation function,
but it is differentiable only along the direction of past changes.

Is it the case that you don't have this direction problem if there

is only one dimension (one weight); you just go in the opposite
direction if the error increased? But for for more than one
dimension you don’t know for which of the weights the sign (of the
correction) should change? So, for 2 dimensions there are four
quadrants to choose from (quadrants increase exponentially with
dimensions). With ecoli the quadrant is chosen at random. How is the
quadrant chosen in the case of the differentiable functions of
neural networks?

  In

what you quote, I assume “tarj” means a target for j, which I
suppose would be a reference value for perception j (yj )
if node j was a perceptual function (as it often is in neural
network research). But there’s no such error value for any of the
other types of node in the PCT hierarchy, and in any case, the
parameters involved are unlikely to be linearly additive weights.
For e-coli, all that is needed is that there be parameter values
that can be changed, whether continuously or discretely (discrete
changed mean finding the best of N possibilities rather than an
optimum from a range, but the operation is effectively the same,
so long as the N possibilities can be ordered; if they can’t be
ordered, e-coli reverts to random choice.)

    So, this raises a couple of question for me:
  1.         In NN's as all the weights change in the same direction
    

how can this converge, as I would have thought it is likely
to be the case that the change for some should be +ve and
some -ve, to descend the gradient.

  Correct. There's no constraint on the direction of change, ....
Actually, I was saying I thought there was a constraint on the

weight change in neural networks, as the correction for all weights
for a node are determined by the difference between the target and
current activation for that node. I was querying how convergence
takes place if all the weights are going in the same direction?

  1.         In PCT the changes are random and this results in a
    

meandering of the weight space. Wouldn’t this be a problem
with increase of dimensions (more inputs) as there is more
chance that the weights will go in the wrong direction
(error doesn’t decrease)?

  Very much so, to the extent that if the choices were only ternary

(go left or go right or stay as you were), and there were only 10
weights to consider, the chance of going in exactly the best
direction would be only 1/59049. It’s a problem that bedevilled
Powers until he thought of the e-coli solution.

Do you mean that the ecoli solution resolves the dimension explosion

problem? Wouldn’t you just end up with a lot more time spent
tumbling as there are many more possibilities of going in the wrong
direction?

Rupert

[From Erling Jorgensen (2017.10.23 1355 EDT)]

Rupert Young (2017.10.23 17.10)

RY (to Martin): Do you mean that the ecoli solution resolves the dimension explosion problem? Wouldn’t you just end up with a lot more time spent tumbling as there are many more possibilities of going in the wrong direction?

EJ: With e-coli reorganization, a high dimension space would certainly lead to “more tumbles,” although not necessarily all that much “more time spent” as a result of those tumbles. By that I mean, not much time spent actually going in the wrong direction. I believe e-coli solves the dimension explosion problem by quickly tumbling again, if the previous tumble does not lead to improvement. So the number of times goes up with the dimension, but the time spent goes up at a slower proportion.

EJ: I don’t know if we have a way to measure the actual time a given tumble takes, or how long to detect non-improvement. But by minimizing time spent “making it worse”, the e-coli method can come close to either “standing one’s ground or making it better”. If the cost of a tumble (metabolic or otherwise) is not too great, that seems to be a worthwhile trade-off.

All the best,

Erling

···

Disclaimer: This message is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employer or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by telephone and delete the material from your computer. Thank you for your cooperation.

[From Rupert Young (2017.10.27 18.15)]

(Erling Jorgensen (2017.10.23 1355 EDT)]

Rupert Young (2017.10.23 17.10)

      >RY (to Martin):  Do you mean that the ecoli solution

resolves the dimension explosion problem? Wouldn’t you just
end up with a lot more time spent tumbling as there are many
more possibilities of going in the wrong direction?

      EJ:  With e-coli reorganization, a high dimension space

would certainly lead to “more tumbles,” although not
necessarily all that much “more time spent” as a result of
those tumbles. By that I mean, not much time spent actually
going in the wrong direction. I believe e-coli solves the
dimension explosion problem by quickly tumbling again, if the
previous tumble does not lead to improvement. So the number
of times goes up with the dimension, but the time spent goes
up at a slower proportion.

Thanks Erling that's very useful.

Rupert

[From Rupert Young (2017.10.27 18.25)]

(Martin Taylor 2017.10.21.13.27]

Good points, I'll try and bear them in mind.

What I meant was when there was a tumble; i.e. when the weight change changes. Each time there is a tumble the weight change for each weight is derived from a different random number for each weight, between -1 and +1. So some of the new weight changes may be -ve and some +ve.

Not quite. What changes in a tumble isn't the weights, but the amount they will change in the next simulation cycle. The weights won't change at all at the moment of a tumble. What changes is the direction they will move thereafter.

I believe I was saying the same thing, I wonder if you may have misread what I wrote.

Regards,
Rupert

[Martin Taylor 2017.10.27.13.30]

  [From

Rupert Young (2017.10.27 18.25)]

  (Martin Taylor 2017.10.21.13.27]




  Good points, I'll try and bear them in mind.
      What I meant was when there was a

tumble; i.e. when the weight change changes. Each time there
is a tumble the weight change for each weight is derived from
a different random number for each weight, between -1 and +1.
So some of the new weight changes may be -ve and some +ve.

    Not quite. What changes in a tumble isn't the weights, but the

amount they will change in the next simulation cycle. The
weights won’t change at all at the moment of a tumble. What
changes is the direction they will move thereafter.

  I believe I was saying the same thing, I wonder if you may have

misread what I wrote.

Obviously I did. On rereading it I can see the ambiguity. Had you

omitted the word “weight” a couple of times, I think I might have
read it as you intended the first time: “* Each time there is a
tumble the [weight] change for each weight*” and “* So some of
the [new weight] changes may be -ve and some +ve* .” Do you see
where I was misled?

Anyway, I'm glad we seem to be on the same page in understanding

what happens, as opposed to understanding the language used to talk
about it.

Martin