Weight-transpose control model

Bill_Powers1 · October 18, 2008, 8:39pm

[From Bill Powers (2008.10.18.1438 MDT)]

Tracy B. Harms (2008-10-18 12:01 Pacific) --

Numbering these five systems from zero through four, I notice that
systems 0, 2, and 3 maintain relatively low error but systems 1 and 4
have ever-increasing error.

Does this suggest that I've mishandled some aspect of the translation?

Possibly, but there may be a mistake in the code. You write:

oe=: oe + (ko*dt) * rs - ps NB. output effect (alters factors)

This seems to be intended to say that the output oe is the old value
of output plus (ko*dt) times the error signal, which is reference -
perception or rs - ps. If the conventions for precedence of operators
are the same in J as in other languages, however, that is not what
that line of code says, because there are no parentheses around "rs -
ps." (Ko*dt) multiplies only the reference signal, and the perceptual
signal is then subtracted from the product. I think the line should
look like this:

oe=: oe + (ko*dt) * (rs - ps) NB. output effect (alters factors)

Do you agree?

Bill P.

No virus found in this outgoing message.
Checked by AVG - http://www.avg.com
Version: 8.0.173 / Virus Database: 270.8.1/1731 - Release Date: 10/17/2008 7:01 PM

Tracy_Harms · October 20, 2008, 12:33pm

[from Tracy B. Harms (2008-10-20 05:29 Pacific)]

Hi Bill,

The code does not need the parentheses you suggested because J has no
precedence among functions. Instead, there is a right-to-left
evaluation rule. For example, evaluation of 3*5-2 gives 9, not the
result of 13 most languages would compute.

I went through the more complete code you sent and matched
initialization to the same ranges. More importantly, I think, I added
code to normalize the weighting matrices as you had. The improvement
was dramatic, showing reduced error over time for all systems instead
of only some.

Here are two test run reports. For each line a new set of random
values was used. The first number is the magnitude of the largest
error at the end of processing, and the second number is the count of
systems for which the error is less than 100. This particular test was
done with 100 systems (and 100x100 sized weighting matrices). The
first twenty runs iterated 10,000 times each, the next set did
100,000,

   20 run10e 4
147.587 76
187.747 80
192.291 80
178.039 77
  149.9 69
159.651 76
173.423 74
179.094 72
186.312 70
186.555 73
162.739 72
158.958 76
167.586 76
179.879 77
153.721 73
170.908 76
168.426 81
145.444 76
185.759 69
161.064 70
   20 run10e 5 NB. 100
2.84923e_5 100
4.61379e_5 100
6.47067e_5 100
7.01763e_5 100
4.20655e_5 100
0.000150492 100
5.74201e_5 100
6.93783e_5 100
0.000105297 100
6.63768e_5 100
9.13737e_5 100
7.77277e_5 100
6.48943e_5 100
0.000111045 100
3.48462e_5 100
2.15484e_5 100
3.02702e_5 100
6.57733e_5 100
8.68881e_5 100
4.75084e_5 100

The revised code follows.

Tracy

···

_____________________________________________________________

NB. William T. Powers' matrix-transpose model
NB. translated to J by Tracy Harms (DRAFT, 2008-10-19)
NB. wtpmtm.ijs

initmtm=: verb define
   NB. quantity of control systems in this network:
size=: 100
   NB. weighting coefficients, input and output:
wi=. 1.98* ? 0 $~ 2# size
wco=: |: wci=: 100* wi % +/,wi
   NB. normalize to 100, and define output weights
   NB. as transpose of input weights

ko=: 1
dt=: 0.0002
ps=: size # 0
oe=: size # 0
vs=: 2000*0.5-~ ? size # 0
rs=: 2000*0.5-~ ? size # 0
)
initmtm''

act=: verb define
ps=: vs (+/"_1 . *) wci NB. perceived situation
oe=: oe + 0.0002 * rs - ps NB. output effect (alters factors)
vs=: oe (+/"_1 . *) wco NB. variable-factor situation
rs-ps NB. output for testing
)

run10e=: verb define
initmtm''
(>./,[:+/100&>)| act^:(10^y)''
:
run10e"_1 x # ,: y
)

Bill_Powers1 · October 20, 2008, 3:53pm

[From Bill Powers (2008.10.20.-0944 MDT)]

Tracy B. Harms (2008-10-20 05:29 Pacific)--

The code does not need the parentheses you suggested because J has no
precedence among functions. Instead, there is a right-to-left
evaluation rule. For example, evaluation of 3*5-2 gives 9, not the
result of 13 most languages would compute.

So how do you write (a + b)*(x + y)? Are parentheses honored if you
do use them?

I went through the more complete code you sent and matched
initialization to the same ranges. More importantly, I think, I added
code to normalize the weighting matrices as you had. The improvement
was dramatic, showing reduced error over time for all systems instead
of only some.

OK,great. The reason for the normalization is that when you use
random numbers in the input matrix, the magnitude of the perceptual
signal can vary wildly, which effectively causes the loop gain to
vary unpredictably by a large amount. Unless you use a very small
value of dt, chances are good that some of the systems will go into
oscillation because of having too much loop gain. To counteract that
you can make dt very small -- but then you have to wait forever for
the error to get small in the systems that don't have high loop gains.

The normalization isn't necessary when you reorganize the output
matrix instead of setting it to the transpose of the input matrix.
The entries in the output matrix all start at zero, and when a system
starts to become unstable, reorganization is triggered because of the
increase in error, and the system returns to stability.

Nice way of presenting the data!

Gotta run now to get my space at the mobile home park paid for.

Best,

Bill P.

No virus found in this outgoing message.
Checked by AVG - http://www.avg.com
Version: 8.0.173 / Virus Database: 270.8.1/1734 - Release Date: 10/20/2008 7:25 AM

Tracy_Harms · October 20, 2008, 8:54pm

[from Tracy B. Harms (2008-10-20 13:49 Pacific)]

Re. Bill Powers (2008.10.20.-0944 MDT)

...
So how do you write (a + b)*(x + y)? Are parentheses honored if you do use
them?

Yes, parentheses work normally, so the way you wrote it would work as
you'd expect it to. There wouldn't be anything wrong with adding the
parentheses you suggested, but it wouldn't have fixed anything,
either.

(The only non-normal use of parentheses in J is the odd use of a
solitary right-paren to end a multi-line script.)

... The reason for the normalization is that ...

Thank you very much for explaining this. At present I'm not able to
grasp it so fully as I'd like, but it helps move me toward a stronger
understanding.

My immediate curiosity is about the difference between discrete models
and continuous systems. You said that "Unless you use a very small
value of dt, chances are good that some of the systems will go into
oscillation because of having too much loop gain." In a non-digital
system there is no approximation occurring over a discrete dt, it
seems to me, there are only the actually continuous changes of the
components. Yet, I'd wager that the same sort of oscillation due to
too much loop gain would be possible. I can't quite put my finger on
the right way to ask the question that floats in front of me as I
compare these two things...

Nice way of presenting the data!

Glad you liked it. Here's a new approach, where the columns indicate
the errors which 100%, 75%, 50%, and 25% of the systems are
less-than-or-equal-to. Here are some results from runs on systems
sized at 100 and 500:

   size
100
   5 r10e 2
973.779 705.765 442.972 231.084
978.843 768.533 475.926 228.497
974.636 685.998 441.102 263.212
979.095 766.264 529.043 259.795
975.388 701.124 479.172 296.944
   5 r10e 3
822.582 574.326 416.975 212.369
809.158 630.792 441.169 202.888
806.361 521.778 329.246 139.269
826.465 593.599 369.542 178.443
822.512 613.308 388.945 205.893
   5 r10e 4
186.724 90.6913 60.6271 40.4473
181.976 98.681 64.1077 24.942
173.293 95.7286 55.7975 24.8745
173.038 101.317 60.9026 25.1176
162.278 92.4096 58.0736 24.4473

size
500
5 r10e 4
919.018 708.364 479.42 244.208
923.241 708.977 458.959 227.474
924.917 725.841 453.252 212.574
921.198 683.316 456.497 211.271
926.589 709.71 475.969 227.014

Hope your move is going well,

Tracy

···

_____________________________________________________________
run10e=: verb define
initmtm''
({:, q3&{, q2&{, q1&{)/:~| act^:(10^y)''
:
run10e"0 x # y
)

Bill_Powers1 · October 21, 2008, 12:15am

[From Bill Powers (2008.10.20.1650 MDT)]

Tracy B. Harms (2008-10-20 13:49 Pacific)--

My immediate curiosity is about the difference between discrete models
and continuous systems. You said that "Unless you use a very small
value of dt, chances are good that some of the systems will go into
oscillation because of having too much loop gain." In a non-digital
system there is no approximation occurring over a discrete dt, it
seems to me, there are only the actually continuous changes of the
components. Yet, I'd wager that the same sort of oscillation due to
too much loop gain would be possible. I can't quite put my finger on
the right way to ask the question that floats in front of me as I
compare these two things...

Your instincts are excellent. There are indeed two causes of
oscillation in digital simulations. One is real, and the other is
what I call a "computational oscillation."

The real oscillation can happen, for example, when the underlying
continuous system has more than a single integration in the feedback
loop. With only one integration, all continuous control systems are
unconditionally stable with any loop gain no matter how large (until
you reach limits set by the speed of conduction of signals in a
wire). With two integrations and a high enough loop gain, the lag
added by the second integration (or by any other means) makes it
possible for the loop gain, nominally negative, to become positive
and greater than 1 at some high-enough frequency. When that happens,
the loop will go into spontaneous oscillations which increase in
amplitude until some physical limit is reached. This happens in real
control systems and can shake them to pieces. Needless to say,
oscillating systems do not control.

The computational oscillation happens only in a digital
implementation of a continuous feedback loop. Basically, there is a
transport lag of one iteration-time (dt) in any closed loop that is
digitally simulated. If the components of the system correct errors
too rapidly, so signals can change appreciably in the time
represented by one dt, the result of an error signal can, with
sufficient loop gain, be too large a correction so that on the next
iteration, the error is larger than and of opposite sign to that at
the start of the previous iteration. Once that happens, obviously the
overcorrections will simply get larger and larger, each full cycle of
the oscillation having a duration of exactly twice the iteration rate.

That's the giveaway that the oscillations aren't real. The
oscillation period of 2*dt says that these are artifacts of the
method of computation and do not represent anything in the real
continuous system being modeled. To get rid of them, we simply make
sure that one element in the control loop responds slowly enough so
that the delayed corrections are smaller than the original error. We
usually put that slowing into the output function, though it can go
anywhere. If the environment or any component in the loop contains a
natural slowing factor, such as an integration like the one that
turns a force into a velocity, that is sufficient by itself. We would
then have to be sure that dt is short enough so that with the desired
loop gain, computational oscillations do not occur. Since no more
than one integration is allowed in a control loop if it's to be
unconditionally stable, we can't add another slowing factor, and
we're stuck with the one that exists because it's part of a model and
if we change it it won't be an accurate model any more. Shortening dt
or lowering the loop gain is then the only option, and if we're
modeling an existing system, we can't alter the loop gain either. So
that leaves only shortening dt as a way to stop the computational oscillations.

Why don't we want to shorten dt if we don't have to? Purely practical
reasons. If dt has to be one millionth of a second, that means we
have to do one million iterations of the simulation to show the
behavior over only one second of time. That could slow down the
computations to the point where the whole project becomes
impractical. Forty years ago, when I was programming on a Digital
Equipment Corporation PDP-8/S, simulating the behavior of 500 control
systems as we now are doing was simply beyond reach. Making dt small
enough would have meant using half a semester to do a one-minute run.
One addition of 12-bit numbers took 36 microseconds (instead of
today's couple of nanoseconds for 32-bit numbers).

Here's a new approach, where the columns indicate
the errors which 100%, 75%, 50%, and 25% of the systems are
less-than-or-equal-to. Here are some results from runs on systems
sized at 100 and 500:

It looks as if you've really got the whole model running correctly.
When you have a large number of systems, convergence to a final
steady state can be very, very slow since we're not doing any
optimizations at all. If you made dt adjustable, or the loop gain,
you might be able to set it to obtain the fastest feasible
convergence for a given input matrix. But there will always be some
random input matrices that are close to the critical state
(determinant of zero) and contain a lot of conflict between control
systems, and then you'll just have to wait. Maybe you could increase
dt or the loop gain to speed them up, but that would probably cause
trouble with the control systems that happen to have nice parameters.
I don't think you want to get into adjusting gain for each control system.

Remember that my point in writing this demo was to test the idea that
assemblages of control systems could independently control different
aspects of a common pool of environmental variables simply by making
the output matrix the transpose of the input matrix. I didn't know
then that Richard Kennaway would come along and prove that this was
correct with about three steps of mathematical reasoning.

Best,

Bill P.

No virus found in this outgoing message.
Checked by AVG - http://www.avg.com
Version: 8.0.173 / Virus Database: 270.8.1/1734 - Release Date: 10/20/2008 7:25 AM