Well, I've got a primitive version of astro going, tho I haven't done
enough work with it to be entirely sure of what it does. Here's how it
works (somewhat modified from my original posting on the subject).
Asto is a little spaceship who tries to track his mother ship using fore
& aft thrusters (only 1-dimensional movement for now). Mother wanders
back and forth across the screen, changing direction suddenly & at
random (& when she hits the edges). Astro has a perceptual system that
sets a reference level for velocity toward mother at:
v_ref = c*x^(1/2) (`^' = exponentiation)
where c is a constant settable by the reorganization system, whose
optimal value is determined by the maximum acceleration that astro's
thrusters can produce (when c is to big, there's overshoot, when it's
to small, the tracking isn't aggressive enough).
Astro also has a intrinsic reference level for 0 distance towards
mother, which drives `reorganization' (really `tuning' in a case like
this), which is adding an increment (currently 0.05 or -0.05) to c.
A running average of the intrinsic error is maintained, and every 100th
iteration (100 chosen by trial and error, tho this value may well be too
small), the current average intrinsic error is compared with that of the
previous reorganization episode. If the new is greater than the old,
the sign of the increment is reversed, otherwise the increment, is
retained unchanged, and the increment is then added to c.
When c is too high, causing overshoot, the system seems to settle down,
but when it starts near 0, it doesn't do so well, especially since once
c gets to -1, then the intrinsic error climbs inexorably upward, but the
increment just changes sign, and the problem never gets fixed.
This problem might be approached by having the increment vary with the
magnitude of the average intrinsic error, but this would only reinforce
the main intuition that I've gotten from this so far, which is that
reorganization systems have to be rather carefully tuned to their
domains in order to produce sensible results in a reasonable amount of
time. (e.g. working critters need a lot of built-in knowledge about
their environment, albeit perhaps rather high-level knowledge, such as
about appropriate timescales for reorganization).
Another aspect of it that bothers me is that the reorganization episodes
are sudden & discrete, while it would seem better to have it happening
gradually, but determined by the long-term trend in intrinsic error.
But I haven't figured out a plausible way to do this.
What next? Well, I suspect that astro is probably not worth developing
much further, since the environment is too un-biological in nature
(my original motivation was that since handling yourself with thrusters
in a low friction environment is hard, a machine that could learn to do
it would be impressive. But maybe it's too hard a problem, and too artificial).
So maybe I'll add coefficients of friction and turn him into a fish.
Avery Andrews@anu.edu.au