Feedback control systems are simple, though underrated, self-correcting, adaptive mechanisms. This talk will discuss how such systems, that, particularly, control their perceptions, are the basis for behaviour and intelligence within living systems. When arranged in hierarchies they provide dynamic powerful solutions to complex behavioural scenarios without the need for internal predictive models. I will show some demonstrations of the approach applied to simulated and real robotics systems and present some new, preliminary work on how the architectures can self-organise, through evolutionary processes.
Using genetic algorithms to develop control hierarchies was particularly illuminating. Genetic algorithms are ‘directed’. Warren, Eetu, and I have had some discussion that suggests (to me at least) how control associated with the language branch of the hierarchy could have an ‘directive’ influence on reorganization. There are at least three branches of the hierarchy at its lower levels: the somatic branch and the language branch in addition to the familiar behavioral branch. That discussion is in Fundamentals/How are new input functions created?.
I finally got a chance to see your talk and it was quite impressive. My only complaint is that you gave only a cursory description of the your learning algorithm. You did show a quick set of different hierarchies that were generated by the algorithm but you didn’t say much about how the “evolution” process worked or what the criterion for working was. And so when you compared your algorithm to the RL algorithm, showing that former did much better than the latter, I didn’t know what made one better than the other. I also had questions about how the relative levels were selected and whether the output parameters were varied independently of the input parameters. Is there a place where you give some details on the two learning algorithms you compared in this study?
One little nit: In the Cartpole Controller chart at 32:00 minutes you refer to the variables IPA, IPV, ICP and ICV – which I presume refer to pole angle, pole velocity, cart position and cart velocity – as environmental variables. These are actually perceptions since they have to be derived from more elementary sensory variables that are more direct analogs environmental variables. It would be nice to show what those environmental variables are. For example, pole angle is probably derived from the x and y position of the tip of the pole relative to the bed of the cart.
Anyway, it was a very interesting talk. I especially like the way your algorithm came up with two different successful pole balancing hierarchies, one much more elegant than the other.
Yes, it was an informal talk to present some results. Details of the methodology haven’t been published yet. Basically, the evaluation was based on the hierarchy error over a run of a generated architecture.
Yes, of course, it’s PCT system. Though perhaps I am misunderstanding your question.
It shouldn’t (and doesn’t) matter to the algorithm what we call them, they are all just signals. In the case of the single unit CartPole architecture, none of those variables are controlled individually. The controlled variable is a combination of all.
I use pre-existing packages for the environments, in this case Open AI Gym. If you are interested in the details of the environmental variables they can be seen in the source code of the relevant environment, http://gym.openai.com/envs/CartPole-v1/.