[Ben Hawker 2017.10.02 15:00]
···
MT: PCT is also an adaptive system in that same sense. Organisms growing
up in different environments develop different perceptual functions
and connect them to different outputs to produce controllable
perceptions that help maintain the intrinsic variables near their
optima. a city dweller doesn’t easily perceive deer tracks in the
bush or know how to get lunch when he can perceive the tracks, a
hunter from the bush doesn’t easily perceive when it is safe to
cross the street at a city intersection, or where to get lunch when
he can cross the street safely.
BH: Yes, in theory. But, are there working algorithms for deriving new perceptions and placing them in hierarchies of control? If so, let me know, as that’s my PhD! As I’m aware, that hasn’t been done yet. With the ability to identify new relevant perceptions and place them in the hierarchy, it can be considered an adaptive system. Without that, surely it is just reactive?
MT: PCT builds perceptual functions and connects up control units by
reorganization rather than reinforcement. The difference between the
two concepts is analogous to the difference between “Come here” and
“Don’t go there”. Reinforcement says “what you did was good”,
whereas reorganization says “What you are doing isn’t bad.” In other
analogous mottos, Reinforcement is “Getting better every day”,
whereas PCT is “If it ain’t broke, don’t fix it.” There’s no
technical reason why both could not work together in building more
effective systems, just as in the brain there seem to be mutually
supportive systems of perception “It’s like this” and “it’s
different from that”, or at higher levels “intuition” and
“analysis”.
BH: That’s one of the nicest descriptions of the conceptual differences between reinforcement and reorganization that I’ve seen, may well pinch that. Given the requirement for us as living agents to control effort, reorganization seems the more sensible approach. However, aren’t we yet to still find something that robustly reorganises the hierarchy? (Insert my PhD work here, but that’s a long story so I’m sure I can cover that privately if wished!).
MT: Why is that a downside? It's built into the very heart of PCT.
There’s no prohibition in PCT against one control system seeing a
rock coming at your head and sending a reference signal to a lower
level set of systems to move your head to a position at which the
rock is seen to be missing you and going past.
MT: I have no idea how complex, but I do think it needs prior experience
in order to reorganize those perceptions and the corresponding
controls. Before you have ever been hit by something you could have
seen coming while you were balancing, you would not counter it in
advance under either adaptive development system.
BH: Without producing new perceptions (which is something that isn’t covered in PCT currently, hence Rupert asking about RL I assume), an agent wouldn’t be able to solve anticipatory problems. For example, if you were balancing and then you were warned that you would receive an impact one second after, a cognitive agent could prepare. In short, Perceptual Control Theory allows an agent to minimise the difference between perception and error… but currently, there is functionally no way of generating perceptions that allow an agent to learn new perceptions from information it is given. Current reorganization algorithms do handle changing of weights between PCT nodes and internal gains, but not for actually finding new perceptions. This is what I am arguing as part of my work is part of the “generative” side of cognition which is not something I strictly cover. But, without the ability to generate new perceptions, PCT cannot minimise more than what it is programmed to (hence reactive).
BH: So, for example, an agent could learn a perceptual signal for a warning, and then learn that this relates to the temporal property of a second which is a relative perception that raises expectation after a second. You could provide the first, and the second, and our current reorganization methods may find it. But, it cannot make the perception it needs.
MT: Why do you say that? In the book I am currently trying to write
(working title “Powers of Perceptual Control” – pun intended), I
have two whole chapters on how perceptions of different aspects of
language would be expected to develop. Language is a tricky example,
which is why I used it. I have been drafting also a response in some
detail to your earlier comment "* It’s where the perceptions come
from that I think is the thing PCT doesn’t answer.* ", a comment
that greatly surprised me.
BH: Let’s take the previous example. Basically, my point boils down to whatever perceptions the system is given makes or breaks the ability for it to function in a task. Moreover, they also decide how hard it would be to learn to control the problem, if learning is needed at all.
BH: Let’s take an agent that is trying to avoid pain in the classical blinking eye conditioning task. So, the agent is strapped to a machine which will administer an uncomfortable eye puff into the eye. However, the agent is able to shut its eyes. The agent is presented a warning stimulus (a sound perhaps?) which indicates a puff is incoming. Research shows, obviously, we can learn to solve this task. What would a PCT agent need to solve this task? A perception of eyes closed and open, which leads to an actuator to change the position of the eyelids would allow higher levels to close or open the eyes. A perception of sound is needed to invoke some preparatory response. A perceptual representation of how long it is going to be before the eye puff which would basically be our perception of “warning”. Then, with higher level references of the minimisation of pain and maximisation of eyes open for survival reasons would be all that’s needed. So, the problem is certainly solvable with PCT. But, where do those perceptions come from in the first place? What method tracks them down and identifies which perception is relevant then adds them to the hierarchy in the right place? The problem is simple with the right perceptions, but the latter part is easier said than done. How does a cognitive agent find them autonomously?
MT: Exactly! That's a feature, not a bug. It enables ranges of behaviour
that control ranges of perceptions, some of which serve to maintain
the intrinsic variables in good condition, others of which don’t.
The latter tend to get pruned, the former tend to stick around. But
the word “choose” is unfortunate, as it suggests some directed
control process, whereas in PCT it is the effects of interactions
with the environment that “choose”, as part of the reorganization
process. Or are you talking about designed robots only?
BH: See above. Part of the problem is there is still no algorithm for generating these perceptions (from raw input to the system or even as perceptual transformations of other perceptions) and then inserting them into the hierarchy. Am I missing something? Everyone in the PCT community seems to assume there’s a solid way of finding new perceptions and knowing where to add them to the hierarchy (or how to rearrange the hierarchy given them) but I am yet to see this…
MT: What engineer? The Earth Mother who directs evolution? It's true
that humans will not find behaviours that require massive electrical
impulses that kill dangerous predators, and that eels will not find
much use for levers their muscles cannot move. Is that a hit against
PCT? How would reinforcement handle those problems?
BH: The engineer who builds the PCT solution… as above, I am yet to find an algorithm that appropriately generates and applies perceptions that haven’t already been explicitly given to the system!
MT: True. The sensor systems must vary what they report to create any
and all perceptions that might ever be controlled, and we can be
killed by things our sensors do not report, such as X-ray or gamma
radiation. Is there a difference between PCT and reinforcement
learning in this respect?
BH: RL, in combination with some model base to work from (Neural Networks) uses big data to find more useful transformations of the data, conceptually. This allows it to recognise patterns or shapes from only raw input data, for example. The neural networks on the front effectively act as the pattern generator that produces more complex perceptions. Then, usually, some reactive controller is stuck on the end of that to minimise the problem. I assume this is why RY might be quite interested in this… perhaps using RL and DNN to produce relevant perceptions for a PCT system could be quite effective.
BH: True, we cannot perceive X-ray or gamma radiation… but we can learn about it, which effectively allows us to produce a new internal perceptual hierarchy about how to avoid it even though we can’t directly perceive it. It’s clear we as cognitive agents can learn new perceptions and highlight ones we need. Reorganization is lacking this (which is also part of my PhD).
MT: Reorganization doesn't actually stop, either. It just slows greatly
when control is good and the intrinsic variables stay near their
optimum values. In everyday terms (as I said above) “If it ain’t
broke, don’t fix it.” Put another way, if you are feeling good about
yourself and the way things are, you won’t be a revolutionary. But
even then, small changes allowed by what (using a quantum physics
analogy) we might call “zero-point” reorganization, good things
might get even better. If the small zero-point changes make things
worse, the tendency is to return them to the way they were, in
effect creating a stopping point.
BH: Which I think is a much better approach… agents clearly adopt a “If it ain’t broke, don’t fix it.” model. This is not how reinforcement learning works. Furthermore, we’re constantly learning and reinforcement learning is not temporally stable. That to me is one of the biggest conceptual criticisms of it as a realistic learning algorithm for agents. Reorganization is thought about in a temporally continuous manner, but needs additions to handle producing new perceptions. I will however say that’s a seriously complicated problem, but one worth thinking about!
MT: Well, that's exactly what a living control system is supposed to do.
It starts life with the systems evolution has provided (humans can’t
walk at birth, newborn deer can). Check out the work of Franz
Plooij, for example, for stages of human development interpreted as
stages in building control hierarchies.
BH: Will take a look, very excited!
Ben
BH: PCT is indeed a reactive controller (which is the
useful term I’ve found used) and RL is more of an adaptive
function (it learns how to adapt to the environment,
BH: but isn't built on learning the reactive behaviour to
simply reduce error for example). Obviously, there are
downsides to both. RL requires a lot of training time to
learn adaptive behaviour, and PCT isn’t able to do
anticipatory control without extra perceptions and
probably a variety of levels to the hierarchy.
BH: For example, anticipating an impact while balancing.
Before it hits, one can lean forward to improve your
balance post-impact. This is an example of something that
is not innate to reactive controllers and is thus quite a
difficult problem. While solvable under PCT, it definitely
isn’t something easy to derive and requires complex
perceptions.
BH: So to speak, yes. The behaviour in RL is encoded in
the action-state pairs and their weightings which take
ages to compute. In PCT, the behaviour is in minimising
the error between the perceptual inputs and the
references. The complexity there comes in at what the
perceptions are, which the system itself doesn’t handle.
BH: So, what PCT actually does is more simple and elegant.
However, what perceptions you choose for a system
massively affect the behaviour!
BH: RL just crunches every possible interaction of action
and state with big data, but for PCT, a lot of this has
been pruned out by simply what perceptions are selected.
PCT will not find behaviour that the engineer doesn’t
effectively allow it to have
BH: since the complexity can only be as much as the
perceptions it’s given. A system could not consider or
control a perception which it doesn’t have as an input.
BH: While RL can, it will take a long time and a lot of
good data to crunch this… as well as stopping the
learning at the right time, as often it doesn’t know when
to stop.
BH: So, to avoid the problem of either the engineer
having to hand select perceptions and order them to
produce hierarchies or to leave big data to crunch it, it
would be great if there were some way to have a control
system learn how to hierarchically arrange control nodes
such that it can learn how to control perceptions. (Hey,
that’s my PhD!)