How perceptions form, perhaps

Hi,

As a long-time lurker on this list, here is our contribution to the discussion of perceptual control systems. We simplify things by defining the controlled variables with respect to reference values, so perceptions are controlled towards zero. However, at the same time we are interested in the emergent patterns, statistical level control really, and utilizing linear algebra the control systems can scale to thousands of dimensions, which is more like how the perception works (also socially). Perhaps this could shed some light on the difficult part -- where do the perceptions come from.

The presentation is here in HD with English captions for your convenience:
http://www.youtube.com/watch?v=frOzDw1vtdw

I'm happy to answer any inquiries!

Kind regards,

   Petri Lievonen
   Helsinki Institute of Information Technology HIIT

[From Rick Marken (2013.05.11.2130)]

Hi,

As a long-time lurker on this list, here is our contribution to the discussion of perceptual control systems.

Hi Petri

Thanks for this. But I’ve got to admit that I didn’t understand much of it. To some extent that was because the math was eyond me. But I also didn’t really understand the basic assumptions. I didn’t get what problem was being solved. I also didn’t see how control fit into the picture. So, yes, if you could help me understand it that would be great. One thing that I think would help is if you could explain what was going on in that example where hand written digits emerged from what appeared to be arrays of dots of differing levels of brightness. It looks a bit like those demonstrations by Bela Julesz of recognizable shapes (like a portrait of Abraham Lincoln) emerging from highly “pixilated” images after high pass filtering. Is that what’s going on?

We simplify things by defining the controlled variables with respect to reference values, so perceptions are controlled towards zero. However, at the same time we are interested in the emergent patterns, statistical level control really, and utilizing linear algebra the control systems can scale to thousands of dimensions, which is more like how the perception works (also socially). Perhaps this could shed some light on the difficult part – where do the perceptions come from.

Actually, Bill Powers has addressed exactly that question in Chapter 6 of his latest book, “Living Control Systems III: The Fact of Control”, Benchmark Books, 2008. And it looks like he does it in a way that may not be all that different from yours, at least in the sense that the perceptions that emerge are different linear functions of an input vector. For some reason I understand his approach better than I understand yours because his is implemented as a computer simulation; I can understand computer algorithms but math is just beyond me. But I think it would be very interesting if you could get a copy of LCSIII and post a comparison of your approach to answering the question “where do the perceptions come from” with his.

The presentation is here in HD with English captions for your convenience:

http://www.youtube.com/watch?v=frOzDw1vtdw

I’m happy to answer any inquiries!

Great.I look forward to hearing your answers.

Best regards

Rick

···

On Sat, May 11, 2013 at 3:06 PM, Petri Lievonen petri.lievonen@tkk.fi wrote:

Kind regards,

Petri Lievonen

Helsinki Institute of Information Technology HIIT


Richard S. Marken PhD
rsmarken@gmail.com
www.mindreadings.com

Hi Rick,

Thanks for the book suggestion. I'm quite sure there's common ground, as this is adaptive control, after all.

The example you referred to (http://www.youtube.com/watch?v=frOzDw1vtdw#?t=28m23s) is a particularly nice one because you can intuitively see how the control system structure emerges -- perhaps it is illuminative to point out that the input data could be any kind of data, just scaled to "natural units" so it's not too far from the unit sphere. A kind of factor analysis, which is the basis for so many theories used in the social sciences, emerges naturally from quite simple principles.

The n=25 vectors phi_i you see there are m=32*32=1024-dimensional, and they are simply the rows of the 25x1024-dimensional semantic filter matrix here: http://www.youtube.com/watch?v=frOzDw1vtdw#?t=19m04s
You can think of it as the matrix B in state-space models, projecting the input to the state space. When any input u is shown to the system, such as a picture of a digit, the system ends up in a corresponding steady state xbar that is a kind of a compressed, distributed 25-dimensional representation of that 1024-dimensional input (these steady states are not shown here). Here we had 8940 examples of handwritten digits, each expressed as a 1024-dimensional binary vector (they are treated as continuous-valued), so we could feed them one after another to the system and observe the steady states. Actually I just collected all the inputs as columns in a matrix and projected them all in a one step to their corresponding steady states.

So you can think that there is an 32x32-dimensional input that affects a 25-dimensional system. You can take a picture of a digit, lay it over one of those filters phi_i, multiply dimensionwise (pixelwise), and the sum of products (inner product) would present the steady state x_i for that input and for that system component. Then you would do the same for other 24 components, to get the distributed representation xbar. This would be only for one 1024-dimensional input state, and you would repeat this with other inputs (or varying input), here with 8939 other examples, to get the corresponding steady states xbar. So the system is assumed linear, thus the steady states are proportional to the input.

At first, the system structure is random (those phi_i shown are just noise), so the states will be quite random too. But the point is that on a slower time scale the whole structure of the system adapts, according to those products E{x_i*u_j}. So, even with random structure, some state dimensions x_i will correlate weakly with some kind of input, and due to adaptation there will be an internal positive feedback and the state dimensions that will correlate with input will start correlating even more. Without negative feedback, this would explode -- and as there is no communication between the state dimensions, they could easily all start correlating with the same strongest input features and the diversity would be lost to some kind of a monoculture.

The trick is, that in the simplest case we apply implicit negative feedback instead of an explicit one. This implicit negative feedback is simply input consumption -- the more the input triggers a state dimension, the more it is used for that activity, so diminished, or controlled towards zero (and this control could be more explicit too, any reasonable control would diminish the input). So we approximate this effect by multiplying the x_i with corresponding phi_i, and substracting this from input. So for each input, one can imagine laying each phi_i on top of it, and substracting it pixelwise, weighted by corresponding x_i. This results in the surprising symmetry of those steady states http://www.youtube.com/watch?v=frOzDw1vtdw#?t=25m10s (I can help out if somebody wants to arrive at the same formulas) So for simulation, one just needs the first and the third formula in that group, so one does not need even matrix inverses, and everything is quite local. (In addition I adapt the coupling parameters q_i as shown in presentation.)

The result is that instead of controlling each input to zero, the system adapts its structure by itself to maximally control the total variance in all input dimensions towards zero -- so resulting in a statistical level control. It won't go to zero, as that would collapse the adaptive control system which learns from the residuals ubar, or errors, actually. For tighter control the learning should be stopped, but we don't want that, learning is never ending in changing circumstances. That implicit negative feedback will take care of this, because if the control would be too good, the resulting steady states would vanish too, and the controlled variables would grow, and so on.

So in a one-dimensional case, this would look quite sloppy control, but the intelligence is visible in these higher dimensional cases. In multidimensional control, it is more important to do the right things than to do them exactly right (the directions of those phi_i vectors are more important than their lengths, for example). And using this like dead-beat control would always be late, so it is better to think this in statistical level, include low-pass filtered signals among the inputs, etc.

One way to understand that example is to think those 25 vectors phi_i as 25 separate control systems, each trying to control the input towards zero on average (or, equivalently, maximizing their own activity). The task distribution emerges due to each seeing only the residual of the effects of all the control systems, so each will naturally concentrate on their strengths, where they are best. So in the simplest case, the systems communicate only via their controlled variables, not directly.

Hopefully this was helpful? Experimenting with algorithms is great to build understanding, but in this presentation we have taken the mathematics route so that generalizations could highlight analogies among different systems. Of course the example with 25 dimensions is too simple compared to the diversity in most natural systems, but it is nice that this approach could scale from simple clustering to thousands of dimensions on multiple layers, each controlling each other in a more life-like way.

Kind regards,

   Petri L.

12.05.2013 07:32, Richard Marken kirjoitti:> [From Rick Marken (2013.05.11.2130)]

···

On Sat, May 11, 2013 at 3:06 PM, Petri Lievonen > <petri.lievonen@tkk.fi > <mailto:petri.lievonen@tkk.fi>> wrote:

    Hi,

    As a long-time lurker on this list, here is our contribution to the
    discussion of perceptual control systems.

Hi Petri

Thanks for this. But I've got to admit that I didn't understand much of
it. To some extent that was because the math was eyond me. But I also
didn't really understand the basic assumptions. I didn't get what
problem was being solved. I also didn't see how control fit into the
picture. So, yes, if you could help me understand it that would be
great. One thing that I think would help is if you could explain what
was going on in that example where hand written digits emerged from what
appeared to be arrays of dots of differing levels of brightness. It
looks a bit like those demonstrations by Bela Julesz of recognizable
shapes (like a portrait of Abraham Lincoln) emerging from highly
"pixilated" images after high pass filtering. Is that what's going on?

    We simplify things by defining the controlled variables with respect
    to reference values, so perceptions are controlled towards zero.
    However, at the same time we are interested in the emergent
    patterns, statistical level control really, and utilizing linear
    algebra the control systems can scale to thousands of dimensions,
    which is more like how the perception works (also socially). Perhaps
    this could shed some light on the difficult part -- where do the
    perceptions come from.

Actually, Bill Powers has addressed exactly that question in Chapter 6
of his latest book, "Living Control Systems III: The Fact of Control",
Benchmark Books, 2008. And it looks like he does it in a way that may
not be all that different from yours, at least in the sense that the
perceptions that emerge are different linear functions of an input
vector. For some reason I understand his approach better than I
understand yours because his is implemented as a computer simulation; I
can understand computer algorithms but math is just beyond me. But I
think it would be very interesting if you could get a copy of LCSIII and
post a comparison of your approach to answering the question "where do
the perceptions come from" with his.

    The presentation is here in HD with English captions for your
    convenience:
    YouTube
    <http://www.youtube.com/watch?v=frOzDw1vtdw&gt;

    I'm happy to answer any inquiries!

Great.I look forward to hearing your answers.

Best regards

Rick

    Kind regards,

       Petri Lievonen
       Helsinki Institute of Information Technology HIIT

--
Richard S. Marken PhD
rsmarken@gmail.com <mailto:rsmarken@gmail.com>
www.mindreadings.com <http://www.mindreadings.com>

[From Rick Marken (2013.05.13.1630)]

Thanks for the book suggestion. I’m quite sure there’s common ground, as this is adaptive control, after all.

Hi Petri

Thanks for the detailed reply. Unfortunately I’m still quite a ways from understanding what you folks are doing and, in particular, how your work relates to control theory, let alone perceptual control theory. So I hope you’ll bear with me and answer my questions which will probably sounds terribly stupid given my rather passing acquaintance with mathematics.

The example you referred to (http://www.youtube.com/watch?v=frOzDw1vtdw#?t=28m23s) is a particularly nice one because you can intuitively see how the control system structure emerges –

Could you explain this. How does this show me how a control system structure emerges? I think of the structure of a control system as: perceptual function sending output to a comparator that continuously puts out a error signal that represents the difference between perceptual and reference signal; this error signal drives the output of the system via an output function. I didn’t see any of that described in the talk.

perhaps it is illuminative to point out that the input data could be any kind of data, just scaled to “natural units” so it’s not too far from the unit sphere. A kind of factor analysis, which is the basis for so many theories used in the social sciences, emerges naturally from quite simple principles.

Now this is starting to sound to me like a pattern learning/recognition/detectoin system. Perhaps it is considered a control system because there is some criterion or reference value to which the functions of the input are being brought. Is that it? If you are comparing the process to factor analysis then perhaps these functions are what I would call feature detectors; sets of coefficients that weight the vector inputs so that the overall sum matches some criterion (zero perhaps?).

The n=25 vectors phi_i you see there are m=32*32=1024-dimensional, and they are simply the rows of the 25x1024-dimensional semantic filter matrix here: http://www.youtube.com/watch?v=frOzDw1vtdw#?t=19m04s

You can think of it as the matrix B in state-space models, projecting the input to the state space. When any input u is shown to the system, such as a picture of a digit, the system ends up in a corresponding steady state xbar that is a kind of a compressed, distributed 25-dimensional representation of that 1024-dimensional input (these steady states are not shown here). Here we had 8940 examples of handwritten digits, each expressed as a 1024-dimensional binary vector (they are treated as continuous-valued), so we could feed them one after another to the system and observe the steady states. Actually I just collected all the inputs as columns in a matrix and projected them all in a one step to their corresponding steady states.

I only vaguely understand this but apparently you are describing the training process that solves for the vectors that will be used to recognize/detect the digits in noise. Is that it?

So you can think that there is an 32x32-dimensional input that affects a 25-dimensional system. You can take a picture of a digit, lay it over one of those filters phi_i, multiply dimensionwise (pixelwise), and the sum of products (inner product) would present the steady state x_i for that input and for that system component. Then you would do the same for other 24 components, to get the distributed representation xbar. This would be only for one 1024-dimensional input state, and you would repeat this with other inputs (or varying input), here with 8939 other examples, to get the corresponding steady states xbar. So the system is assumed linear, thus the steady states are proportional to the input.

This is too tough for me. If Richard Kennaway (our resident PCT mathematician) is out there perhaps he could help out with this.

At first, the system structure is random (those phi_i shown are just noise), so the states will be quite random too. But the point is that on a slower time scale the whole structure of the system adapts, according to those products E{x_i*u_j}. So, even with random structure, some state dimensions x_i will correlate weakly with some kind of input, and due to adaptation there will be an internal positive feedback and the state dimensions that will correlate with input will start correlating even more. Without negative feedback, this would explode – and as there is no communication between the state dimensions, they could easily all start correlating with the same strongest input features and the diversity would be lost to some kind of a monoculture.

This seems to say that the algorithm works. I don’t really understand how it works though.

The trick is, that in the simplest case we apply implicit negative feedback instead of an explicit one. This implicit negative feedback is simply input consumption – the more the input triggers a state dimension, the more it is used for that activity, so diminished, or controlled towards zero (and this control could be more explicit too, any reasonable control would diminish the input). So we approximate this effect by multiplying the x_i with corresponding phi_i, and substracting this from input. So for each input, one can imagine laying each phi_i on top of it, and substracting it pixelwise, weighted by corresponding x_i. This results in the surprising symmetry of those steady states http://www.youtube.com/watch?v=frOzDw1vtdw#?t=25m10s (I can help out if somebody wants to arrive at the same formulas) So for simulation, one just needs the first and the third formula in that group, so one does not need even matrix inverses, and everything is quite local. (In addition I adapt the coupling parameters q_i as shown in presentation.)

Since control is involved could you tell me what the controlled variable(s) is (are) in this situation. What are these implicit negative feedback systems controlling? It would really help me if you could present a diagram of the negative feedback loops that are involved here. It’s hard for me to see the control organization just by looking at the formulas.

The result is that instead of controlling each input to zero, the system adapts its structure by itself to maximally control the total variance in all input dimensions towards zero – so resulting in a statistical level control. It won’t go to zero, as that would collapse the adaptive control system which learns from the residuals ubar, or errors, actually. For tighter control the learning should be stopped, but we don’t want that, learning is never ending in changing circumstances. That implicit negative feedback will take care of this, because if the control would be too good, the resulting steady states would vanish too, and the controlled variables would grow, and so on.

Again, a diagram would really help me understand the controlling going on here, I think.

So in a one-dimensional case, this would look quite sloppy control, but the intelligence is visible in these higher dimensional cases. In multidimensional control, it is more important to do the right things than to do them exactly right (the directions of those phi_i vectors are more important than their lengths, for example). And using this like dead-beat control would always be late, so it is better to think this in statistical level, include low-pass filtered signals among the inputs, etc.

One way to understand that example is to think those 25 vectors phi_i as 25 separate control systems, each trying to control the input towards zero on average (or, equivalently, maximizing their own activity). The task distribution emerges due to each seeing only the residual of the effects of all the control systems, so each will naturally concentrate on their strengths, where they are best. So in the simplest case, the systems communicate only via their controlled variables, not directly.

Hopefully this was helpful? Experimenting with algorithms is great to build understanding, but in this presentation we have taken the mathematics route so that generalizations could highlight analogies among different systems.

It helped a little (given my mathematical shortcomings). But regarding algorithms vs. mathematics, it looks like you must have written a computer program that transformed those dot matrices into digits. Or are those the result of mathematical calculations? If it was done by computer then it would be nice if you could present a flow diagram of the code that transformed the random dot matrices into (fuzzy) digits.

Thanks so much.

Best

Rick

···

On Sun, May 12, 2013 at 3:05 PM, Petri Lievonen petri.lievonen@tkk.fi wrote:

Of course the example with 25 dimensions is too simple compared to the diversity in most natural systems, but it is nice that this approach could scale from simple clustering to thousands of dimensions on multiple layers, each controlling each other in a more life-like way.

Kind regards,

Petri L.

12.05.2013 07:32, Richard Marken kirjoitti:> [From Rick Marken (2013.05.11.2130)]

On Sat, May 11, 2013 at 3:06 PM, Petri Lievonen <petri.lievonen@tkk.fi > > mailto:petri.lievonen@tkk.fi> wrote:

Hi,



As a long-time lurker on this list, here is our contribution to the

discussion of perceptual control systems.

Hi Petri

Thanks for this. But I’ve got to admit that I didn’t understand much of

it. To some extent that was because the math was eyond me. But I also

didn’t really understand the basic assumptions. I didn’t get what

problem was being solved. I also didn’t see how control fit into the

picture. So, yes, if you could help me understand it that would be

great. One thing that I think would help is if you could explain what

was going on in that example where hand written digits emerged from what

appeared to be arrays of dots of differing levels of brightness. It

looks a bit like those demonstrations by Bela Julesz of recognizable

shapes (like a portrait of Abraham Lincoln) emerging from highly

“pixilated” images after high pass filtering. Is that what’s going on?

We simplify things by defining the controlled variables with respect

to reference values, so perceptions are controlled towards zero.

However, at the same time we are interested in the emergent

patterns, statistical level control really, and utilizing linear

algebra the control systems can scale to thousands of dimensions,

which is more like how the perception works (also socially). Perhaps

this could shed some light on the difficult part -- where do the

perceptions come from.

Actually, Bill Powers has addressed exactly that question in Chapter 6

of his latest book, “Living Control Systems III: The Fact of Control”,

Benchmark Books, 2008. And it looks like he does it in a way that may

not be all that different from yours, at least in the sense that the

perceptions that emerge are different linear functions of an input

vector. For some reason I understand his approach better than I

understand yours because his is implemented as a computer simulation; I

can understand computer algorithms but math is just beyond me. But I

think it would be very interesting if you could get a copy of LCSIII and

post a comparison of your approach to answering the question "where do

the perceptions come from" with his.

The presentation is here in HD with English captions for your

convenience:

http://www.youtube.com/watch?__v=frOzDw1vtdw

<[http://www.youtube.com/watch?v=frOzDw1vtdw](http://www.youtube.com/watch?v=frOzDw1vtdw)>



I'm happy to answer any inquiries!

Great.I look forward to hearing your answers.

Best regards

Rick

Kind regards,



   Petri Lievonen

   Helsinki Institute of Information Technology HIIT

Richard S. Marken PhD

rsmarken@gmail.com mailto:rsmarken@gmail.com

www.mindreadings.com <http://www.mindreadings.com>


Richard S. Marken PhD
rsmarken@gmail.com
www.mindreadings.com

[From Kent McClelland (2012.05.14.1145 CDT)]

Petri Lievonen
Rick Marken (2013.05.13.1630)

Hello Petri,

Thank you for alerting CSGnet to the YouTube lecture by Heikki Hy�tyniemi, and thank you for all the work you must have done to make the video attractive and easily accessible to English speakers. I found his lecture (what I understood of it) very interesting, and I can see how it might well have some relevance to my own interests in PCT.

Like Rick, I didn't really comprehend most of the math, although I know enough about factor analysis to have some general ideas about your approach. I did appreciate some of the conclusions that your team has come to as a result of your analysis. Your conclusion that the models that make the most sense feature "multilayer perceptrons" "that manipulate only feedback errors" resonated with me, since it sounds a lot like Bill Powers' Hierarchical PCT organization for brain organization.

I also found it interesting that you describe living systems as "on the boundary line between order and chaos," and you talk about systems "eating chaos" and turning it into structure. While I'm not sure quite what you mean by it, this talk of control systems as operating on the boundary and structuring their environments again resonates with my own views.

The image of control systems working with "whirls in the information flow" and as "systemic squirrel wheels" (is that right?) sounds intriguing, but I'm having a hard time connecting this vivid metaphor with anything concrete. Do you have any examples (other than the digits example) of applications of your models to empirical data?

Kent

···

On May 12, 2013, at 5:05 PM, Petri Lievonen wrote:

Hi Rick,

Thanks for the book suggestion. I'm quite sure there's common ground, as this is adaptive control, after all.

The example you referred to (http://www.youtube.com/watch?v=frOzDw1vtdw#?t=28m23s) is a particularly nice one because you can intuitively see how the control system structure emerges -- perhaps it is illuminative to point out that the input data could be any kind of data, just scaled to "natural units" so it's not too far from the unit sphere. A kind of factor analysis, which is the basis for so many theories used in the social sciences, emerges naturally from quite simple principles.

The n=25 vectors phi_i you see there are m=32*32=1024-dimensional, and they are simply the rows of the 25x1024-dimensional semantic filter matrix here: http://www.youtube.com/watch?v=frOzDw1vtdw#?t=19m04s
You can think of it as the matrix B in state-space models, projecting the input to the state space. When any input u is shown to the system, such as a picture of a digit, the system ends up in a corresponding steady state xbar that is a kind of a compressed, distributed 25-dimensional representation of that 1024-dimensional input (these steady states are not shown here). Here we had 8940 examples of handwritten digits, each expressed as a 1024-dimensional binary vector (they are treated as continuous-valued), so we could feed them one after another to the system and observe the steady states. Actually I just collected all the inputs as columns in a matrix and projected them all in a one step to their corresponding steady states.

So you can think that there is an 32x32-dimensional input that affects a 25-dimensional system. You can take a picture of a digit, lay it over one of those filters phi_i, multiply dimensionwise (pixelwise), and the sum of products (inner product) would present the steady state x_i for that input and for that system component. Then you would do the same for other 24 components, to get the distributed representation xbar. This would be only for one 1024-dimensional input state, and you would repeat this with other inputs (or varying input), here with 8939 other examples, to get the corresponding steady states xbar. So the system is assumed linear, thus the steady states are proportional to the input.

At first, the system structure is random (those phi_i shown are just noise), so the states will be quite random too. But the point is that on a slower time scale the whole structure of the system adapts, according to those products E{x_i*u_j}. So, even with random structure, some state dimensions x_i will correlate weakly with some kind of input, and due to adaptation there will be an internal positive feedback and the state dimensions that will correlate with input will start correlating even more. Without negative feedback, this would explode -- and as there is no communication between the state dimensions, they could easily all start correlating with the same strongest input features and the diversity would be lost to some kind of a monoculture.

The trick is, that in the simplest case we apply implicit negative feedback instead of an explicit one. This implicit negative feedback is simply input consumption -- the more the input triggers a state dimension, the more it is used for that activity, so diminished, or controlled towards zero (and this control could be more explicit too, any reasonable control would diminish the input). So we approximate this effect by multiplying the x_i with corresponding phi_i, and substracting this from input. So for each input, one can imagine laying each phi_i on top of it, and substracting it pixelwise, weighted by corresponding x_i. This results in the surprising symmetry of those steady states http://www.youtube.com/watch?v=frOzDw1vtdw#?t=25m10s (I can help out if somebody wants to arrive at the same formulas) So for simulation, one just needs the first and the third formula in that group, so one does not need even matrix inverses, and everything is quite local. (In addition I adapt the coupling parameters q_i as shown in presentation.)

The result is that instead of controlling each input to zero, the system adapts its structure by itself to maximally control the total variance in all input dimensions towards zero -- so resulting in a statistical level control. It won't go to zero, as that would collapse the adaptive control system which learns from the residuals ubar, or errors, actually. For tighter control the learning should be stopped, but we don't want that, learning is never ending in changing circumstances. That implicit negative feedback will take care of this, because if the control would be too good, the resulting steady states would vanish too, and the controlled variables would grow, and so on.

So in a one-dimensional case, this would look quite sloppy control, but the intelligence is visible in these higher dimensional cases. In multidimensional control, it is more important to do the right things than to do them exactly right (the directions of those phi_i vectors are more important than their lengths, for example). And using this like dead-beat control would always be late, so it is better to think this in statistical level, include low-pass filtered signals among the inputs, etc.

One way to understand that example is to think those 25 vectors phi_i as 25 separate control systems, each trying to control the input towards zero on average (or, equivalently, maximizing their own activity). The task distribution emerges due to each seeing only the residual of the effects of all the control systems, so each will naturally concentrate on their strengths, where they are best. So in the simplest case, the systems communicate only via their controlled variables, not directly.

Hopefully this was helpful? Experimenting with algorithms is great to build understanding, but in this presentation we have taken the mathematics route so that generalizations could highlight analogies among different systems. Of course the example with 25 dimensions is too simple compared to the diversity in most natural systems, but it is nice that this approach could scale from simple clustering to thousands of dimensions on multiple layers, each controlling each other in a more life-like way.

Kind regards,

Petri L.

12.05.2013 07:32, Richard Marken kirjoitti:> [From Rick Marken (2013.05.11.2130)]

On Sat, May 11, 2013 at 3:06 PM, Petri Lievonen <petri.lievonen@tkk.fi >> <mailto:petri.lievonen@tkk.fi>> wrote:

   Hi,

   As a long-time lurker on this list, here is our contribution to the
   discussion of perceptual control systems.

Hi Petri

Thanks for this. But I've got to admit that I didn't understand much of
it. To some extent that was because the math was eyond me. But I also
didn't really understand the basic assumptions. I didn't get what
problem was being solved. I also didn't see how control fit into the
picture. So, yes, if you could help me understand it that would be
great. One thing that I think would help is if you could explain what
was going on in that example where hand written digits emerged from what
appeared to be arrays of dots of differing levels of brightness. It
looks a bit like those demonstrations by Bela Julesz of recognizable
shapes (like a portrait of Abraham Lincoln) emerging from highly
"pixilated" images after high pass filtering. Is that what's going on?

   We simplify things by defining the controlled variables with respect
   to reference values, so perceptions are controlled towards zero.
   However, at the same time we are interested in the emergent
   patterns, statistical level control really, and utilizing linear
   algebra the control systems can scale to thousands of dimensions,
   which is more like how the perception works (also socially). Perhaps
   this could shed some light on the difficult part -- where do the
   perceptions come from.

Actually, Bill Powers has addressed exactly that question in Chapter 6
of his latest book, "Living Control Systems III: The Fact of Control",
Benchmark Books, 2008. And it looks like he does it in a way that may
not be all that different from yours, at least in the sense that the
perceptions that emerge are different linear functions of an input
vector. For some reason I understand his approach better than I
understand yours because his is implemented as a computer simulation; I
can understand computer algorithms but math is just beyond me. But I
think it would be very interesting if you could get a copy of LCSIII and
post a comparison of your approach to answering the question "where do
the perceptions come from" with his.

   The presentation is here in HD with English captions for your
   convenience:
   YouTube
   <http://www.youtube.com/watch?v=frOzDw1vtdw&gt;

   I'm happy to answer any inquiries!

Great.I look forward to hearing your answers.

Best regards

Rick

   Kind regards,

      Petri Lievonen
      Helsinki Institute of Information Technology HIIT

--
Richard S. Marken PhD
rsmarken@gmail.com <mailto:rsmarken@gmail.com>
www.mindreadings.com <http://www.mindreadings.com>

Hi Rick, I'm happy, we are very near to understanding each other.

I will put here some pointers that will probably help you through.

The control diagram is here:
http://www.youtube.com/watch?v=frOzDw1vtdw&t=21m28s

The controlled variable is u. It is vector valued, so multiple dimensional -- the concept will come clearer later.

The reference signal is zero. So you can think that the u is defined so that it is the deviation from reference.

The error signal is u_tilde (or u_bar after convergence in a steady state). Note that the error won't go to zero in this simplest case, it only gets smaller (1/sqrt(q_i), on average).

The "output" of the system is delta_u (or estimate u_hat after convergence in a steady state). So in this simplest, perhaps most natural case, consuming the residual (error from zero) means control (think about exploitation of some energy resources, for example).

The function from error to output is phi phi^T (read from left to right, linear algebra is usually simpler that way). So basically you multiply the error with a constant number to reduce it to the internal simpler representation xbar_i of that error, and then you multiply that representational state with the same constant to expand it back to the output space. In a one-dimensional case, this constant would represent one pixel in those digit feature simulations. This mapping makes more sense in higher dimensions, as the multiplier is a vector phi_i acting as a filter, or factor, or feature, or component, or concept, or cluster prototype, or however one wants to call it. When you collect a few vectors together as columns, it is a matrix phi, and it is very convenient to process massive datasets (as in nature, too).

Now to get rid of the constants in the model, the constant phi is not kept constant in the slower time scale, but adapted according to the only local signals there are available for that multiplier: error (or residual) ubar and state xbar. The surprising thing is that optimal phi is proportional to mean value (average) of the product of ubar and xbar, when controlled variable u changes and so the ubar and xbar changes -- so basically proportional to their covariance or correlation. In addition, the proportionality constant q_i seems optimal when it is inverse of the enformation in xbar_i, e.g. 1/mean(xbar_i^2)), it turns out that this also normalizes the vectors phi_i to unit length automatically, etc, as shown in the presentation. So there are minimal number of free parameters in the system, almost everything is driven by the controlled variables and their statistical properties. The controls emerge quite autonomously.

Now, to understand this, and those images and vectors and multiple dimensions, my favourite examples are from Andrew Ng (who has taught machine learning courses online to 100 000 students at a time). First, one could awe the power of these kind of sparse linear generative models here (when higher layers just try to explain the lower layers, so controlling lower layer residual enformation towards zero, or equivalently maximizing higher layer enformation):
http://www.youtube.com/watch?v=hBueMr9eaJs ("Single learning algorithm theory", 8 minutes)

Then, one could watch a longer presentation of his, where the "images as vectors" is explained, and also the generality of that kind of processing is indicated (vision, sound, touch, you name it)
http://www.youtube.com/watch?v=ZmNOAtZIgIk (Unsupervised Feature Learning and Deep Learning, 48 minutes)

Then, one could expand one's understanding of multidimensional spaces by watching this presentation by Tom Mitchell, where he demonstrates that only 10-dimensional linear CCA (correlation analysis) models surprisingly well the relation between Google's trillion (10^12) dimensional text corpus and a 20 000 dimensional brain imaging data, so there is something really relevant in these kind of correlation models:
http://www.youtube.com/watch?v=QbTf2nE3Lbw (Brains, Meaning and Corpus Statistics, 59 mins, definitely worth the effort)

Now, Ng and Mitchell both use traditional approaches, sparse coding by cost function minimization and CCA by some algebra involving matrix decompositions and inversions, but we all know/assume that in nature those structures/correlations are built by control systems in a distributed way. (And the idea of perceptual control would also close the wider feedback loops surrounding the action through the environment -- we all know this, see for example Raffaello D'Andrea's http://www.youtube.com/watch?v=C4IJXAVXgIo "Feedback Control and the Coming Machine Revolution", 24 mins). So, basically our idea in that presentation is to suggest one simple principle which could explain this in a natural way: maximization of enformation (energetic information). It is of course still quite sketchy, but from that principle, meaning is continually emerging in the world -- a nice world view, actually!

Kind regards,

   Petri L.

···

________________________________________
From: Control Systems Group Network (CSGnet) [CSGNET@LISTSERV.ILLINOIS.EDU] on behalf of Richard Marken [rsmarken@GMAIL.COM]
Sent: 14 May 2013 02:28
To: CSGNET@LISTSERV.ILLINOIS.EDU
Subject: Re: How perceptions form, perhaps

[From Rick Marken (2013.05.13.1630)]

On Sun, May 12, 2013 at 3:05 PM, Petri Lievonen <petri.lievonen@tkk.fi<mailto:petri.lievonen@tkk.fi>> wrote:

Thanks for the book suggestion. I'm quite sure there's common ground, as this is adaptive control, after all.

Hi Petri

Thanks for the detailed reply. Unfortunately I'm still quite a ways from understanding what you folks are doing and, in particular, how your work relates to control theory, let alone perceptual control theory. So I hope you'll bear with me and answer my questions which will probably sounds terribly stupid given my rather passing acquaintance with mathematics.

The example you referred to (http://www.youtube.com/watch?v=frOzDw1vtdw#?t=28m23s) is a particularly nice one because you can intuitively see how the control system structure emerges --

Could you explain this. How does this show me how a control system structure emerges? I think of the structure of a control system as: perceptual function sending output to a comparator that continuously puts out a error signal that represents the difference between perceptual and reference signal; this error signal drives the output of the system via an output function. I didn't see any of that described in the talk.

perhaps it is illuminative to point out that the input data could be any kind of data, just scaled to "natural units" so it's not too far from the unit sphere. A kind of factor analysis, which is the basis for so many theories used in the social sciences, emerges naturally from quite simple principles.

Now this is starting to sound to me like a pattern learning/recognition/detectoin system. Perhaps it is considered a control system because there is some criterion or reference value to which the functions of the input are being brought. Is that it? If you are comparing the process to factor analysis then perhaps these functions are what I would call feature detectors; sets of coefficients that weight the vector inputs so that the overall sum matches some criterion (zero perhaps?).

The n=25 vectors phi_i you see there are m=32*32=1024-dimensional, and they are simply the rows of the 25x1024-dimensional semantic filter matrix here: http://www.youtube.com/watch?v=frOzDw1vtdw#?t=19m04s
You can think of it as the matrix B in state-space models, projecting the input to the state space. When any input u is shown to the system, such as a picture of a digit, the system ends up in a corresponding steady state xbar that is a kind of a compressed, distributed 25-dimensional representation of that 1024-dimensional input (these steady states are not shown here). Here we had 8940 examples of handwritten digits, each expressed as a 1024-dimensional binary vector (they are treated as continuous-valued), so we could feed them one after another to the system and observe the steady states. Actually I just collected all the inputs as columns in a matrix and projected them all in a one step to their corresponding steady states.

I only vaguely understand this but apparently you are describing the training process that solves for the vectors that will be used to recognize/detect the digits in noise. Is that it?

So you can think that there is an 32x32-dimensional input that affects a 25-dimensional system. You can take a picture of a digit, lay it over one of those filters phi_i, multiply dimensionwise (pixelwise), and the sum of products (inner product) would present the steady state x_i for that input and for that system component. Then you would do the same for other 24 components, to get the distributed representation xbar. This would be only for one 1024-dimensional input state, and you would repeat this with other inputs (or varying input), here with 8939 other examples, to get the corresponding steady states xbar. So the system is assumed linear, thus the steady states are proportional to the input.

This is too tough for me. If Richard Kennaway (our resident PCT mathematician) is out there perhaps he could help out with this.

At first, the system structure is random (those phi_i shown are just noise), so the states will be quite random too. But the point is that on a slower time scale the whole structure of the system adapts, according to those products E{x_i*u_j}. So, even with random structure, some state dimensions x_i will correlate weakly with some kind of input, and due to adaptation there will be an internal positive feedback and the state dimensions that will correlate with input will start correlating even more. Without negative feedback, this would explode -- and as there is no communication between the state dimensions, they could easily all start correlating with the same strongest input features and the diversity would be lost to some kind of a monoculture.

This seems to say that the algorithm works. I don't really understand how it works though.

The trick is, that in the simplest case we apply implicit negative feedback instead of an explicit one. This implicit negative feedback is simply input consumption -- the more the input triggers a state dimension, the more it is used for that activity, so diminished, or controlled towards zero (and this control could be more explicit too, any reasonable control would diminish the input). So we approximate this effect by multiplying the x_i with corresponding phi_i, and substracting this from input. So for each input, one can imagine laying each phi_i on top of it, and substracting it pixelwise, weighted by corresponding x_i. This results in the surprising symmetry of those steady states http://www.youtube.com/watch?v=frOzDw1vtdw#?t=25m10s (I can help out if somebody wants to arrive at the same formulas) So for simulation, one just needs the first and the third formula in that group, so one does not need even matrix inverses, and everything is quite local. (In addition I adapt the coupling parameters q_i as shown in presentation.)

Since control is involved could you tell me what the controlled variable(s) is (are) in this situation. What are these implicit negative feedback systems controlling? It would really help me if you could present a diagram of the negative feedback loops that are involved here. It's hard for me to see the control organization just by looking at the formulas.

The result is that instead of controlling each input to zero, the system adapts its structure by itself to maximally control the total variance in all input dimensions towards zero -- so resulting in a statistical level control. It won't go to zero, as that would collapse the adaptive control system which learns from the residuals ubar, or errors, actually. For tighter control the learning should be stopped, but we don't want that, learning is never ending in changing circumstances. That implicit negative feedback will take care of this, because if the control would be too good, the resulting steady states would vanish too, and the controlled variables would grow, and so on.

Again, a diagram would really help me understand the controlling going on here, I think.

So in a one-dimensional case, this would look quite sloppy control, but the intelligence is visible in these higher dimensional cases. In multidimensional control, it is more important to do the right things than to do them exactly right (the directions of those phi_i vectors are more important than their lengths, for example). And using this like dead-beat control would always be late, so it is better to think this in statistical level, include low-pass filtered signals among the inputs, etc.

One way to understand that example is to think those 25 vectors phi_i as 25 separate control systems, each trying to control the input towards zero on average (or, equivalently, maximizing their own activity). The task distribution emerges due to each seeing only the residual of the effects of all the control systems, so each will naturally concentrate on their strengths, where they are best. So in the simplest case, the systems communicate only via their controlled variables, not directly.

Hopefully this was helpful? Experimenting with algorithms is great to build understanding, but in this presentation we have taken the mathematics route so that generalizations could highlight analogies among different systems.

It helped a little (given my mathematical shortcomings). But regarding algorithms vs. mathematics, it looks like you must have written a computer program that transformed those dot matrices into digits. Or are those the result of mathematical calculations? If it was done by computer then it would be nice if you could present a flow diagram of the code that transformed the random dot matrices into (fuzzy) digits.

Thanks so much.

Best

Rick

Of course the example with 25 dimensions is too simple compared to the diversity in most natural systems, but it is nice that this approach could scale from simple clustering to thousands of dimensions on multiple layers, each controlling each other in a more life-like way.

Kind regards,

   Petri L.

12.05.2013 07:32, Richard Marken kirjoitti:> [From Rick Marken (2013.05.11.2130)]

On Sat, May 11, 2013 at 3:06 PM, Petri Lievonen <petri.lievonen@tkk.fi<mailto:petri.lievonen@tkk.fi> <mailto:petri.lievonen@tkk.fi>> wrote:

     Hi,

     As a long-time lurker on this list, here is our contribution to the
     discussion of perceptual control systems.

Hi Petri

Thanks for this. But I've got to admit that I didn't understand much of
it. To some extent that was because the math was eyond me. But I also
didn't really understand the basic assumptions. I didn't get what
problem was being solved. I also didn't see how control fit into the
picture. So, yes, if you could help me understand it that would be
great. One thing that I think would help is if you could explain what
was going on in that example where hand written digits emerged from what
appeared to be arrays of dots of differing levels of brightness. It
looks a bit like those demonstrations by Bela Julesz of recognizable
shapes (like a portrait of Abraham Lincoln) emerging from highly
"pixilated" images after high pass filtering. Is that what's going on?

     We simplify things by defining the controlled variables with respect
     to reference values, so perceptions are controlled towards zero.
     However, at the same time we are interested in the emergent
     patterns, statistical level control really, and utilizing linear
     algebra the control systems can scale to thousands of dimensions,
     which is more like how the perception works (also socially). Perhaps
     this could shed some light on the difficult part -- where do the
     perceptions come from.

Actually, Bill Powers has addressed exactly that question in Chapter 6
of his latest book, "Living Control Systems III: The Fact of Control",
Benchmark Books, 2008. And it looks like he does it in a way that may
not be all that different from yours, at least in the sense that the
perceptions that emerge are different linear functions of an input
vector. For some reason I understand his approach better than I
understand yours because his is implemented as a computer simulation; I
can understand computer algorithms but math is just beyond me. But I
think it would be very interesting if you could get a copy of LCSIII and
post a comparison of your approach to answering the question "where do
the perceptions come from" with his.

     The presentation is here in HD with English captions for your
     convenience:
     http://www.youtube.com/watch?__v=frOzDw1vtdw

     <http://www.youtube.com/watch?v=frOzDw1vtdw>

     I'm happy to answer any inquiries!

Great.I look forward to hearing your answers.

Best regards

Rick

     Kind regards,

        Petri Lievonen
        Helsinki Institute of Information Technology HIIT

--
Richard S. Marken PhD
rsmarken@gmail.com<mailto:rsmarken@gmail.com> <mailto:rsmarken@gmail.com>
www.mindreadings.com<http://www.mindreadings.com> <http://www.mindreadings.com>

--
Richard S. Marken PhD
rsmarken@gmail.com<mailto:rsmarken@gmail.com>
www.mindreadings.com<http://www.mindreadings.com>

Hi Kent,

Thank you, I have about 40 hours of Hyötyniemi's lectures recorded, but they are in Finnish, so I guess I am waiting for speech recognition and machine translation to get sufficiently advanced to make them properly available some day. While a lot of the material (a few hundred slides, manuscripts) are in English and online, the concepts have got some clarity during the recent years, so I am happy that there is now this latest summary in English -- it becomes most vivid when thought out loud.

There are some industrial projects and student work that have been made, mostly on the traditional control engineering side, but of course the research direction and intuitions are mostly inspired by everyday life -- we all encounter living systems all the time, but our current models don't seem to properly make justice to that quality of being alive.

We try to approach things so that there is always some mathematical formulation behind the idea (that allows simulations and also perhaps enables operationalization of concepts for empirical studies), but so that the ideas can also be expressed as metaphorical stories, even, to keep that living quality, hopefully.

Those whirls in the flow of enformation, there are many interpretations for them. Most simply they are the action of the control loop. For example, one could study the eigenvalue envelope here http://www.youtube.com/watch?v=frOzDw1vtdw&t=20m47s -- if there is only one control loop (one x_i, phi_i or "monad" in the spirit of Leibniz), only one eigenvalue can be controlled. The corresponding eigenvector (of data covariance matrix) represents a kind of axis of freedom, or degree of freedom in data. The higher the coupling, the more tight the control loop becomes (so q is proportional to gain). This is one sort of a whirl, especially when delays are involved -- where ever there is a function from errors (or residuals) to control action, there is a sort of whirl through the environment, especially as the errors rarely stay at zero all the time.

Now, it can be claimed that these kind of control loops are constraints, there is only minimal variation in that direction left -- but the variation is inherited in the control system, so there's work to do. Those "systemic squirrel wheels" here http://www.youtube.com/watch?v=frOzDw1vtdw&t=34m30s refer to these kind of loops, where you have to constantly make effort even though (or so that) from a controlled variable perspective everything stays quite as before.

In the extreme case, when the variation goes nearly zero, it can be thought as an usual algebraic constraint, one equation among a group of many. It is quite plausible to think that in complex systems, there are massive amounts of constraints, coupling variables together, and each of those constraint reduces the degrees of freedom in data. In a way, the original eigenvalue envelope is also understandable this way -- the eigenvalue envelope is usually quite peaked, so that there are usually a much larger number of small eigenvalues than large eigenvalues, due to lots of constraints in the data.

So when looking at data from complex systems, it is more clever to concentrate on the remaining freedoms, where there is still variation (or enformation, really), left, where control systems have not yet stiffened the system so much. And this analysis of freedoms is what enformation theoretic systems do (to each other, also). One degree of freedom can be thought of as a one-dimensional normal vector to the hyperplane (rest of the orthogonal directions in space). There is a duality here -- one can think of a vector either as a direction in space, or as a definition of hyperplane orthogonal to it. So one can play with these kind of orthogonal complements to a subspace -- when there are multiple hyperplanes (constraints), it is more economical to represent their union subspace by some spanning vectors, by remaining degrees of freedom, actually.

There is also a bonus interpretation of the picture here: http://www.youtube.com/watch?v=frOzDw1vtdw&t=34m54s From the standpoint of those whirls, if everything is very controlled, then the "rotation speed" may still be a sort of a degree of freedom (as in chemical reaction chains or cycles in nature), and in physics that kind of things are usually represented by axis orthogonal to rotation. That could be one way to frequency domain, I don't know.. Perhaps geometric algebra, where imaginary unit i represents the unit volume, could generalize this, but here there is the advantage that one is not mostly confined to three-dimensional space, one thinks in massive amount of dimensions, so the constraints are different (I'm referring to those cross-products and right hand rules etc in physics). Also in the enformation theoretical mappings there are these terms like (1+x^2)^-1 and (1-x^2)^-1, in matrix form, which is like an inversion around some hypersphere, and there could be a connection to Lie algebras and some other rotation thingies, maybe some mathematician some day will see patterns in these things.

Now to think about it, there is also a multidimensional whirl that can happen when all (or a subset of) the q_i:s become the same and if there are no nonlinearities in the system -- then the different profiles phi_i can rotate in the principal subspace, still spanning together the same data but switching their roles gradually. It is interesting bubbling that can happen in those digits examples too, if there are no nonlinearities that lock into higher order statistics in data.

Ok -- I have done some frequency domain analysis on these models, as they can be straightforwardly extended to complex numbers: instead of just transposing matrices, also negate their phase (complex conjugate, Hermitean). Then the models are not only relative to amplitude, but also model phase differences, a sort of holographic memory emerges. But I haven't yet grasped what would be an illustrative example of it -- once I experimented with analysing MEG (Magnetoencephalography) data with it, and spectrum components spanning the head appeared. They represented something about the brain, as we came fourth in a "thought reading" competition, trying to infer which of the five types of video the subject was watching. But there I ignored the phase, as I didn't know how to synchronize different samples, what to think about different frequencies, really. I just took the logarithm of the power spectrum, so that hopefully it linearized the spectrum components (they being sort of filtered, convoluted, multiplied versions of some "deeper spectra"). There are also interesting new possibilities for nonlinearities in those complex-valued models -- one can take the absolute value, which preserves enformation, or one can take only the real or imaginary component, or a projection to some other phase. I remember that when taking only the real component of activities xbar, it was equivalent to analysing complex data with ordinary real number system, where the real and imaginary components were simply included in the data dimensions, and when taking only the imaginary component, the models could end up rotating. Well, there could be also connections to modern physics, as there also the eigenvectors are important, and the enformation, square of amplitude, is probability, and the physics stay the same if you multiply everything with a constant phase... But now this is getting too far.

I would like to model resonances, but it would perhaps need a wise electrical engineer to think these models in terms of transmission lines, and to really be able to see the essence of things behind expert terms, not get lost to nonlinearities. That optimum q seems some kind of impedance matching.

I would also like to find a principled way to sparsity -- in some approaches it is done by adding an absolute value (L1) reqularization to cost function, so that the cost function still stays convex and it has only one minimum, but so far I have just used simple nonlinearities for xbar, such as taking absolute value or zeroing the negative values.

I would also like to understand projective spaces and affine spaces, as otherwise these models are just linear, going through origin. Usually in multilayer perceptrons one circumvents this by including a constant dimension among data, so that the corresponding multiplier can push the model away from the origin, but here as all dimensions are treated equally, that is perhaps not enough. Multiple layers may help in this, as then constant terms can appear, but I haven't simulated those yet. For now, one can deal with clusters far away from the origin by processing the data in multiple systems -- the first processing system draws the clusters closer to the origin in ubar, so that the next system can process xbar and ubar together as n+m-dimensional data (where variances have been automatically normalized, too). This is also related to simulation problems, where massive enformation impulses can happen that would be quite impossible in nature -- digital technology can manipulate very nonphysical quantities. Perhaps one just needs to low-pass-filter everything very carefully, and make all kinds of safety checks so that the models would not collapse unnaturally so easily.

Sorry, it seems that I got lost into these "whirls", perhaps somebody found some empirical ideas here or in the videos of Ng and Mitchell I sent? But I also understand that this can be quite overwhelming, so many ideas in here.

   Petri.

···

________________________________________
From: Control Systems Group Network (CSGnet) [CSGNET@LISTSERV.ILLINOIS.EDU] on behalf of McClelland, Kent [MCCLEL@GRINNELL.EDU]
Sent: 15 May 2013 00:03
To: CSGNET@LISTSERV.ILLINOIS.EDU
Subject: Re: How perceptions form, perhaps

[From Kent McClelland (2012.05.14.1145 CDT)]

Petri Lievonen
Rick Marken (2013.05.13.1630)

Hello Petri,

Thank you for alerting CSGnet to the YouTube lecture by Heikki Hyötyniemi, and thank you for all the work you must have done to make the video attractive and easily accessible to English speakers. I found his lecture (what I understood of it) very interesting, and I can see how it might well have some relevance to my own interests in PCT.

Like Rick, I didn't really comprehend most of the math, although I know enough about factor analysis to have some general ideas about your approach. I did appreciate some of the conclusions that your team has come to as a result of your analysis. Your conclusion that the models that make the most sense feature "multilayer perceptrons" "that manipulate only feedback errors" resonated with me, since it sounds a lot like Bill Powers' Hierarchical PCT organization for brain organization.

I also found it interesting that you describe living systems as "on the boundary line between order and chaos," and you talk about systems "eating chaos" and turning it into structure. While I'm not sure quite what you mean by it, this talk of control systems as operating on the boundary and structuring their environments again resonates with my own views.

The image of control systems working with "whirls in the information flow" and as "systemic squirrel wheels" (is that right?) sounds intriguing, but I'm having a hard time connecting this vivid metaphor with anything concrete. Do you have any examples (other than the digits example) of applications of your models to empirical data?

Kent

On May 12, 2013, at 5:05 PM, Petri Lievonen wrote:

Hi Rick,

Thanks for the book suggestion. I'm quite sure there's common ground, as this is adaptive control, after all.

The example you referred to (http://www.youtube.com/watch?v=frOzDw1vtdw#?t=28m23s) is a particularly nice one because you can intuitively see how the control system structure emerges -- perhaps it is illuminative to point out that the input data could be any kind of data, just scaled to "natural units" so it's not too far from the unit sphere. A kind of factor analysis, which is the basis for so many theories used in the social sciences, emerges naturally from quite simple principles.

The n=25 vectors phi_i you see there are m=32*32=1024-dimensional, and they are simply the rows of the 25x1024-dimensional semantic filter matrix here: http://www.youtube.com/watch?v=frOzDw1vtdw#?t=19m04s
You can think of it as the matrix B in state-space models, projecting the input to the state space. When any input u is shown to the system, such as a picture of a digit, the system ends up in a corresponding steady state xbar that is a kind of a compressed, distributed 25-dimensional representation of that 1024-dimensional input (these steady states are not shown here). Here we had 8940 examples of handwritten digits, each expressed as a 1024-dimensional binary vector (they are treated as continuous-valued), so we could feed them one after another to the system and observe the steady states. Actually I just collected all the inputs as columns in a matrix and projected them all in a one step to their corresponding steady states.

So you can think that there is an 32x32-dimensional input that affects a 25-dimensional system. You can take a picture of a digit, lay it over one of those filters phi_i, multiply dimensionwise (pixelwise), and the sum of products (inner product) would present the steady state x_i for that input and for that system component. Then you would do the same for other 24 components, to get the distributed representation xbar. This would be only for one 1024-dimensional input state, and you would repeat this with other inputs (or varying input), here with 8939 other examples, to get the corresponding steady states xbar. So the system is assumed linear, thus the steady states are proportional to the input.

At first, the system structure is random (those phi_i shown are just noise), so the states will be quite random too. But the point is that on a slower time scale the whole structure of the system adapts, according to those products E{x_i*u_j}. So, even with random structure, some state dimensions x_i will correlate weakly with some kind of input, and due to adaptation there will be an internal positive feedback and the state dimensions that will correlate with input will start correlating even more. Without negative feedback, this would explode -- and as there is no communication between the state dimensions, they could easily all start correlating with the same strongest input features and the diversity would be lost to some kind of a monoculture.

The trick is, that in the simplest case we apply implicit negative feedback instead of an explicit one. This implicit negative feedback is simply input consumption -- the more the input triggers a state dimension, the more it is used for that activity, so diminished, or controlled towards zero (and this control could be more explicit too, any reasonable control would diminish the input). So we approximate this effect by multiplying the x_i with corresponding phi_i, and substracting this from input. So for each input, one can imagine laying each phi_i on top of it, and substracting it pixelwise, weighted by corresponding x_i. This results in the surprising symmetry of those steady states http://www.youtube.com/watch?v=frOzDw1vtdw#?t=25m10s (I can help out if somebody wants to arrive at the same formulas) So for simulation, one just needs the first and the third formula in that group, so one does not need even matrix inverses, and everything is quite local. (In addition I adapt the coupling parameters q_i as shown in presentation.)

The result is that instead of controlling each input to zero, the system adapts its structure by itself to maximally control the total variance in all input dimensions towards zero -- so resulting in a statistical level control. It won't go to zero, as that would collapse the adaptive control system which learns from the residuals ubar, or errors, actually. For tighter control the learning should be stopped, but we don't want that, learning is never ending in changing circumstances. That implicit negative feedback will take care of this, because if the control would be too good, the resulting steady states would vanish too, and the controlled variables would grow, and so on.

So in a one-dimensional case, this would look quite sloppy control, but the intelligence is visible in these higher dimensional cases. In multidimensional control, it is more important to do the right things than to do them exactly right (the directions of those phi_i vectors are more important than their lengths, for example). And using this like dead-beat control would always be late, so it is better to think this in statistical level, include low-pass filtered signals among the inputs, etc.

One way to understand that example is to think those 25 vectors phi_i as 25 separate control systems, each trying to control the input towards zero on average (or, equivalently, maximizing their own activity). The task distribution emerges due to each seeing only the residual of the effects of all the control systems, so each will naturally concentrate on their strengths, where they are best. So in the simplest case, the systems communicate only via their controlled variables, not directly.

Hopefully this was helpful? Experimenting with algorithms is great to build understanding, but in this presentation we have taken the mathematics route so that generalizations could highlight analogies among different systems. Of course the example with 25 dimensions is too simple compared to the diversity in most natural systems, but it is nice that this approach could scale from simple clustering to thousands of dimensions on multiple layers, each controlling each other in a more life-like way.

Kind regards,

Petri L.

12.05.2013 07:32, Richard Marken kirjoitti:> [From Rick Marken (2013.05.11.2130)]

On Sat, May 11, 2013 at 3:06 PM, Petri Lievonen >> <petri.lievonen@tkk.fi >> <mailto:petri.lievonen@tkk.fi>> wrote:

   Hi,

   As a long-time lurker on this list, here is our contribution to the
   discussion of perceptual control systems.

Hi Petri

Thanks for this. But I've got to admit that I didn't understand much of
it. To some extent that was because the math was eyond me. But I also
didn't really understand the basic assumptions. I didn't get what
problem was being solved. I also didn't see how control fit into the
picture. So, yes, if you could help me understand it that would be
great. One thing that I think would help is if you could explain what
was going on in that example where hand written digits emerged from what
appeared to be arrays of dots of differing levels of brightness. It
looks a bit like those demonstrations by Bela Julesz of recognizable
shapes (like a portrait of Abraham Lincoln) emerging from highly
"pixilated" images after high pass filtering. Is that what's going on?

   We simplify things by defining the controlled variables with respect
   to reference values, so perceptions are controlled towards zero.
   However, at the same time we are interested in the emergent
   patterns, statistical level control really, and utilizing linear
   algebra the control systems can scale to thousands of dimensions,
   which is more like how the perception works (also socially). Perhaps
   this could shed some light on the difficult part -- where do the
   perceptions come from.

Actually, Bill Powers has addressed exactly that question in Chapter 6
of his latest book, "Living Control Systems III: The Fact of Control",
Benchmark Books, 2008. And it looks like he does it in a way that may
not be all that different from yours, at least in the sense that the
perceptions that emerge are different linear functions of an input
vector. For some reason I understand his approach better than I
understand yours because his is implemented as a computer simulation; I
can understand computer algorithms but math is just beyond me. But I
think it would be very interesting if you could get a copy of LCSIII and
post a comparison of your approach to answering the question "where do
the perceptions come from" with his.

   The presentation is here in HD with English captions for your
   convenience:
   YouTube
   <http://www.youtube.com/watch?v=frOzDw1vtdw&gt;

   I'm happy to answer any inquiries!

Great.I look forward to hearing your answers.

Best regards

Rick

   Kind regards,

      Petri Lievonen
      Helsinki Institute of Information Technology HIIT

--
Richard S. Marken PhD
rsmarken@gmail.com <mailto:rsmarken@gmail.com>
www.mindreadings.com <http://www.mindreadings.com>