Hi Rick, I'm happy, we are very near to understanding each other.
I will put here some pointers that will probably help you through.
The control diagram is here:
http://www.youtube.com/watch?v=frOzDw1vtdw&t=21m28s
The controlled variable is u. It is vector valued, so multiple dimensional -- the concept will come clearer later.
The reference signal is zero. So you can think that the u is defined so that it is the deviation from reference.
The error signal is u_tilde (or u_bar after convergence in a steady state). Note that the error won't go to zero in this simplest case, it only gets smaller (1/sqrt(q_i), on average).
The "output" of the system is delta_u (or estimate u_hat after convergence in a steady state). So in this simplest, perhaps most natural case, consuming the residual (error from zero) means control (think about exploitation of some energy resources, for example).
The function from error to output is phi phi^T (read from left to right, linear algebra is usually simpler that way). So basically you multiply the error with a constant number to reduce it to the internal simpler representation xbar_i of that error, and then you multiply that representational state with the same constant to expand it back to the output space. In a one-dimensional case, this constant would represent one pixel in those digit feature simulations. This mapping makes more sense in higher dimensions, as the multiplier is a vector phi_i acting as a filter, or factor, or feature, or component, or concept, or cluster prototype, or however one wants to call it. When you collect a few vectors together as columns, it is a matrix phi, and it is very convenient to process massive datasets (as in nature, too).
Now to get rid of the constants in the model, the constant phi is not kept constant in the slower time scale, but adapted according to the only local signals there are available for that multiplier: error (or residual) ubar and state xbar. The surprising thing is that optimal phi is proportional to mean value (average) of the product of ubar and xbar, when controlled variable u changes and so the ubar and xbar changes -- so basically proportional to their covariance or correlation. In addition, the proportionality constant q_i seems optimal when it is inverse of the enformation in xbar_i, e.g. 1/mean(xbar_i^2)), it turns out that this also normalizes the vectors phi_i to unit length automatically, etc, as shown in the presentation. So there are minimal number of free parameters in the system, almost everything is driven by the controlled variables and their statistical properties. The controls emerge quite autonomously.
Now, to understand this, and those images and vectors and multiple dimensions, my favourite examples are from Andrew Ng (who has taught machine learning courses online to 100 000 students at a time). First, one could awe the power of these kind of sparse linear generative models here (when higher layers just try to explain the lower layers, so controlling lower layer residual enformation towards zero, or equivalently maximizing higher layer enformation):
http://www.youtube.com/watch?v=hBueMr9eaJs ("Single learning algorithm theory", 8 minutes)
Then, one could watch a longer presentation of his, where the "images as vectors" is explained, and also the generality of that kind of processing is indicated (vision, sound, touch, you name it)
http://www.youtube.com/watch?v=ZmNOAtZIgIk (Unsupervised Feature Learning and Deep Learning, 48 minutes)
Then, one could expand one's understanding of multidimensional spaces by watching this presentation by Tom Mitchell, where he demonstrates that only 10-dimensional linear CCA (correlation analysis) models surprisingly well the relation between Google's trillion (10^12) dimensional text corpus and a 20 000 dimensional brain imaging data, so there is something really relevant in these kind of correlation models:
http://www.youtube.com/watch?v=QbTf2nE3Lbw (Brains, Meaning and Corpus Statistics, 59 mins, definitely worth the effort)
Now, Ng and Mitchell both use traditional approaches, sparse coding by cost function minimization and CCA by some algebra involving matrix decompositions and inversions, but we all know/assume that in nature those structures/correlations are built by control systems in a distributed way. (And the idea of perceptual control would also close the wider feedback loops surrounding the action through the environment -- we all know this, see for example Raffaello D'Andrea's http://www.youtube.com/watch?v=C4IJXAVXgIo "Feedback Control and the Coming Machine Revolution", 24 mins). So, basically our idea in that presentation is to suggest one simple principle which could explain this in a natural way: maximization of enformation (energetic information). It is of course still quite sketchy, but from that principle, meaning is continually emerging in the world -- a nice world view, actually!
Kind regards,
Petri L.
···
________________________________________
From: Control Systems Group Network (CSGnet) [CSGNET@LISTSERV.ILLINOIS.EDU] on behalf of Richard Marken [rsmarken@GMAIL.COM]
Sent: 14 May 2013 02:28
To: CSGNET@LISTSERV.ILLINOIS.EDU
Subject: Re: How perceptions form, perhaps
[From Rick Marken (2013.05.13.1630)]
On Sun, May 12, 2013 at 3:05 PM, Petri Lievonen <petri.lievonen@tkk.fi<mailto:petri.lievonen@tkk.fi>> wrote:
Thanks for the book suggestion. I'm quite sure there's common ground, as this is adaptive control, after all.
Hi Petri
Thanks for the detailed reply. Unfortunately I'm still quite a ways from understanding what you folks are doing and, in particular, how your work relates to control theory, let alone perceptual control theory. So I hope you'll bear with me and answer my questions which will probably sounds terribly stupid given my rather passing acquaintance with mathematics.
The example you referred to (http://www.youtube.com/watch?v=frOzDw1vtdw#?t=28m23s) is a particularly nice one because you can intuitively see how the control system structure emerges --
Could you explain this. How does this show me how a control system structure emerges? I think of the structure of a control system as: perceptual function sending output to a comparator that continuously puts out a error signal that represents the difference between perceptual and reference signal; this error signal drives the output of the system via an output function. I didn't see any of that described in the talk.
perhaps it is illuminative to point out that the input data could be any kind of data, just scaled to "natural units" so it's not too far from the unit sphere. A kind of factor analysis, which is the basis for so many theories used in the social sciences, emerges naturally from quite simple principles.
Now this is starting to sound to me like a pattern learning/recognition/detectoin system. Perhaps it is considered a control system because there is some criterion or reference value to which the functions of the input are being brought. Is that it? If you are comparing the process to factor analysis then perhaps these functions are what I would call feature detectors; sets of coefficients that weight the vector inputs so that the overall sum matches some criterion (zero perhaps?).
The n=25 vectors phi_i you see there are m=32*32=1024-dimensional, and they are simply the rows of the 25x1024-dimensional semantic filter matrix here: http://www.youtube.com/watch?v=frOzDw1vtdw#?t=19m04s
You can think of it as the matrix B in state-space models, projecting the input to the state space. When any input u is shown to the system, such as a picture of a digit, the system ends up in a corresponding steady state xbar that is a kind of a compressed, distributed 25-dimensional representation of that 1024-dimensional input (these steady states are not shown here). Here we had 8940 examples of handwritten digits, each expressed as a 1024-dimensional binary vector (they are treated as continuous-valued), so we could feed them one after another to the system and observe the steady states. Actually I just collected all the inputs as columns in a matrix and projected them all in a one step to their corresponding steady states.
I only vaguely understand this but apparently you are describing the training process that solves for the vectors that will be used to recognize/detect the digits in noise. Is that it?
So you can think that there is an 32x32-dimensional input that affects a 25-dimensional system. You can take a picture of a digit, lay it over one of those filters phi_i, multiply dimensionwise (pixelwise), and the sum of products (inner product) would present the steady state x_i for that input and for that system component. Then you would do the same for other 24 components, to get the distributed representation xbar. This would be only for one 1024-dimensional input state, and you would repeat this with other inputs (or varying input), here with 8939 other examples, to get the corresponding steady states xbar. So the system is assumed linear, thus the steady states are proportional to the input.
This is too tough for me. If Richard Kennaway (our resident PCT mathematician) is out there perhaps he could help out with this.
At first, the system structure is random (those phi_i shown are just noise), so the states will be quite random too. But the point is that on a slower time scale the whole structure of the system adapts, according to those products E{x_i*u_j}. So, even with random structure, some state dimensions x_i will correlate weakly with some kind of input, and due to adaptation there will be an internal positive feedback and the state dimensions that will correlate with input will start correlating even more. Without negative feedback, this would explode -- and as there is no communication between the state dimensions, they could easily all start correlating with the same strongest input features and the diversity would be lost to some kind of a monoculture.
This seems to say that the algorithm works. I don't really understand how it works though.
The trick is, that in the simplest case we apply implicit negative feedback instead of an explicit one. This implicit negative feedback is simply input consumption -- the more the input triggers a state dimension, the more it is used for that activity, so diminished, or controlled towards zero (and this control could be more explicit too, any reasonable control would diminish the input). So we approximate this effect by multiplying the x_i with corresponding phi_i, and substracting this from input. So for each input, one can imagine laying each phi_i on top of it, and substracting it pixelwise, weighted by corresponding x_i. This results in the surprising symmetry of those steady states http://www.youtube.com/watch?v=frOzDw1vtdw#?t=25m10s (I can help out if somebody wants to arrive at the same formulas) So for simulation, one just needs the first and the third formula in that group, so one does not need even matrix inverses, and everything is quite local. (In addition I adapt the coupling parameters q_i as shown in presentation.)
Since control is involved could you tell me what the controlled variable(s) is (are) in this situation. What are these implicit negative feedback systems controlling? It would really help me if you could present a diagram of the negative feedback loops that are involved here. It's hard for me to see the control organization just by looking at the formulas.
The result is that instead of controlling each input to zero, the system adapts its structure by itself to maximally control the total variance in all input dimensions towards zero -- so resulting in a statistical level control. It won't go to zero, as that would collapse the adaptive control system which learns from the residuals ubar, or errors, actually. For tighter control the learning should be stopped, but we don't want that, learning is never ending in changing circumstances. That implicit negative feedback will take care of this, because if the control would be too good, the resulting steady states would vanish too, and the controlled variables would grow, and so on.
Again, a diagram would really help me understand the controlling going on here, I think.
So in a one-dimensional case, this would look quite sloppy control, but the intelligence is visible in these higher dimensional cases. In multidimensional control, it is more important to do the right things than to do them exactly right (the directions of those phi_i vectors are more important than their lengths, for example). And using this like dead-beat control would always be late, so it is better to think this in statistical level, include low-pass filtered signals among the inputs, etc.
One way to understand that example is to think those 25 vectors phi_i as 25 separate control systems, each trying to control the input towards zero on average (or, equivalently, maximizing their own activity). The task distribution emerges due to each seeing only the residual of the effects of all the control systems, so each will naturally concentrate on their strengths, where they are best. So in the simplest case, the systems communicate only via their controlled variables, not directly.
Hopefully this was helpful? Experimenting with algorithms is great to build understanding, but in this presentation we have taken the mathematics route so that generalizations could highlight analogies among different systems.
It helped a little (given my mathematical shortcomings). But regarding algorithms vs. mathematics, it looks like you must have written a computer program that transformed those dot matrices into digits. Or are those the result of mathematical calculations? If it was done by computer then it would be nice if you could present a flow diagram of the code that transformed the random dot matrices into (fuzzy) digits.
Thanks so much.
Best
Rick
Of course the example with 25 dimensions is too simple compared to the diversity in most natural systems, but it is nice that this approach could scale from simple clustering to thousands of dimensions on multiple layers, each controlling each other in a more life-like way.
Kind regards,
Petri L.
12.05.2013 07:32, Richard Marken kirjoitti:> [From Rick Marken (2013.05.11.2130)]
On Sat, May 11, 2013 at 3:06 PM, Petri Lievonen <petri.lievonen@tkk.fi<mailto:petri.lievonen@tkk.fi> <mailto:petri.lievonen@tkk.fi>> wrote:
Hi,
As a long-time lurker on this list, here is our contribution to the
discussion of perceptual control systems.
Hi Petri
Thanks for this. But I've got to admit that I didn't understand much of
it. To some extent that was because the math was eyond me. But I also
didn't really understand the basic assumptions. I didn't get what
problem was being solved. I also didn't see how control fit into the
picture. So, yes, if you could help me understand it that would be
great. One thing that I think would help is if you could explain what
was going on in that example where hand written digits emerged from what
appeared to be arrays of dots of differing levels of brightness. It
looks a bit like those demonstrations by Bela Julesz of recognizable
shapes (like a portrait of Abraham Lincoln) emerging from highly
"pixilated" images after high pass filtering. Is that what's going on?
We simplify things by defining the controlled variables with respect
to reference values, so perceptions are controlled towards zero.
However, at the same time we are interested in the emergent
patterns, statistical level control really, and utilizing linear
algebra the control systems can scale to thousands of dimensions,
which is more like how the perception works (also socially). Perhaps
this could shed some light on the difficult part -- where do the
perceptions come from.
Actually, Bill Powers has addressed exactly that question in Chapter 6
of his latest book, "Living Control Systems III: The Fact of Control",
Benchmark Books, 2008. And it looks like he does it in a way that may
not be all that different from yours, at least in the sense that the
perceptions that emerge are different linear functions of an input
vector. For some reason I understand his approach better than I
understand yours because his is implemented as a computer simulation; I
can understand computer algorithms but math is just beyond me. But I
think it would be very interesting if you could get a copy of LCSIII and
post a comparison of your approach to answering the question "where do
the perceptions come from" with his.
The presentation is here in HD with English captions for your
convenience:
http://www.youtube.com/watch?__v=frOzDw1vtdw
<http://www.youtube.com/watch?v=frOzDw1vtdw>
I'm happy to answer any inquiries!
Great.I look forward to hearing your answers.
Best regards
Rick
Kind regards,
Petri Lievonen
Helsinki Institute of Information Technology HIIT
--
Richard S. Marken PhD
rsmarken@gmail.com<mailto:rsmarken@gmail.com> <mailto:rsmarken@gmail.com>
www.mindreadings.com<http://www.mindreadings.com> <http://www.mindreadings.com>
--
Richard S. Marken PhD
rsmarken@gmail.com<mailto:rsmarken@gmail.com>
www.mindreadings.com<http://www.mindreadings.com>