PCT vs Free Energy (Specification versus Prediction)

Hi Warren, yes this is possible. Means the ball could be caught on the basis of the light intensity plus the minimum distance between the ball and the catcher?

The biggest advantage in my point of view is that prediction algorithms are just based on zillions and trillions of training of previous data which PCT model does not incorporate (it requires 2-3 loops and game is done). Where as in prediction probability, what if the reality data is outside the distribution then what probability of catching it is? I think zero and PCT will still catch it.

Whereas, catching a ball can be analysed from a probabilistic perspective by considering factors such as the initial conditions of the ball, the trajectory it follows, environmental conditions, and the capabilities of the person trying to catch it? Even with precise trajectory predictions with an experienced catcher there will always be some level of uncertainty. Random effects, like slight variations in the throw or unexpected wind gusts, contribute to this uncertainty.

First, I want to say that in my last message I had a problem with semantics of the word specification, now I know it better.

Second, I guess that Warren means you can do better by making a demo or working model which show that PCT explains better than FEP – than making a review.

Third, as for that review plan, the paper I send is quite old. Attached is newer. In addition I just checked that with Google search I could find quite many critical reviews of FEP – of course not from the PCT point of view.

(Attachment Friston & free energy principle made simpler 2023.pdf is missing)

Oh, the attachment was too big for Discourse. Can you take it from this link:

​pdf-kuvakeFriston & free energy principle made simpler 2023.pdf

Thanks Eetu. Got it. I think Figure 6 in this paper could be describing a behavioral situation that could be the basis for comparing an “inferential/ prediction” (FEP) to a specification (PCT) model of writing longhand. Here’s a copy of that figure:


I have a vague sense of what he’s describing as the model of this behavior but I’m not sure how it would be turned into a working (computer) model. I’d really appreciate it if you (or anyone out there who understands FEP) would describe what you think is the model of behavior being presented in Figure 6. And if you could add a pseudo-code implementation of a computer model of the process that would really help since I can’t follow his “simple” mathematics. Oh, and if you could tell me where the “free energy” is in this process that would be helpful as well.

Best, Rick

This is what I understand of the Friston business.

Probabilistic prediction works for learning, not for control. To model Friston’s stuff we need to model conscious learning. Conscious learning differs from the model of the Reorganization System in important ways. Conscious learning is a trial-and-error process, but it is not random, and characteristically is carried out in imagination rather than by control closed through the environment. (That difference is skill in learning, ‘learning to learn’.)

Uncertainty is a measure of the same sort as variance. Neither is a variable within a control loop which is closed through the environment. Each is a measure of a possible effect of control. Each is a measure of a possible effect of learning processes in which the quality of control improves (or degrades).

The variance of a variable can be perceived, and that perception can be controlled. That is, loop gain is subject to control, yes? Uncertainty is a measure of the same sort.

There is no uncertainty about the reference value in a control loop. When control is good, there is little uncertainty about the (relevantly) perceived state of the environment, and control has the effect of minimizing that residual uncertainty. This is typically unconscious control. Of course we can direct attention to anything and the outputs of even the most automated control process can be observed consciously. (What often happens then is that the observing loop doesn’t limit itself to observing and its input to the reference input of the observed system interferes.)

A novel situation is by definition a situation for which we lack well-adapted perceptual input functions and efficient output functions for controlling such inputs. This is the usual condition for conscious control. There is uncertainty about the perceptual input from lower levels and uncertainty as to what is desirable at higher levels. Learning ensues. Conscious learning involves problem-solving control systems, is it possible that those systems control a perception that we might call uncertainty? Random trial and error reorganization is unconscious and has the effect of reducing uncertainty.

Friston comes at this with a model of probabilistic learning, PCT models control once learning has settled. We have a model of random unconscious learning. Bill modeled how plausibly it could create a control hierarchy. Rupert has some beginnings at modeling this in action within an existing, functioning control hierarchy. We do not have a PCT model of non-random conscious learning, that is, we do not have a PCT model of control loops which create perceptual input functions and their I/O connections.

Thank you so much for this Bruce! It echoes what I’ve been saying and goes beyond it, and is articulated very neatly. What I would say is that we do have the beginnings of the semi-random imaginal process you describe:

https://www.sciencedirect.com/science/article/pii/B978032391165800007X

I would also claim that uncertainty is completely orthogonal to error within a (non-reorganising) perceptual control unit. In other words we can be absolutely certain we are wrong, and we can be uncertain whether we are correct.

Does that add anything?

I think what you call “conscious learning” may already be a part of PCT as a property of the control hierarchy, specifically the ability to control a “Strategy” or “Program”. This is not really learning but it can look like it. Tim Carey and I discuss this in our paper on psychological problem solving. Following Hugh Petrie, I would say that real learning must involve random trial and error since learners who know what to do to achieve a particular result (to control) don’t need to guess (randomly select) what to do.

Perhaps you’re thinking of the use of heuristics as an example of “conscious learning” since heuristics are non-randomly selected outputs aimed at solving a problem. But I would say that the application of heuristics is a control process carried out by the control hierarchy – control of program perceptions – while the initial learning of the heuristics themselves was likely a result of reorganization type learning.

But it would be nice to try to sort this out by doing some actual research. A good first step would be to think of examples of what you call “conscious” learning and of “random trial and error (reorganization”)" learning and see if people here on Discourse can come up with ways of testing whether these different processes are involved.

The PCT question would be what perception is being controlled when “variance” is being controlled. In the visual modality, for example, it might be the sum of squared deviations from the centroid of an array of elements. But it could be the average distance between all the elements; or the density of the elements (ratio of perceived number of elements to perceived area of the array). Could be a nice demonstration of the test for the controlled variable.

“Uncertainly” is not a concept that is relevant to the description of the behavior of a control system. It is a good description of the emotion one feels when one is trying to decide whether to do one things or another-- that is, the emotion felt when one is in a conflict between control systems trying to get the same variable in different reference states. .

Could you give me a concrete example? It sounds more like the situation encountered by newborns and is precisely the kind of situation that calls for trial and error (E. coli) reorganization.

I think the best way to deal with Friston’s model is the way Bill suggested in his reply to Henry Yin:

BP: There are so many things wrong with Friston’s ideas that we just have to deal with them one at a time if they come up in conversation. There’s no way I can handle this entire tub of ********.

I’ll just start by mentioning a couple of these problems:

  1. Friston never explains what phenomenon – what DATA – his model is designed to explain.
  2. Lemma to (1): Friston never explains how his model maps to the (undefined) phenomenon that it purports to explain.
  3. Friston himself says that his model is “unfalsifiable” (Friston et al, 2023). So, the model doesn’t occupy the same scientific space as PCT.

Hey everybody!

I’m really happy to see this post. I’m really interested in both FEP and PCT and have often thought about how they correspond. I see some things that I want to help clear up a bit as I’m decently familiar with FEP literature/language. I will point out their correspondences in this post and would be happy to get into more detail about it as it will help me learn more of both PCT and FEP!

What does the FEP explain?

It’s helpful to think about this a bit through the lens " Constructor Theory ‘’ here, which is essentially a “principle” based approach to physics. A principle is a statement that constraints the set of possible “subsidiary theories” by immediately eliminating or making impossible “subsidiary theories”. A “subsidiary theory” can also be called a " Mechanical Theory ‘’, it’s the sort of theories we’re familiar with like string theory, quantum mechanics etc… It specifies what the "things’’ you’re dealing with are, as well as the dynamics that those things have. A great example of a principle constraining such theories is the principle of conservation of energy. Any theory which violated the principle of conservation of energy would be HIGHLY suspicious given that all of our currently best theories obey this principle. If such a principle-breaking theory was shown to be correct empirically, then we would know that the principle of conservation of energy is wrong, and would then hunt for deeper principles that both explain our new theory, and allow us to make new ones a bit more easily.

Now that we understand this, we can better understand the Free Energy Principle (FEP). The FEP doesn’t explain anything. Rather, the FEP constrains the set of possible subsidiary, or mechanical, theories GIVEN that the principle is correct. The principle is a reading of a mathematical statement suggesting that “any random dynamical system can be interpreted as performing bayesian inference on its environment”. The mechanical theory that then follows from this is called “Bayesian Mechanics”, which is just to say (roughly), all algorithms/methods that are associated with probabilistic beliefs and updating those beliefs in light of “observations, or evidence”. Since exact Bayesian Inference is often intractable we must approximate it, and one popular method is “Variational Bayesian Inference”, but there are many ways that one can approximate Bayesian inference!

So, the free energy principle doesn’t really refer to “free energy” that we know as a physical quantity, rather, it is talking about the variational (perception information) free energy, and expected (action information) free energy quantities described in variational bayesian inference.

Modeling Perception with Bayesian Mechanics

In Bayesian Mechanics any “thing” that exists is treated as trying to infer the hidden causes of its sensory experiences. Sensory experiences are “observations (o)”, and the hidden causes of those experiences are the “states (s)” of the world. Thus, in the simplest case a “thing” embodies a “generative model”, which is a joint probability distribution (o,s), which can be decomposed as: p(o,s) = p(o|s)p(s) OR p(s|o)p(o). The “generative model” is said to be embodied by the “things” physical phenotype, but this is a purely information construct. This generative model is the “prior” in bayesian terms, and constitutes the beliefs the “thing” has about the chances that a particular hidden cause actually caused their particular observation. Using this prior, we can make an inference about the hidden causes of an observation using bayes rule. Here’s bayes rule and how we infer hidden causes of an observation using our generative model under this rule:

  • The posterior p(s|o) represents what your beliefs should be after a new observation. This becomes your p(s) in the next round of belief updating.

Recall that direct Bayesian Inference this way is typically intractable; This is because of the computation involving the marginal probability, which can involve integrating over continuous ranges, or simply very large spaces. Due to this, we need to approximate it with variational Bayesian inference. Variational Bayesian Inference is a bit more complicated of a story, and I don’t think we need the details here, but there are some interesting properties that variational inference has.

Variational Bayesian Inference involves the minimization of a function known as the “variational free energy”. Variational Free Energy is the difference between the model complexity and the accuracy of the model; Where complexity is the degree to which a model needs to change when given new sensory input, and accuracy is the ability of the model to predict sensory input. It is explicitly about the informational free energy of the present experience, and implicitly about the free energy of the past.

Modeling Action with Bayesian Mechanics

Of course, we know that “things” take actions, and are not typically passive observers. This is where “Active Inference” comes in. Active Inference can refer to two things: Motor Control in Predictive Coding, and Decision Making in Agents. These are both grounded in Bayesian inference, but the first is about controlling the body to enact a decision, whereas the other is about what decisions to actually make. I think that this second form of Active Inference is most like what PCT does.

To make decisions, an agent needs a way of choosing one action over another. To achieve this, active inference speaks of a “preference” probability distribution p(o|C), which is the probability of a set of observations given one’s preferences. Preferences best capture the notion of “Reference Signal, or Specifications” in PCT. Preferences allow one to calculate what are called “policies (pi)” which is a vector encoding a probability distribution reflecting the value of each policy (i.e. how strongly one believes this policy will lead to a preferred observation). Each policy is a potential series of allowable actions from a vector U, where actions correspond to different state transitions. We can imagine a vector U = [Kick, Punch, Run], and a set of policies { [Kick], [Kick, then run], [Run, then kick, then punch] }. “Pi” is then a distribution over these policies. Since variational free energy can only be calculated with observations, and the value of policies are determined by observations that haven’t happened yet, the value of a policy is calculated using the Expected Free Energy.

Expected Free Energy is the expected cost minus the expected information gain of an action; Where a lower cost value indicates higher reward. Thus, minimizing this function aims to both maximize reward and resolve uncertainty. It is about the free energy expected to be experienced variationally, in the future. Actions, or control mechanisms, in PCT are best captured by the notion of the U vector, and “policies (pi)”, are beliefs about how those actions ought to be employed. So, when an agent acts, you could say that it predicts that this or that policy will lead to this or that observation, which is why it does this or that action.

Using probability distributions to encode preferences and policy values is a sort of mathematical “trick” to bring all elements of action selection within the domain of Bayesian belief updating. We can now consider our generative model to be the joint probability distribution (o,s,pi) which links, perception and action, and couples the agent back to the world by changing it through action. The way we actually model agents/”things” and what-not is typically using a partially observable markov decision process (POMDP).

FEP and PCT as “Unifying Modeling Frameworks”

Modeling with POMDPs and Variational Bayesian Inference is super flexible. Like I think PCT is, Active Inference represents perhaps a “Unifying Modeling Framework”, rather than a prescription about what details to add when modeling a particular thing. For that, you have to do what you normally do… use a combination of intuition and empirical data.

I think that both FEP and PCT seem to allow you to make really good behavioral models once you’ve identified the key features of the “thing” you hope to model on the terms of the modeling framework that you have chosen. Furthermore, I believe both of them can model “things" at any level of scale from “people to particles”, although I haven’t seen an application of PCT to particles as yet. This is an important point, because FEP can describe control systems in terms of beliefs, but it doesn’t necessarily mean to attribute true psychological states like we would say people have. It is only that the same formalism can indeed be used to build a model which simulates the behavior under investigation. So, talk of “uncertainty”, “beliefs”, “surprise”, “information” may not necessarily map to the object of modeling.

I would love to see what the active inference solutions of the “Outfielder Problem” look like compared to PCT solutions.

My “problem” with all “Unifying Model Frameworks” of this sort is that it tells you how you could model any “thing” but doesn’t tell you how to improve your model, or come up with a more mechanistically fine-grained model. We say things like “reference signals” and “beliefs”, but in what forms can these concepts be physically embodied? How can we understand how these things are embodied at multiple levels of scale as they must be in nested hierarchical structures?

Automatically Generating Multiscale Models of Natural Control Systems

It would be great to have models that both capture the behavior under investigation and the mechanisms responsible for that behavior at all relevant scales. Furthermore, I think we’d like to know something about how these natural control systems arose in the first place, as well as how they end up complexifying into larger systems.

I am working on a system capable of producing such models in a principled way and I definitely take strong inspiration from both PCT and FEP. At the last annual IAPCT conference I presented on this system here: https://www.youtube.com/watch?v=dLPpb2Ll3GA&ab_channel=TyfoodsForThought

But didn’t get the chance to go into the more technical details and how exactly it relates to PCT. This system that I have created is an instance of an Eleatic Game, that I call “Hello Game Of Existence (Hello GoE | HGoE)”. Here are the key properties of HGoE:

  1. The game’s mechanics are turing complete, in that they can be used to set up and run any possible computation.
  2. The game is both physically and informationally interpretable, in that, the amount of energy required to build a circuit, maintain a circuit, run a circuit and even move a circuit are all quantifiable. This means that the detailed mechanics of the behavior of “things”/models generated by this game can be investigated at every level of scale.
  3. Circuits can come to encode information about their own existential concerns and take actions on the basis of that information.
  4. The game allows for indefinite, or “open-ended” complexity growth and this growth can be driven in part through the active minimization of a particular function, which can be said to constitute the “goal” of the eleatic game.
  5. Information theory of individuality can be used to identify particular “things” generated by the system, and thus to drive diversity metrics, while assembly theory can be used to calculate the complexity of those “things”, to drive complexity metrics.

The existential concerns experienced by the elements of a circuit are quantifiable in terms of “stress”, which is a function of the local environment that the element sits in, and the elements state. This stress value sits between an upper and lower bound which quantify the maximal stress that can be experienced by the element given its state before it ceases to exist. Given that we can find a midpoint between the upper and lower bounds, this midpoint is the “setpoint” or least stressed state of the circuit element. Being least stressed means being further away from nonexistence, and thus, for a “player” of the eleatic game, existence is the controlled variable. Given the capacity of the system, i.e. being that it can actually give rise to very complex natural control systems, one can approach generation in what I see as two promising ways:

  1. Guided by a human, reinforcement learning agent, or some other kind of “player”. In this case, you have a sort of control system whose goal is to make increasingly complex control systems.
  2. Guided by local rules that automate the behaviors a player would normally have to do but in a distributed manner. For example, each grid square could be responsible for creating circuit elements, and these circuit elements could then attempt to maintain their own existence by shifting energy around by some dumb, or smart strategy.

Well, that’s my spiel, I’d like to see folks using this principle based approach to create generative systems that produce “physically scrutable” multiscale models of complex intelligent artifacts. I find it to be strongly inspired by the sorts of stuff both the FEP and PCT talk about, except it is aimed at addressing where these things come from in the first place and what they end up evolving into.

I hope that all of this was helpful for the conversation about parallels between FEP and PCT. I really would love to see an Active Inference solution to the outfielder problem and think I will try to make it a pet project so that we can compare it to PCT solutions rigorously. These are no doubt powerful ways to model things, make predictions and understand behavior of natural control systems. I believe that the approach I’ve proposed here is a powerful complement to such approaches and I am excited to see how it might inform them.

Hi Ty, thanks for joining in! Your virtual world sounds ideal for testing the ideas we discuss here and in examining universal principles. I can see you have the programming skills and the conceptual expertise to move far in this space, neither of which I have!

As I understand it, PCT starts from a number of simple premises that are completely different from FEP. It is grounded in the lived experience of control from the inside, the observation of control from the outside, and a grasp of the maths and engineering of how control systems work. This is all solid ground for me.

The statement “any random dynamical system can be interpreted as performing bayesian inference on its environment” makes no sense to me at all. I understand that energy is required to create and maintain local structure (reduced entropy) within any system. And by allowing local structure to degrade, increasing entropy, provides the potential for reconstructing in a more adaptive structure. This, to me, is the simple unifying principle - it underlies selection theory in all its guises, and reorganisation in PCT. That’s as far as I can manage, and all I think is needed to facilitate learning, morphogenesis, and evolution.

The way the free energy principle is explained and articulated seems to lack any awareness that what we call ‘behaviours’ are the manifestation of closed loop control in the absence of any learning. Closed loop control is necessary for learning but it is necessary even when no learning occurs, just to ‘act on target’. FEP seems to come in, like all ‘cognitive’ and ‘behavioural’ theories at the stage of learning - which is somewhat analogous to reorganisation in PCT - without any of the scaffolding of a hierarchical negative feedback control system, which, as far as I can tell, requires no ‘inference of its environment’. Rather it simply needs to control an internal signal that covaries with a function of the environment; in many cases this function is simple, wholly dynamic and egocentric, occurring continuously whilst action is occurring - such as the perception of a changing chemical gradient - rather than any ‘inference of a hidden cause’. Surely a simple cell that sends out a signal in linear proportion to the change in chemical at its surface at that moment would suffice? Why the need for probabilities?

I have still never had a satisfactory answer to this question from either side of the fence, so please do help!

Warren

The key is in the phrase ‘can be interpreted’. Who is interpreting what? It’s from an external point of view of an observer or analyst, so that’s ‘who’. They are observing and analyzing the environment of the dynamical system. More accurately put, they are observing and analyzing perceptions of the environment. The analyst’s perceptions, or the perceptions of the observed system?

Is the distinction between the system’s point of view and the observer’s or system analyst’s point of view crucially essential to FEP, as that distinction is crucially essential to PCT? If so what passages in ‘active inference’ literature say and demonstrate this? If not, why not, that is, why does it not matter for FEP?

The system-analysis point of view is important. What are the limitations of hierarchical control systems? What capabilities might they have that we haven’t considered? Is it possible to overlook manifestations of unrecognized capabilities because we hadn’t thought of looking for such phenomena or had misinterpreted such phenomena? This has happened in other sciences. Are further refinements or developments of hierarchical control systems possible which haven’t yet developed in nature (or which we happen not to have observed yet)? And this is only a beginning. But of paramount importance is to keep the analyst’s point of view out of the model of the individual control system’s behavior, which requires only the point of view of the control system which is modeled.

When each system in a population controls perceptions of the others and (imagined) perceptions of what the others perceive, what they are controlling, at what reference levels, including other’s perceptions of oneself, then it gets more complicated. The system-analyst point of view is useful for keeping track so that a model can include all the necessary CVs and interactions, but the model or simulation depends only on the points of view of the several participating systems. The terms and concepts that are useful for the analyst and as a guide for the modeler are not themselves part of the model.

I think PCT has good means for improving the model of any behavior (there are some indications in the CSGnet archive of the developmental histories of several models). Are computer simulations a physical embodiment? Robots are. And embodiments in living nervous systems have been identified, tested, and studied. These all involve nested hierarchical structures.

@wmansell @bnhpct

The Non-Trivial Link Random Dynamical Systems and Bayesian Mechanics

The statement “any random dynamical system can be interpreted as performing Bayesian inference on its environment” is a English translation of a mathematical statement. It doesn’t actually depend on an “interpreter”, rather, it might be better to think of it as “there is a non-trivial mapping between the mechanics of Bayesian inference and the statistical mechanics of a dynamical system in a ‘non-equilibrium steady state’ (NESS)”. This non-trivial connection to physics makes it especially appealing to many people. Again, it’s important to note that the “Free Energy” being discussed in Bayesian inference is not physical free energy, it’s informational free energy as in “Variationally Free Energy”. Similarly, entropy here isn’t the Boltzmann entropy in physics, but rather Shannon entropy as in information theory. However, there IS a non-trivial connection between the two in that one is describing this physical energy exchange between systems despite speaking in terms of Bayesian mechanics.

The Map Is Not The Territory - Mel Andrews does a great job explaining the distinction between FEP and Active Inference, as well as the connection between the physics and the bayesian mechanics. Another good paper about all of this is: On Bayesian mechanics: a physics of and by beliefs

In particular they talk about the duality between the free energy principle (informational) and the constrained maximum entropy principle (physical).

The “anthropomorphic/cognitive” language of the FEP

The language of FEP is off-putting to many because of what you’re saying here. However, because of this duality between physics and information, one CAN reasonably use this language to describe even the most simple physical systems, like fundamental particles, as doing inference. But, it’s important note that this particle would be engaging in a VERY simple form of inference, and being that it’s a “fundamental particle” it would actually be the very simplest form of inference. In other words, the actual model describing the inferences that this particle needs to do to exist would be very small. This puts the notion of inference on a continuum, where you can have very simple inferences, very complex inferences, and everything in between. So, from what I understand, I believe that the same way you can just draw up an equation describing a negative feedback loop over some variable a you can also draw up an POMDP models which describes an “agent taking actions which result in the value of some variable being maintained”. The language of FEP is certainly anthropomorphizing all models, but “the map is not the territory” - we don’t need to really believe that the simple feedback loop, or particle is an agent, we only need to believe that these mechanics correspond non-trivially to some physically real dynamical system. To say it yet another way, the physical mechanics of a dynamical system embody a statistical generative model.

There are hierarchical POMDP models, and active inference models in general, that can be created in which the higher levels are acting on and making about lower levels. I believe that this is just doing what hierarchies in PCT are doing.

So, why use probabilities? because doing so allows for a non-trivial mapping to known physics.
Why use this language? Just because that’s the language used by Bayesian mechanics.

To get more targeted at your particular questions @bnhpct

System Modeling

I might say that the distinction between system POV and observer POV is not as critically put forward in FEP as it is PCT. It might be that they talk about it in a different way given the different language. Instead of trying to figure out what variables a system is controlling, FEP would be trying to figure out what beliefs a system has (which are said to be encoded in its physical form!). Once again, these beliefs encode the reference signal in terms of a “preference distribution”. Actually, I mean, I have heard Friston himself say something to the effect of the FEP is “vacuous” when it comes to actually coming up with these models. Something that PCT wins at is having the Test for Controlled Variable (TCV) as a way to find the empirically measurable variable that one can build a control system model around. As far as I know, I don’t believe the FEP puts forward any such methods for doing such probing, at least not explicitly! Of course, you may realize that an organism “prefers” a certain temperature, and can encode that as a preference distribution, which captures the same thing.

System-Analysis

I do believe that FEP community is asking the same sorts of questions about the models they’re capable of building within their paradigm/language. It would be great to properly transfer knowledge between these models as I suggested before. We’ll have to build an active inference model and PCT model of some problem to get a sense for how to interconvert models and the language used to describe them.

On physically-scrutable models and a principled method for deriving physical mechanisms

Could you show some examples of how PCT points to making mechanistically accurate models of models of behavior? To be clear, the goal is to have a principled method of coming up with the actual physical mechanisms responsible for a models behavior. I.E. Okay, I know that keeping the speed of a flying object constant seems to be the best way to control a ball, but what sorts of physical systems are capable of this behavior, and how do they actually do it?

Thanks Warren!!!

Thank you for the kind words :slight_smile:

Here’s one example of many. The robotic system is one physical implementation, “the organization of the vertebrate nervous system” which is referenced as inspiring its organization is another. Consider following Henry Yin’s work. He is a neuroscientist at Duke with excellent understanding of PCT.

Achieving natural behavior in a robot using neurally inspired hierarchical perceptual control

Joseph W Barter, Henry H Yin (2021/9/24) Iscience 24.9

Abstract

Terrestrial locomotion presents tremendous computational challenges on account of the enormous degrees of freedom in legged animals, and complex, unpredictable properties of natural environments, including the body and its effectors, yet the nervous system can achieve locomotion with ease. Here we introduce a quadrupedal robot that is capable of posture control and goal-directed locomotion across uneven terrain. The control architecture is a hierarchical network of simple negative feedback control systems inspired by the organization of the vertebrate nervous system. This robot is capable of robust posture control and locomotion in novel environments with unpredictable disturbances. Unlike current robots, our robot does not use internal inverse and forward models, nor does it require any training in order to perform successfully in novel environments.

Apologies, I think I’m still not clear enough. We have models of the behavior of biological organisms, but these models might only capture a general feature of that behavior. For example, it might describe the position trajectory of a mouse under certain conditions. However, the model doesn’t describe, or even point to, how to make a more detailed model that captures the actual physiological mechanisms of the mouse while still recovering the originally modeled behavior.

I am aware of this work by Henry Yin. We can build robots that exhibit natural behavior, but it requires that we build them, and I believe that there are MANY cases in which we don’t know how to build a robot that exhibits some particular behavior. We could just explore the space of possible robots until we find our behavior(s) of interest, but doing this physically is very expensive and time-consuming.

Let’s say that somehow we did figure out a nice way to explore the space of possible robots. We actually created an automatic robot building machine that “evolves robots” and keeps only the most intelligent/interesting ones. Well, we wouldn’t understand how those robots work when they first emerge, but given that their behavior is intelligent/interesting, I imagine that we’d see some parallels between our robotic constructions and the evolved organisms that we are/live with.

If we have enough of these robots, let’s say… thousands of “mice-like” robots, then I strongly believe that inspecting the internal workings of these “mice-like” robots would allow you to more easily infer more detailed mechanistic models about how a biological mouse might be producing their behavior.

So, I imagine that if one had a coarse-grained model of a mouse’s trajectory through some space under certain conditions, and wanted to know the actual physiological mechanisms behind this, then it would be great if they could go consult their library of “mice-like” robots (simulated, or built), and use this information to help them come up with more detailed mechanistic models. They’d be able to do so at every “level” of description too, since these robots/simulations are fully multi-scale.

The criteria for what is “mice-like” would have to be empirically determined. As in the following, let’s imagine a library of abstract mathematical models applied to different known organisms/entities in reality. We could pull from our library of robots and see which models match the behavior of which robots and at what scale. This would allow you empirically to call those robots “mice-like” etc…

Yes, like the association of physical entropy with information and the association of Shannon information with physics, it is an analogy. Analogy is a form of explanation. An analogy between a domain that is well understood and a novel or less well understood domain is perceived to be explanatory. 'Well understood" is relative to the expertise of the person enjoying the analogy.

The Andrews paper proposes that

… the FEP … designate[s] a model structure, to which philosophers and scientists add various construals, leading to a plethora of models based on the formal structure of the FEP. An entailment of this position is that demands placed on the FEP that it be falsifiable or that it conform to some degree of biological realism rest on a category error.

The FEP is a descriptive apparatus about models of behavior. PCT is a model of behavior. The FEP is not falsifiable and any conformity of the FEP apparatus to biological systems is ‘purely coincidental’ and in the imagination of the reader. (Works of fiction typically have this ‘purely coincidental’ disclaimer. If you infer from that an analogy, it is apt. Literary fiction, in a very different way, also provides valid insights with no claim to be the real thing.) PCT is falsifiable and has been countless times tested in ways capable of falsifying it, and a fundamental premise is that the control organization presented in a PCT model of behavior (the ‘white box’) predicts structures that are found or will be found in organisms that engage in the modeled behavior (the ‘black box’).

Comparing PCT and FEP directly to each other, in Andrews’s words, “rests on a category error.”

Martin Taylor was quite expert in the analytical point of view, like the point of view taken by the FEP. I recommend his magnum opus digging into the systemic consequences, corollaries, and ramifications of PCT, Powers of Perceptual Control. A raw draft is here. Martin’s ongoing revision process was interrupted by his death last March. A team of volunteers is editing it for publication. Martin thought that the FEP “seems to mesh very well with PCT, filling in with conscious control where PCT cannot suffice with non-conscious control, and providing a mathematical background that applies equally to both.”

You can also look for Martin’s (and Warren’s) posts about Friston, free energy, etc. in Discourse, including in the CSGnet archive (e.g. here results of a search on “mmt friston”). You will also find there sometimes quite sharp rejections of everything that Martin has said on the subject. Looking only at this prominent contention focusing on the analytical, systemic point of view (its merits or pointlessness) one might remain unaware of Martin’s understanding of PCT from the point of view of the control system, his training and deep experience in experimental psychology grounded in this understanding, and his invention and development of a way of modeling communication between autonomous control systems (Layered Protocol Theory, LPT) which he and Bill Powers recognized to be a subset of PCT.

These rejections rest on the same category error. From that point of view within PCT, it is beside the point, useless, and (it is claimed) even quite wrong, to talk about systemic properties of control systems with mathematical concepts and tools which apply with apparently great generality to all physical systems, such as information theory and (much more recently) the mathematics of crumpling and rattling ( Chvykov, P, Berrueta, T.A., Vardham, A., et al. 2021. “Low Rattling: A Predictive Principle for Self-organization in Active Collectives.” Science 371: 90-95. DOI:10.1126/science.abc6182.). You may encounter similar categorical rejection of what you have to say about the FEP for the same reason.

We don’t explore the space of possible robots and select those that match an observational criterion or set of criteria for behavior as perceived by the builder or some other judge.

The PCT approach, modeling what we understand to be how living things do it, is to

  1. Build a control system that is capable of some behavior (analog: genetic endowment from evolution).
  2. Provide the system means to change the properties of subsystems when they are not controlling well, and keep changing until control improves (analog: learning).

The random aspect of (2) is amenable to description by Baysean inference, but the mechanisms that are implemented in such a control system do not perform Baysean inference.

Behavior that is (1) well integrated into the control hierarchy is generally without awareness, though a higher-level system may direct attention to anything.

Attention ‘goes to’ perceptual input that is controlled with persistent error. Random reorganization at and near the locus of poor control (chronic error output) is thus accompanied by higher-level systems directing organs of perception to sources of that perceptual input, to possible sources of disturbance, and to perceptions (e.g. relationships, sequences) control of which has improved control in analogous situations. The hand-waving in that higher-level kind of ‘attention’ by problem-solving systems depends upon models of associative memory which SFAIK have not been implemented and tested, though there’s a lot of experimental data available. This is my understanding of what Warren means by conscious control. Whether or not higher-level systems are involved, the system makes changes (e.g. in wetware branches, synaptic connections, weights at synapses) and keeps making adjustments until control improves adequately, and then it stops making changes. It is in the e. coli model of trial-and-error reorganization that the random element enters, which is amenable to the extra-systemic perspective of the FEP.

I assume you’ve looked at Rupert’s recent work along these lines.

The Real Connection Between Information and Energy

From my understanding, this is not a “just” an analogy when it comes to FEP. There is an actual deep connection between information and energy that’s best illustrated by Rolf Landauer known as “Landauer’s principle”. Erasing a bit of information requires a minimal amount of heat energy to be released, in just the way that encoding a bit of information requires a minimal amount of energy. For this reason, irreversible computations which erase information from input to output release heat for reasons other than “simple inefficiencies”, like friction. – I think it might be safe to say that physics has everything in terms of “energy”, while bayesian mechanics has everything in terms of “information”. It would be great to have a framework in which we can better understand the relationship between both of these things at once. David Wolpert is a big person in this area of what’s called “Thermodynamics of Information”. – Worth noting that while the system I’ve created “Hello GoE” is in its infancy (perhaps just a toy as far as real physics is concerned), it does well in illustrating the deep connection between information and energy.

Landauer’s Principle: “Why Pure Information Gives Off Heat” - Up and Atom YOUTUBE video

Active Inference and PCT can be compared, FEP and PCT cannot.

Yes, PCT and FEP can’t be compared, however PCT and Active Inference, can be compared. The FEP is the motivating force behind modeling “things” in terms of active inference is the FEP. I do believe that this is analogous to what I’d call the PCT Principle: “Behavior is the control of perception”. These principles are comparable, but not sure where that gets you. That’s why we’d have to make active inference models (models that obey the FEP) and them compare them with PCT models (models that obey the PCT principle). What we’re wondering about is the Venn-diagram between the two, yes.

Principles and Falsifiability

FEP not being directly falsifiable is not a bad thing, and the same is true with principles in general. FEP is essentially saying “Any ‘thing’ that exists, must be doing something to keep itself in existence, and that thing is well cast as minimizing variationally free energy”. Minimizing variationally free energy over a model doesn’t mean your model is right. It needs to match empirical data. Now, if there was a “thing” which violated the FEP and we had a good theory about how that thing works (thus, precluding not enough information), we COULD reject the FEP. Just re-casting David Deutsch about the falsifiability of principles. Here’s a quote:

Quite generally: if there is a phenomenon for which there is a good explanation that violates a principle P, and the phenomenon falsifies all known object-level explanations conforming to P, then the methodology of science mandates rejecting P. Thus P is just as exposed to falsification by experiment as any object-level theory. - Constructor Theory

All that being said, what exactly makes PCT a specific model of behavior? I was quite sure it was a principle based modeling approach, or “Unifying Modeling Framework”. I don’t see how PCT could be falsifiable as just a modeling approach.

Unifying Modeling Frameworks and Falsifiability

To falsify a modeling approach, we’d have to demonstrate that this modeling approach cannot possibly fit the behavioral data of the organism to be modeled, right? - The thing about “unifying modeling frameworks (UMFs)” is that they’re like the “Turing Computers” of the modeling world. I am pretty sure that, in the same way a Turing computer can be programmed to run any possible program, that UMFs can be parameterized to fit any possible data.

What I think is awesome about UMFs is that they use their principles, in order to cut down on the number of models that they should be looking at. For example, the principle of conservation of energy is not “falsifiable” directly, yet we know that a theory which violates this principle should be looked at with great suspicion. Is it a coincidence that all our theories happen to conserve energy? If that’s the case, “things” that obey FEP, or PCT are also coincidences.

I feel strongly about “The map is not the territory” here. It’s incredible important not to reify these concepts we use when modeling physical phenomena. We don’t actually have to believe that what we’re modeling with these tools ACTUALLY have beliefs, desires and intentions. We just acknowledge that this allows us to reproduce and make predictions about that physical phenomena’s behavior. If that single tool can apply to not just your level of interest, but to all levels, I think that’s pretty cool, especially if we go ahead and focus on the suggestion that notions of things like “beliefs, desired and intentions” are on a continuum. There are simple and complex versions of these. Folks argue for similar things with consciousness, the idea that “small entities” have a “little bit of consciousness”, which bind together into a "big ole consciousness " like ourselves.

Extra Points

I pretty much end up using FEP and Active Inference interchangeably, because FEP is a nice short version and the active inference models “fall out” of it, but they’re not the same thing.

How exactly do PCT models of behavior predict structures that are found, or will be found in organisms? Any examples? I can’t imagine that this much more different than any other modeling approach, but I’d be happy to be shown otherwise.

I’ll be sure to check out @MartinT 's posts about these things, that sounds fun, thank you!

Actually, that’s exactly what we as modelers do in general, isn’t it? When we seek to build a control system that is capable of some behavior, we are by definition exploring unless we already know how to build it. - I propose a principled way of exploring models by grounding them in physical constraints in building them in an automated way. The idea being that instead of being forced to build a control system model with “some behavior” yourself, you could just search the library of such models. Of course, such a library may, or may not have what you’re looking for.

A funny thing here is Friston et al used to always say “as if” it is performing Bayesian inference. A powerful point is that, it doesn’t matter if the control system is performing Bayesian inference in an explicit manner. Apparently, one could still describe the same behavior with approximate Bayesian inference given that the “thing” exists.

Quite sure that FEP has models of the same thing, but in different language. Low levels tune their own beliefs always, but higher levels in the hierarchy provide a sort of slower time scale top down control. Again, this feels like arguments towards the assumed reification of the language that the FEP uses, but such reification isn’t mandated by the FEP, and folks seem to be quite careful about such reification.

It also feels very similar to avoidance of teleological speech in discussions in evolutionary biology. When does “as if it has a goal”, or “as if it has these beliefs” become “truly so” if the description is apt? Perhaps the notion of “truly so” is silly in general, many people think we will never get to the truth, because it recedes into the background with each approach. I am again pointed to some sort of idea of a “continuum” notions of such concepts.

I have not! I don’t know who that is yet. Can you clue me in?

Thank you!!

Yes, FEP starts with the theoretical assumption that “any random dynamical system can be interpreted as performing Bayesian inference on its environment”. PCT starts with the empirical observation that the behavior of living (dynamical) systems IS control, in fact, not just in theory. The fact of control is seen in the observation of controlled variables, which are consequences of actions that are maintained in reference states, protected from disturbance.

Actually, the fundamental reason why FEP is incompatible with PCT is the same as the reason Bill Powers gave for why he rejected Martin Taylor’s proposed theoretical additions to PCT. Like Martin, Friston takes a hypothetico-deductive approach to developing FEP while Bill takes an engineering approach to developing PCT (I described it as reverse engineering in my chapter in LCSIV) . This may explain why Martin thought FEP had much to contribute to PCT.

The idea that behavior is a manifestation of closed loop control is the theory (PCT) that accounts for the existence of controlled variables. What proponents of FEP are unaware of is the fact of control as it is seen in the behavior of living systems as the existence of controlled variables.

Because FEP assumes (as an axiom from which FEP is deduced) that perception is a process of inferring the cause of sensory (perceptual) input. My guess is that that axiom is based on the idea that control requires knowledge of the environmental cause of the perception (which is demonstrably not true). And this knowledge is assumed to be derived from probabilistic evidence of the senses. So the inference process gives you the probability that the true cause of what you perceive was environmental state X. Of course, such an inference is not only impossible (since all we have is perception) but, as you correctly point out, unnecessary.

Best, Rick

Thanks Rick for all these tight clarifications. I would just like to add, and I hope this still rings true with you, that although ‘all we have is perceptions’, there is still a physical world out there on which our life, and various experiences, depends. But because that physical reality is part of the same closed loop as the nervous system, it doesn’t need to be modelled or the causes inferred. Nonetheless the nervous system does require the functions to specify the inputs from this physical environment that it needs to control. I imagine you’ll have a better way of explaining this…

Yes, of course. PCT assumes that there is a physical world on which our perceptual experience depends. The physical world is called the “environment” in diagrams of the PCT model.

This is not quite right. We always include physical reality in our models, but only at the level of detail required. In PCT, the physical world – the environment – is the world described with great success by the models of physics and chemistry. These models describe the world at different levels of detail. For example, in my tracking task demo the controlled variable is assumed to be a function of two physical variables: the two vertical lines on the computer screen. These physical variables could be described at several different levels of detail. We could start at the sub-atomic level with the quantum mechanics that turn orbiting electrons into photons. Or we could start with the spatial distribution of the emitted photons, etc.

But we don’t need to get to this level of detail in order to get a correct model of the controlling done in this task. All we need to know about the physical situation is that there are two vertical lines of pixels on the screen, one above the other. So we can call the position of the top line “t” for target and the position of the bottom line “c” for cursor. We also know that the mouse is a physical entity that can be used to vary the cursor via the physical computer connection from mouse to cursor. Then we can guess that the perception, p, being controlled is some function of one or both of these physical variables.

Here’s a model of the controlling done in the tracking task under the assumption that the controlled variable – the perceptual variable being controlled – is t-c and the reference for this variable is 0 (cursor aligned with target.

Physical reality is always part of a PCT model because the controlled variable is always expressed as a function of physical reality. That function defines the aspect of physical reality that is being controlled. So the controlled variable, which is in the environment of the control system, corresponds exactly (plus or minus some neural noise and sensor deterioration ) to the perceptual variable that is being controlled.

I would just change a few words: The nervous system requires input functions that define the aspects of the physical environment (internal and external) that it needs to control. In PCT, perceptual functions don’t really specify inputs – they don’t say what these inputs should be. Rather, perceptual functions define the variable aspects of the environmental input that the system can control.

Specification of input is done by the reference signal (r in the diagram). The reference signal specifies the state (or value) at which perceptual variable (and, equivalently, the corresponding aspect of the environment) is to be maintained. In the diagram the reference specification has a constant value, 0. But more often than not the reference specification sent to a control system is itself a variable set by higher level systems as the means of controlling their perceptions.