PCT vs Free Energy (Specification versus Prediction)

I’ve been trying to find some clear description of the relationship between PCT and Friston’s “Free Energy” (FE) model of behavior. Apparently there was some discussion about it on CSGNet back in early 2019 but it never seemed to get anywhere. This discussion took place after Bill passed away I wasn’t sure I knew what Bill thought of FE but I Was pretty sure it wasn’t highly.

My impression of the FE model is that there is no there there. But I just did a search in the archives to see if I could find out what Bill though of FE and, lo and behold, I found this little gem that he sent to CSGNet in 2011, in response to a personal note from Henry Yin:

Date: Wed, 2 Nov 2011 19:08:36 -0600
From: Bill Powers powers_w@FRONTIER.NET
Subject: Re: Wholegroup; Optimal control

Hello, Henry –

At 03:03 PM 11/2/2011 -0400, Henry Yin wrote:

Hello, Henry –

At 03:03 PM 11/2/2011 -0400, Henry Yin wrote:

If you want to see some conceptual confusion, just check out this article, published today, on optimal control from Karl Friston, Richard’s favorite:)

Good God. I have written about six paragraphs here, and this is all that hasn’t been deleted. I think I just have to decline to comment. There are so many things wrong with Friston’s ideas that we just have to deal with them one at a time if they come up in conversation. There’s no way I can handle this entire tub of ********. That’s an eight-letter word.


Apparently, my guess that Bill didn’t think highly of FE was an overestimate. But I am still interested in seeing whether there is any way to compare the Free Energy (FE) model to PCT. To the extent that I can understand it, the FE model seems to say that behavior is based on prediction of the results of action while PCT says that behavior is based on specification of the results of action. In terms of the production of behavior, FE says that the brain is a prediction machine while PCT says that it is a specification machine.

I would like to try to develop ways to test these two different views of how behavior is produced but in order to do it I need to know how FE (or any other model of behavior based on prediction) actually works. In order to do this I have to know how FE (or any other model of behavior based on prediction) explain behavior, such as the behavior seen in a simple tracking task, behavior that is explained so nicely by PCT.

If anyone can help me out on this I’d really appreciate it.

1 Like

Hi Rick, I think it is good news that you are working on this! I also think that specification is the most appropriate word that we can use in PCT for what the free energy principle is trying to explain. The biggest issue I think for making any comparison is that these researchers seem to be using the word prediction to mean about four different types of processes. These include specification, control, extrapolation, and mental simulation into the future. Maybe this confusion is partly what frustrated Bill so much.
The way I have tried to get my head around all these issues is to stick with PCT and work out if there is any explanatory gap within the PCT that can be filled through reading and listening to the work that these researchers are coming up with. The answer is very little, but I am drawn to the fact that at present, the workings of the input functions in PCT are quite vague. A lot of the research on so-called prediction is really about whether a perceptual variable continues to be specified as the same perceptual variable, over time or despite occlusion or degradation of inputs. They seem to use bayesian statistics to set a smoothing threshold for this. How do we think PCT does it? Given that the PCT is a constructivist framework, We are not so concerned with absolute accuracy but It would be functionally important for the same perceptual variable to be consistently specified over time. So far, I have been guessing that inputs from lower levels are integrated layer by layer through the input functions and that during early development, the neurons in these input functions reorganise to form the most reliable specification of each perceptual variable over time. in adulthood we still do this for high-level principles that bring together perceptions from diverse modalities, creating our stream of autobiographical memories.

Anyway, this all misses the point of comparing PCT with the free energy principle. I wonder whether the way to do that is to produce a PCT model that learns to specify a perceptual variable through reorganisation but to the observer it looks like it is making predictions based on bayesian probabilities. This is another form of the behavioural illusion.

Just another point. Karl Friston himself has an appreciation and a soft spot for PCT, and therefore, probably Bill himself! So the feeling is not mutual. In fact, he is very approachable and willing to chat these kinds of issues through.

Looking forward to talking to you more about this!

Hi Rick and Warren,

The prediction of Bayesian or any other Machine learning (ML) model is based on the patterns and their relationships that the model has learned from the prior data. Hence, predictions involve applying this obtained model to new, unseen data to estimate or classify results. Now most important in this process are the input features and corresponding output values (as a regression problem) or labels (as a classification problem). Features are the input variables and their choice and quality significantly impact the model’s ability to generalize to new data (Lasso method can be used to identify which feature to retain). The architecture or structure of the ML model defines how it processes input data and produces output predictions. This includes linear regression, decision tree, neural network, etc. Hyperparameters are the internal variables that the model adjusts (or reorganises according to Warren) during the training process to minimize the difference between predicted and actual outcomes. The optimal values of these parameters are learned from the training data.

Now in my point of view we can set the comparison:

ML Model - PCT Model
Hyperparameters (weights) - Reorganising Parameters (weights)
Features - Control Variables?

Prediction (in ML also they say Specificity and Sensitivity) - Specification

Specificity is a measure of how well a model correctly identifies true negative instances among all actual negative instances.
Specificity= (True Negatives)/(False Positives + True Negatives)
A high specificity indicates that the model is effective at correctly identifying true negative instances, minimising false positives.

And sensitivity is towards true positive. I think PCT struggles to define True Positive?

To be honest the recursiveness of PCT is very similar to Recursive Neural network (RNN).

RNN that is designed to operate on tree-structured data. Instead of processing sequences, RNN works on hierarchical structures, like parsing trees. They recursively reorganise weights to combine information from lower child nodes to higher parent nodes until a final representation is obtained.

Let us do it, I think this will be a novel framework. I think ChatGPT or Natural Language is built on PCT hierarchies…LOL!

Hi Tauseef, thank you for explaining the similarities. I think the main problem is still with the word prediction because it sets the entire context for doing this learning that is very restrictive in terms of the task at hand - recognising things for an external human user. Instead, I would assume that a living organism is continuously acting to control, very primitive, perceptual variables before, and during this, more complicated, learning of configurations. And it has to counteract disturbances, whilst doing so, however, my hunch is that a living organism learns much faster because it is already controlling inputs based on the perceptual variables, it can already specify. So the challenge is we need to set up a context for this competition, in which the agents need to maintain their survival by acting against disturbances through controlling some simple perceptual variables, and during that process has a capacity to learn to specify more complex perceptual variables.
How does that sound?


It is not at all easy to say what the “free energy principle” (FEP) is about. It is really an example of highly (/ unnecessarily) esoteric piece of theory. (See more https://en.wikipedia.org/wiki/Free_energy_principle and the attachment.) For me it seems that it is somehow about a different level phenomenon than PCT. PCT (for me at least) is a theory about action (or behavior as many of you want to say) of living beings. Part (and only part) of that explained phenomenon in is learning or reorganization. FEP is another way round: it is interested mainly or even only in learning – and so action (our: control of perceptions) is only an unimportant side issue for it. Here it is similar with so many other theories and especially those of machine learning (ML).

“Prediction” (a somewhat polysemy term) is contained in PCT: When an organism is controlling a perception it implicitly makes or has made a prediction that its output will draw and/or keep the perception near its reference. If that will not happen so follows a (at least implicitly again) surprise which will cause reorganization. In FEP this is otherwise the same, but the organism seems to be making predictions for no particular reason and it only predicts what will happen – so it is always in an observation mode. In FEP and ML learning is based on the imitation of an external model. In PCT (and real life) this is seldom possible, but the organism must (at least often or basically) learn just from trial-and-error.

Friston The free-energy principle 2010.pdf (700 KB)

Thank you, Eetu. This framing of the relationship seems right, as far as it goes.

I’ll paste here an excellent post by Martin to ECACSnet, and your ensuing exchange with him, Warren, which I’m sure you remember. I think this is important to have in the Discourse record.

[Martin Taylor]

Warren, I don’t know what Friston or his followers think. Sometimes I don’t even know what I think, but PCT explains this last, since perceptions in the fully reorganized hierarchy are controlled non-consciously. We don’t consciously think about what we do to bring those perceptions nearer their reference values. But we often encounter situations that we perceive consciously and we perceive that they are not the way we want them to be. And that’s where mutual entropy comes in.

To answer to your last question: "Where is their architecture they can be lit into this mode that PCT so clearly describes? " is Seth A. K., & Friston K. J. (2016), Active interoceptive inference and the emotional brain. Phil. Trans. R. Soc. B 371: 20160007. http://dx.doi.org/10.1098/rstb.2016.0007. That’s the architecture I pointed out in CSGnet and in PPC Section 7.3 as functionally equivalent to the Powers architecture, while being possibly more powerful in explaining phenomena.

Now to the meat of you message. Ignore the maths, which are the heart of Friston’s publications — at least those I have seen. “Entropy” is really quite simple in principle. You have a measure of uncertainty (a.k.a. entropy) for any variable. It’s the same kind of measure as variance, just a way of describing how a set of values vary among themselves. Often those values are of the same variable at different times, but also often they are different values the variable might have at a given moment such as “now”. So if you have two variables, you have two uncertainties. Let’s say that one variable is in the environment, a complicated function of variables that are controllable in the reorganized hierarchy, but not a function that you know. The variable in the environment has some uncertainty both over time and possibly over variation in this novel function. There is also a variable that is the environmental correlate of the current reference value, so it, too has some uncertainty.

Then between two variables there is mutual uncertainty (a.k.a. mutual entropy), which is actually another variable, the uncertainty about one variable that is left when you know the value of the other. Uncertainty has a flip side, a complementary variable we call “information”. Information is always a reduction in uncertainty as a consequence of something such as an event or an observation. In control, you are acting to minimize the uncertainty of the perception (or of the Corresponding Environmental Variable) given the value of the reference variable. In terms of information, you are trying to maximize the mutual information between reference and perception (or the CEV).

Where does “free energy” come into this? That again is easy in principle, though perhaps not in detail. It takes energy to reduce entropy (uncertainty) or to maintain a reduced entropy of a variable. Perceptual control minimizes the mutual uncertainty between reference and CEV. It does this by using an energy flow from a source to a sink into which the control system dumps the entropy it removes from the CEV. The disturbance would supply energy to the CEV, but by using this through energy flow, the output extracts this energy by opposing it, dumping the disturbance’s energy (and the entropy it would have added to the CEV) into the waste flow out to the environment. IN all this, PCT and FEP are in complete agreement. Indeed, in a CSGnet message some years ago, and repeated in PPC Fig 7.2, I used a hierarchical connection taken from a paper by Seth and Friston as a possible mathematically equivalent alternative to the Powers hierarchical connection.

The equivalence between the Free Energy Principle and PCT breaks down in two places: (1) In the PCT non-conscious reorganized hierarchy the action output at every level has already been pre-wired. You don’t have to think, you just do. Later, you may think about what you did, but it has already been done. The Friston approach doesn’t allow for that AFAIK. You think about how to act in order to achieve minimal mutual uncertainty (entropy) between prediction (a.k.a. reference) and observation post-action; (2) PCT (and Powers in informal communications) has no position on the function of conscious thought, other than Powers’s assertion that every conscious perception is constructed in the reorganized hierarchy, but Friston, not having a reorganized hierarchy, makes no distinction among levels.

Putting these two differences together, we have that the FEP hierarchical architecture is equivalent to the Powers hierarchical architecture, at least on the perceptual side of the hierarchy, and can apply to conscious thought, which is not incorporated in the Powers architecture other than through the within-level imagination loop inside the reorganized perceptual control hierarchy.

Where FEP and PCT disagree most strongly is in that FEP requires analysis of the probable effects of different actions on the predicted value of the (in PCT) controlled perception, and a choice as to which actions, given the current observed state of the world, will most closely bring the observation to match the prediction. I earlier equated the predicted value with the reference value, but I don’t remember reading an FEP paper that talks about desired values. I imagine that Eva or someone could fill that hole in the argument (Eva because I seem to remember that she came to PCT from FEP).

When the perception that you consciously want to change is not of a situation you have encountered, you probably don’t have the actions to bring it to a reference value already reorganized into the hierarchy. Think of encountering chess for the first time. Every layout of the 32 pieces on the board makes as much sense as any other, except that the starting position looks a lot more regular than any other. To learn to play chess well, you have to learn to see some other layouts of the pieces as being significant — as being perceptions with values that need (or do not need) to be changed if you want to win the game. Initially, you have to work out “If I do this, then the resulting layout will be like that. Have I ever seen the result of that move? Is the resulting layout one I have seen to be promising or dangerous ?” and so forth for doing “this” and for doing “that”.

The more perceptions of layouts you have seen, the less you have to think whether the possible move brings the reference value “I win” nearer to the perceptual value of the whole complex board layout. More and more small patterns become perceptions reorganized into the non-conscious hierarchy, and the more of those small patterns become parts of larger patterns involving more pieces. But even Grand Masters have to think when opposing equally skilled opponents (though possibly not when playing 32 amateurs while blindfolded).

In all this, chess is a world, a Universe of possibility in which the Free Energy Principle of considering the effects of actions in the current world situation is needed only in more and more complex “layouts” that one has not yet learned the actions that would alter the controlled variable toward its reference value (winning in chess, becoming and remaining happy and healthy, perhaps, in life). As one learns more, more and more complex perceptions become included in the reorganized hierarchy, where one does not need to think about what to do.

This is very long, I know. But does it make sense to you? Does it answer your questions? Does it explain why I think FEP applies to conscious control, and is mathematically compatible with PCT in the reorganized hierarchy, though thinking about what to do under “these” circumstances is not compatible with the non-conscious reorganized hierarchy?


[Martin Taylor]

On 2021/04/30 9:35 AM, Warren Mansell wrote:
I found this really helpful:


And it led me to this:

"So control information is information that which can be reliably re-experienced and identified by the perceiver regardless of disturbances. To do so requires an input function that identifies this source of information, and a means to act against disturbances (e.g. movements, distortions, loss of data) that might render it unidentified. Otherwise the purpose or function of the information to the perceiver is redundant – it is just a meaningless pattern impinging on the senses.

“So control information is information that which can be reliably re-experienced and identified by the perceiver regardless of disturbances.”

So far so good. PCT would agree. But …

"To do so requires an input function that identifies this source of information, and a means to act against disturbances (e.g. movements, distortions, loss of data) that might render it unidentified. Otherwise the purpose or function of the information to the perceiver is redundant – it is just a meaningless pattern impinging on the senses. "

PCT would strongly disagree, for those perceptions whose control is in the hierarchy. All that is required is that some action pattern has been constructed that will move the perception in a consistent direction. The last sentence is unintelligible. What has the source to do with the ability to control a perception? What could the information available from a perception be redundant with?

The writer seems to believe that meaning to a perceiver is imposed from outside. PCT says that meaning depends on what one wants the world to be like a compared with how it is perceived to be. It is inside the observer/actor. Shannon pointed this out in his seminal Mathematical Theory of Communication, without involving what the receiver wants. What Shannon said (and this is true of Kolmogorov uncertainty as well) is that the information gained by a receiver from a message (observation) depends on what the receiver knows already that is augmented or changed by reception of the message (observation). It has nothing to do with the sender of the message, or the state of the world that is observed. What the observer/receiver gets out of the message (observation) depends on the fit between the decoding apparatus (perceptual functions) and the coding done by the sender (the actual construction of Real Reality, of which Perceptual Reality is a decoded version).

Anyway, I would say that abstracted paragraph has a valid message hidden in a mass of gobbledygook. The last sentence makes nonsense of what precedes it.


Warren Apr 30, 2021, 11:22 AM:

Hi Martin, you said “PCT would strongly disagree, for those perceptions whose control is in the hierarchy”
Sorry, I should explain, my point is about new variables for which there is not yet a learned input function… So my point is that to use reorganisation to develop a reliable, accurate input function to identify a variable, the system needs to be acting against disturbances so that a new trial-and-error candidate for the input function settles through reduction in error…

Martin Taylor]

OK. That makes a bit more sense. That is where I suggested that the Free Energy Principle approach comes into play. Consciously a bunch of different perceptions are, for some reason taken together as meaning something with respect to what you already know or want. Together, those perceptions are of some function in Real Reality, which could be but is not yet joined to become a function in a White Box (perceived Object). At this point, the consciously perceived value of this concatenation of perceptual values differs from what you want it to be.

By manipulating one or more of the perceptions already controllable in the hierarchy, just as one would if the new function were already reorganized into the hierarchy, the result might be closer to what you want. If you can perceive states in the outer world, you could manipulate their avatars in imagination — Plan a course of action, and if that seems as though it wouldn’t work, check out Plan B, Plan C, etc. non-destructively in imagination, using control of perceptions you already can control, or imagining simpler structures that might be part of the one you started with. “Peace in the World” might be an desired state too difficult to achieve with an imaginable plan that involves only perceptions you already can control.

The imagined Plan A, Plan B, … are predictions in the sense of FEP. If the perceptions to be controlled according to a plan do correspond to “the way the world is” (including your ability to control perceptions that the plan requires you to control), then if the plan produces the desired result in imagination, it may well do so when put into practise by controlling those controllable perceptions that form part of the as yet uncontrollable perceptual complex. That’s FEP “prediction”. If it doesn’t work in Real Reality, the reason might be perceptible. At least, something might have changed, including the uncertainty about the state of the world (CEV) that you are trying to control, given the state you want it to take on.

Though you don’t mention imagination, this may be what you are thinking of in your clarification when you say “my point is that to use reorganisation to develop a reliable, accurate input function to identify a variable, the system needs to be acting against disturbances so that a new trial-and-error candidate for the input function settles through reduction in error.”.


Hi Bruce, what I can’t get my head around is Martin‘s suggestion that controlling a perceptual variable in PCT is equivalent to reducing its uncertainty. Within a control unit value of the error can be specified to a high degree of certainty. And the control loop doesn’t reduce its uncertainty; it reduces its value. It’s is my impression that reducing uncertainty has more to with fine tuning the functions that calculate the values of the signals going round a control unit so that they are more reliable and create less variance in the signals. I do agree with Martin’s conclusion regarding prediction in FEP and how that is different from PCT

Actually, that’s what I would like to know: what does the free energy principle (FEP) explain? The only thing that can appropriately be called a “specification” in PCT is the reference signal, a theoretical entity. Is FEP a theory that explains a theory?

I think that would be confusing for anyone, including the FEP theorists themselves. Prediction and specification are quite different concepts. A prediction is a guess (usually based on data) about what the future state of a variable might be ; a specification is a statement of what the present state of a variable should be. A prediction can function as a specification, as in predictive models of control (eg. Parker et al, 2021). But a specification never functions as a prediction; it is always a “must be”, never a “maybe”, always a demand, never a suggestion.

I have no idea because I don’t understand the question. I don’t know what you mean when you say that prediction research is “about whether a perceptual variable continues to be specified as the same perceptual variable, over time or despite occlusion or degradation of inputs”.

I am pretty concerned about absolute accuracy and it’s not only “functionally important” for the same perceptual variable to be consistently specified over time, the same perceptual variable IS consistently specified over time in PCT.

No need for the reorganization. I’ve already created this illusion in my baseball catch demo.

One of the great tragedies of PCT is that people develop a soft spot for it for all the wrong reasons. It’s easy to tell that this is happening when people fall in love with PCT but don’t abandon their existing theoretical crushes, such as FEP, information theory, reinforcement theory, etc.


If the features are controlled then they are controlled variables. But if the ML model is the Free Energy (FE) model and it is really comparable to PCT why not just go with PCT? I certainly will. The FE model just looks like a bunch of high falutin’ mathematical hand waving to me. I have no idea how to use it as an example of a predictive control model, let alone a model of any actual behavior of a real living system.

I’m with you on this one, Eetu! And thanks for the attached paper on the Free Energy model.

I think that’s not a good way to think about it. Thee reference signals in PCT can be thought of as predictions in the sense you mention. But, as I said earlier, a prediction is a statement of what might happen, hence all the Bayesian statistical math. The references in PCT are best thought of as specifications for what should happen – and will be made to happen by the control loop, hence the lack of statistical hand waving in PCT.

The best part of this is that you note that the predictions in FEP (of what it’s not clear) are being made “for no particular reason”. Why predict when control works just fine (and often better) without it?


RM: The reference signals in PCT can be thought of as predictions in the sense you mention. But, as I said earlier, a prediction is a statement of what might happen, hence all the Bayesian statistical math. The references in PCT are best thought of as specifications for what should happen – and will be made to happen by the control loop, hence the lack of statistical hand waving in PCT.

EP: I did not mean that reference is a prediction. I agree with your thought, but I would say it still steeper: Prediction is what (we assume that) will happen, and reference is what the agent wants to happen. For me it seems that they both can be said to be specifications: you can specify predictions as well as references. But you can be wrong and surprised normally only with your predictions not with your references. (Well, sometimes can be surprised with our references but it means rather that we were wrong with our predictions about our references.) This difference can be also expressed so that prediction is like a descriptive and reference like a prescriptive statement.

Our references and predictions can have different relationships. Sometimes we can want what we predict, like in a sleepless night you predict and want that the sun will soon rise. (Bruce N used to call this control but I can’t quite agree.) Or in a nice celebration night you can predict and not want that the sun will soon rise. But the normal case at least in PCT is that you want to drink coffee, but you predict that you cannot do it if you don’t first cook the coffee, and so you control your perception of coffee drinking by cooking it first. Here the (at least implicit) prediction is that you can do that cooking. And here you can encounter a surprise and find out that your prediction was wrong. Perhaps the coffee jar is empty, or you cannot use the complicated coffee machine. These kinds of surprises cause learning and reorganization.

RM: The best part of this is that you note that the predictions in FEP (of what it’s not clear) are being made “for no particular reason”. Why predict when control works just fine (and often better) without it?

Well, control may not always work fine. You cannot control when the sun rises and sets, and when the bus comes. But still, it can be very important that you could predict those happenings reliably. I think it is this phenomenon of “brains ability to make predictions” which FEP is mostly interested to explain. Action and control is a smaller side issue for it. They think that predictions make it possible for us to act successfully in our environments, that is of course partly true, but we – at least I – think it more in the other way round: Action (= control of perceptions) both successfully and not successfully makes it possible (via reorganization) for us to make better predictions.

Good. I think it’s very important to understand that a reference signal in PCT is a specification for what the state of a perceptual variable should be.

As I said, a prediction can be used as a specification. But the prediction itself is not a specification.

In PCT, surprise is one type of emotion and all emotions are presumed to have the same basis – perception of the physiological consequences of error that produces no effective output on a controlled variable. What you call that perception – surprise, fright, love, etc – presumably depends on the situation in which the error is created.

In the free energy principle, “surprise” refers to a discrepancy between what is predicted and what happens. It’s a part of the theory; I think it’s the “free energy” we hear tell of.

On that note, I want to thank you again for the Friston paper "The free-energy principle: a unified brain theory? I have read it now and when I get a chance I will try to do what Bill Powers couldn’t: write a review of the free energy principle. I’m afraid I can’t guarantee success but I’ll try. Or at least I’ll try to try.

Rick, you can do a lot better than this. We need a model that learns, through reorganisation of its parameters and functions to do the flyball catching, and then we show that the data can be ‘explained’ as probabilistic prediction whereas we know the data was generated by a reorganising PCT system.

I’m flattered that you think so. But what is the “this” that you think I can do better than?

What is the difference between a “probabilistic prediction” model and a “PCT reorganization” model? If the behavior produced by PCT reorganization can be explained as probabilistic prediction, then aren’t the models functionally the same?

It would be a just-so-story explanation as opposed to an explanation through a working functional model. Just like Skinner ‘explains’ learning through the reinforcement of action, whereas it works through reorganisation of control systems.

Could you tell me the probabilistic prediction “just so” story and how it differs from the PCT reorganization model.

And could you please answer my first question about what it is that I did that I could do much better than. I really don’t know what it was I said that led you to say this.

Hi Rick, I think we’d need Tauseef’s help, partly from his experience in computer vision and partly because he’s already got a simplified version of the flyball demo in one dimension of control - that wins at Breakout and Pong. Rather than hardcoding the perceived lateral position of the ball, the PCT controller could have a lateral array of receptors that register the ‘light’ from the ball. These receptors would converge on a input function that needs to learn to output ‘position’ as a signal…

I say, we then run this demo which would start inaccurate and get more accurate over time, and show it to researchers who use predictive processing approaches and asked the to explain it…

I guess I’ll never find out what it is that I did that you think I could do much better than. So I’ll just keep on doing what I’m doing, mediocre as it may seem to you.

Probabilistic prediction refers to making predictions about future events or outcomes in a way that incorporates uncertainty. Instead of providing a deterministic answer (e.g., “this will happen”), probabilistic predictions offer a probability distribution over possible outcomes. This means that, rather than a single prediction, you get a range of possible outcomes along with their associated probabilities”.