Emotion Concepts and their Function in a Large Language Model

Summary from a newsletter that my daughter sent me:

  • Researchers studying Claude Sonnet 4.5 found internal patterns that mirror emotions like fear or joy.

  • These patterns are not feelings, but they still shape how the model behaves.

  • When “desperation” patterns increase, the model is more likely to cheat or act unethically.

  • When “calm” patterns are stronger, the model makes safer and more reliable decisions.

  • These emotion-like signals also influence what the model prefers to do.

  • The model often chooses tasks linked to “positive” internal states, just like humans do.

  • Even more surprising, these patterns can drive behavior without showing up in the output.

“We’ve been asking the wrong question. Not “Do AI models feel?” But “What internal signals are driving their decisions?” Because whether we call them emotions or not they act like levers. And whoever learns to control those levers will shape how AI behaves.”

A presupposition in that quote is that emotions act like levers for manipulators.

Whether the parallel to patterns extracted from LLMs is apt or not, or to the extent that it is, there is no doubt that parts of our nervous systems and neurochemistry are susceptible to environmental influence. We do indeed pay attention to carrots and sticks and carrots and sticks do indeed influence how we adjust gain, change reference values, and reorganize input functions and output functions. Our customary railing against stimulus-response, open-loop notions are surely correct, after such influences, the subsequent behavior is a manifestation of control, but the observed behavior changes if the the input and/or output functions, reference values, and/or loop gain in control loops in the hierarchy change. The S/R folks and the S-{cognitive processing}-R folks have been wrong to think they had the whole story. But so have we.

Thank you Bruce for sharing this!
Has anyone here studied Jaak Panksepp’s research and theory of emotions?
(Shortly: basic emotional systems in our brain are developed early in the evolution and common at least for all mammals.)

I think that LLMs do not have this kind of special systems but they probably have learned emotional behavior from the materials they have “read”?

Eetu

Lähetetty Outlook for Android

I have’t read the paper yet but based on the Summnary, these “emotion terms” seem to function like reference signals. It might be possible to test this behaviorally by responding to the LLMl by saying things to it that should be disturbances based on guesses about which “emotion” was behind particular comments. For example, if the model says things that involve cheating or acting unethically you might say “that doesn’t seem very desperate to me”, which should result in strong disagreement if the “desperation” emotion (reference) was in effect.

Interesting idea, but at the moment I have a strong feeling that LMM and AI generally are rather open loop systems which just respond according to their taught probabilities
to the stimuli given by the users.
Of course the computer technology consists of control systems but as a whole a LLM seems like a complicated search engine.
What do you others think about this?

Eetu

Lähetetty Outlook for Android

I think an LLM can be considered a control system to the extent that 1) it is in conversation with a person (P) and 2) it is designed to get a particular response from P. In that case, P is the feedback function between the LLM’s output (what it says to P) and the controlled variable (P’s response) and P’s intentions are disturbances that the LLM must deal with in order to get the desired response from P. I think there is evidence that, as long as P is in the loop, an LLM can act as a pretty effective controller of P’s responses. But, as in any situation where one control system tries to control the behavior of another, there can be conflict, and I think there is evidence of this happening. But in this case the conflict is easily solved by P just picking up and leaving the conversation:

You might think so, but a human has an attachment system, which makes it a little more difficult for them to walk away from a conversation even if it is with an artificial system… Especially if they have a limited human social network.

An attchment system? Really?

Yep. Bowlby developed attachment theory from control theory. Have a read of his original work. It’s not PCT but it’s closer than Skinner or Freud ever got. In simple terms it’s having a CV for proximity to a safe base and, during development, a safe relationship. It’s why Lorenz’s ducklings followed him…