The Test for the Controlled Variable

Unavoidable delay–IAPCT business to carry forward, wife’s medical issues, refinancing the house, birthday, requests from my indigenous friends for help coining vocabulary that their elders didn’t have, a couple of end-of-month reports to get out which I should be working on instead of this … I’m sure you will complain about the length and respond only to convenient bits, but I haven’t had time to make it shorter and it seems important to keep the record straight.

There is no suggestion in any of this that speech is generated output. It’s called the motor theory of speech perception and as such is not concerned with speech production. Here’s a summary statement:

The three main claims of the theory are the following: (1) Speech processing is special (Liberman & Mattingly, 1989; Mattingly & Liberman, 1988); (2) perceiving speech is perceiving vocal tract gestures2 (e.g., Liberman & Mattingly, 1985); (3) speech perception involves access to the speech motor system (e.g., Liberman et al., 1967).
Galantucci, B., Fowler, C. A., & Turvey, M. T. (2006). The motor theory of speech perception reviewed. Psychonomic bulletin & review, 13(3), 361–377. https://doi.org/10.3758/bf03193857

A theory of perception is obvious relevant because perception are what is controlled in speech as in all other purposeful behavior. The closest thing to a motor theory of speech production is the gestural model developed at Haskins Lab (at Yale) for computer synthesis of speech. That is not in play here.

Advocates of the motor theory of speech perception appear to have tried to make the articulatory aspect of speech perception account for everything about speech perception. I think that’s actually a misrepresentation, but that’s why it has been so disregarded within linguistics and phonology, because it seems to disregard the importance of the auditory aspect. It has gained traction mostly outside of linguistics. In some CogSci quarters there may have been efforts to recruit it for an account of generated outputs.

Liberman, Turvey, Galantucci, and the other developers of the motor theory of speech perception used the term ‘motor perceptions’ for the tactile and kinesthetic perceptions in the vocal tract which are controlled to pronounce language-compliant syllables and words. Auditory perceptions are also controlled, and you have agreed with me that the controlled speech variables in the auditory perceptual modality are controlled by means of setting the references for those ‘motor perceptions’ (proprioceptive and tactile perceptions) which produce speech sounds.

To recap, the speech sounds cannot be controlled in real time because by the time you hear them they have already been produced, any more than the archer can correct the arrow’s flight, once audible they are as it were ballistic in the air, and the best you can do is repeat the intended utterance and articulate it as intended, perhaps with higher gain the second time. The articulatory references are adjusted over repetitions of the given phoneme and syllable just as the archer adjusts the references for diverse perceptions so that arrows more reliably hit intended targets.

Any resemblance to the Test is accidental. They are not trying to identify the controlled variables. It has long been known that formants are variables controlled in both the production and recognition of spoken syllables and words. Research over many decades has teased out the perceptual variables which, when their values are disturbed, determine whether one phoneme or another is perceived.

Here’s one example among many thousands.

Liberman, A.M., Isenberg, D. & Rakerd, B. (1981). Duplex perception of cues for stop consonants: Evidence for a phonetic mode. Perception & Psychophysics 30, 133–143. Duplex perception of cues for stop consonants: Evidence for a phonetic mode | Attention, Perception, & Psychophysics

That article focuses on the auditory modality. It has also long been known that articulatory perceptions are also controlled in speech production, and of course the ‘motor theory of speech perception’ affirms their role in perception of phonemic contrasts of syllables and words. The physics (acoustics and physiology) of the relationship between the auditory perceptions and the articulatory perceptions is well understood.

However, people working (understandably) in a CogSci framework have (equally understandably) been unable to integrate the two sensory modalities within a CogSci conception of plans and execution. Katseff wanted to lay some groundwork for such integration (pp. 140ff of her dissertation) while specifically focusing on the puzzle disclosed by prior experimentation with disturbing formants in real time (work she cited), namely, why such disturbances to auditory perception were incompletely ‘compensated’ by changes in articulation.

To introduce disturbances to specific formants without appreciable delay to the feedback of speech sounds in headphones requires specialized computer chips. It would be possible to perform the Test with such equipment, as I proposed doing on CSGnet in 1991-1992, but (as I said) lacking that equipment, I haven’t done it.

However, for an audience of linguists and phonologists, it would be a waste of time to prove that formants and relationships of formants are controlled perceptions. It would only be new information for PCT researchers who don’t believe that anybody else identifies controlled variables by observing their reference values, disturbing them from those values and observing what subjects do to ‘compensate’ the disturbance.

The experimenter purposefully applies a disturbance to the CV. You say that purposefully applying a disturbance to a perceived variable is not control. We disagree.

Repeating Bill’s brief description:

Phonemic contrast is a categorial perception. Bill proposed that a perceptual input function for category perception receives input from a plurality of diverse sources, not all of which need be present for the category to be perceived. For category perception, then, it is not quite so simple. This statement works for a simple case such as the relationship between cursor and target:

BP:
The test is completed by showing that preventing the other system from perceiving the variable destroys control, and that the reason for the small effect of the disturbance is opposition by an action of the controlling system.
For categorial perception, it is necessary to identify all the sources of input. For speech, we have auditory and articulatory perceptions. We have auditory perception through at least two channels, atmospheric and bone conduction. For articulation of vowels we have proprioception of the tongue muscle, jaw, and lips, and we have tactile perceptions where the tongue may touch the teeth, gingiva, and palate (or not), and where the corners of the lips approximate. With so many sources of input, various subsets of which suffice to produce the category perception, the project of blocking sensory input is scarcely feasible. And to attempt to pinpoint *the* controlled variable among them is a fool's errand.
BP:
We apply disturbances to something in the environment, with an expectation of how the disturbance would change it if there were no control.
Is that expectation a reference value in some control loop? If not, what is an expectation? And specifically, what is the PCT account of this particular expectation?

Bill suggested that an expectation is a reference level. Here’s that quote again:

My understanding is that an expectation of a perception is our experience of a reference level for that perception. Whether we experience it as an expectation, a fear, or a purpose (among perhaps other possibilities) depends upon additional context that we could talk about in another place. And as you have pointed out, we can have a reference level (presumably a reference signal) for a perception without immediate means of controlling that perception.

When we perceive what we call consternation we may be seeing outputs of systems attempting to regain control, like the twisting of limbs and body that we see in a free-falling animal. Those seemingly futile agitations may actually reorient the body so as to land in a safer manner, and of course cats at least do this to land on their feet.

But setting that aside and staying on topic, this suggests a form of the Test that could identify reference values for a perceptual variable even when for some reason the subject does not act to resist a disturbance to it. It could be lack of output function. It could be lack of sufficient output capacity, as recognized by a higher system which therefore in some way ‘turns off’ the reference signal that would call for that output. It could be that a higher-level system does so because of perceiving some other contingency. Modeling how a living control system refrains from doing something should certainly be considered in another place.

The subjects were able to hear the word they intended to say. The sound of the vowel was in the intersection of the acceptable ranges for one pair of vowels including the intended one, and the ‘feel’ of pronouncing it was in the intersection of the acceptable ranges for the other pair of vowels including the intended one. Perceptual control was compromised in both modalities, in opposite directions, with the two control systems in effect splitting the difference between them.

You say the results were not clear. Katseff’s results were clear. She already knew that the speaker would not completely resist the disturbance. This was a prior finding in other labs, which she was replicating. So don’t think that those were her results. Her result, her conclusion, was that the reason for incomplete ‘compensation’ was indeed, as you say, “because there was some other variable [or rather, variables] being controlled” at the same time as the relationships of formants. She referred to the relationship between the auditory and the articulatory as “interference”. This relationship can be seen by going up to higher levels of control. That’s what the diagram is for.

Some off-topic material is in the Science|Language category. There’s other off-topic stuff here that I should have put there as well, but like I said, I would have if I’d had more time.