Describing Controlled Variables

Thank you Bruce for such a detailed and insightful overview.

Employment status seems like a binary variable, employed or not. Maybe a better cv is work satisfaction or expected work satisfaction, and the employment status can change depending on the error.

work satisf. = w1 * salary + w2*free_time + w3 * work_relationships …
(individual-specific weights)

If the cv is always a sum of the disturbances and own efforts, maybe ‘work satisfaction control’ can be modelled with a system that tries to counteract the actions of the employer by own actions. Some of the terms come from just the employer (salary), and others come from just the employee (effort), and some may be combined.

So, let’s say the employer lowers the salary, or doesn’t give the expected raise. This might result in direct protests for the raise; or in people slacking off. Or maybe the employer does not give a raise, but gives public honors to some people, or they get some kind of promotion, and they don’t protest. Maybe if they do get a raise, they also feel obliged to work longer hours. I don’t know, I’m talking out of my ass here. There is a vast literature on work satisfaction, might be worth looking into.

I think that the fundamental data of PCT – the data needed to test the model – are the controlled variables themselves. The measures of environmental variables you are talking about must be the disturbances used when testing for controlled variables. These measures (which are themselves perceptions) are either explicitly or implicitly included in reports of the results of PCT research. But the main result that is relevant to a PCT understanding of the behavior is the controlled variable, which is the observed variable – presumed to be a function of environmental variables – that is found to be protected from these disturbances; it is the variable that is being controlled.

Yes. Research aimed at identifying the variables around which behavior is organized distinguishes PCT-based research from all other behavioral research. I specifically referred to non-PCT control theory approaches to behavioral research since, like PCT, they are based on control theory but, unlike PCT, don’t understand that behavior itself is a control process organized around the control of a hierarchy of many different types of controlled variables.

I’ll take a look at the book but maybe you could give me an example of a mathematical (maybe a better way is to call it a “formal”) representation of a higher level variable controlled when speaking.

Could you give an example of a set theory or linear algebra-based representation of some relatively complex linguistic controlled variable?

“Quantify” is probably the wrong word. What I want to do is figure out a way to formally describe controlled variables – the different variables that have been found (in various research studies) to be those that are being controlled when organisms are carrying out various behaviors. The variables should be described in a way that allows them to be classified into different types using some formal methodology, such as cluster analyses.

Formal will do. If you can put aspects of language into mathematical structures (as per the title of the Zellig Harris tract that you sent) then that is “quantitative” enough for me.

That may be, but the fundamental data of a PCT understanding of language is controlled variables.

The rest of your post isn’t really relevant to my interests, but thanks for sharing.

I don’t think there is anything wrong with a possible controlled variable being binary; indeed, the highest level variables in my three level spreadsheet hierarchy model are binary (logical variables that can be only true or false). But I agree that it’s a little difficult to imagine how this would be implemented in a nervous system with what are essentially continuously varying (and somewhat noisy) neural firing rates.

But I like your proposed continuously variable alternative:

work satisf. = w1 * salary + w2*free_time + w3 * work_relationships …

This does illustrate one of the points I was making about defining controlled variables; it can be done in terms of lower order perceptual variables. In this case, two of those lower order variables – salary and free time – can be defined pretty precisely. The variable work_relationships presents a bit more of a problem, although the verbal definition certainly suggests its possible type (relationship). Maybe there are verbal ways to name controlled variables that would be a reasonable basis for typing them.

I think these are very good suggestions about how work satisfaction might be defined and controlled. They certainly suggest ways to test whether work satisfaction, as so defined, is indeed a controlled variable. And it might even be possible to find existing research in that vast literature on work satisfaction that could be a basis for model based tests to determine whether work satisfaction, as defined, is, indeed, a controlled variable. And could, then be entered into the database of controlled variables that Bill suggested building.

But, then, how do we describe work satisfaction in the database in a way that would make it possible to determine what type of variable it might be?

Anyway, great suggestions. Thanks

In analog computing the structure that does this is called a flip-flop. The connections of the electronic circuit have a straightforward analogs with inhibitory and excitatory synapses. It has been extensively discussed here since the first decade of CSGnet. A good explanation is here, including properties of combinations of interconnected flip-flops. (Caveat: understanding of these elaborations has advanced greatly since 1997.)

The basic flip-flop is the means by which continuously variable streams of electrons through conductive metal wires become the discrete zeroes and ones of binary computer code and digital computing. It transforms the analog into the digital. It is the fundamental unit of memory store in a digital computer. It enables computation of a next state of the stored variable based upon not just the current input but also upon all prior states of that variable — in automata theory, the mechanism of sequential logic as distinct from combinational logic, which lacks memory and is dependent only on current input. (These are also distinct from the informal logic that is familiar as a human thinking process, or its more disciplined form as propositional logic.) Analog computers thus can do digital computation while retaining computation with continuously varying values as well.

As to ‘work satisfaction’ in my experience the employer’s satisfaction with the work is more important in a ‘buyer’s market’ and the employee’s satisfaction is more important in a ‘seller’s market’ (in the labor market, where the employee’s ability to perform is the ‘commodity’). Obviously, these are different complex perceptions, with different sets of lower-level perceptual inputs for the employer and for the employee. The crux is that the employee can quit or be fired, depending.

Good point. And there’s a nice description in B:CP of how this might be implemented with neural currents in the nervous system. It’s in Figure 3.10:

But this is kind of getting off topic – at least the one I am interested in – which is how to describe controlled variables so that these descriptions can be used as the basis for determining whether different controlled variables fall into classes – types – that are like those hypothesized in B:CP.

This seems like one of the most, if not the most, important thing to test about PCT – and it seems to have been what Powers thought people should be doing to test the model, but little or no work has been done along these lines. So maybe it’s not that important. If not, I’d like to know why not. But if it seems important to some others here then I’d appreciate any thoughts you have about how to determine whether the controlled variables found via testing are of the types predicted by the PCT model.

Best, Rick

I don’t know how general can we go and still have the math apply to all potential controlled variables. For now, in the cases I’ve studied, it seems we can say that the controlled variable is going to be a sum of the effect of behavior and the effect of the disturbance. A mathematical description of a controlled variable should start with defining how a specific measure of behavior of the participant can affect the controlled variable, and how a specific measure of experimenter’s behavior can affect the controlled variable.

As example, here is a figure from a paper I’m writing:

These are some possible different geometric and kinematic variables that can be visually perceived by a participant following a target that is moving along an elliptic trajectory. Some of them can be discarded from the start - the Euclidean distance (A) is always positive, there is no way of using it in a simple proportional-output control loop. The other ones all need to be defined as:
qi = F(target position) - F(cursor position) = qd - qf

Conceptually, qd and qf are not perceptual variables from a lower level, they are environmental variables defined by the experimenter in order to make the estimate of the controlled variable. They are not perceptual variables because only the controlled variable has a potential neural correlate (perceptual signal), while the other two, qd and qf, don’t.

The function F defines the effect of the cursor or target on the qi, and in this case is the same function. For example, in (C), F is the unwrapped angle from the cursor or target, to the center of the ellipse, and to the x axis. It is important that it is unwrapped (it doesn’t reset at 2 pi), because then the difference between qd and qf will show the angular difference between the cursor and the target. Another example (H) the amplitude of the target can be measured independently of the amplitude of the cursor, and their difference is the cv; or (G), the arc-length should be measured from the same point, so that the difference of the arc-length of the cursor and the target gives a useful measure of cv.

When the variables are defined like that, they are always going to have the same units, so there is always a way to find relative stability between qd and qi, by comparing their variances. There is also always a correlation between qd and qi.

Ideally (when the cv is a good approximation of the one used by the participant), fitting a model to the qi, qd and qf defined as above would return consistent delays.
To answer your question - the way to determine if the variables fit the types proposed by BCP is to first make sure we found good approximations of controlled variables, then see if the delays of variables proposed to be “higher” also have larger delays, and the ones proposed to be “lower” have lower delays. The same types should have the same delays, and hopefully be conceptually similar.

I’m basing this on Bill’s Spadeworks paper mostly. It seems it might apply to higher-order variables, but hard to say without experiments.

Since controlled perceptions of contrasts are controlled perceptions they are not only the fundamental data of linguistics they are also among the fundamental data of PCT. What exactly are you denying here?

I have laid out examples in many places. A relatively early example is in the Festschrift for Bill. How the simpler aspects are reached—contrastive phonemic segments and/or features, syllables and prosody, morphemes, morpheme classes, words, word classes, constructions of words and morphemes specified in terms of form classes—is amply shown in a century of literature on theory and practice of descriptive linguistics as it has developed, not including the neo-phrenology and ‘innate language organ’ hand-waving by some philosophers. Mappings among sets of sentences and phrases (constructions) are reached by a formal methodology testing controlled perceptions such as relative acceptability, or acceptability in like contexts, but that is because of the notorious instability and limited accessibility of judgments of meaning (mapping of sentences and phrases to non-language perceptions to which they ‘refer’, which they ‘denote’, ‘connote’, etc.) and of paraphrase (a judgement that such a ‘semantic’ relation for one utterance is the same as that for another). And all of this is presented in compact form in Mathematical structures of language, in A theory of language and information: A mathematical approach, and in many other writings such as those on this site. But none of this is of any interest to you, I take you at your word:

If it doesn’t interest you, then don’t pretend to ask. These waters are not for dipping thimbles.

These are collectively controlled variables. If you still believe that collective control only results from conflict, that could be an impediment. Without collective control all one has left for an account of language is something like Chomsky’s innate ideas and Pinker’s ‘language instinct’.

I am denying that “contrasts” are the only perceptual variables controlled in language. Indeed, I’m not convinced that what linguists call “contrasts” are even controlled variables.

I wish you could just post, here in discourse, one example of a formal description of a linguistic controlled variable.

I’m interested in seeing if there is a formal way of describing controlled variables. Whether these variables are “collectively controlled” (whatever that means) or not has nothing to do with my interest in developing a formal way of describing controlled variables that can be used as the basis for seeing if those variables can be sorted into types that correspond to those proposed in B:CP.

Yes, this is one way to classify controlled variables. I’ve developed a couple demos (here and here) to show how this might be done. The reaction time in my demos really only classifies the variables in terms of relative level in a presumed hierarchy. I doubt that reaction time (delays) can be used to classify controlled variables in terms of type, though.

Well, the definition of the variable gives you the type. A position, an amplitude, a category, a sequence, etc.

I don’t see other possibilities for classification - collect a bunch of good approximations of controlled variables, classify in terms of relative delays and see if the types of variables you assumed are on the same level (based on BCP) have the same delays.

Yes, I think that’s right. And hopefully the delays for all examples of a particular type would be about the same.

Good. That would be a preposterous claim. I don’t know anyone who makes it.

That’s true, you aren’t. In its strictest form, the particular application of the Test is called the ‘pair test’, substituting one segment of sound/articulation for another in a short utterance and observing whether the difference disturbed a native speaker’s perception of hearing a repetition of the same utterance vs. hearing two different utterances. You were present at a demonstration that some differences that make a difference in English don’t make a difference in Swedish and vice versa. That demonstration hinged on substitution of one segment of sound/articulation for another in a short utterance. But to convince you is neither my business nor my need.


The history of attempts to generalize the methods and results of linguistics, ‘phonemes of culture’ and the like, strongly suggests that other social phenomena are not nearly as tightly structured as language. Indeed, language is instrumental in how people cooperate and conflict in perceiving and controlling such phenomena socially.

Here is an algebraic expression:

N t V

The algebraic variables N and V are form-class labels representing two sets of words, familiarly named “nouns” and “verbs”, respectively, and t is a smaller set of morphemes (a few words and a few affixes on V) that we interpret as tense and aspect. As each term represents a set of words, the formula represents a set of sentences, and the sequence of form classes is a sentence-form representing a set of sentences that all have the same form. A plate broke is an example. (The ‘indefinite article’ a is an automatic complement of the subclass of N called ‘count nouns’, a detail that need not detain us here.)

N t be V-en is another sentence-form stipulating the same form-classes, with the addition of two algebraic constants, be and -en. A plate was broken is an example.

Users of English perceive differences in the acceptability of these sentences as sentences, or differences in the contexts in which they are acceptable. A fish swam, my head swam, and vacuum swam differ in these ways, as do A plate broke, dawn broke, and vacuum broke.

When the satisfiers of the sentence-form N t V and the satisfiers of the sentence-form N t be V-en are ranked as to their perceived acceptabilities, or contextual constraints on acceptability (there are various ways to do this), a perceived difference between two satisfiers of one is not reversed between the corresponding (same word-choices) satisfiers of the other. Thus, A plate was broken, dawn was broken, and vacuum was broken. (The lower acceptability of dawn was broken discloses its basis in metaphor. The vacuum was broken has normal acceptability in suitable context. An account of what is involved in metaphor and in the ‘definite article’ would require you to learn things that do not interest you.)

Such perceptions of ‘normalcy’, or of relative acceptability, or of context that is required for an utterance to be an acceptable sentence, when tested in this way, provide a criterion for a mapping from one set of sentences to the other, an operation in linear algebra. A mapping (also called a transformation) is a correspondence between the correlated members of two sets preserving some property.

Careful investigation discloses between N t V and N t be V-en an intervening sentence-form N is in the state of one Ving it. (Independently, but not coincidentally, investigation of the history of English discloses that the en suffix was at an earlier stage a noun roughly translated ‘state’ or ‘condition’.) In general, the set of transformations takes the shape of a network of elementary sentence-differences. The significance and ramifications of that finding are out of scope here.

Nothing here but perceptions, and nothing of stimulus-response, conditioning, or linear causation in the investigation of them. The methodology of disturbance by substitution tests identifies controlled variables, beginning with the fundamental data of contrast, controlled perceptions which establish discrete, socially pre-set elements in the continua of speech as conventional means of identifying what words were spoken despite myriad disturbances. The application of representations and tools of set theory and linear algebra enables the investigation of the enormous and superficially disorderly body of language data to be orderly and systematic, and provides ways of representing and communicating findings.

Many investigators have done much to identify socially conventionalized gestures and postures. (Birdwhistell, Goffman and others come to mind, and the work on micro-expressions.) To control these consciously in order to employ the methodology of substitution tests would require the disciplines and practices of an actor.

Alas, I have not been able to identify a monograph that I read in the early 1970s, in which students in (as I recall) a Speech Department (?) gained proficient control of aspects of speech production such as nasality, degree of pitch variation and amplitude variation, breathiness, something they called ‘orotundity’, etc. They made a series of recordings (male and female voices) reading a neutral passage called ‘the Rainbow Trail’ (referring to the Grand Canyon), substituting different values of these parameters in different combinations. Listeners were then asked to assess the character and personality traits of what they perceived as different men and women reading that passage. There was close agreement in their assessments.

That is an example of applying the methods developed in linguistics to other social phenomena. However, it does not extend to establishing elements as combinations of simpler elements (morphemes as combinations of phonemes), form-classes of elements, regular combinations of form-classes, mappings among such combinations. It is possible that some existing research into non-language social phenomena can be restated in such terms. Maybe someone has tried. I don’t know.

What variable do you think this shows to be controlled?

But this demonstration says nothing about controlling contrasts. It just shows that, when a person is asked to control for correctly saying whether two speech sounds are the same or different, speakers of different languages often base their responses on different features of the same acoustical waveforms. This is not a Test to determine the variables people are controlling when using language. You have to do the Test quite differently to see what the acoustical basis of these same/different decisions might be.

Of course it is!

All this was very interesting but I don’t see what it has to do with determining the types of controlled variables used in language. But that’s OK; Adam already answered my question. There is no need for formal descriptions of controlled variables; just plain old verbal descriptions will do. And these can be checked by seeing whether the variables we call the same thing are actually the same type by looking at speed of corrective response to a disturbance.

Best, Rick

Nope. It’s a judgement whether one utterance is a repetition of the other. The ‘acoustical waveforms’ can and do vary greatly, as in e.g. want to and wanna vs. wand two and Wanda (which are also variable), or nitrate vs. night rate.

These questions suggest that your attention is on the controlled phonetic variables which are the means of controlling the different kind of controlled perception to which I wish to draw your attention. For almost a century, that variable has been called contrast by those who have studied the matter. The quarrels among them have been about how to represent it segmentally (alphabetically) or by clusters of concurrent features, what those features are, how to represent their durations not coinciding neatly with alphabetic boundaries, and other such problems. These are problems of description, that is, how to specify the higher-level CVs, which are discrete, in terms of the lower, which are continuous. The variables of phonetics are continua, with distinct extremes which may be construed as centers, transitions which may be construed as boundaries, and other ‘cues’. The passage about the pair test that I quoted describes how the fundamental discrete controlled perceptual variables are established. In a symbolic representation of those discrete variables, it is easy to assume that e.g. ‘the phoneme’ t is a discrete entity with specific physically measurable content (the phonetic variables); but t is a representation of a contrast between what is occurring at that location and all the other possible occurrences at that location (within utterances in the language): t vs. p, t vs. n, and so on. The physically measurable phonetic ‘content’ is quite variable, so long as it suffices in the given context. In night rate instead of what one might have supposed obligatory to produce t, closure of the oral cavity by the tongue against the gingival ridge with a consequent pause in voicing, what often occurs is a pause in voicing by closure of the vocal folds with no movement of the tongue toward the gingival ridge. This alternative is not available for nitrate. Research into higher-level linguistic variables, event perceptions made of series of phonemes, the reduction of t to glottal stop can be described as a property of word boundary, one of various ‘cues’ for perceiving where one word ends and another begins.

When linguists use words like ‘cue’ it does not label them as behaviorists. All of this is explicitly a search for controlled perceptual variables, though of course the particular terminology of PCT can’t be expected in the literature of linguistics.

On occasion the words ‘stimulus’ and ‘response’ may occur in discussions of phonetic and phonemic (contrast) perception. In my experience such usage is uncommon and incidental, and when it does occur it is about recognition. When perceptual input is sufficient for a perceptual input function for perception π to generate a perceptual signal π, that perception-recognizing process can quite properly be called the response of that perceptual input function to the perceptual input. Such occasional phraseology does not deny the place of that input function within a loop and within the hierarchy. We know there is a linear cause-effect relation between any successive pair of functions in the path around a control loop. Perceptual recognition is a cause-effect stimulus-response process. To communicate to an investigator that perception π has been recognized, however, requires closing the loop and controlling π.

That’s what I said. The subject is asked to judge (and indicate by saying) whether the two utterances are the “same” or “different”. Presumably the subject is controlling for making the correct judgment, which means saying the right thing.

No, I would just like to know what controlled variable you think is revealed by this test. I don’t think this “pair test” can be used to tell what variables are controlled in speech. But it could be used to tell the variables controlled when people make judgements of whether or not there is a difference between speech sounds.

Yes, “contrast” is a reasonable description of the variable that must be controlled in the “pair test”. But a contrast between what? Linguists see the contrasts that are the basis of the same/difference judgements as differences in how the speech samples are produced: voiced vs unvoiced, labial versus non-labial, etc. But these contrasts can’t be perceived in the pair test so the basis of these judgements must be an acoustical contrast. There are probably studies that have tried to determine what this is for different speech sounds. Such studies would tell us the acoustical basis of same/different judgements but it wouldn’t necessarily tell us what variables are actually controlled when people speak.

Yes, indeed. Which is why it has been so difficult to develop speech recognition systems. But those systems have come a hell of a long way since I first met them in the 1970s.

I think linguists have made brilliant observations. But they certainly didn’t understand speech as control of perception. And the problem isn’t just a matter of terminology. So linguists haven’t really done any testing or modeling that is of much use to PCT.

But there are examples of research done by linguists – psycholinguists, I guess – that are much closer to being tests for the controlled variables than is the pair test. One example is that paper you sent to me where the researchers put what was basically an adjustable filter into the feedback function between vocal output and heard input so that the speaker would have to adjust their output to compensate for the disturbance created by the filter if, in fact, what was being disturbed by variation in the filter setting was an aspect of the acoustical signal that was being controlled. There were problems with their methodology – it wasn’t really a proper test for the controlled variable – but it was definitely on the right track.

And Labov’s data describing the geographical differences in the distribution of the pronunciation of different diphthongs provided a good basis for PCT based modeling. So there is some useful data in the linguistics research literature, but there could be much more if some of those researchers were willing to learn PCT.

Best, Rick

Nope. What you said was “a person is asked to control for correctly saying whether two speech sounds are the same or different”. An utterance is made of plural speech sounds, with few exceptions (mmm, ah, sss, and the like) which are inconsequential to this important distinction. ‘Speech sounds’ refers to the phonetic level of language structure and ‘contrast’ refers to the higher phonemic level of language structure.

The word ‘judgement’ is ambiguous. The perception can be called a judgement and the statement can be called a judgement. Better then to say the subject is asked to perceive (and indicate by saying) whether the two utterances are the “same” or “different”. This is a perception of contrast. If they do not contrast they are repetitions of the same utterance (despite ineffective differences); if they contrast they are different utterances. The contrasts can be localized to one or more points at corresponding places within the utterances. If they are a ‘minimal pair’ there is only one point of contrast in the continua of speech, and that is the ideal situation sought for the pair test.

The usual methodology in the field is less formal. The speaker says yét aaq̓o (the name of Mt. Shasta). The linguist asks “is that aak̓o or aaq̓o?”, exaggerating the fricative release. (This is done by forcing air past the oral closure by raising the larynx while the vocal folds are still closed for the laryngealized stop, which increases air pressure behind the oral closure. The release is normally lenis or inaudible.). The pitch of the friction sound is higher for and lower for . In this example, one utterance is a word and the other is a combination of nonsense syllables that could be a word in the language but isn’t.

Variables at different levels are controlled in speech. The acoustic perceptions are probably what you are thinking are exclusively “the variables that are controlled in speech”. Articulatory variables have traditionally been represented by phoneticians in terms of places of greatest occlusion of airflow through the oral cavity. For PCT, this is a surrogate for tactile and kinesthetic perceptions that the speaker and listener (in imagination) are controlling. I have described elsewhere the relation between these two modalities, error in auditory feedback being corrigible only by adjusting the references for articulatory control over successive repetitions. You referred to the experimental work that I have brought to your attention.

That is in fact what I have told you it is used for. Correction: Such substitution tests are used used to locate the contrasts between utterances, and to identify phonetic features occurring at the locations of contrast. The pair test is an idealization of substitution tests (like aak̓o vs. aaq̓o) at a level of language above phonetics, the level of phonemic contrast. These substitution tests identify what we call phonemic contrasts. For these we can devise symbols called phonemes, a convenient alphabetic way of representing speech. Despite appearances and our alphabetic training and prejudices the and etc. are not ‘things’. The indicates a position (specified relative to other phonemes in the written sequence) at which , k, q, p, , t, , m, and all the other possible phonemes are not occurring. The perceptual signal constructed by the input function for is strong, and the signals from the input functions for all the alternative phonemes are weaker to varying degrees. The phonetic means are variable by which is differentiated from each of these others, differing most obviously between any pair. The phonetic ‘differences that make a difference’ between and are not the same as the differentia of k and , and , and so on. And no single phonetic feature or bundle of features is always present when a contrast of with all other possible phonemes is perceived. It is not possible to guarantee for each phoneme that some particular measurable phonetic perceptions always characterize it in every context of its occurrence or even in every production of the same utterance. Contrasts between utterances cannot be built up from phonetic differences between utterances. “Today’s speech recognition systems use powerful and complicated statistical modeling systems.” In this way, they take context into account.

Visual perceptions (lip-reading) also contribute to the phonemic perceptual input functions, and the McGurk effect shows that this visual input can result in hearing a different phoneme. In another way, if context supports an expectation of hearing or reading a given word the recipient may understand that word in that context rather than what was actually produced. Psychologists reversing the meaning of ‘behavior is the control of perception’ is an example that is familiar to you.

More generally, language has hierarchical structure, of which the phonetic and the phonemic are the lowest levels, but the levels in that structure are not levels in the perceptual hierarchy. With syllables and words we are already at the Event level. (An essential difference between syllables and words is that, except for their intersection in the set of monosyllabic words, syllables are nonsense utterances.)

Obviously, the elements of language which are so structured are perceived in the ordinary way within the perceptual hierarchy. The structuring itself is a human artifact, subject to certain universal constraints which are described in the works cited but otherwise maintained in the course of social use.

So I am correct in what I said because an utterance is a type of speech sound.

I think you are missing the point I am trying to make, which is that the “pair task” is, like all tasks, a control task. In the pair task a person is asked to control for the relationship between what they say (“same” or “different”) and whether the components of the pair are perceived as same and different. The person doing the task is asked to keep that perception in a reference state of “correct”.

A nice exercise would be to identify the disturbance(s), outputs and likely controlled variable in this task. Then I think you will get a better idea of how this “pairs task” relates to an actual test for controlled variables and what the results of this task can tell you about the variables controlled in speech.

Yes, and an even better way to say it is that the subject is asked to control for producing the correct relationship between what they say (“same” versus “different”) and what they perceive about the pairs (same versus different). I think it’s useful to try to see behavior – all behavior, including that in conventional psychological experiments – as control, which it always is!

Not at all. I am thinking of variables that range from the phonemic to the semantic to the pragmatic and beyond. I think some great hints about what these are and how to identify them are to be found in Pinker’s marvelous book The Sense of Style.

I’ll just end by saying that that only people I have ever seen who control for “contrasts” in speech are linguists. And they do it when they do the “pairs” and “substitution” tests.

Best, Rick

You are mistaken in several respects.

The pair test (not ‘task’) is indeed a test to identify controlled variables, and as always with the Test the investigator is also controlling perceptual variables in order to introduce disturbances to a posited CV, so I am not missing that point.

The pair test does not concern what the native speaker participant says. The participant can be completely silent (the yes/no indication of repetition can be nonverbal). The pair test requires the participant to control whether what they hear is the same utterance in the language as what they just previously heard.

The pair test does not concern whether “the components of the pair” are the same or different. The ‘components’ of utterance A can be quite different from those of utterance B, and B is still perceived as a repetition of A. If you go to translate.google.com, set up the left side for English with the input word shinbone, and the right side for Swedish, The word skenben is the output word. This is what it looks like:

Below the printed word skenben is an audio widget to hear its native pronunciation [šíənbíən]. Go ahead and listen to it on the translate.google.com page. As I demonstrated with Christine Forssell at one of our conferences, an American pronunciation as [šíínbíín] is a repetition of the same word for a speaker of Swedish, although the pronunciation is foreign. The difference between the long vowel ii and the diphthong íə is a phonetic difference that doesn’t make a phonemic difference in Swedish.

The setup is that the native speaker is hearing utterance A followed by similar utterance B. A disturbance is the substitution in B of a phonetically different feature of pronunciation. The output is the participant’s indication (verbal or otherwise) that B is or is not a repetition of A. We are looking for a difference that makes a difference, a disturbance to the perception of repetition (same word).

When we look for a phonetic difference that makes a phonemic difference, the data do not present a simple linear mapping from phonetic detail to phoneme. A variety of phonetic distinctions can suffice to make a given phonemic contrast. A given phoneme representing a point of contrast can be identified with a variety of phonetic features.

At a given point of contrast (represented by a letter or ‘phoneme’), one phonetic feature distinguishes that point from one phoneme or set of phonemes, and a different phonetic feature distinguishes it from another possible phoneme or set of phonemes. The contrast between b and p is ideally associated with the length of delay (the prolongation of silence) before onset of voicing after that segment; the contrast between b and m is in the presence of a perhaps random noise perceived as nasality as voicing continues throughout; the contrast between b and d and between m and n is in the transition of formant frequencies from whatever was the preceding vowel and to whatever vowel follows; and so forth. But these cues, too, may be attenuated, lost, or overridden depending on context and on other variables the speaker may be controlling. That is why speech recognition systems have to resort to statistical processing that takes context into account. There may be structural limitations on the full generality of these features or parameters of contrast, as e.g. the neutralization of the voicing contrast in b/p etc. after s. There is no contrast between spoon and sboon, and the spoonerism on ‘spaghetti’ is ‘skabetti’, not ‘sgapetti’.

A different kind of ‘task’ would be to ask whether pronunciation A of a given utterance in the language is the same as pronunciation B of the same utterance (a repetition of it). Discriminating whether the phonetic ‘components’ are the same or different is completely different from the pair test because when you are discriminating phonetic differences it is a given that utterance B is a repetition of utterance A. Either it is a repetition, or the subject asks “why are you asking me if they’re pronounced differently? Of course they are, they’re different words.” (Yes, yes, we can provide for homophones.) In this kind of task, you can’t ask whether or not it is a repetition, you can only talk about an imitation which is more or less accurate phonetically. In this kind of phonetic discrimination task it is fruitless to ask whether A is “the same as” B because two pronunciations of the same word are never phonetically identical. There are just too many physical variables, and perception at that level of detail is imperfect.

But in the pair test the subject ignores phonetic differences that don’t make a phonemic difference. The question is not how good is the imitation, it is whether or not it is a repetition of the same word or words. (Or a repetition of the same nonsense syllable or syllables that phonologically could be a word in the language but happens not to be.)

The repetition can be in a different dialect of the language, or in a different register (e.g. degree of formality), such that the phonetic data are quite different, but the recipient still recognizes utterance B as a repetition of utterance A. So the American visitor talks about a bruise on her “sheenbeen” [šíínbíín] and her Swedish host says yes, I heard you hit your skenben [šíənbíən] on my coffee table when you went for a drink of water in the dark. In Swedish, that phonetic distinction does not make a phonemic contrast. It’s a difference that doesn’t make a difference in Swedish, but it does make a difference in English, e.g. in the contrast between Korean and careen, Achean and achene, and so on.

To use the words of a language for various purposes the speaker and hearer control the phonemic contrasts between words by varying the references for controlling a variety of phonetic distinctions. It is a commonplace of PCT to control a perception by varying the references for a number of other perceptions.

The issues have been deeply studied for more than a century in ways that are quite compatible with PCT, and there is a huge body of data. It’s perfectly understandable that you are unfamiliar with this, it’s not your field.

The test for the controlled variable begins with a hypothesis about the variable the agent is controlling when performing a particular task. What is the hypothesis about the variable being controlled in the pair test?

I agree. As I understand it, all the pair test is concerned with is how well people can correctly identify the pairs as the same or different.

OK, so the hypothesized controlled variable is that sameness judgements are based on phonetic features of pronunciation in utterance A and B.

Of course, there are a variety of phonetic (acoustic) patterns that are heard as the same phoneme. So there will be many different phonetic versions of, say, /l/ that distinguish it from the many different phonetic versions of /r/, if this phonemic difference is made in a language (as it is not in Japanese). But I don’t understand why you are always talking about contrasts. When I hear the words lane and rain, for example, I don’t hear a “contrast” between /l/ and /r/. When I hear the word lane I hear it starting with the /l/ sound; not with a sound that contrasts with the /r/ sound. So I don’t understand what this “contrast” thing is that you think is an important aspect of language understanding.

These are all features of the acoustical signals that are the basis for identifying the different phonemes. They provide the basis for discriminating (contrasting) one phoneme from another but that’s just because we can identify each on based on it’s acoustical features. With all the cool technology that is now available for varying the acoustical signal in real time it should be possible to do a proper version of the test to get a more detailed and accurate picture of the acoustical variables that are controlled when speaking.

Yes, but it’s also a commonplace of PCT that we can only control what we can perceive. And when I am speaking I can only perceive the phonemes I am producing (controlling); I can’t perceive the phonemes that are presumed to “contrast” with those phonemes.

I think linguists have made some great observations. But I don’t see their explanations of these observations as being in any way compatible with PCT, the motor theory of speech perception being a particularly clear example of this incompatibility.

Best, Rick