[Bruce Nevin 2018-06-10_18:41:07 ET]
I see now that Katseff refers to this proposal about predictive control
Houde, John F. & Srikantan S. Nagarajan. 2011. Speech production as state feedback control,
https://www.frontiersin.org/articles/10.3389/fnhum.2011.00082/full
I have only glanced at it enough to see the argument that delay in processing the acoustic signal makes it unsuitable for ‘direct’ negative feedback control.Â
That is actually not a problem for the model that I have proposed, in which acoustic feedback only sets the references for the perceptual variables that are rapidly controlled in speaking.Â
···
Taken together, such phenomena [including those studied e.g. by Katseff & Houde] reveal a complex role for feedback in the control of speaking a role not easily modeled as simple feeddback control. Beyond this, however, there are also more basic difficulties with modeling the control of speech as being based on sensory feedback. In biological systems, sensory feedback is noisy, due to environment noise and the stochastic firing properties of neurons (Kandel et al., 2000). Furthermore, when considering the role of the CNS in particular, an even more significant problem is that sensory feedback is delayed. There are several obvious reasons why sensory feedback to the CNS is delayed [e.g., by axon transmission times and synaptic delays (Kandel et al., 2000)], but a less obvious reason involves the time needed to process raw sensory feedback into features useful in controlling speech. For example, in the auditory domain, there are several key features of the acoustic speech waveform that are important for discriminating between speech utterances. For some of these features, like pitch, spectral envelope, and formant frequencies, signal processing theory dictates that the accuracy in which the features are estimated from the speech waveform depends on the duration of the time window used to calculate them (Parsons, 1987). In practice, this means such features are estimated from the acoustic waveform using sliding time windows with lengths on the order of 30100 ms in duration. Such integration-window-basedd feature estimation methods are slow to respond to changes in the speech waveform, and thus effectively will introduce additional delays in the detection of such changes. Consistent with this theoretical account, studies show that response latencies of auditory areas to changes in higher-level auditory features can range from 30 ms to over 100 ms (Heil, 2003; Cheung et al., 2005; Godey et al., 2005). A particularly relevant example is the long (~100 ms) response latency of neurons in a recently discovered area of pitch-sensitive neurons in auditory cortex (Bendor and Wang, 2005). As a result, while auditory responses can be seen within 1015 ms of a sound at the ear (Heil and Irvine, 1996; Lakatos et al., 2005), there are important reasons to suppose that the features needed for controlling speech are not available to the CNS until a significant time (~30100 ms) after thhey are peripherally present. This is a problem for feedback control models, because direct feedback control based on delayed feedback is inherently unstable, particularly for fast movements (Franklin et al., 1991).