responding to claims about delay

[Bruce Nevin 2018-06-10_18:41:07 ET]

I see now that Katseff refers to this proposal about predictive control

Houde, John F. & Srikantan S. Nagarajan. 2011. Speech production as state feedback control,

https://www.frontiersin.org/articles/10.3389/fnhum.2011.00082/full

I have only glanced at it enough to see the argument that delay in processing the acoustic signal makes it unsuitable for ‘direct’ negative feedback control.Â

That is actually not a problem for the model that I have proposed, in which acoustic feedback only sets the references for the perceptual variables that are rapidly controlled in speaking.Â

···

Taken together, such phenomena [including those studied e.g. by Katseff & Houde] reveal a complex role for feedback in the control of speaking – a role not easily modeled as simple feeddback control. Beyond this, however, there are also more basic difficulties with modeling the control of speech as being based on sensory feedback. In biological systems, sensory feedback is noisy, due to environment noise and the stochastic firing properties of neurons (Kandel et al., 2000). Furthermore, when considering the role of the CNS in particular, an even more significant problem is that sensory feedback is delayed. There are several obvious reasons why sensory feedback to the CNS is delayed [e.g., by axon transmission times and synaptic delays (Kandel et al., 2000)], but a less obvious reason involves the time needed to process raw sensory feedback into features useful in controlling speech. For example, in the auditory domain, there are several key features of the acoustic speech waveform that are important for discriminating between speech utterances. For some of these features, like pitch, spectral envelope, and formant frequencies, signal processing theory dictates that the accuracy in which the features are estimated from the speech waveform depends on the duration of the time window used to calculate them (Parsons, 1987). In practice, this means such features are estimated from the acoustic waveform using sliding time windows with lengths on the order of 30–100 ms in duration. Such integration-window-basedd feature estimation methods are slow to respond to changes in the speech waveform, and thus effectively will introduce additional delays in the detection of such changes. Consistent with this theoretical account, studies show that response latencies of auditory areas to changes in higher-level auditory features can range from 30 ms to over 100 ms (Heil, 2003; Cheung et al., 2005; Godey et al., 2005). A particularly relevant example is the long (~100 ms) response latency of neurons in a recently discovered area of pitch-sensitive neurons in auditory cortex (Bendor and Wang, 2005). As a result, while auditory responses can be seen within 10–15 ms of a sound at the ear (Heil and Irvine, 1996; Lakatos et al., 2005), there are important reasons to suppose that the features needed for controlling speech are not available to the CNS until a significant time (~30–100 ms) after thhey are peripherally present. This is a problem for feedback control models, because direct feedback control based on delayed feedback is inherently unstable, particularly for fast movements (Franklin et al., 1991).

[Martin Taylor 2018.06.10.19.36]

          [Bruce

Nevin 2018-06-10_18:41:07 ET]

          I

see now that Katseff refers to this proposal about
predictive control

          Houde,

John F. & Srikantan S. Nagarajan. 2011. Speech
production as state feedback control,

https://www.frontiersin.org/articles/10.3389/fnhum.2011.00082/full

          I

have only glanced at it enough to see the argument that
delay in processing the acoustic signal makes it
unsuitable for ‘direct’ negative feedback control.

      I haven't looked at the article,

but I would look at a lot of work in speech perception and
auditory perception, for example at work by such as Louis Pols in
the 1970s (?) that demonstrated that both vowel and consonant
perceptions were based not on formant values,
but on formant traces over time (similar to sequence
control, but continuous rather than in discrete stages). But more
importantly, as everyone on CSGnet knows, the last sentence of the
abstract is wrong. Instead of “inherently unstable”, it should
read “has a tendency toward instability in the absence of
processes to correct for the phase shifts caused by delays,
especially at higher frequencies” or something similar.

          That

is actually not a problem for the model that I have
proposed, in which acoustic feedback only sets the
references for the perceptual variables that are rapidly
controlled in speaking.Â

            Taken together, such phenomena [including those

studied e.g. by Katseff & Houde] reveal a complex
role for feedback in the control of speaking – a roole
not easily modeled as simple feedback control. Beyond
this, however, there are also more basic difficulties
with modeling the control of speech as being based on
sensory feedback. In biological systems, sensory
feedback is noisy, due to environment noise and the
stochastic firing properties of neurons (Kandel et al.,
2000). Furthermore, when considering the role of the CNS
in particular, an even more significant problem is that
sensory feedback is delayed. There are several obvious
reasons why sensory feedback to the CNS is delayed
[e.g., by axon transmission times and synaptic delays
(Kandel et al., 2000)], but a less obvious reason
involves the time needed to process raw sensory feedback
into features useful in controlling speech. For example,
in the auditory domain, there are several key features
of the acoustic speech waveform that are important for
discriminating between speech utterances. For some of
these features, like pitch, spectral envelope, and
formant frequencies, signal processing theory dictates
that the accuracy in which the features are estimated
from the speech waveform depends on the duration of the
time window used to calculate them (Parsons, 1987). In
practice, this means such features are estimated from
the acoustic waveform using sliding time windows with
lengths on the order of 30–100 ms in duration. Such
integration-window-based feature estimation methods are
slow to respond to changes in the speech waveform, and
thus effectively will introduce additional delays in the
detection of such changes. Consistent with this
theoretical account, studies show that response
latencies of auditory areas to changes in higher-level
auditory features can range from 30 ms to over 100 ms
(Heil, 2003; Cheung et al., 2005; Godey et al., 2005). A
particularly relevant example is the long (~100 ms)
response latency of neurons in a recently discovered
area of pitch-sensitive neurons in auditory cortex
(Bendor and Wang, 2005). As a result, while auditory
responses can be seen within 10–15 ms of a sound att the
ear (Heil and Irvine, 1996; Lakatos et al., 2005), there
are important reasons to suppose that the features
needed for controlling speech are not available to the
CNS until a significant time (~30–100 ms) after theey are
peripherally present. This is a problem for feedback
control models, because direct feedback control based on
delayed feedback is inherently unstable, particularly
for fast movements (Franklin et al., 1991).

Martin

Martin. Bruce

···

From: Martin Taylor (mmt-csg@mmtaylor.net via csgnet Mailing List) csgnet@lists.illinois.edu
Sent: Monday, June 11, 2018 1:45 AM
To: csgnet@lists.illinois.edu
Subject: Re: responding to claims about delay

[Martin Taylor 2018.06.10.19.36]

[Bruce Nevin 2018-06-10_18:41:07 ET]

I see now that Katseff refers to this proposal about predictive control

Houde, John F. & Srikantan S. Nagarajan. 2011. Speech production as state feedback control,

https://www.frontiersin.org/articles/10.3389/fnhum.2011.00082/full

I have only glanced at it enough to see the argument that delay in processing the acoustic signal makes it unsuitable for ‘direct’ negative feedback control.

MT : I haven’t looked at the article, but I would look at a lot of work in speech perception and auditory perception, for example at work by such as Louis Pols in the 1970s (?) that demonstrated that both vowel and consonant perceptions were based not on formant values, but on formant traces over time (similar to sequence control, but continuous rather than in discrete stages). But more importantly, as everyone on CSGnet knows, the last sentence of the abstract is wrong. Instead of “inherently unstable”, it should read “has a tendency toward instability in the absence of processes to correct for the phase shifts caused by delays, especially at higher frequencies” or something similar.

HB : Not only last sentence of abstract is wrong from the point of PCT. I think that most of abstract is wrong. Totaly wrong approach. Behavioristic.

Houde, John F. & Srikantan S. Nagarajan. 2011. Speech production as state feedback control,

Sections of Abstract :

  1. “Spoken language exists because of a remarkable neural process. Inside a speaker’s brain, an intended message gives rise to neural signals activating the muscles of the vocal tract”.

Bill P (B:CP) :

All evidences points to the fact that within the nervous system the general course of events following stimulation is from the input devices, through the nervous system, to the output devices. This is born not only by tracing neural impulses inside nervous system, but by observing the time sequence of events. Stimulus always precedes respons, even if only by a small fraction of a second.

  1. Here, we review some of the key characteristics of speech motor control and what they say about the role of the CNS in the process.

Bill P B:CP) :

Great problem represent thinking that sensory events guide behavior and how close is their control. Nervous system is meant as transmitter of excitations from receptors to effectors (muscles, glands).

  1. SFC postulates that the CNS controls motor output by (1) estimating the current dynamic state of the thing (e.g., arm) being controlled, and (2) generating controls based on this estimated state.

HB : Arm being controlled ???

Bill P (B:CP) :

Both within and without behaviorism there is one fundamental concept that underline studies of behavior : a model of the nervous system and of its relationship to the external world…With or without direct contact environment can apparently act on the nervous system in such a way as to cause the muscles to tense, moving the organism

HB : I I can conclude from abstract of the article that it’s pure behavioristic analysis

Bill P (B:CP) :

The basic behavioristic model of how sensory events influence the brain to produce or change behavior remains.

Boris

That is actually not a problem for the model that I have proposed, in which acoustic feedback only sets the references for the perceptual variables that are rapidly controlled in speaking.

Taken together, such phenomena [including those studied e.g. by Katseff & Houde] reveal a complex role for feedback in the control of speaking – a role not easily modeled as simple feedback control. Beyond this, however, there are also more basic difficulties with modeling the control of speech as being based on sensory feedback. In biological systems, sensory feedback is noisy, due to environment noise and the stochastic firing properties of neurons (Kandel et al., 2000). Furthermore, when considering the role of the CNS in particular, an even more significant problem is that sensory feedback is delayed. There are several obvious reasons why sensory feedback to the CNS is delayed [e.g., by axon transmission times and synaptic delays (Kandel et al., 2000)], but a less obvious reason involves the time needed to process raw sensory feedback into features useful in controlling speech. For example, in the auditory domain, there are several key features of the acoustic speech waveform that are important for discriminating between speech utterances. For some of these features, like pitch, spectral envelope, and formant frequencies, signal processing theory dictates that the accuracy in which the features are estimated from the speech waveform depends on the duration of the time window used to calculate them (Parsons, 1987). In practice, this means such features are estimated from the acoustic waveform using sliding time windows with lengths on the order of 30–100 ms in duration. Such integration-window-based feature estimation methods are slow to respond to changes in the speech waveform, and thus effectively will introduce additional delays in the detection of such changes. Consistent with this theoretical account, studies show that response latencies of auditory areas to changes in higher-level auditory features can range from 30 ms to over 100 ms (Heil, 2003; Cheung et al., 2005; Godey et al., 2005). A particularly relevant example is the long (~100 ms) response latency of neurons in a recently discovered area of pitch-sensitive neurons in auditory cortex (Bendor and Wang, 2005). As a result, while auditory responses can be seen within 10–15 ms of a sound at the ear (Heil and Irvine, 1996; Lakatos et al., 2005), there are important reasons to suppose that the features needed for controlling speech are not available to the CNS until a significant time (~30–100 ms) after they are peripherally present. This is a problem for feedback control models, because direct feedback control based on delayed feedback is inherently unstable, particularly for fast movements (Franklin et al., 1991).

Martin

[Bruce Nevin 2018-06-14_08:15:52 ET]

Martin Taylor 2018.06.10.19.36 –

I would look at a lot of work in speech perception and auditory perception, for example at work by such as Louis Pols in the 1970s (?) that demonstrated that both vowel and consonant perceptions were based not on formant values, but on formant traces over time (similar to sequence control, but continuous rather than in discrete stages).

To elaborate your point, in normal speech in context speakers seldom fully reach the reference values in acoustic space and articulatory space which are evident with careful pronunciation of isolated words and syllables. There are several identified reasons for this, prominently e.g. coarticulation (controlling the kinesthetic and tactile perceptions that produce sound x conflicts with immediately after controlling the very different kinesthetic and tactile perceptions that produce adjacent sound y).Â

This doesn’t apply to the simple task in Katseff’s experiments, but it does show that these low-level speech perceptions are rarely controlled with high gain (in ve-ry care-ful speech). Ordinarily, we just have to get close enough to avert misperception of the intended word as a similar word, and when that happens we pronounce the intended word with higher gain.

This is important for understanding the incomplete resistance to disturbance shown in Katseff’s experiments.

···

/Bruce

On Sun, Jun 10, 2018 at 7:45 PM Martin Taylor csgnet@lists.illinois.edu wrote:

[Martin Taylor 2018.06.10.19.36]

          [Bruce

Nevin 2018-06-10_18:41:07 ET]

          I

see now that Katseff refers to this proposal about
predictive control

          Houde,

John F. & Srikantan S. Nagarajan. 2011. Speech
production as state feedback control,

https://www.frontiersin.org/articles/10.3389/fnhum.2011.00082/full

          I

have only glanced at it enough to see the argument that
delay in processing the acoustic signal makes it
unsuitable for ‘direct’ negative feedback control.

      I haven't looked at the article,

but I would look at a lot of work in speech perception and
auditory perception, for example at work by such as Louis Pols in
the 1970s (?) that demonstrated that both vowel and consonant
perceptions were based not on formant values,
but on formant traces over time (similar to sequence
control, but continuous rather than in discrete stages). But more
importantly, as everyone on CSGnet knows, the last sentence of the
abstract is wrong. Instead of “inherently unstable”, it should
read “has a tendency toward instability in the absence of
processes to correct for the phase shifts caused by delays,
especially at higher frequencies” or something similar.

          That

is actually not a problem for the model that I have
proposed, in which acoustic feedback only sets the
references for the perceptual variables that are rapidly
controlled in speaking.Â

            Taken together, such phenomena [including those

studied e.g. by Katseff & Houde] reveal a complex
role for feedback in the control of speaking – a roole
not easily modeled as simple feedback control. Beyond
this, however, there are also more basic difficulties
with modeling the control of speech as being based on
sensory feedback. In biological systems, sensory
feedback is noisy, due to environment noise and the
stochastic firing properties of neurons (Kandel et al.,
2000). Furthermore, when considering the role of the CNS
in particular, an even more significant problem is that
sensory feedback is delayed. There are several obvious
reasons why sensory feedback to the CNS is delayed
[e.g., by axon transmission times and synaptic delays
(Kandel et al., 2000)], but a less obvious reason
involves the time needed to process raw sensory feedback
into features useful in controlling speech. For example,
in the auditory domain, there are several key features
of the acoustic speech waveform that are important for
discriminating between speech utterances. For some of
these features, like pitch, spectral envelope, and
formant frequencies, signal processing theory dictates
that the accuracy in which the features are estimated
from the speech waveform depends on the duration of the
time window used to calculate them (Parsons, 1987). In
practice, this means such features are estimated from
the acoustic waveform using sliding time windows with
lengths on the order of 30–100 ms in duration. Such
integration-window-based feature estimation methods are
slow to respond to changes in the speech waveform, and
thus effectively will introduce additional delays in the
detection of such changes. Consistent with this
theoretical account, studies show that response
latencies of auditory areas to changes in higher-level
auditory features can range from 30 ms to over 100 ms
(Heil, 2003; Cheung et al., 2005; Godey et al., 2005). A
particularly relevant example is the long (~100 ms)
response latency of neurons in a recently discovered
area of pitch-sensitive neurons in auditory cortex
(Bendor and Wang, 2005). As a result, while auditory
responses can be seen within 10–15 ms of a sound att the
ear (Heil and Irvine, 1996; Lakatos et al., 2005), there
are important reasons to suppose that the features
needed for controlling speech are not available to the
CNS until a significant time (~30–100 ms) after theey are
peripherally present. This is a problem for feedback
control models, because direct feedback control based on
delayed feedback is inherently unstable, particularly
for fast movements (Franklin et al., 1991).

Martin