Formants

[Martin Taylor 931124 15:45]
(Bruce Nevin Wed 931124 13:25:04)

To avoid confusion spreading too widely, I'll correct Bruce's conceptual
typo for him, though I'm sure he will notice it himself soon enough.

Formant enhance the harmonics within their band, rather than "damp" them.

"Damp" is an unfortunate word in this context, because damping is related
to the width of a formant. Wide formants damp fast, narrow ones damp
slowly. But they all enhance rather than diminish the signal within
their pass-band. Damping refers to how fast an oscillation dies down,
not to enhancement or suppression.

A Formant is the effect of a resonator on a signal. To see the formant
in a spectrum, the signal that it affects must be of sufficient bandwidth.
Bruce's illustration is correct, except that his dotted lines should show
frequencies of greater rather than missing energy. The same arguments
apply. The soprano signal does not have the components to define the
formants, as he showed them, so long as she keeps a constant pitch.

Singers are trained not to keep a constant pitch. Their fundamental
fluctuates above and below the notional target pitch. One of the effects
of that fluctuation is to drift the harmonics, which are multiples of the
fundamental, up and down the frequency scale, so that they are likely to
move across the band of any formant resonator. The shift in relative
amplitudes of the harmonics allows the ear to hear the formants.

Incidentally, this effect is even more important for the tone of a violin.
The body resonances of a violin are quite sharp and there are many of them.
As the player's vibrato moves the harmonics across the peaks and valleys
of the violin's body spectral filter, the instantaneous "quality" of the
tone changes rapidly. This change is heard as the richness of the violin
tone, and is hard to simulate electronically. Too much fluctuation, or
body formants poorly located relative to one another, make for a poor
violin. Same, I suppose, for a singer.

As for the original query about changing the voice quality by changing
the pitch, this is one of the standard kinds of transformation that
speech researchers do. They also may change utterly the excitation
waveform, replacing the periodic glottal pulse with something else.
One of the weirder demonstrations of this kind was something Melvyn Hunt
did some years ago: He made the Berlin Philharmonic sing "We were away
a year ago" as the first bars of Beethoven's Fifth Symphony. I don't
mean that the orchestra members sang. The instruments did. Melvyn
took the formant structure of himself saying the phrase, and fed the
recording of the music through it. A fascinating concept and a
fascinating experience.

Martin

[From: Bruce Nevin (Mon 931129 09:37:48 EST)]

(Martin Taylor 931124 15:45) --

Conceptual typo is a good word for it. Thanks, Martin. I seem to have
got disoriented in the mirror world! (Is there still egg on my face, or
is that a hallucination?)

I especially appreciated your account of the utility of vibrato, which
had before seemed to me as though merely decorative. Questions arise
about the preference for clear, vibrato-free singing in "authentic" early
music performance, boy soprani, etc. Also the thought that lower voices
need not and perhaps ought not to have so much vibrato as high voices. I
still find it difficult to understand the words of much singing unless I
know them already. And this all points of course to a communicative
value to intonation contours that hadn't occurred to me. As the voice
pitch sweeps up and down in the course of producing an utterance,
harmonics sweep across the "window" of each formant. This in turn
suggests a functional basis for women to use more expressive intonation
contours in their speaking, and for men to use a flatter intonation
style.

I was aware that sound energy outside the formants was suppressed or
damped--this is the way it is described by e.g. Lieberman & Blumstein,
and I remember this from working with Lisker and (later) with Ohala.
I was not aware that compression waves at the formant frequencies are
actually amplified or enhanced. What accomplishes this? Surely not
resonance of bony structures surrounding the mouth, unless some vowels
are favored by such resonance and others not as the tongue moves through
the continuous range of vowel-articulating positions. That would have
some fascinating implications for language evolution and language
learning.

`Late to bed, early to rise, makes a man stupid.' (Poor Richard's
bankrupt brother.)

    Bruce

[Martin Taylor 931129 11:45]
(Bruce Nevin Mon 931129 09:37:48)

I was aware that sound energy outside the formants was suppressed or
damped--this is the way it is described by e.g. Lieberman & Blumstein,
and I remember this from working with Lisker and (later) with Ohala.
I was not aware that compression waves at the formant frequencies are
actually amplified or enhanced. What accomplishes this? Surely not
resonance of bony structures surrounding the mouth...

There's a subtlety about this word "amplified," and it has a general
implication for control.

Resonators are passive filters. They do not supply energy. They organize
it, if you want to think in those terms. What we are looking at is a
system that supplies energy that might be used at any frequency (lung
air pressure greater than ambient produces an air flow), and another
system of much lower power that determines the frequencies at which the
available energy supply will be used. It is that lower power signal
that is amplified, not the energy from the air flow.

Generally, when we draw control loop diagrams, we show something like
this:

               > reference
               V
        ---comparator----
       > >
   perceptual output
    function function
       ^ |
       > V
======|=======etc=======|====
       > >
       > <-effects--
                 on CEV

We should show something more like this:

               > reference
               V
        ->-comparator-->- -<- energy source
       > > >
   perceptual output
    function function
       ^ | |
       > V V
======|=======etc=======|=|==
       ^ | |
       > power--<-- ----> energy sink
              to CEV

Some part (in an efficient system, the greater part) of the power from the
energy source is delivered to the CEV rather than to the sink. But no more
can be delivered to the CEV than is available from the source plus the
error input to the output function. If more goes to the CEV than is
available from the error input alone, we say that the error is "amplified"
by the output function. It has "gain" (doesn't matter whether the sign
of the gain is positive or negative--energy is a square-law phenomenon).
But one cannot amplify the energy from the "energy source" (except by a
very small amount if the energy available in the error signal is of the
same order of magnitude as that in the power source).

Energy from the energy source may be concentrated in time by the output
function; we call that "modulation" of the energy source by the error
signal. The way this is done depends on the physics of the situation.
With cavity resonators such as exist in the vocal tract, you can think
of the power as being concentrated by the reinforcement of echoes within
the resonator at some time intervals as opposed to the cancellation of
echoes at other time intervals. The time delay of an echo that leads
to reinforcement corresponds to a frequency that is reinforced, or, we
might say, amplified, if that frequency is something affected by the
error signal. But it is really only a repackaging of the energy available
from the energy source, and does not augment that energy in total.

I was not aware that compression waves at the formant frequencies are
actually amplified or enhanced. What accomplishes this? Surely not
resonance of bony structures surrounding the mouth, unless some vowels
are favored by such resonance and others not as the tongue moves through
the continuous range of vowel-articulating positions.

So you can see that the compression waves at the formant frequencies ARE
enhanced, as compared with the amplitude they would have if the resonators
were not there. But they carry no more energy than the total over all
frequencies would have otherwise have been, without the resonators. The
energy source is not amplified.

The soft structures of the mouth damp the echoes, making the reinforcement
and cancellation at any frequency imperfect. This is why the formant
passbands are broad rather than sharply peaked. A more technical way of
saying the same thing is that they are "low-Q" resonances. The damping
really does remove energy that could have been in the acoustic signal.
In the above diagram, that energy goes out by the path shown as "power sink,"
rather than to the CEV. But this damping has nothing at all to do with
the formants. It is a function of the physics of the walls of the
acoustic tract; the formants are functions of the shape of the tract.

It is useful to note that some of the energy shown as going to the energy
sink is actually devoted to what are sometimes called "side effects" of
control. If you include the heat generated by the control action as a
side effect, then all of the energy to the sink goes to sife effects. I
prefer to think of side effects as being more organized than simple heating.

Martin