(Bruce Nevin Wed 930125 12:54:08 EST)
Bill Powers (950114.0750 MST)
I don't think that metaphors like "paths through familiar degrees of
freedom" are very helpful.
Thanks for calling me on that. It's easy to forget that metaphors tend
to take on a life of their own -- so to speak. ;->
It appears to me that some approaches to speech synthesis involve
engineering solutions that are difficult to adapt for modelling what
human beings do. When I say "engineering solutions," I mean methods
analogous to what I am seeing described here as the norm for much of
robotics. An exception appears to be the synthesizer described in:
Klatt, Dennis H and Laura C. Klatt. 1990. Analysis, synthesis, and
perception of voice quality variations among female and male
talkers. _The Journal of the Acoustical Society of America_
(A note says that Dennis Klatt died at the end of December 1988, and that
Laura Klatt was a summer research assistant in 1987. I don't know the
status of this equipment.)
The emphasis of the research reported in this paper is on voice quality.
Perhaps it is for that reason that a "glottal" sound source and various
resonators and other modifiers of its output, with their controllable
parameters and constants, are arranged as an analog of the vocal tract.
This "KLSYN88 Cascade/Parallel formant synthesizer" controls separate
resonators for each formant above the first. It does so separately for
laryngeal sound sources, where F1 through F5 are in series (cascade), and
for frication noise (F2 through F6 in parallel). I do not understand yet
why there is also a parallel arrangement of formant resonators for F1
through F4 for laryngeal sound sources, noted as "normally not used," nor
do I know whether the separation is an artifact of the block diagram, and
in fact the same device is used for e.g. the F2 resonator in all cases.
When I read the article I may understand all this better.
In most cases, as I understand it, it cannot be that the formants are
controlled independently of one another. The coupling between them is a
function of the physical (acoustical) properties of the vocal tract--a
part of the environment, as I take it. It is for this reason that F0
(pitch) and the first two formants, F1 and F2, suffice as cueues for
vowel perception, though not for synthesis of natural-sounding speech.
It would be nice to have some sort of vocal-tract function that
realistically provides this aspect. Perhaps only one input would be
controlled then for changes in pitch of all the formants at once.
Klatt and Klatt make the interesting observation that voice quality
apparently needs to vary over the course of speaking for the speech to
Got to go home and nurse this flu.