What is controlled in speaking and in speech recognition?

Technical Writer
       Cisco Systems, Inc.

  Thank you for your talk at the 2003 CSG Conference
  on 26 July 2003. In genealogy we have the problem
  of placing letters in classes which do not vary when
  names are examined over wide differences in time,
  location, ethnicity, or literacy, and when a mistake is
  made in speaking or hearing. The Soundex system
  was devised in the 19th Century for this purpose. In
  this system, vowels, h, w, etc. are worth 0, while the
  other consonants range from 1 (p, b, v, f) to 6 (r). As
  an example, my family name is R355, and the value
  is unchanged if one writes Rotman or Ruttman (the
  way the name actually appears on some records, a
  sign of illiteracy in the person or the midwife, etc.).

  The main problem with Soundex is that the w which
  is in some Eastern European names often became
  v in America, and an r is occasionally missed at the
  end of a name (Rabbiner sounds like Rabbina).

     Dave, 310-676-4032
     David Rothman
     14125 Doty Avenue, #23
     Hawthorne, CA 90250-8042

···

To: Bruce Nevin

________________________________________________________________
The best thing to hit the internet in years - Juno SpeedBand!
Surf the web up to FIVE TIMES FASTER!
Only $14.95/ month - visit www.juno.com to sign up today!

The general problem of variable spelling and pronunciation is threefold:

1. In English, the notion of a 'standard spelling' is basically an invention of the 19th century, so even for English names 16th-century spelling is all over the map.

2. Any given language changes through time and varies through geographical and social space. This could be the source of the w-v and r-null problem you mention.

3. The sounds of one language are variously rearticulated as sounds of another, e.g. at Ellis Island. This is a more likely source of that problem. Recorders were trying to spell unfamiliar names as resembling English names or words.

I poked into a web page at VBnet™ Visual Basic Developers Resource Centre to find out more about this. Soundex is a pretty simple-minded approach to this, placing letters in seven classes on the principle that all the members of a class represent sounds that have some phonetic features in common. The zero class appears to be for letters (vowels plus the semivowels w, y, and h) whose sound correspondences are so variable that they should be disregarded. The representation is the first letter (vowel or consonant) followed by 3 digits for the first 3 letters.

The specific problem that you mention is that the classification can easily break down when applied to names in non-English sound systems. Also it is applied letter by letter rather than sound by sound. Note that "Cook" is C200 even though the sound of "oo" is one vowel and not two. If it were truly phonetic there would be one zero and the code would be C202. Then the correspondence to Koch would be easier to make. The classes are pretty wild: 1=(B, P, F, V; 2=C, S, K, G, J, Q, X, Z!; 3=D, T;4=L; 5=M, N; 6=R. No provision for non-English alphabets or their transliterations, much less the non-English sounds they represent. So the system has a lot of problems.

Was there a question?

         /Bruce

···

At 10:55 AM 8/1/2003, David Rothman wrote:

        To: Bruce Nevin
             Technical Writer
             Cisco Systems, Inc.

        Thank you for your talk at the 2003 CSG Conference
        on 26 July 2003. In genealogy we have the problem
        of placing letters in classes which do not vary when
        names are examined over wide differences in time,
        location, ethnicity, or literacy, and when a mistake is
        made in speaking or hearing. The Soundex system
        was devised in the 19th Century for this purpose. In
        this system, vowels, h, w, etc. are worth 0, while the
        other consonants range from 1 (p, b, v, f) to 6 (r). As
        an example, my family name is R355, and the value
        is unchanged if one writes Rotman or Ruttman (the
        way the name actually appears on some records, a
        sign of illiteracy in the person or the midwife, etc.).

        The main problem with Soundex is that the w which
        is in some Eastern European names often became
        v in America, and an r is occasionally missed at the
        end of a name (Rabbiner sounds like Rabbina).

     Dave, 310-676-4032
     David Rothman
     14125 Doty Avenue, #23
     Hawthorne, CA 90250-8042

________________________________________________________________
The best thing to hit the internet in years - Juno SpeedBand!
Surf the web up to FIVE TIMES FASTER!
Only $14.95/ month - visit www.juno.com to sign up today!