word processing

[From Bruce Nevin (2003.09.19 11:28 EDT)]
You may well have seen this little exercise, which has proliferated
explosively in email and in blogs, beginning about September 12.

PRETTY WEIRD

Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it
deosn’t mttaer in waht oredr the ltteers in a wrod are,
the olny iprmoetnt tihng is taht the frist and lsat ltteer
be at the rghit pclae. The rset can be a total mses and
you can sitll raed it wouthit porbelm. Tihs is bcuseae the
huamn mnid deos not raed ervey lteter by istlef, but the
wrod as a wlohe.
As the cousin who passed it along to me said, “Dyxlesics of the
wrlod untie!”
Well, it’s more like: All the letters are present, but only the first and
last letters must be in the correct order. There can be wrong letters
(researcher == rsereearch instead of rscheearch above) or possibly extra
letters (omit the indefinite article “a”, then research ==
rseearch, racseerh, etc.). Even though it probably overstates the case,
nevertheless, this seems to be evidence against the simple model of a
word recognizer as checking off one after another of the phonemes (or
letters) in a fixed sequence. It seems unlikely that the speed and ease
of recognition could be entirely due to semantic expectations. After all,
the entire text is written this way. Comprehension is not delayed, but
begins immediately. (My oldest daughter, Ruby, wrote back that she didn’t
even realize anything was odd until the text told her so.)
It lends credence to a kind of linguistics known as optimality theory
(OT).
http://roa.rutgers.edu/
OT is a selectionist theory, as in Gary Cziko’s book Without
Miracles
. On the speech-production side of things, an ‘underlying
form’ (reference perception) for a word is retrieved from memory. A
diverse population of candidate pronunciations is tested against a set of
‘constraints’ (reference perceptions in systems controlling aspects of
speech). The constraints are ranked, so that some have higher priority
(gain) than others. The optimal candidate (least error) is the one that
is produced. This simple framework produces some very sophisticated
results. The literature now is quite large.
A stock intro example: In an OT description of English phonology, the
Onset constraint (a syllable must begin with a consonant) ranks higher
than Faithfulness constraints (produce the underlying form faithfully).
Consequently, before a word such as apple whose underlying form
begins with a vowel, we insert a ‘default’ consonant when it follows a
word that ends in a vowel (and usually when it is utterance-initial), but
not when it follows a word that ends in a consonant. In English, the
default consonant is a glottal stop:

  1. Apples are good.
    Note the initial glottal stop. (A glottal stop is the initial and medial
    sound in the English negation that is usually written
    “uh-uh”.)
    The pronunciation of a preceding word may in some cases be changed to end
    with a consonant when a vowel-initial word like apple
    follows:

  2. An apple a day
    By contrast, final n of an (which has the same historical
    origin as one) is not retained when a consonant follows:

  3. *A bird in hand …*Moving along to the definite article, before a vowel the
    rhymes with thee:

  4. The appleThe effect is that a y sound intervenes before apple, a
    consonantal sound even though no consonant letter represents it.
    By contrast, the vowel of the rhymes with the vowel of cup
    when a consonant-initial word follows:

  5. The cupIn those dialects – often perceived as somewhat uncouth – in which
    one says
    “Thuh apple”, a glottal stop intervenes.
    In many of the languages that rank Onset over Faithfulness, the ‘default
    consonant’ is not a glottal stop. For example, it is t in Tagalog
    (an important language in the Philippines). Yet other languages coalesce
    the vowels into one, or elide one of the two vowels. Sanskrit and Greek
    are examples, and to some degree this was also formerly heard in English,
    e.g. Shakespeare, “Oh, let me see’t!”

Languages which, on the other hand, rank Faithfulness constraints above
the Onset constraint do not insert a default consonant between a
word-final vowel and a word-initial vowel, but instead allow a syllable
to begin without a consonant in its onset. An example would only be
convincing if you knew the language.

There has also been some work in this theory on the question of how the
underlying forms (reference perceptions) of words are learned (or
‘acquired’). It explains how it can be that young children can have what
appear to us to be oddly scrambled pronunciations, and apparently not
notice the discrepancy between what they produce and what they hear from
others. Examples from one 2.5 year old:

    Child           Adult
    -----           -----
    gichys          chicken
    nowsman         snowman
    dans            stand

This sort of phenomenon, and the variability of it from one child to
another, is hard to explain otherwise.

This seems consistent with many control systems controlling in parallel.
It obviously is not scrambling of the sort proposed above for written
English (that may in part be an artifact of the arbitrariness of English
spelling, which puts some distance between the written word shape and the
corresponding spoken word shape), but it may be related. A few salient
points from web discussions (refs below):

  • The faster you read the less trouble you have with it.

  • Breaking up a letter cluster that represents a single phoneme (ch, sh,
    ow, etc.) is more problematic. (Indeed, the initial example above is
    pretty respectful of syllable and morpheme boundaries.)

  • It doesn’t work with inflected languages (e.g. German, Russian) if the
    inflectional ending is scrambled into the middle of the word.

“Uncle Jazzbeau” may have tracked down the source
(http://www.bisso.com/ujg_archives/000224.html).
Here’s a quote:

···

ø¤º°°º¤ø,¸¸,ø¤º°°º¤ø,¸ begin quote
¸,ø¤º°°º¤ø,¸¸,ø¤º°°º¤ø

¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
The original reference is [in] a letter to New Scientist magazine by
Graham Rawlinson of Aldershot, Hampshire (vol 162 issue 2188 - 29 May
1999, page 55) titled “Reibadailty”.

Rawlinson writes: ‘You report that reversing 50-millisecond segments of
recorded sound does not greatly affect listeners’ ability to understand
speech (In Brief, 1 May, p 27).

'This reminds me of my PhD at Nottingham University (1976), which showed
that randomising letters in the middle of words had little or no effect
on the ability of skilled readers to understand the text. Indeed one
rapid reader noticed only four or five errors in an A4 page of muddled
text.

'This is easy to denmtrasote. In a puiltacibon of New Scnieitst you could
ramdinose all the letetrs, keipeng the first two and last two the same,
and reibadailty would hadrly be aftcfeed. My ansaylis did not come to
much beucase the thoery at the time was for shape and senqeuce
retigcionon. Saberi’s work sugsegts we may have some pofrweul palrlael
prsooscers at work.

'The resaon for this is suerly that idnetiyfing coentnt by paarllel
prseocsing speeds up regnicoiton. We only need the first and last two
letetrs to spot chganes in meniang.

‘This was not easy to type!’

The letter is in New Scientist’s searchable online archive
(archive.newscientist.com)


ø¤º°°º¤ø,¸¸,ø¤º°°º¤ø,¸ end quote
¸,ø¤º°°º¤ø,¸¸,ø¤º°°º¤ø

¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
At
http://www.bisso.com/ujg/
he says where the dissertation can be found.

LanguageHat has an extended discussion with some good stuff in it:
http://www.languagehat.com/archives/000840.php

Also Slashdot
http://science.slashdot.org/article.pl?sid=03/09/15/2227256&mode=thread&tid=133&tid=134&tid=186

These three sites are now mentioned on the Snopes folklore site:
http://www.snopes.com/language/apocryph/cambridge.asp

[From Rick Marken (2003.09.19.1140)]

Bruce Nevin (2003.09.19 11:28 EDT)
You may well have seen this little exercise, which has proliferated
explosively in email and in blogs, beginning about September 12.

I just saw it this morning. Nifty!

this seems to be evidence against the simple model
of a word recognizer as checking off one after another of the phonemes
(or letters) in a fixed sequence. It seems unlikely that the speed and
ease of recognition could be entirely due to semantic expectations. After
all, the entire text is written this way. Comprehension is not delayed,
but begins immediately.
I agree. When I saw this it looked like a perfect demonstration of Bill’s
pandemonium word recognition scheme where all word recognizers fire off
to a degree proportional to the extent that their letter sequence is present.
So even though, for example, the “want” and “what” and “that” recognizers
fire when “waht” is shown (as in the example, “what” fires a bit more than
the others and, the “what” output also produces the best result for the
higher level sentence recognizer (which fires maximally for the “doesn’t
matter what order” set of inputs)
It lends credence to a kind of linguistics known
as optimality theory (OT).

http://roa.rutgers.edu/
OT is a selectionist theory, as in Gary Cziko’s book Without Miracles.
On the speech-production side of things, an ‘underlying form’ (reference
perception) for a word is retrieved from memory. A diverse population of
candidate pronunciations is tested against a set of ‘constraints’ (reference
perceptions in systems controlling aspects of speech). The constraints
are ranked, so that some have higher priority (gain) than others. The optimal
candidate (least error) is the one that is produced. This simple framework
produces some very sophisticated results. The literature now is quite large.

I like the pandemonium version because it makes all “hypotheses” about
the word present available simultaneously to the next higher perceptual
level. The OT theory acts like a filter that decides what should go on
to the sentence recognizer. I think this could be a very inefficient
approach.

'This is easy to denmtrasote. In a puiltacibon of
New Scnieitst you could ramdinose all the letetrs, keipeng the first two
and last two the same, and reibadailty would hadrly be aftcfeed. My ansaylis
did not come to much beucase the thoery at the time was for shape and senqeuce
retigcionon. Saberi’s work sugsegts we may have some pofrweul palrlael
prsooscers at work.
I agree. The PCT pandemonium model is a palrlael pcroisesng mdesl
of wrod rceogtnioin.
Best

Rick

···

Richard S. Marken, Ph.D.

Senior Behavioral Scientist

The RAND Corporation

PO Box 2138

1700 Main Street

Santa Monica, CA 90407-2138

Tel: 310-393-0411 x7971

Fax: 310-451-7018

E-mail: rmarken@rand.org

This is Phil Runkel on 19 Sept 03, replying to Bruce Nevin of
2003.09.19. 11:28:

Hey, your essay on PRETTY WEIRD was lots of fun. Thanks. --Phil R.

[From Bill Powers (2003.09.20.0745 MDT)]

Bruce Nevin (2003.09.19 11:28 EDT)--

You may well have seen this little exercise, which has proliferated
explosively in email and in blogs, beginning about September 12.

>PRETTY WEIRD
>
>Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it
>deosn't mttaer in waht oredr the ltteers in a wrod are,
>the olny iprmoetnt tihng is taht the frist and lsat ltteer
>be at the rghit pclae. The rset can be a total mses and
>you can sitll raed it wouthit porbelm. Tihs is bcuseae the
>huamn mnid deos not raed ervey lteter by istlef, but the
>wrod as a wlohe.

Astonishing! I noticed that of all these words, there are very few which
could have been any other word no matter how you arrange the intermediate
letters leaving the first and last the same. A word like "untie" which
appeared later is a rarity, since it was meant as "unite". Out of context,
there would be no reason to suspect "untie" of really being some other
word. Anagrams are not all that common, especially when their first and
last letters must be the same!

As Rick said, this does say something about the pandemonium model. As long
as other recognizers are not producing significant signals, even a small
signal from one recognizer is enough. One variant on the sequence
recognizer in B:CP that occurred to me was to take signals from the output
of each stage and sum them. That gives at least some signal from partial
words. But it doesn't take care of intermediate elements that are wrong.
I'm sure there's some similar simple arrangement that will do the trick.

Another thing we have to keep in mind is that these are visual patterns
we're looking at, so it is possible to perceive their elements
simultaneously, up to some number of letters (configuration level). It's
not nearly as easy to do this if you scramble the beginning and ending
sounds of a spoken word, where the elements occur only in temporal
sequence. If you scranbled the sounds artificially, I doubt that the words
would be anything like this easy to recognize. It's fairly easy to learn to
recognize printed words in a mirror, but backward sounds are impossible
(for me) to understand. To some extend, sounds are retained in a "specious
present" for a short time, so they might form configurations, but not
nearly as easily as printed letters do. One trick Mary taught me for
solving anagrams was to scatter the letters rsandomnly in two dimensions.
This breaks up misleading patterns and makes new ones easier to find. This
can't be done with sounds.

I noticed also that the easiest words to recognize were those that provided
correct signals from more than one sequence-recognizer, as in "Cmabrigde".
the "bri" is a strong hint, and many of the letters are close to their
normal positions in the word. A thoroughly scrambled word that avoids all
familiar subsequences, like "Cbdrmgaie", is much harder, while "Crabdigme"
is hard for the opposite reason -- too many familiar subsequences, too many
shouts from other input functions..

I'll leave it to you to say whether this all fits in with "optimality theory."
I'm just getting the files and programs on this desktop computer back into
their original arrangement after Windows ME simply corrupted itself, with
no hardware problems, to the point where the repair tech had to reformat
the disk. Fortunately, I kept the old 20G disk from my previous machine and
had made a backup copy of everything only a few days before the crash. I am
now back running on Windows 98, since going on to W2000 would have cost
$250, and XP lets Bill Gates, or someone with less delicate sensibilities,
read over my shoulder.

Best,

Bill P.

Message
From David Goldstein (2003.09.21.0801)

[From Bruce Nevin (2003.09.19 11:28 EDT)]

Dear Bruce and listmates:

While I was able to figure out what the message was saying, I felt that my eyemovements were not normal. It would be interesting to do the following experiment.

Have a person read a norml paragraph while monitoring eye movements using the Visigraph. This provides quantitative information about eye movements such as number of fixations, regressions, during of fixaations and provides normative comparisons.

Then have a person read a paragraph with the kind of disturbances that you talk about.

While I know that I can obtain the message with the distorted paragraph, I feel that I am not doing this in the way that I normally read.

Best regards,

David

[From Bruce Nevin (2003.09.22 16:16 EDT)]

David Goldstein (2003.09.21.0801)–

···

At 08:08 AM 9/21/2003 -0400, David M. Goldstein wrote:

Have
a person read a norml paragraph while monitoring eye movements using the
Visigraph.

Do you have access to such equipment?

    /Bruce

Nevin

Message
[From David Goldstein (2003.09.23.709 EDT)]

[From Bruce Nevin (2003.09.22 16:16 EDT)]

Hi Bruce and listmates,

Used to. Easy to use. Coordinated with online reading program called ReadingPlus.

I probably can gain access through an optometrist who I turned on to using the Visigraph.

Why?

···

-----Original Message-----
From: Control Systems Group Network (CSGnet) [mailto:CSGNET@listserv.uiuc.edu] ** On Behalf Of** Bruce Nevin
Sent: Monday, September 22, 2003 4:17 PM
To: CSGNET@listserv.uiuc.edu
Subject: Re: word processing

[From Bruce Nevin (2003.09.22 16:16 EDT)]

David Goldstein (2003.09.21.0801)–
At 08:08 AM 9/21/2003 -0400, David M. Goldstein wrote:

Have a person read a norml paragraph while monitoring eye movements using the Visigraph.

Do you have access to such equipment?

      /Bruce Nevin

[From Bruce Nevin (2003.09.23 11:19 EDT)]

David Goldstein (2003.09.23.709 EDT)–

Used
to [have Visigraph].
Easy to use. Coordinated with online reading program called
ReadingPlus.

I probably can gain access
through an optometrist who I turned on to using the
Visigraph.

Why?

Because I don’t have the Visigraph, and that was an interesting
hypothesis. Do you want to test it?

My daughter Aria forwarded a reply from the father of one of her friends
at Smith. He said that he was unable to read it aloud, though reading it
to himself posed no problem. I had no problem reading it or other
scrambled texts aloud. This probably relates to differences in reading
style (cognitive style?) of the sort noted by one of the bloggers on
the slashdot\science site. Several others wrote programs to
randomize the inner letters of words in an input text. Several noted that
there are degrees of scrambledness. Some supposed it was distance from
original location. I think it’s whether a letter is moved out of its
original morpheme. Seems to me overwhelmingly probable that there are
morpheme recognizers as well as whole-word recognizers, and that
disturbing control of the former would reduce useful input to the latter.
All testable.

    /Bruce
···

At 07:11 AM 9/23/2003 -0400, David M. Goldstein wrote: