Words and perceptions

[Martin Taylor 960422 1111]

Bill Powers (960420.0300 MDT)

The night-owl calls again, I see...

Words in general underspecify perceptual experience, not so? I'm still
considering a demo in which one person verbally guides another to create
some sort of result on a computer screen; the guide can't see the
screen. I think this demo could show very dramatically what it is about
experience that words always leave out.

It's not exactly what you describe, but I did something rather like it as
part of the same sleep-deprivation study for which we have so much tracking
data. One person was required to describe to another a route on a map.
The map consisted of a sheet of A3 (or 11 x 17) paper on which were
depicted several named landmarks. Each participant had a map, and neither
could see the other participant or the other's map. The two maps differed
in controlled ways: most landmarks were identical and identically placed,
but some landmarks might be named differently on the two maps, or a
landmark might be missing or duplicated on one or other of the maps. The
"giver"s map had a route marked on it that wound its way from a "Start"
that was marked identically on both maps to an "end" marked only on the
giver's map. The "follower"s job was to mark as closely as possible the
same route on his/her map.

Over the course of the sleep deprivation study, 216 separate dialogues
were recorded. All of them have been transcribed, and are available both
as digitized audio and as transcriptions, together with displays of the
giver's and resulting follower's maps, on a set of 12 CD-ROMs available from
the Linguistic Data Consortium at a nominal price (I think $200 US, but
that's from faulty memory).

I had intended (still do) to try analyzing these dialogues using Layered
Protocol Theory, but since both the actual voice and resulting pictures
are available, anyone can use them for any kind of analysis. I know some
people want to look at changes in voice quality with sleep deprivation and
with the various drugs. Other people want to look at "riskiness" (I'd say
control gain) in tolerating possible error, by either giver or follower
or both. And they could be used to look at how the actual visual impression
of the follower's map agrees with the giver's map when both think they are
satisfied (which should be something like Bill's suggestion.

The official announcement of the availability of the corpus follows.

Martin

ยทยทยท

============================

                 DCIEM SLEEP DEPRIVATION STUDY MAP TASK CORPUS

             Defence and Civil Institute of Environmental Medicine
                          North York, Ontario, Canada

                      Human Communication Research Centre
              University of Edinburgh & University of Glasgow, UK

                              under the aegis of
    NATO DRG Panel 3 Research Study Group 10 (Automatic Speech Processing)

            Corpus Copyright 1995 DCIEM; Distributed by HCRC & LDC
         Pre-mastering by Speech Data Services Ltd, Great Malvern, UK.

   The DCIEM Sleep Deprivation Map Task Corpus is the product of a
   collaboration between HCRC and DCIEM. Like its predecessor, the HCRC Map
   Task Corpus, the DCIEM Corpus is a large scale balanced elicitation
   experiment for spontaneous speech. It consists of 216 unscripted
   task-oriented dialogues produced by 35 normal Canadian adults
   participating a sleep-deprivation study. Of the 216 dialogues
   approximately 60 dialogues were recorded as control material before
   sleep-deprivation began, 138 during sleep deprivation, and 18 after
   recovery sleep. During the 60-hour work-filled sleepless period which
   began on the second day of recording, subjects were assigned to drug
   treatment groups (amphetamines, Modafinil, placebo) on a double-blind
   regime.

   The map task dialogues were direct analogues of those contained in part
   of the HCRC Map Task Corpus and the Chiba Map Task Corpus. Pairs of
   speakers worked with slightly different schematic maps of imaginary
   locations, collaborating to reproduce on one of the maps the route
   preprinted on the other. Neither could see the other's map.
   Participants knew that their maps differed but not where or how. No
   restrictions were placed on what subjects could say. Different pairs of
   maps used over the course of the study presented both a range of
   phonological material in the form of labels on landmarks and a balanced
   set of differences between maps.

   All dialogues took place in studio-like conditions and were recorded on
   DAT with a separate channel and close-talking microphone for each
   participant. Subjects worked either with the same partner throughout the
   week-long study or with 2 different partners, each of whom also had 2
   partners. All subjects served both as givers and as receivers of
   instructions.

   The resulting 216 dialogues include over 175,000 word tokens representing
   approximately 1,900 different words. For each dialogue, the Corpus
   includes a digitized speech file, an orthographic transcription including
   the HCRC Map Task Corpus sgml-type annotations and time-stamps at the
   onset of each contribution, scanned model and copied maps, a NIST header
   file, and a TEI-entry point. All 12 CD-ROMs of the Corpus also contain
   explanatory information and a detailed account of the experimental design
   and the speaker characteristics. Part 1 of the Corpus includes all
   materials for 54 dialogues which comprise a balanced study of drug
   treatment and sleep deprivation. Part 2 contains the rest of the
   materials.

   The materials have been designed to be easily accessible to users with
   different equipment and a variety of needs. All the text files should be
   readable and printable via most systems which can be connected to a
   CD-ROM reader. The maps are intended for printing via PostScript(TM)
   printers, and the speech files are provided with human-readable standard
   headers, enabling them to be played by a wide range of environments for
   processing sampled speech. Local and public domain software for
   accessing the speech on a variety of platforms is also included.

   The DCIEM Sleep Deprivation Study Map Task Corpus carries no warranty of
   any kind. For more information, consult

   The old www.cogsci.ed.ac.uk server

   or maptask@cogsci.ed.ac.uk