|
|
|
- Multimedia integration
- Synchronizing text with audio using SMILs
- Letting text drive the audio
- Additional readings
|
It is not necessary to synchronize
transcripts with audio or video recordings. Many
users of oral history will be happy to have the full
transcript available in one piece. However, if
the budget allows, synchronizing audio and text may
benefit some listeners by helping them maintain their
place in the transcript and by making it easier to
skip around in the interview.
Synchronization is most commonly accomplished using
SMIL or SAMI. SMIL (Synchronized Multimedia
Integration Language) is a Web standards markup language
for multimedia presentations. SAMI (Synchronized
Accessible Media Interchange) is a similar language
developed by Microsoft. SMIL is supported by QuickTime
and RealPlayer, while SAMI is used by Windows Media
Player.
Both SAMI and SMIL are HTML-like languages that
can be written "by hand" or created by
program. Although few programs are designed
specifically for oral history, there are many products
that can be adapted to the task. Captioning
tools designed to increase Web accessibility for
the hearing impaired work fine for oral history as
well. These include MagPie,
a free tool developed by the CPB/WGBH National Center
for Accessible Media (NCAM), and Hi
Caption Studio which
runs about $500. Both of these products can
output either SMIL or SAMI.
Web accessibility sites are good sources of "how
to" information about captioning for common
media players. See:
|
SMIL 2.0, published in 2001, is
a complicated language not yet fully supported by any
media player. However, a simple SMIL (pronounced "smile")
file to associate sections of transcript with corresponding
audio can easily be made by hand. Like HTML,
a SMIL file is just tagged text with <head> and
a <body> section. The basic format of a
SMIL file is:
<smil>
<head>
</head>
<body>
</body>
</smil>
|
The <head> section defines the layout of the
presentation, for example:
<head>
<layout>
<root-layout
width="800" height="800" background-color="white"/>
<region
id="banner" top="0" left="0" height="42"/>
<region
id="a" top="43" left="0" height="500"/>
</layout>
</head>
|
Here we define a screen of 800 x 800 pixels with
two regions. The region called "banner" starts
at the top left hand corner and is 42 pixels wide. The
region called "a" starts right below the
banner region and is 500 pixels wide.
<body>
<par>
<img
src="floridaVoices.jpg" region="banner" dur="165s"/>
<audio
src="Gross.MP3" />
<text
id="pt1" dur="25s" src="Gross-1.txt" region="a"/>
<text
id="pt2" begin="+25s" dur="24s" src="Gross-2.txt" region="a" />
<text
id="pt3" begin="+49s" dur="25s" src="Gross-3.txt" region="a" />
<text
id="pt4" begin="+74s" dur="80s" src="Gross-4.txt" region="a" />
<text
id="pt5" begin="+154s" dur="11s" src="Gross-5.txt" region="a" />
</par>
</body>
|
The <body> section defines the presentation. The
section above identifies 7 media files: one image, one
audio, and five text files. The image will
be displayed in the previously-defined region called "banner",
and the text files will display in the region called "a". The
transcript is synchronized with the audio by the begin and dur parameters. For
example, the section of transcript called "pt2" will
begin displaying after 25 seconds, and remain on
the screen for 24 seconds. Presumably, that
corresponds to the start and end time of the appropriate
audio segment. The <par> tag guarantees everything
within it will be played/displayed simultaneously
(while following timing instructions). Otherwise
the text would not appear until the audio had completed.
To execute the full SMIL file, click here (you
may have to install RealPlayer).
This is a very simplistic example designed to show
one technique -- breaking up a transcript into multiple
files each containing small-ish segments of text, and
timing the display of each file to correspond to the
spoken audio. This can be encoded in SMIL in
several different ways, and the layout designed in
the <head> section can be far more sophisticated
than that shown here. For more information see
the official SMIL web
page on the World Wide Web Consortium website. This
includes links to different versions of the specification,
a list of authoring tools, and several SMIL manuals
and tutorials. |
|
The technique described above is useful for audiences
who want to listen to an entire interview while reading
along. Often, however, a researcher will skim
the text of the transcript looking for particular
topics, and he will want to play the audio corresponding
to a selected section of text. In other words,
he will want the text to drive the audio, rather
than the other way around.
This can be done by breaking up the audio of the
interview into smaller files and using MS Word or
other text editor to insert links to the audio segments
in the appropriate places in the transcript. There
are many inexpensive programs that will split audio
files, including MP3
Splitter and Joiner (about $20 from EZ Softmagic,
very easy to use) and Audacity (free,
general purpose audio editor).
Here is
an example of a page where each question and answer
is a separate audio segment. In reality,
however, you would probably create longer segments;
some projects have recommended three minutes. For
a real-world application of this technique, see Oral
Histories of the American South. This
grant-funded project developed a tool for synchronizing
audio and text that might in the future be made available
to other oral history projects.
|
|
|
| TOP | HOME |
NEXT |
|
|
|