Florida Voices
Florida Voices is an initiative of the Florida Electronic Library to support all types of libraries and cultural heritage organizations in Florida.
Florida Voices
 

II. Preservation

  1. Recommended digital recording settings and file formats
  2. Preservation agreements
  3. Guidelines for naming files
  4. Copying files and labeling media for preservation and access
  5. Records management
  6. Preservation guides
1. Recommended digital recording settings and file formats

The most important factor determining the quality of a recording, whether analog or digital, is the quality of the recorder's pre-amp, the device that converts vibrations into an electronic audio signal. There is ongoing debate over the merits of various digital file formats, such as .mp3, .wav and .bwf, as well as various recording parameters that determine file size and sound quality. In practical terms, in order to achieve the highest possible sound quality and the maximum range of potential uses (including the ability to take advantage of as-yet uninvented technologies), one must invest more time and money in purchasing and learning to use higher-end equipment as well as acquiring more disk or server storage capacity. Given the real-world limitations of most oral history programs, the most widely agreed-upon compromise is CD-quality audio recorded in a .wav format at a bit-depth of 16 bits and a sampling rate of 44.1 kHz. This is the standard that the music industry uses, and it also drives the market for most mass produced consumer audio devices.

Digital archivists, professional sound engineers, and other digital media experts make a strong case for adopting a higher standard if the recording is worth preserving for the future. The larger the bit-depth and the higher the sampling rate, the more detail can be captured in the recording and the closer it comes to being an exact replica of the original sound, which also results in larger file sizes (and thus increases the required amount and cost of storage space—to see how much larger, click here. Another reason for a higher bit-depth and sampling rate is forgivability: if the recording levels were too low or there was unwanted background noise and the recording requires sound editing, a smaller, lower-resolution file is harder to improve than a larger, higher-resolution one. See Sound Devices: Real World Advantages of 24-bit Recording."

Audio files should ideally be created in an uncompressed PCM (Pulse Code Modulation) format, either .wav or .bwf. On the other hand, as the recording capability and storage capacity of devices such as iPods continues to improve, including the ability to record at higher bit-depth and sampling rates, inexpensive recording in .mp3 format may become less incompatible with good sound quality than currently is the case. Recording with iPods is a particularly good option for classroom oral history projects. Refer to the USF Oral History Program's guide, "Recording with iPods."

PCM formats: WAV, BWF and AIFF
In the United States, the .wav format is the most universally accepted file format for digital audio master files. FCLA’s WAVE Action Plan notes that “the BWF (Broadcast Wave Format), developed by the European Broadcasting Union, is based on the WAVE format. While a WAVE file can use any one of a number of compression formats, BWF files can only use either PCM format or MPEG compression. A BWF file has at least one extra chunk that a WAVE file does not - the Broadcast Audio Extension chunk. This chunk contains material that broadcasters would exchange with each other, like a textual description of the sound file. The EBU came up with a core metadata set for radio archives that is based on the Dublin Core metadata. This metadata can be stored in the <axml> chunk that was added to the BWF specification in 2003. A BWF is compliant with the WAVE format and uses the WAVE file extension (wav) so WAVE players can play BWF files (but can not parse any added metadata).” Some audio manufacturers, such as Sound Devices, are merging the .wav and .bwf formats because “the .bwf file extension ended up causing more confusion than it eliminated” (Sound Devices Sound Notes). Some oral history programs also use the AIFF format. The most important factor from a preservation standpoint is that digital audio be recorded in an uncompressed pulse-coded modulation (PCM) format.

The national standard for digital file formats can be found at Sustainability of Digital Formats: Planning for Library of Congress Collections. Similarly, the Florida Center for Library Automation (FCLA) has developed preservation guidelines for the Florida Digital Archive (FDA) , including estimates of the long-term viability for various digital formats. Formats that are rated high or medium confidence level receive full preservation under the agreements between FCLA and FDA participating institutions among Florida's eleven state universities. Formats rated low confidence level receive bit-level preservation. In general, files that are encrypted, compressed, lossy, or proprietary formats are the most difficult to preserve. For optimum preservation, files in these low confidence level formats should be converted to formats that are unencrypted, uncompressed, lossless, and open-source.

For preparing the media types typically used in oral history (text,audio and video) for professional digital preservation, FCLA recommends:


Text: convert documents created with word processing programs such as Microsoft Word or Word Perfect to PDF/A-1(ISO 19005-1)(*.pdf)
Audio: record in or convert to WAV(PCM) (*.wav, *.bwf)
Video: record in or convert to Motion JPEG (*.avi, *.mov), Motion JPEG 2000 (ISO/IEC 15444-4) (*.mj2), AVI(uncompressed) (*.avi), QuickTime Movie(uncompressed)(*.mov)
FCLA preservation ratings for oral history media formats
Text (transcripts and other textual documents)

High confidence level:

  • Plain text(encoding: USASCII, UTF-8, UTF-16 with BOM)
  • XML(includes XSD/XSL/XHTML, etc.; with included or accessible schema and character encoding explicitly specified)
  • PDF/A-1(ISO 19005-1)(*.pdf)

Medium confidence level:

  • Cascading Style Sheets(*.css)
  • DTD(*.dtd)
  • Plain text(ISO8859-1 encoding)
  • PDF(*.pdf)(embedded fonts)
  • Rich Text Format 1.x(*.rtf)
  • HTML 4.x(include a DOCTYPE declaration)
  • SGML(*.sgml)
  • Open Office(*.sxw, *.odt)
  • Office Open XML(*.docx)

Low confidence level:

  • PDF(*.pdf)(encrypted)
  • Microsoft Word(*.doc)
  • WordPerfect(*.wpd)
  • DVI(*.dvi)
  • All other text formats not listed here
Audio

High confidence level:

  • AIFF(PCM)(*.aif, *.aiff)
  • WAV(PCM)(*.wav, *bwf)

Medium confidence level:

  • SUN Audio(uncompressed)(*.au)
  • Standard MIDI(*.mid, *midi)
  • Ogg Vorbis(*.mid, *.midi)
  • Free Lossless Audio Codec(*.flac)
  • Advance Audio Coding(*.mp4, *.m4a, *.aac)
  • MP3(MPEG-1/2, Layer 3)(*.mp3)

Low confidence level:

  • AIFC(compressed)(*.aifc)
  • NeXT SND(*.snd)
  • RealNetworks 'Real Audio'(*.ra, *.rm, *.ram)
  • Windows Media Audio(*.wma)
  • WAV(compressed)(*.wav)
  • All other audio formats not listed here
Video

High confidence level:

  • Motion JPEG 2000(ISO/IEC 15444-4), (*mj2)
  • AVI(uncompressed)(*.avi)
  • QuickTime Movie(uncompressed)(*.mov)
  • Motion JPEG(*.avi, *.mov)

Medium confidence level:

  • Ogg Theora(*.ogg)
  • MPEG-1,MPEG-2(*.mpg, *.mpeg)
  • MPEG-4(*.mp4)

Low confidence level:

  • AVI(compressed)(*.avi)
  • QuickTime Movie(compressed)(*.mov)
  • RealNetworks 'Real Video'(*.rv)
  • Windows Media Video(*.wmv)
  • All other video formats not listed here
2. Preservation agreements

Ideally, a digital oral history program should be supported by an agreement with a library or archives to ensure professional standards of cataloging, preservation and public access for all interview materials. The preservation agreement should include secure storage, verification of file fixity, media migration as necessary, and format migration to newer formats as file formats threaten to become obsolete.

Click on the links below for additional resources on preservation agreements between oral history programs and archival repositories, including sample agreements:

3. Guidelines for naming files

Once digital audio files have been uploaded from the recorder to a computer hard drive, it is very important to name all the files generated by each interview using a consistent system that will create unique ID numbers and ensure that basic information about the provenance of the interview remains attached to the file. According to the Digital Library Federation, "File naming should follow ISO 9660 conventions: 8-character filenames, 3-character extensions, using A-Z, a-z, 0-9, underscores and hyphens. The rationale behind this suggestion is that when moving texts across different platforms (DOS for instance), some systems will truncate beyond the eighth character." (Digital Library Federation, "TEI Text Encoding in Libraries: Guidelines for Best Encoding Practices" Version 2.1)

That being said, with only 8 characters in a file name, it is difficult to include basic information such as the interviewee name and date. Many oral history programs do not adhere to 8 characters, but use the accession number of the interview as the basis for each unique file number, so that transcripts, master audio, and derived files (such as sound edited or streaming web audio) can be identified as being from the same interview, and can be matched with paper documentation on file. An example of a file name based on the accession number, last name of interviewee, date of interview (YYYYMMDD), and record type would be: 00137_smith_20071203_trans. For audio that requires editing to improve sound quality, both the original and edited files should be saved in .wav format and archived for digital preservation. Since most master files should not require sound editing unless there are problems during recording, including edited versions of audio master files should not significantly add to storage space requirements. Be sure to specify the inclusion of both edited and original master files in the preservation agreement with an archival repository, if applicable.

4. Copying files and labeling media for preservation and access

When using CD burning software to save sound files to CD, there are two disk format options that can be chosen: CD-DA (audio format) and CD-R (data format). Note that the same compact disks (media) are used for both CD-DA and CD-R; only the format of the recorded file is different.

CD-DA (Compact Disc - Digital Audio) is the official designation for the audio-only format on CD. Audio CDs can be read by CD players as well as computers. An audio CD can store up to 74 minutes and 30 seconds of sound, so longer interviews will have to be divided into two parts. If the file is saved in a data format (CD-R), the files can only be read by a computer, not a CD player. Since data CDs can hold 650MB, they can hold twice the recording time as an audio formatted CD, since at recording settings for CD-quality audio in mono, one hour of recording time creates a file of approximately 300MB. Archival quality DVDs can hold up to 4.7GB of data, but some digital archivists have expressed concern that DVDs are a less stable format than CDs, since a different chemical process is used to bind the layers of the DVD. External hard drives, available in a range of storage capacities, are also a good solution, especially for individuals or programs who do not have preservation support from an institutional repository. An external drive connects to a PC via USB or FireWire and usually comes packaged with back-up software that facilitates the daily creation of backup copies manually or automatically. An advantage to having all the files for a project on one external drive is that it makes transporting and downloading large amounts of data easier than having to burn multiple CDs. It is still advisable, however, that any oral history program that goes digital should pursue a repository relationship with an archive that can provide professional preservation and support, since the ideal method of backing up digital files is on a dedicated server with digital preservation protocols in place.

Following these steps will ensure you have adequate copies for use and preservation:

  • Upload master audio files (WAVE or Broadcast WAVE format) to a computer hard drive or server. If recorded on a compact flash card, insert the card into the appropriate computer port or use a flash card reader if your computer is older and doesn't have the right port. You can also connect the recorder to the computer USB port with an I/O cable. Name the files as described above.
  • Burn two preservation copies (1 audio CD-DA and 1 data CD-R) of the master file in .wav format to MAM-A (Mitsui) gold archival discs. For non-preservation purposes such as listening, transcription and distribution to interviewees, use CD burning or sound editing software to derive .mp3s and write to cheaper, non-archival audio CDs. Keep one preservation audio CD and one use audio CD onsite. If you have an agreement with an archival repository, send a preservation data CD for cataloging and preservation. If not, make several back-up copies.
  • After the master audio file has been transferred and safely backed up, erase and reuse the flash card or other storage media. Label write-only (non-reusable) recording media on the inside clear ring of the disk only, with felt tip pens approved for CD/DVD labeling. Do not use adhesive labels. For non-preservation copies, which are used by or sent out to the public, CDs or DVDs can be silkscreened with program information (logo, contact info, copyright info, etc.) and remaining identifying information filled in with CD/DVD labeling pens.  The National Institute of Standards and Technology (NIST) has an online publication called "Care and Handling of CDs and DVDs: A Guide for Librarians and Archivists." (see pages 21 through 26 for guidelines on labeling). Store the media in a stable environment away from excessive heat, light, and humidity. For audio tapes, be sure to limit access to the original and only use copies for transcribing and listening.
5. Records management

Each interview should be tracked using paper documents and a database. The interviewer should submit the following paper documents to be kept on file for each interview:

  1. Interview cover sheet and checklist
  2. Interviewee life history form
  3. Proper name form
  4. Field notes
  5. Release form
  6. Transcript or recording index (if interview will not be transcribed)
  7. Any additional documentation provided by the interviewee, such as resume or CV, photographs, or memorabilia.
Sample forms are available at:

An index card or database record can then created for each interview, using the information on the interview coversheet.

6. Preservation guides
Of course, creating preservation and use copies on optical media is only the beginning of effective longterm digital preservation. Cornell University's Lab of Ornithology offers a very comprehensive example of state-of-the-art audio digitization and preservation methods that can be applied to oral history as well as field recordings of birdcalls. The Field Audio Collection Evaluation Tool (FACET) ] is a very useful, point-based, open-source software tool that assesses and ranks audio field collections based on preservation condition, including the level of deterioration they exhibit and the degree of risk they carry. Additional resources on digital preservation best practices are listed below. Additional resources on digital preservation best practices are listed below.
TOP | HOME | NEXT