Media Asset Management
Media asset management systems are database systems for storing and retrieving audio,
video, and still images. According to a Frost & Sullivan report, the U.S. media asset
management market will grow from $152 million in 1998 to $2.58 billion in 2004.
The Inadequacy of Text Descriptions
Currently, users of media asset management systems rely on text descriptions to locate
audio and video clips. Human catalogers are burdened with the task of describing in words
each clip in the database. Without a text description, a recording may be unrecoverable,
lost in a vast sea of material, possibly never to be played again.
Sounds are extraordinarily difficult to describe in words.
Text descriptions are inherently subjective and almost always inadequate. People use
different words to describe the same sound (e.g., honk, beep, toot). Some words
apply to such a wide variety of sounds as to be almost useless, such as ring and
crash. Descriptions are often incomplete, such as Halloween party and
accident scene; these recordings may contain many useful sounds, but unless
someone searches for one of the words, Halloween, party, accident, or scene,
these sounds will never be retrieved.
Onomatopoeia is the formation of words to imitate sounds; examples include
boing, buzz, crunch, hiss, pop, screech, thud, and twang. Catalogers have
raised onomatopoeia to an art form in desperate attempts to describe sounds. Here are
some actual descriptions given to sounds in one commercial sound effects library:
kablam, gedunk, kabong, quick zing, heavy zonk, laser whooshes, pingy wobbles,
whirling whippy swishes, and our favorites, bowang, wiggle bowang, and
rising wiggle bowang. Words simply cannot describe the range of sounds we hear.
Describing the source of a sound is far easier than describing the sound itself,
and many catalogers resort to this approach. Most of us know the sounds of the following:
Honda Accord idling, several coins dropped on a tile floor, and
roller coaster passing by. Source descriptions are much less useful if
we are unfamiliar with the sounds produced by the source, for example,
llama vocalizing, slab of steel emerging from a furnace, and
water lock gates opening. Moreover, the source of a sound is often unknown.
The Needs of Sound Designers
No one has a greater need to locate sounds than sound designers, creative
professionals who incorporate sound effects with dialog and music to produce sound tracks
for feature films and television programs. Sounds are the primary concern of sound
designers, and they care little about the source of sounds. In fact, sound designers have
made a science out of fooling the listener's ear.
Robert L. Mott discusses this subject in his book, Sound Effects: Radio, TV, and
Film (Focal Press, 1990). He is a sound designer with 40 years of experience at CBS
and NBC. He recommends using coconut half-shells to create the sound of a horse's
hoofbeats; a piece of cellophane to create a crackling fire; a cork dipped in kerosene and
rubbed on glass for a chattering monkey or squealing rat; and buckshot rolled slowly on a
bass drum for the sound of the surf. He tells of how a single recording of an African
waterfall, when played at different speeds, has been used convincingly to create the
sounds of printing presses and atomic bomb explosions.
Fictional characters provide a wonderful opportunity for creativity. In the
Star Wars movies, the voice of Chewbacca came from a walrus. In the movie,
Jurassic Park, a power saw was used for the sound of the T-Rex. Reportedly,
the sound of the tornado in the movie, Twister, includes a lion's roar. For the
1998 version of Godzilla, sound designers spent a year developing the sound of the
monster's roar, which combined musical instrument and animal sounds with the original
roar from the 1950s Japanese films.
One goal of sound design is to meet the expectations of listeners. As Mr. Mott explains,
"A sound's origin is not as important as the listener's expectation of how something
should sound. . . In films and television, many natural sounds do not meet
everyone's expectations. When this happens, they are either replaced with more suitable
sounds or the natural sound is layered (other sounds are added) to make it more
desirable." He relates how critics panned a show that used the actual sound of a
.38 pistol for the gunshots fired from Humphrey Bogart's .38 pistol:
"The American audience was accustomed to hearing big, booming gunshots such as those
characteristic of John Wayne westerns (and done in post production)! Of course, Humphrey
Bogart's gunshots, despite their natural sound, came off sounding like an 'anemic' cap
pistol."
Sound designers choose sounds based on their sound qualities, not
their source. Text descriptions should have little or no influence on the
selection process. Mr. Mott encourages sound designers to "disassociate the names
of the sounds with the sounds themselves."
The Comparisonics Solution
The Comparisonics® sound-matching technology makes it
possible to search audio and video by sound. A prototype sound is compared with the sounds
in a database to find sounds that are similar to the prototype. The prototype may be
created by mimicking a desired sound into a microphone or by using a sound synthesizer.
More often, the prototype is a sound that has been retrieved from the database, located
through either sound matching or text searching.
After the user specifies a prototype, the system returns a list of similar sounds, ranked
by similarity to the prototype. The user may audition any of these sounds, and use any
of them as a prototype in a subsequent query. By issuing a sequence of sound-matching
queries, the user can explore the database. One sound designer called this process,
"shopping for sounds." An audio database is a sound designer's palette; with
the Comparisonics sound-matching technology, it is now accessible. Furthermore, audio
retrieved from a database can be represented by its
colored waveform display.
The Comparisonics technology matches sounds without regard to their source.
For example, the sound of a truck idling may match a tiger's growl. Such a match is
valuable to sound designers, and would never be discovered from text descriptions.
Audio collections without text descriptions can now be searched. If text descriptions are
available, they can be helpful for finding prototypes.
The value of an audio or video archive is greatly enhanced when the archive can be
searched effectively. Audio collections include sound effects, sample sounds, music,
radio programs and commercials, speeches, voice mail, and audio resumés of voiceover
talent. Video archives include films, television programs and commercials, music videos,
instructional tapes, and stock video footage. The larger the collection, the greater is
the need for search tools. Archives used in Hollywood are measured in terabytes.
Home
Overview
Technologies
Applications
Sound Gallery
FindSounds.com
FindSounds Palette
About Us
Contact Us
|