Media Asset Management

Media asset management systems are database systems for storing and retrieving audio, video, and still images. According to a Frost & Sullivan report, the U.S. media asset management market will grow from $152 million in 1998 to $2.58 billion in 2004.


The Inadequacy of Text Descriptions

Currently, users of media asset management systems rely on text descriptions to locate audio and video clips. Human catalogers are burdened with the task of describing in words each clip in the database. Without a text description, a recording may be unrecoverable, lost in a vast sea of material, possibly never to be played again.

Sounds are extraordinarily difficult to describe in words. Text descriptions are inherently subjective and almost always inadequate. People use different words to describe the same sound (e.g., honk, beep, toot). Some words apply to such a wide variety of sounds as to be almost useless, such as ring and crash. Descriptions are often incomplete, such as Halloween party and accident scene; these recordings may contain many useful sounds, but unless someone searches for one of the words, Halloween, party, accident, or scene, these sounds will never be retrieved.

Onomatopoeia is the formation of words to imitate sounds; examples include boing, buzz, crunch, hiss, pop, screech, thud, and twang. Catalogers have raised onomatopoeia to an art form in desperate attempts to describe sounds. Here are some actual descriptions given to sounds in one commercial sound effects library: kablam, gedunk, kabong, quick zing, heavy zonk, laser whooshes, pingy wobbles, whirling whippy swishes, and our favorites, bowang, wiggle bowang, and rising wiggle bowang. Words simply cannot describe the range of sounds we hear.

Describing the source of a sound is far easier than describing the sound itself, and many catalogers resort to this approach. Most of us know the sounds of the following: Honda Accord idling, several coins dropped on a tile floor, and roller coaster passing by. Source descriptions are much less useful if we are unfamiliar with the sounds produced by the source, for example, llama vocalizing, slab of steel emerging from a furnace, and water lock gates opening. Moreover, the source of a sound is often unknown.


The Needs of Sound Designers

No one has a greater need to locate sounds than sound designers, creative professionals who incorporate sound effects with dialog and music to produce sound tracks for feature films and television programs. Sounds are the primary concern of sound designers, and they care little about the source of sounds. In fact, sound designers have made a science out of fooling the listener's ear.

Robert L. Mott discusses this subject in his book, Sound Effects: Radio, TV, and Film (Focal Press, 1990). He is a sound designer with 40 years of experience at CBS and NBC. He recommends using coconut half-shells to create the sound of a horse's hoofbeats; a piece of cellophane to create a crackling fire; a cork dipped in kerosene and rubbed on glass for a chattering monkey or squealing rat; and buckshot rolled slowly on a bass drum for the sound of the surf. He tells of how a single recording of an African waterfall, when played at different speeds, has been used convincingly to create the sounds of printing presses and atomic bomb explosions.

Fictional characters provide a wonderful opportunity for creativity. In the Star Wars movies, the voice of Chewbacca came from a walrus. In the movie, Jurassic Park, a power saw was used for the sound of the T-Rex. Reportedly, the sound of the tornado in the movie, Twister, includes a lion's roar. For the 1998 version of Godzilla, sound designers spent a year developing the sound of the monster's roar, which combined musical instrument and animal sounds with the original roar from the 1950s Japanese films.

One goal of sound design is to meet the expectations of listeners. As Mr. Mott explains, "A sound's origin is not as important as the listener's expectation of how something should sound. . . In films and television, many natural sounds do not meet everyone's expectations. When this happens, they are either replaced with more suitable sounds or the natural sound is layered (other sounds are added) to make it more desirable." He relates how critics panned a show that used the actual sound of a .38 pistol for the gunshots fired from Humphrey Bogart's .38 pistol: "The American audience was accustomed to hearing big, booming gunshots such as those characteristic of John Wayne westerns (and done in post production)! Of course, Humphrey Bogart's gunshots, despite their natural sound, came off sounding like an 'anemic' cap pistol."

Sound designers choose sounds based on their sound qualities, not their source. Text descriptions should have little or no influence on the selection process. Mr. Mott encourages sound designers to "disassociate the names of the sounds with the sounds themselves."


The Comparisonics Solution

The Comparisonics® sound-matching technology makes it possible to search audio and video by sound. A prototype sound is compared with the sounds in a database to find sounds that are similar to the prototype. The prototype may be created by mimicking a desired sound into a microphone or by using a sound synthesizer. More often, the prototype is a sound that has been retrieved from the database, located through either sound matching or text searching.

After the user specifies a prototype, the system returns a list of similar sounds, ranked by similarity to the prototype. The user may audition any of these sounds, and use any of them as a prototype in a subsequent query. By issuing a sequence of sound-matching queries, the user can explore the database. One sound designer called this process, "shopping for sounds." An audio database is a sound designer's palette; with the Comparisonics sound-matching technology, it is now accessible. Furthermore, audio retrieved from a database can be represented by its colored waveform display.

The Comparisonics technology matches sounds without regard to their source. For example, the sound of a truck idling may match a tiger's growl. Such a match is valuable to sound designers, and would never be discovered from text descriptions.

Audio collections without text descriptions can now be searched. If text descriptions are available, they can be helpful for finding prototypes.

The value of an audio or video archive is greatly enhanced when the archive can be searched effectively. Audio collections include sound effects, sample sounds, music, radio programs and commercials, speeches, voice mail, and audio resumés of voiceover talent. Video archives include films, television programs and commercials, music videos, instructional tapes, and stock video footage. The larger the collection, the greater is the need for search tools. Archives used in Hollywood are measured in terabytes.


 Home   Overview   Technologies   Applications   Sound Gallery   FindSounds.com   FindSounds Palette   About Us   Contact Us 

© 2009 Comparisonics Corporation