Speech-synthesis technology is worth talking about

With applications beyond just video games, text-to-speech and voice-recognition have bright futures

This year marks the 25th anniversary of the well-known (at least in the geek world) computer flick "WarGames," wherein a 1980s computer wiz accidentally connects via modem to the WOPR (War Operation Plan Response) mainframe, a supercomputer designed to conduct World War III scenarios. In doing so, he kicks off a series of events that bring the world to the brink of nuclear holocaust. Perhaps you recall the various times when, even in the underground NORAD bunker, there were speakers connected that allowed the computer program Joshua to read aloud. "Shall we play a game?" is a line we'll never forget.

The concept of text-to-speech or speech synthesis -- that is, the artificial production of speech from text or from coded response text -- has been in development for quite some time. You might recall some of the latest attempts by Microsoft to provide text-to-speech, including Microsoft Sam, which is the default voice for Windows 2000 and XP. The Sam screen reader is designed to assist users who need to have the text read to them. It sounds very similar to Joshua in "WarGames." The voice is so synthesized and choppy, with a variety of reading glitches, that it's worthwhile to investigate replacement readers. With the release of Windows Vista came Microsoft Anna, a female voice that is meant to be smoother.

Behind the scenes is SAPI (Speech Application Programming Interface), developed by Microsoft to assist with both speech recognition and speech synthesis. SAPI, under various versions and revisions (currently SAPI 5.0), has been added in the Microsoft OS over the years. Applications such as Microsoft Office, Microsoft Agent, and the Microsoft Speech Server have benefited from it. In fact, SAPI is also used by Dragon Naturally Speaking, Adobe Reader, and others apps.

This technology has applications beyond the use of recorded and synthesized speech in video games. It's especially helpful in the enterprise for removing barriers for people with certain disabilities. In the event you have difficulty seeing the screen, due to a visual impairment or a reading disorder such as dyslexia, screen readers can be incredibly helpful. For individuals who are blind, the combination of speech synthesis with speech recognition can help overcome the visual barrier altogether.

AgoraNet and the Nemours Speech Research Laboratory are building on Microsoft's work with SAPI 5.0 to develop ModelTalker Voice Recorder. It allows people with ALS (amyotrophic lateral sclerosis, or Lou Gehrig's Disease) or other conditions to use a synthetic version of their own voice for communication, or to choose a voice best suited to represent them. The way it works is you record your voice or you have someone who has a voice similar to yours record theirs. The voice recordings are uploaded to a voice-generation site, and the voice, once complete, can be used with any system that is SAPI 5.0 compatible.

Of course, Microsoft's not alone in working to provide text-to-speech engines and tools -- though it does have an edge, given its lock on the desktop OS market. For example, a company called Wizzard Software, which has ties to both AT&T and IBM, offers programmers and businesses SDKs to further their work in speech technologies. One of the more impressive text-to-speech technologies the company has developed is Natural Voices, which adds a very realistic, almost human quality to its synthetic speech. A quick search reveals a slew of other text-to-speech engines on the market, such as VoiceMX Studio, A1 SpeechTRON, TTS Builder, NautralSoft, and Linguatecs Personal Translator 2008, to name a few.

Speech technologies have exciting applications on the horizon. Consider the ability to very quickly and efficiently design verbally oriented instruction systems that can be read to us. Combine that with the speech recognition software that allows persons to access systems to instructions and information, and you have a very powerful combination of next-generation tools on the cusp of being developed. It could bring dramatic changes to call centers, for starters: Companies could have the majority of regular inquiries handled with ease by computers, leaving humans to deal with more complex customer inquiries.

Join the newsletter!

Error: Please check your email address.
Show Comments