When AT&T; announced in late April 2012 that by June it would open up its speech recognition application programming interfaces (APIs) to developers to “accelerate innovation,” most people didn’t realize this API development work had occurred at the company’s Florham Park and Middletown labs, a legacy of Bell Labs.
“The foundations for these tools are the state-of-the-art AT&T; Watsonspeech recognition technology services that researchers at AT&T; Labs have been developing and refining for years,” said John Donovan, senior executive vice president and manager of the labs, in a blog post.
Donovan said the labs had filed more than 600 patents in the area. The company already uses the Watson services for various applications offered to its own users, such as mobile voice directory search with YPmobile (Yellow Pages mobile) and voicemail-to-text.
Open APIs at AT&T; is a relatively new initiative. The APIs will be network-specific, interfacing only with AT&T;’s network, and cloud-based. Back in September 2011, Fast Company described the decision to open up APIs as “a bold choice for AT&T.;” In addition to these APIs, the telecom company is also providing developers a software development kit (SDK), which they can use to create software capturing a user’s spoken words and sending them to a network for transcription.
“We know the best way to accelerate innovation is by opening our platforms and network services to outside developers,” Donovan wrote in his post. The first Watson speech APIs to be released will focus on seven areas: web search, local business search, question and answer, voicemail-to-text, SMS, a U-verse electronic programming guide and a dictation API for general use in speech recognition.
On its way in June will be a language translator app created at AT&T; labs by N.J. developers Srinivas Bangalore and Mazin Gilbert. Working with automatic speech recognition, text-to-speech and Watson speech recognition technology, the app takes the spoken or written language, translates it and provides the information in a second language selected by the user.
The API should provide real competition for some N.J. startups, such as SpeechTrans (Lyndhurst) and iSpeech (Newark), which are working in this area. Nuance (Burlington, Mass.), on which SpeechTrans apps are based, has been in the speech recognition business for many years, with its familiar Dragon dictation products and its own developer program. However, often when a large, more established company jumps into a space formerly occupied by only startups, it validates the smaller companies’ market.
AT&T; said more speech APIs are on their way, including some that interface with gaming and social media.