Derived from years of research in speech recognition, automatic translation and machine learning technologies, the Skype Translator is now being developed jointly by Skype and Microsoft Research teams. Combining voice and IM technologies with Microsoft Translator, and neural network-based speech recognition to deliver near real-time cross-lingual communication, Skype Translator allows people to connect in ways never before possible.

When Eric Rudder, then Microsoft’s chief research and strategy officer, and Peter Lee, head of Microsoft Research first saw demos of a research project designed to provide fluent, cross-lingual conversations between speakers of different languages, they knew the time had come to make this technology a reality.

The success of the team’s progress to date was recently displayed in a talk by Microsoft CEO Satya Nadella during a Code Conference. During Nadella’s conversation with Kara Swisher and Walt Mossberg of the Re/code tech website, he asked Gurdeep Pall to join him on stage. Pall, Microsoft’s corporate vice president of Skype, demonstrated for the first time publicly the Skype Translator app, with Pall conversing in English with German-speaking Microsoft employee Diana Heinrichs. The interaction underscored the steady advancement the team has achieved.

“We felt speech translation was a very natural evolution of the text-translation work we’ve been doing,” says Chris Wendt, programme manager of the Machine Translation team.

There have been attempts over the years, several within Microsoft Research, to demonstrate such aspects of translating human speech. But delivering something that is usable in real life, to fit the voice of different users and the nuances of different languages – all built at scale for Skype users – was considered a nearly impossible task.

Making Skype Translator available first on Windows 8 later this year as a limited beta required remarkable research advances in translation, speech recognition, and language processing, combined with contributions from Microsoft engineering and research teams.

The Machine Translation team has taken a One Microsoft approach to the challenge by using contributions from researchers and engineers working on Microsoft’s speech service. Additionally, the team developed a deep partnership with Skype’s designers and engineers, particularly the prototyping team led by Jonas Lindblom.

“We have these two fairly complex technologies coming together for the first time to provide this end-to-end user experience,” Menezes says.

Microsoft Research has been focused on machine translation for more than 10 years. Initial results came with translations for Microsoft’s product-support Knowledge Base. The technology became available for public use as the engine behind Bing Translator.

Along the way, Menezes and Microsoft colleagues addressed significant system and user-interface design challenges, including reducing latency and developing visual feedback so the translation system is continuously improving itself using user feedback.

“One big focus has been to scale up the amount and kinds of data that go into the machine-learning training of these systems,” Menezes says.

The application of conversational data was critical to the state-of-the-art conversational-speech models. The team had to develop new techniques to collect conversational speech data. One example, Menezes says, came from analysis of social-media posts.

“How people write on social media is not how they speak, but there’s some crossover of slang and related utterances that can help this system and make it contemporary,” he says.

There’s also the issue of disfluency, the difference between how people write and talk. When talking, people use lots of pauses and meaningless utterances that bridge gaps between their thoughts. Untangling such conversations requires lots of training. So does determining where sentence breaks occur. The sentence is the basic unit in translation, and without punctuation, it can be difficult to identify. The translator must also learn to segment out the speech input.

“That’s one of the things over the last year that my team’s been doing, resolving the mismatch between the way people talk and the way they write,” Menezes says.

It provided a glimpse into a future where language no longer need be a barrier

A critical test of this One Microsoft approach to developing Skype Translator occurred on October 25, 2012, in Tianjin, China, during a Microsoft Research computing conference. In a keynote address by Rick Rashid, then the worldwide head of Microsoft Research, he publicly debuted the speech-to-speech translation project.

“I made sure to include pauses after each sentence,” Rashid recalls, “so that the audience would have time to clearly hear the Mandarin version of what I was saying. This also meant there was plenty of time for the audience to react. I remember hearing some gasps from the front rows, along with general applause and approval from the audience.

“I think the demo was a clear harbinger of the arrival both of deep-neural-network speech recognition and real-time language translation,” Rashid says. “It provided a glimpse into a future where language no longer need be a barrier.”

Now, Wendt is one of those bringing this technology to the masses. He helps spearhead the One Microsoft approach to Skype Translator’s development, including working closely with Skype colleagues Lindblom, Daniel Nitsche, and Fredrik Furesjö. With help from Skype’s Steve Pearce and Redmond researcher Shamsi Iqbal, the team fine-tuned the interaction model for the experience.

“What I found interesting is how the dynamic between us and the technology changed with the introduction of a more natural way to communicate,” says Vikram Dendi, part of the Machine Translation team before becoming Lee’s technical and strategy adviser.

“What’s been fascinating is the willingness for the two parties trying to communicate with each other to work with the technology to help each other understand in the case of speech translation. Even in beta, this makes this technology very useful and usable.”

Sign up to our free newsletters

Get the best updates straight to your inbox:
Please select at least one mailing list.

You can unsubscribe at any time by clicking the link in the footer of our emails. We use Mailchimp as our marketing platform. By subscribing, you acknowledge that your information will be transferred to Mailchimp for processing.