The speech age

Researchers at MIT have developed a new approach to training speech recognition systems that does not depend on transcriptions – as is the current model. Instead, their system analyses correspondences between images and spoken descriptions of those images, as captured in a large collection of audio recordings. The system then learns a mapping between acoustic features of the recordings correlated with image characteristics.

Traditionally speech recognition systems such as those that convert speech to text on smartphones are the result of machine learning systems that go over many thousands of utterances and their transcriptions to learn a mapping between acoustic features and words. While this method works quite well, the requirement of professional grade transcription is costly and time-consuming. For this reason, speech recognition is usually limited to a few languages.

The goal of the new approach is to try and get machine learning to learn a language in a similar way to how humans do. So while a lot of progress has been made by Siri, Google and many others, this progress has only been recorded with the major languages of the world. Out of around 7,000 languages, we can only really have decent speech recognition for about two per cent.

Further investigation showed that the complex mapping obtained with Deep Neural Networks (a machine learning method loosely inspired by connections between brain neurons) is able to find text terms associated with similar clusters of images. Something like ‘storm’ and ‘clouds’ could be inferred to have related meanings. The system is therefore able to infer meaning from the data it has, and this could be useful in applying the technology for speech translation.

The global speech recognition market is expected to grow, with a market size of around $13 billion by 2012 from 2016, and is mainly driven by growing mobile banking applications and adoption of mobile and cloud-based technology. The growing adoption of automated and smart applications in consumer electronics and healthcare industries act as the main contributor of growth of the speech recognition market.

The expectation over the next few years is to have seamless hands-free operation, useful for many applications like dictation, vehicle operation and conducting chores around the house. Speech recognition removes the need for typical user interfaces, mice and keyboards.

Interaction will be more natural and allows open access to computing facilities for millions of people with disabilities who find the current traditional interfaces restrictive.

In the end, the hope is to be able to interact with machines as we do with our fellow humans, and speech interfaces will be all the rage in the coming decade.

Did you know!

• Most Artificial Intelligence (AI) is ‘female’ – you will notice that when interacting with AI like Google Now, Siri and Cortana, the default voice is female. There is no particular reason for this except that studies have shown ‘female’ voices are liked more. Another reason is that it is mostly men who work on speech technology and think of their AI creation as attractive if ‘female’.

• AI can predict the future – Nautilus is one of the most powerful supercomputers in the world. Nautilus foresaw where Bin Laden was hiding and the beginning of the Arab Spring. Nautilus analyses more than 100 million articles from 1945 to the present day and gives ‘predictions’.

• AI can ‘love’ – while it is possible for humans to be in a sexual/romantic relationship with AI with robot-toys, from an emotional aspect you are more likely to see AI love in science fiction.

For more trivia see: www.um.edu.mt/think

Sound bites

• Artificial Intelligence can identify skin cancer: A new Artificial Intelligence system can spot the tell-tale signs of skin cancer just as accurately as human doctors, say researchers, and the next step is to get the tech on a smartphone, so anyone can run a self-diagnosis. Once the system is refined further and becomes portable, it could give many more people the chance to get screened with minimal cost, and without having to wait for an appointment with a doctor to confirm the symptoms.

http://www.sciencealert.com/ai-is-now-as-good-as-the-experts-at-spotting-skin-cancer/

• Elon musk sees brain-computer systems in the future: In a recent tweet, the Tesla and SpaceX CEO teased that a brain-computer system that links human brains to a computer interface – a ‘neural lace’ – may be announced early this year. Musk first mentioned the neural lace concept (the addition of a digital layer of intelligence to the human brain) at Recode’s Code Conference last year. The brain-computer system would create a “symbiosis with machines,” Musk said.

http://www.livescience.com/57632-elon-musk-neural-lace-artificial-intelligence.html

• For more soundbites, listen to Radio Mocha on Radju Malta 2 every Monday at 1pm and Friday at 6pm.

Did you know!

Sound bites

Sign up to our free newsletters