Systems Analysis and Design, S19-2: Alexa AI scientists reduce speech recognition errors up to 22% with semi-supervised learning

Sunday, April 14, 2019

Alexa AI scientists reduce speech recognition errors up to 22% with semi-supervised learning

With virtual assistants such as Amazon's Alexa, and Google's Home Assistant on the uptick, and older platforms such as Siri and Cortana still in contention, the era of virtual assistants is currently "in" and more powerful than ever before. These assistants are extremely beneficial to all groups of people, young vs. old, and tech savvy or not. A technological invention like virtual assistants unfortunately does not turn out as successful as it did without some road bumps. The road bump for Amazon's Alexa specifically, is speech recognition, and specifically errors in speech recognition.

Alexa's ability (or lack there of) to understand the human voice can be traced to a multitude of factors. Simple factors such as the use of different accents, lisps, and volumes have been historically known to render the virtual assistant unresponsive, if not extremely slow. More deeper problems can be associated with how Alexa handles (and how Amazon programs) the nuances of speech recognition.

The creation of speech recognition largely uses machine learning and "labeling" of data to train and "improve the intelligent assistant’s ability to understand the human voice" according to the article. Amazon also additionally used semi-supervised learning, a method that combines human and machine labeling of the data used to train artifical intelligence models, Amazon scientists were able to train a model and reduce speech recognition error rates by 10-22% compared to methods that rely solely on supervised learning (basically only using EITHER human or machine capabilities, not both).

Greater gains in speech recognition error reduction were seen with the elimination of simple factors such as noisy audio as explained above. "These advances in Alexa’s ability to understand the human voice were achieved through a method using long-short term memory (LSTM) networks called teacher-student training. The “teacher” is trained to understand 30-millisecond chunks of audio and then transfers some of that understanding to a “student” network that uses the unlabeled data."

Overall, speech recognition is obviously an integral part of how we use our various plethora of devices today. Testing them to make sure they are as flawless as can be is a key step to making them viable for future, long-term use. With that, here are three questions for you all to answer in regards to speech recognition:

Questions:
-Do you believe speech recognition and virtual assistants/devices will be on the market in 5 years? 10 years? Do you believe that it is a product that customers will still be buying?

-Is there a USP (unique selling point) specific to virtual assistants/speech recognition devices like Alexa, and Google Home or is planned obsolesence of these products inevitable?

-If planned obsolesence DOES end up occuring with these devices, what sort of product do you think will replace them? What does the product of the future have in store for companies like Amazon and Google?

-Do you find that Alexa/Home/other virtual assistants have trouble understanding you, or are slow to respond to you?

LINK TO ARTICLE: https://venturebeat.com/2019/04/04/alexa-ai-scientists-reduce-speech-recognition-errors-up-to-22-with-semi-supervised-learning/

Systems Analysis and Design, S19-2

Pages

Sunday, April 14, 2019

Alexa AI scientists reduce speech recognition errors up to 22% with semi-supervised learning

No comments:

Post a Comment