Voice Recognition Technology? In a Lift? IN SCOTLAND?
Microsoft’s voice recognition technology, used in speech systems such as Cortana and Presentation Translator, has achieved its lowest error levels yet – but have they reached 11?
Microsoft has announced that its conversational voice recognition software has reached an error rate of 5.1%, the lowest so far and an industry milestone. This surpasses its error rate achieved in studies last year, which peaked at 5.9% of words. This latest test has put the tech on par with human transcribers, who have the luxury of listening to text several times. Sounds promising, but whether the technology will be able to take Scottish passengers to their destination has yet to be tested.
Transcription tests were conducted using the Switchboard Telephone Speech Corpus, which is a collection of around 2,400 recorded hours of two-sided phone calls between 543 speakers. The Corpus test has been used since its inception back in 1990 to test speech recognition systems.
According to Technical Fellow at Microsoft Xuedong Huang, reaching parity with humans on language has been a goal for the last 25 years at Microsoft. The software which could use this technology includes Cortana, Presentation Translator (which translates PowerPoints into different languages in real-time) and Microsoft Cognitive Services (various toolkits for improving the accessibility of computers).
Xuedong Huang said: “Many research groups in industry and academia are doing great work in voice recognition, and our own work has greatly benefitted from the community’s overall progress.
“While achieving a 5.1 percent word error rate on the Switchboard speech recognition task is a significant achievement, the speech research community still has many challenges to address, such as achieving human levels of recognition in noisy environments with distant microphones, in recognizing accented speech, or speaking styles and languages for which only limited training data is available.
“Moreover, we have much work to do in teaching computers not just to transcribe the words spoken, but also to understand their meaning and intent. Moving from recognizing to understanding speech is the next major frontier for speech technology.”
Sincere congratulations to Microsoft. We look forward to shorter, simpler trips in lifts from this point onwards.