Smartphone speech recognition software is not only three times faster than human typists, it’s also more accurate. The researchers hope the revelation spurs the development of innovative applications of speech recognition technology.
In the past two to three years, speech recognition was actually improving a lot, benefiting from big data and deep learning to train its neural networks to produce faster, more accurate results. So Stanford researchers decided to formally test it against humans
Steam machines versus Human
During the 1800s, railroads started to snake across the U.S., and bands of men would smooth out the land by driving stakes into rock with a big ole hammer (and then filling the holes with explosives). John Henry, an African American, was supposed to be the biggest — in spirit, in appetite, in the bulging of biceps — and best driver of all. When companies started to employ steam-powered drills to make better time, Henry decided to challenge one to a race. He won but, tragically, died of exhaustion following his miraculous feat. The story is based in fact, but the details change with the telling
21st Century texting competition
The research team, which included computer scientists from Stanford, Baidu Inc. and the University of Washington, devised an experiment that pitted Baidu’s Deep Speech 2 cloud-based speech recognition software against 32 texters, ages 19 to 32, working the built-in keyboard on an Apple iPhone.
“They grew up texting, so we’re putting speech recognition up against people who are really good at this task,” Landay said.
The subjects took turns typing or speaking about 100 phrases sourced from a standard library of everyday phrases used in text-based research – phrases such as “physics and chemistry are hard,” “have a good weekend” and “go out for some pizza and beer” – while the testing app recorded their times and accuracy rates.
The results were clear no matter the language. For English, speech recognition was three times faster than typing, and the error rate was 20.4 percent lower. In Mandarin Chinese, speech was 2.8 times faster, with an error rate 63.4 percent lower than typing.
With speech recognition it is more of a dictation competition.
With laptops and desktops, the dominant method of text entry is the full-size keyboard; now with the ubiquity of mobile devices like smartphones, two new widely used methods have emerged: miniature touch screen keyboards and speech-based dictation. It is currently unknown how these two modern methods compare. We therefore evaluated the text entry performance of both methods in English and in Mandarin Chinese on a mobile smartphone. In the speech input case, our speech recognition system gave an initial transcription, and then recognition errors could be corrected using either speech again or the smartphone keyboard. We found that with speech recognition, the English input rate was 3.0x faster, and the Mandarin Chinese input rate 2.8x faster, than a state-of-the-art miniature smartphone keyboard. Further, with speech, the English error rate was 20.4% lower, and Mandarin error rate 63.4% lower, than the keyboard. Our experiment was carried out using Deep Speech 2, a deep learning-based speech recognition system, and the built-in Qwerty or Pinyin (Mandarin) Apple iOS keyboards. These results show that a significant shift from typing to speech might be imminent and impactful. Further research to develop effective speech interfaces is warranted.
SOURCES- Stanford University, Arxiv