In the past two to three years, speech recognition was actually improving a lot, benefiting from big data and deep learning to train its neural networks to produce faster, more accurate results. So Stanford researchers decided to formally test it against humans
Steam machines versus Human
During the 1800s, railroads started to snake across the U.S., and bands of men would smooth out the land by driving stakes into rock with a big ole hammer (and then filling the holes with explosives). John Henry, an African American, was supposed to be the biggest — in spirit, in appetite, in the bulging of biceps — and best driver of all. When companies started to employ steam-powered drills to make better time, Henry decided to challenge one to a race. He won but, tragically, died of exhaustion following his miraculous feat. The story is based in fact, but the details change with the telling
21st Century texting competition
The research team, which included computer scientists from Stanford, Baidu Inc. and the University of Washington, devised an experiment that pitted Baidu’s Deep Speech 2 cloud-based speech recognition software against 32 texters, ages 19 to 32, working the built-in keyboard on an Apple iPhone.
“They grew up texting, so we’re putting speech recognition up against people who are really good at this task,” Landay said.
The subjects took turns typing or speaking about 100 phrases sourced from a standard library of everyday phrases used in text-based research – phrases such as “physics and chemistry are hard,” “have a good weekend” and “go out for some pizza and beer” – while the testing app recorded their times and accuracy rates.
The results were clear no matter the language. For English, speech recognition was three times faster than typing, and the error rate was 20.4 percent lower. In Mandarin Chinese, speech was 2.8 times faster, with an error rate 63.4 percent lower than typing.
With speech recognition it is more of a dictation competition.
Arxiv – Speech Is 3x Faster than Typing for English and Mandarin Text Entry on Mobile Devices
Abstract
With laptops and desktops, the dominant method of text entry is the full-size keyboard; now with the ubiquity of mobile devices like smartphones, two new widely used methods have emerged: miniature touch screen keyboards and speech-based dictation. It is currently unknown how these two modern methods compare. We therefore evaluated the text entry performance of both methods in English and in Mandarin Chinese on a mobile smartphone. In the speech input case, our speech recognition system gave an initial transcription, and then recognition errors could be corrected using either speech again or the smartphone keyboard. We found that with speech recognition, the English input rate was 3.0x faster, and the Mandarin Chinese input rate 2.8x faster, than a state-of-the-art miniature smartphone keyboard. Further, with speech, the English error rate was 20.4% lower, and Mandarin error rate 63.4% lower, than the keyboard. Our experiment was carried out using Deep Speech 2, a deep learning-based speech recognition system, and the built-in Qwerty or Pinyin (Mandarin) Apple iOS keyboards. These results show that a significant shift from typing to speech might be imminent and impactful. Further research to develop effective speech interfaces is warranted.
SOURCES- Stanford University, Arxiv
Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.
Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.
A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.