Siri grew out of a huge project inside the Pentagon’s Defense Advanced Research Projects Agency (Darpa), those people who previously gave you the internet and, more recently, a scheme to encourage people to develop driverless cars. Siri’s parent project, called Calo (Cognitive Assistant that Learns and Organizes) had $200m of funding and was the US’s largest-ever artificial intelligence project. In 2007 it was spun out into a separate business; Apple quietly acquired it in 2010, and incorporated it into its new phone.
CALO was an artificial intelligence project funded by the Defense Advanced Research Projects Agency (DARPA) under its Personalized Assistant that Learns (PAL) program. Its five-year contract brought together 300+ researchers from 25 of the top university and commercial research institutions, with the goal of building a new generation of cognitive assistants that can reason, learn from experience, be told what to do, explain what they are doing, reflect on their experience, and respond robustly to surprise
Since October, people have been buying and using Apple’s new iPhone 4S, which comes with a function called Siri – a “voice-driven assistant” which can take dictation, fix or cancel appointments, send emails, start phone calls, search the web and generally do all those things for which you might once have employed a secretary.
Siri isn’t just a “voice recognition” tool, though it can do that (so you speak some words and it turns them into text, and sends them as an email or text message). You can also ask it things such as: “How’s the weather looking tomorrow in London?” and it will come back with the forecast for London (“England”). It’ll do currency conversions or give stock prices. Or try asking it: “Why is the sky blue?” and, after a little thinking, the screen will show an explanation: “The sky’s blue colour is a result of the effect of Rayleigh scattering.” (There is more, but we all know about Lord Rayleigh’s work on molecular refraction in the troposphere, don’t we?)
When you ask or instruct Siri to do something, it first sends a little audio file of what you said over the air to some Apple servers, which use a voice recognition system from a company called Nuance to turn the speech – in a number of languages and dialects – into text. A huge set of Siri servers then processes that to try to work out what your words actually mean. That’s the crucial NLU part, which nobody else yet does on a phone.
Then an instruction goes back to the phone, telling it to play a song, or do a search (using the data search engine Wolfram Alpha, rather than Google), or compose an email, or a text, or set a reminder (possibly linked to geography – the instruction, “Remind me to call mum when I get home,” will work), or – boring! – call a number.
NLU has been one of the big unsolved computing problems (along with image recognition and “intelligent” machines) for years now, but we’re finally reaching a point where machines are powerful enough to understand what we’re telling them. The challenge about NLU is that, first, speech-to-text transcription can be tricky (did he just say, “This computer can wreck a nice beach,” or “This computer can recognise speech”?); and second, acting on what has been said demands understanding both of the context and the wider meaning.