Bookaroo note #1: Designing a Voice User Interface


I’m a designer of digital products. The past 20 years my work was primarily focussed on GUI – Graphical User Interfaces. Until now. With the rise of voice-controlled digital assistents from Amazon, Google and the others, a new way of communicating with digital products has become available: the VUI – the Voice User Interface.

We all know how to create conversations

And now I’m confused. I still like to think that designers like me should be the ones that decide how a VUI should behave; what it should say, and when. But are we even equipped for designing for voice? As it doesn’t involve any visual elements, it seems to boil down to designing pure conversations. And why would you need a designer for that? We all know how to create conversations, everybody has them all the time. So, what does it take to design a good VUI? Looking for anwers, I first stumbled on some dear memories from my own youth.

The internet’s predecessor

Many years ago, yet not so very long ago, people used telephones solely for talking to each other over distance. No apps, no displays, just that single simple purpose. You dialed a number and got connected to the person on the other side. And so you could reach virtually any person, anytime, from anywhere. To chat, share experiences, to place an order or to get information; The telephone was in many ways the internet’s predecessor. My grandmother worked at 008, which was not, as you might guess from the name, in any way related to the British Secret Intelligence Service. No, 008 once was the number you dialed in the Netherlands when you needed “information”. By information the Dutch telecom provider basically meant telephone numbers; you called this service when you needed to contact someone and didn’t have the number or address. There were other services too: you dialed 002 for the time, 003 for the weather forecast and so you dialed 008 to get someones phone number. These phone services were one of the very first to introduce Voice User Interfaces, or ‘talking computers’ as people usually called them those days. Before that, 008 employed human operators for the task of looking up the numbers. My grandmother was one of them. In her first years as an operator she needed to look up the requested numbers in phone books; the frequently requested numbers she knew by heart. In 1979 computers were introduced at 008 and from that time on the human operator used computers to retrieve the requested information.

A human operator as a proxy for the computer system. My grandmother used to tell me stories about people, mostly men, calling for other purposes than getting a telephone number. I remember one story about a man who had called and asked her for the time. My grandmother kindly told the man he had probably dialed the wrong number, that 002 was the number to get the time. The man had replied he knew what time it was, but that he wanted to ask at what time my grandmother would be off from work, and if she would care to join him for diner. My grandmother proudly told us that these male callers always kept inquiring about her age, because over the phone she apparently sounded like a twenty year old girl. While in fact she was in her late fifties, she had always kept them in the dark and played hard to get, just for the fun of sharing her adventures with her colleagues over lunch. And, years later, with her own grandchildren. The 008 ladies finally got replaced by voice-controlled computers in the late nineties. Computers that we could call and talk to, that got us information faster and cheaper than my grandmother could. As long as the caller kept to the script and didn’t ask for anything out of the ordinary, it was great technology. But I’ll tell you these computers were a lousy flirt compared to my grandmother.

Disfunctional relationships

It’s been about 20 years since computers took over from the 008 ladies and you would think voice recognition technology would have improved dramatically by now… Let’s look at this typical conversation between me and my car: Does any of this sound familiar to you? Or maybe I should just get a new car. But no, I’ve got a similar disfunctional relationship with Siri, Apple’s smart voice assistant. But I don’t blame Siri. It is still a kid. I don’t blame her just the way I don’t blame my own kids if they misinterpret my words or behave in a way that is not quite socially accepted. Siri, just like my kids, needs to learn by making mistakes. Human conversation appears to be so simple, but if you take a moment to think about it you’ll see that it is absurdly complex.

Siri is still a kid

I do blame my car by the way. Because my car is basically not so very different from the voice-controlled computer of 008 from the late nineties. It only knows a very limited set of commands and asks for very specific input at specific points during the interaction. That is no conversation at all. Siri on the other hand listens to everything we say and tries its best to interpret our intention. That, I think, is a remarkable step forward in techology. Even if it still often fails to grasp our true intention, it is unbelievable what it is already capable off. Human communication is complex because our language is so very ambiguous. Words have different meanings in different contexts, what we say literally is often not the same as what we mean. Humans are complex beings. We have tempers, become emotional, are always seeking self-confirmation. Even with the advanced state of technology of today, we are still a long way from real human-like conversation with computers. Watch this fragment from the movie HER.

Now that is a real conversation! It is interesting to see two types of VUI in this fragment: the first is the voice that guides Theodore through the setup of the OS, which is not so very different from the way we are used to interact with computers today. After initialisation this other VUI turns up: the very human-like and instantly intriguing Samantha. Until we have reached that level of conversational and emotional skill in voice user interfaces, let us restrict from using the term ‘conversational interface’. Today we are still in the infancy of the VUI, this is still the era of voice commands. Still, these VUIs need to be designed well in order for us humans to be able to interact with them confidently.

This is the era of voice commands

So I’m back to my original quest of seeking what it takes to design a good VUI. I have described what I think we can expect from a VUI today, how that is very different from 20 years ago, and in what direction we might expect this technology to evolve. But that doesn’t answer my question. So, in order to find out what skills you need to create a good VUI today, I will just need to create one myself. I have found a use case that seems perfectly fit for this: booking a meeting room at our office. I will call it Bookaroo (just add an m and you’ll see why). I will start with a rudimentary prototype that I’m planning to install in one of our meeting rooms to test and evaluate. In the next couple of months I will keep you updated on Bookaroo’s progress and my learnings about designing a VUI by posting these blogs. Hope you will like it!