The world of voice UX design is exploding as we’re increasingly talking to our mobile devices, from personal assistants like Siri to smart speakers like Amazon Echo. Earplay’s team – including experienced writers and game designers Jonathan Myers, CEO, and Dave Grossman, CCO – is taking voice UX design to a new level. Their stories for mobile devices use interactive conversation and simple voice commands that create an eyes-free, hands-free interactive experience.
The Emotional Power of Sound Brings People Into the Story
The team’s first release was Codename Cygnus. “The goal was to take the powerful emotional intimacy of audio, the same one that caused widespread panic when War of the Worlds was first broadcast, and enhance that experience so that you can imagine yourself inside the story, connecting with the other characters,” explains Myers.
Approaching UX Design Like Creatives, Not Engineers
Voice UX designers are often engineers focused on speech recognition. As designers and writers, Earplay takes a different approach. “Simpler design and better user guidance in the prompting and solicitation of speech may work just as well, or better, without needing the excess processing of language,” says Myers.
High-quality audio experiences start with your voice selection. “A recorded human voice is going to provide a better and more intimate interaction for someone whose sole experience is listening and responding. A well-directed voice actor captures the right nuance and subtext so that the meaning, guidance, and solicitation of speech from a user becomes crystal clear.”
File quality is essential, and having the right creative and tech tools lets the Earplay team deliver a world-class audio experience. “By far, our most significant use is with Adobe Audition CC for sound file editing, cleaning and processing.”
Advanced Tips from Interactive UX Story Design
According to Grossman, the design process starts with mapping the story and then goes through highly iterative stages of story refinement, writing, recording, and extensive testing. “What should a user interface be like when it needs to be presented entirely by audio, and how should it distinguish itself from the rest of the audio that the audience is hearing? Or, how can it blend in effectively and still do its job?”
The writing and extensive testing with audiences are key. “You have to pay attention to how you word things when you set it up, because the phrases being listened for can’t sound overly similar to one another – and what the software finds ‘similar’” is sometimes not the same as what a human being would,” advises Grossman.
It’s also important to think carefully about what audiences need to say in order to prompt action. “Keep your voice intents short enough that the audience can remember them, but long enough to be easily distinguishable,” Grossman suggests. “Develop a strategy for prompting the audience for input and make sure that strategy is unambiguous. For instance, if you have characters addressing the audience directly with questions they’re expected to answer, don’t also give the characters any other dialog that could be mistaken for a question.”
Managing Parallel Audio and Voice UXs
An important design challenge is creating a parallel visual UX that supports, yet doesn’t detract from the voice UX. Earplay’s designers use Adobe Creative Cloud tools such as Photoshop and Illustrator for their visual user interface design.
“When you design the voice UX for a mobile app, anything that draws attention to the screen during the experience is probably an unnecessary distraction that detracts from the total experience. A good portion of that experience is taking place inside the mind of the user, so you’re directing their attention away from what’s important if you focus them on the screen,” says Myers.
Ultimately, the team believes that a pared down visual UX is critical unless it adds to the functional voice experience. Myers adds, “It’s best not to provide anything beyond functional audio player buttons or return to menu options. The one exception is that we offer the ability to tap their speech decisions after a prompt when they’d rather not speak aloud. For example, in a public place they may want to keep silent.”