Having planned and built a prototype skill for Alexa, which taps into data and recommendations from our Pulse product, we gathered a number of learning along the way. You can read up on the technical background in our previous post “Alexa, how can I build a prototype?” and on the business background in “Alexa, what are my KPIs?”
Support your users in learning a new technology
Our first learning when testing our skill was that supporting and guiding users through a voice service is key. Even though NLP has come a long way, systems are unable to understand a wide variety of phrases or natural dialogue like a human person would, primarily because they are lacking context. Chaining multiple commands is not possible, for example “turn off the lights, tell me joke and then call a taxi to my location”. Communication with voice assistants requires – at the moment – a certain structure, as shown above. A wake word is always necessary, as is a launch word and a skill/action name – exactly in that order.
To enable users to complete their goals as quickly as possible, provide sample phrases and never punish users with error messages. Instead, query missing information by asking follow-up questions. Consumers are learning how services need to be queried over time, give them time to learn and help them on their way.
And do test a service on your target group. Collect new and refine existing utterances. Generating a set of phrases for an intent is not a task for a single developer. It is a group effort involving different skill sets, and end-user research.
Design your conversations to be heard, not read.
A key takeaway from our prototype is that written chatbot responses can, and arguably should, contain detailed numbers and data. For a voice system, however, we recommend to round numbers and keep interactions short, increasing understandability by reducing complexity and level of detail. Rather have users request more details afterwards than provide a lengthy response with an overall low relevance to users. Simply transferring chatbot responses to a voice system will result in a bad user experience. For our prototype, a more detailed list of KPIs can be sent via email, or as a small information card with precise numbers and changes visible in the Alexa app on a user’s mobile device.
You should employ rhetorical tools such as landmarking and chunking. Landmarking, for example, provides known information fragments to users first, followed by new information: “What’s in my calendar at 8?” – “At 8 you have a meeting with Liz” – instead of “There is a meeting with Liz at 8”. This makes it easier to digest a response.
Overall, voice interface design should be considered a specialization of design, and CareerFoundry in Berlin offers study courses in this field. Amazon also offers excellent documentation on voice design and best practices.
Consider the enterprise environment
As Pulse is a corporate tool at its core, and our prototype is aiming at corporate users, a major drawback is a lack of enterprise programs for skills/actions. Unlike native mobile apps, for example, distribution within a closed group of users is not possible. While beta programs are available with Alexa, they are time limited to 90 days. And any skill not in development is publicly visible and accessible to everyone.
This does not mean public access to KPIs, as skills can require a user login. In our prototype, Pulse acted as an OAuth identity provider, storing an access token with Alexa or in a service database tied to an anonymous, unique user ID. Thus, users without Pulse credentials are unable to use our voice assistants or chatbots.
Another issue could be that corporations prohibit certain KPIs to leave their organization or own IT. It would not be allowed to send these KPIs through Google or Amazon, which is a necessity stemming from their ecosystem architecture. Such sensitive KPIs must be filtered out and blocked by the business logic.
And, last not least, virtual assistants may pose a privacy issue. While technically no voice data is transmitted without a wake word, there is uncertainty whether this functionality is actually and properly implemented, or could be remotely circumvented. As mentioned above, no external code is run on Google’s or Amazon’s hardware, reducing risk of hacking. For developers there is no access to audio recordings of user utterances. And no exploit or breach has been published so far. Still, when introducing virtual assistants in corporations, it is strongly recommended to take employee sentiments into consideration as part of a change management.
What does the future hold?
Overall, we believe that while the voice hype is starting now, it will plateau in 2019/2020 when technology has further improved, more robust services are available, and assistants can be controlled by natural, less structured speech patterns. Home automation will likely remain a main driver for adoption.
Most certainly we will see that kids and teenager grow up with voice systems, letting them interact naturally via speech. Similar to the previous age group that now considers every screen to be touch enabled, future generations will speak to inanimate objects without fear or apprehension. We already observe children talking and controlling Alexa devices today as if they were talking to a grandmother over a phone.
What are your experiences with, or expectations for, voice assistants – especially in a business context? What use cases can you think of? Let us know!