The Alexa Skills Kit empowers you with Star Trekesque creative powers to help computers understand and speak human languages. Alexa technology achieves this despite the almost-insurmountable challenges that human languages pose; namely, infinite combination of words, vast vocabulary, ambiguity, grammatical and structural complexity, context-based complications, accents, dialects and more. Two key technologies working seamlessly to make this possible are automatic speech recognition (ASR) (converting speech to text) and natural language understanding (NLU) (extracting meaning from text). Alexa’s NLU technology helps map customer utterances to the correct response, making it the most critical ingredient in the recipe for your skill’s success. In this blog, we’ll talk about improving the NLU accuracy of your Alexa skills.
The Anatomy of an Alexa Skill Interaction
To understand how NLU works for your Alexa skill, consider the following customer utterance for a hypothetical skill named “Horoscope Reader”:
“Alexa, Ask Horoscope Reader my horoscope for Virgo today.”
Here's what happens next. Alexa hears the wake word ('Alexa'), gets triggered into action and listens to the customer utterance. ASR then converts the customer’s utterance into text, which is broken down into its different parts and identified:
The word 'Ask' is identified as the 'Launch Phrase', 'Horoscope Reader' as the 'Skill Name' and the rest of the sentence is identified as the 'Customer request' through statistical modeling and exact match rules. Next, Alexa refers the skill's interaction model to map the customer request – “my horoscope for Virgo today” to the correct intent - 'GetHoroscope' and maps the slot values ‘Virgo’ and ‘today’ to the slots - {sign} and {time} respectively. These values are then passed to your skill's backend code through a structured JSON query, to elicit the correct response, which in this case is the horoscope prediction for Virgo on that day. This response is then converted from text to speech and heard. As we can see, to accurately answer all customer requests for your skill, Alexa must provide an interaction model at run time to elicit the expected response. This model contains the maximum possible permutations and combinations of the utterances, slots and slot values, mapped to intents, to cover a wide range of possible customer utterances. Hence, the quality of the interaction model is a crucial aspect that determines the NLU accuracy of your skill.
How to Improve the NLU Accuracy of Your Skills
An NLU accuracy error occurs when a customer invokes a skill with a specific request and it provides an inappropriate response or it doesn’t get invoked altogether. Here are a few things, within your control as a developer, that can reduce your skill’s NLU error rate and ensure a smooth customer experience:
1. Avoid Common Invocation Name Issues
The choice of invocation name is important for your skill from a voice recognition and discoverability point of view. Here are a few invocation name issues that you can easily avoid:
2. Include a Comprehensive Set of Sample Utterances Resembling Real World Interactions
The success of your skill in responding to customer utterances is directly related to how closely your sample utterances resemble real world customer interactions. Work backwards from every intent or functionality within your skill to think about all the possible ways in which customers could pose questions/requests that will be answered by those intents. Here are some of the best practices for building a robust interaction model:
3. Use Custom or Built-In Slots Wherever Relevant
If your utterances contain different words belonging to the same category in multiple utterances (for example, Virgo, Aries etc. belonging to the ‘Zodiac’ category in our example skill), it helps if you add custom slots wherever applicable in your sample list of utterances. Also, Amazon provides a wide range of built-in slots describing numbers, dates, times, phrases and list of items. These cover some of the most common use cases that may be used for skills. If you use any of these built-in slot types, you do not need to provide the slot values or sample utterances for the same as they are pre-built and provided. Both the slot types (custom and built-in) automatically reduce the number of utterances you need to provide for your skill.
For example, in our ‘Horoscope Reader’ skill, it makes sense to add a custom slot for zodiac signs – ‘{sign}’ with slot values equating to all of the 12 signs and a built-in Amazon.DATE slot in your utterances:
“Alexa, Ask Horoscope Reader my horoscope for {sign} {AMAZON.DATE}.”
Here, AMAZON.DATE slot has built-in values like "today", “yesterday” "tomorrow", or “august”, "july” that it can convert into a date format.
4. Use Entity Resolution to Eliminate Slot Value Redundancy
The entity resolution feature can help improve the NLU accuracy of customer utterances that have slot values with the same meaning, which do not need to be handled differently. This is achieved by defining synonyms for those slot values. It addresses the redundancy in your slot values and makes it easy for synonyms to be handled by the same code.
For example, if your skill has a slot named {weather} and it has possible slot values to be ‘storm’, ‘hurricane’, ‘gale’, ‘squall’ etc., and you have similar code responses for all of them, you can use entity resolution by defining ‘storm’ as a canonical slot value, ‘hurricane’, ‘gale’ and ‘squall’ as synonyms, and defining ‘storm’ as the unique ID.
Once you’ve set this up, if a user says ‘gale’ in the utterance, your skill back-end will receive ‘gale’, as the customer utterance, ‘storm’ as the canonical value and ‘storm’ as the unique ID. No matter what the synonym value in the utterance, your skill can now take the unique ID and respond according to that slot value, thus reducing the redundancy in code that would otherwise be introduced by slot values with same connotation. Refer our detailed guide here for more.
5. Use the Utterance Profiler to Test Intent Mapping Accuracy
A simple way of improving your skill’s intent mapping accuracy is to study the utterance profiler data (accessible through the Build page in the developer console, by clicking the utterance profiler button in the upper-right corner). It lets you check whether your interaction model is resolving to the right intents and slots, even before building your skill’s backend code. For the utterances that are not resolving to the right intents or slots, you can go back and iteratively update the interaction model until it is resolving correctly. You can see an example of utterance profiler use for a skill in the screenshot below.
6. Use the NLU Evaluation Tool
A scalable technique for batch testing the NLU accuracy of your interaction model is provided by the NLU evaluation tool. Instead of testing each utterance manually on the developer console, the tool lets you create a complete set of utterances mapped to expected intents and slots, known as the annotation set, and automate the batch testing of your skill’s interaction model using it. Results of the tests are marked to be passed or failed depending upon whether they invoked the right intents and slots. This automates your testing process and makes regression testing possible. For more details, please refer this guide. As depicted in the screenshot below, you can access the NLU evaluation Tool under the 'Build' tab on the developer console.
7. Review Intent History
The intent history feature helps you improve the resolution accuracy for your live or in development skills by letting you work backwards from actual customer interaction data for your skill. It provides the anonymized and aggregated customer utterances and slots data along with confidence levels (High, Medium, Low) with which the current run-time interaction model is resolving them to skill’s intents and slots. The tool will display daily data for any skill locale only if it has at least 10 unique users for that day and is not inclusive of all utterances but only constitutes a select sample size.
By studying this data, you can check the confidence level with which each utterance is resolved to an intent. You may either take action to change its mapping to a different intent/slot or retain the existing one. If you see frequently-used user utterance requests in the data that are currently missing in your interaction model, add them to improve accuracy. You may also identify carrier and common phrase patterns that are currently not included in your interaction model and update them.
For example, let’s say you open the intent history tab for our skill 'Horoscope Reader' and it shows “talk about my horoscope” as a frequent utterance that is currently resolved to “Fallback Intent” or is not resolved to any intent. This indicates that the phrase is currently not triggering the launch request and hence it is mapping to the Fallback intent and the skill is not working in this case. To fix this, you will map the phrases as a sample utterance for the “LaunchRequest” intent for your skill. Intent history feature is accessible on the left hand side of the developer console under the “Build” tab as depicted in the screenshot below. Please refer our detailed guide here for more details.
8. Use Fallback Intent to Handle Out-of-Domain Requests
Despite building a robust interaction model that could cover most scenarios, at times, customers may express utterances that are out-of-domain i.e. utterances that aren't mapped to any of your intents. In such cases, your skill should still be able to gracefully handle the request and gently redirect your customers with a message that conveys what the skill can do for them and set the right expectations. For this exact purpose, we have provided the Fallback Intent that can help you take care of these unmapped utterances. Here is an example:
User: 'Alexa, open Horoscope Reader'
Alexa: 'Welcome to Horoscope Reader. What is your Sun Sign?'
User: 'What is Robert De Niro’s Sun sign? '
(This utterance isn’t mapped to any of the Horoscope Reader intents. Since Alexa cannot map this utterance to any intent, AMAZON.FallbackIntent is triggered.)
Alexa: 'The Horoscope Reader skill can't help with that, but I can tell your daily horoscope. What is your Sun sign?'
(The response gently nudges the customer to ask questions within the skill's domain.)
For more details, please refer our detailed guide here.
9. Use Dynamic Entities to Improve Slot Recognition
Your skill may have slots that are highly dynamic in nature ('food item names' for example). With a static slot value list, voice recognition of slots with dynamic values could be poor. In such cases, you may use Dynamic Entities to modify or replace slot values at run time, to offer a personalized experience for customers. The use of dynamic entities substantially improves speech recognition by dynamically biasing the skill’s interaction model towards newly added slot values at run time. For example, you might be building a restaurant skill that let’s customers order items. Dynamic entities let customers order the 'daily specials' by passing on the current daily special slot values at run time, even though they may not have been entered in the pre-built ‘static’ model. For skills that use device location, like a hyper-local food ordering skill, different slot values for restaurant names would be served at run time, based on the device location provided. For implementation details, please refer our detailed guide here.
Conclusion
To sum up, continuous testing with real users, studying the skill response data through the tools described above and iteratively updating the interaction model to improve resolution accuracy is the sure fire way of addressing all NLU issues. The key lies in getting the interaction model right. Hope we have 'invoked' your curiosity enough to motivate your 'intent' of exploring this important topic further. For questions, you may reach out to me on Twitter at @omkarphatak.