Alexa has a visual design framework called Alexa Presentation Language (APL), which allows you to build interactive voice and visual experiences across the device landscape. This multimodal experience can make skills more delightful and engaging to the customer. APL provides visual elements including:
You can design custom visual elements for standard Alexa-enabled devices such as the Echo Show, Fire TV, and select Fire Tablet devices. Third-party devices built using Alexa Smart Screen and the TV Device SDK also support the APL design framework.
Amazon created APL so you can design custom experiences that combine voice, audio, and visual elements in a single customer interface. This framework is adaptable so one design can scale to multiple device types while keeping the visual and voice elements synchronized. There are many ways you can use APL to enrich the customer experience. With APL, you can provide customers with complementary information at a glance from across the room or offer visual clues, such as showing lists or search items. APL supports voice commands as well so that customers can ask for an item on screen instead of relying on touch interactions only. This gives your skill fluidity between interaction types, making customer interactions seamless and intuitive.
When you design with APL, you have the freedom to define where your visual elements are placed on screen, matching your visual expression and brand identity across many Alexa-enabled devices. As you design in APL using the authoring tool, you can run a test simulation in the Alexa Developer Console to see what your design looks like across Alexa-enabled device types built by Amazon.
You can reuse designs across multiple skills and share your designs with others. Having the ability to design across devices allows you to tailor the experience to targeted device types and audiences. To get started, Amazon provides sample APL layouts. These layouts are designed to work well across a broad range of Alexa-enabled device types. You can use the layouts as-is, modify them, or build your own from scratch. Although APL is a new language, it adheres to universally understood styling practices, and the syntax is familiar to anyone with front-end development experience.
APL creates a JSON file sent from your skill to an Alexa-enabled device that contains the specifications for your design elements. The device evaluates what it can support, and then imports images and other data as needed to render the correct experience. The following library contains all the visual design components for APL that you can use:
You can deliver images on screen with or without text that can be responsive to touch using TouchWrappers. You can also have images that are have a blur filter applied to them.
When placing components on top of images, use the overlay (scrim) to apply a colored opacity layer over your image to help with the legibility and accessibility of your content. When you want to de-emphasize an image, you can also change its opacity to create different effects.
Background images create an enriching visual experience without interfering with the primary content. Place images into the background of your layout to provide texture to the primary content shown on screen.
Use thumbnails to differentiate between search results or pair an image with a text component to provide additional context for an option.
Use smaller images to provide tertiary content, such as star ratings.
Add images to an empty or default state to ensure your layouts look complete. Adding images to an empty state is an opportunity to add delight and context to an otherwise bland response.
Ensure that the touch target is tied to, and can be selected by, voice in addition to touch. If you wrap a text string in TouchWrapper, it's best if the string represents the phrase that will trigger the intent.
Use the TouchWrapper to wrap items in your sequence so that customers can select each one using touch to view more detailed content.
In combination with the image component, the TouchWrapper can be used to create navigational items, such as graphical buttons, or to add points of interaction and selection on images such as a game board.
Because touch wrappers are intended to be touched, we recommend a minimum size of 48x48dp, which creates a physical touch target of 9mm, regardless of screen size.
When you show text, you can specify the text color, size, and weight for available fonts. You can use TouchWrappers and ScrollViews to make your text touch responsive and allow you to display it outside the bounds of the container. This enables customers to touch to scroll below the fold.
When you want to add or remove emphasis to text, you can change its color and opacity to help distinguish states, or primary and secondary content.
Too much text on screen can distract from the voice experience and overwhelm the customer. When using the text component in a List, try to limit your text to 3 lines. This highlights the most pertinent information to allow for quick scanning and selection.
Note: Custom fonts are not supported at this time.
Limit the number of rich text formatting options to two styles (for example, bold and italic).
Use as few text sizes as possible to convey a strong sense of hierarchy and meaning to the message you want to convey.
Make sure to have strong contrast between the text color and background color to make it easier for customers to read your text, especially at a distance.
Note: Unlike Image components, text components should never be layered on top of each other. Layered text is impossible to read and will not pass accessibility standards for the Alexa device ecosystem.
You can use Pagers to show a time-ordered sequence of items that typically advance automatically, such as slideshows. Or you can use Sequences to show a continuous list of choices, such as local restaurants, and allow customers to navigate the list via voice or by touch or remote control.
Note: For most devices, touching the screen will pause pagination.
Pager is best used for images or text that don't match exactly with the TTS that Alexa is reading, or for content that you don't want the customer to scroll through. For example, you can use Pager to automatically paginate through a carousel of images, or a series of cards displaying sports scores.
Pager is a good opportunity to present additional (or “bonus”) information or images that may be of interest to the customer. The TTS should be related to what Alexa is saying but it doesn't need to match exactly. Limit the number of pager items to no more than 6 or 7 to avoid overwhelming your customer.
Critical information should be spoken. Customers may not be looking at the displayed content, or may miss items at the end of your presentation. Content presented using the Pager component works best when not combined with too many other layouts displayed on screen. Too many things happening at once can be distracting to customers.
With the Sequence component, you can place a list within your skill. Sequences are best suited for providing multiple options or results for a customer to chose from in a predetermined order. Only use one sequence per screen so that the customer understands how to control the sequence with voice commands.
Numbering each item in your sequence with an ordinal is important for enabling easy selection for the customer. Always be sure to start with 1 and increment by 1 throughout your sequence.
You can set a scroll direction of vertical or horizontal for your sequence. Sequences with text work best in a vertical scrolling orientation, while sequences with images work best in a horizontal scrolling orientation.
You can use Layouts to group and describe the placement of components such as images, text, ScrollViews, pagers, and sequences. You can nest Layouts within each other. You can also specify the header, footer, and hint layouts provided by Amazon. With Conditional Expressions you can customize by device type by using the when property in your layouts. For example, you might conditionally select one nested layout when the device shape is round and another when the shape is rectangular.
Use Speech Synchronization to synchronize text highlighting with Alexa's voice response.
You can also send commands that change the audio or visual presentation of content shown on screen within your APL documents. For example, you can highlight a line or block of text currently being read by using the SpeakItem command and highlightMode. Similar to SSML for voice, you can use the Idle command to insert pauses in visual sequences if Alexa pauses while speaking.
Note: Text is usually displayed in the Ember Light font. However, when using speech synchronization, the text is displayed in Ember Regular as the default font with different opacity values for highlighting to make it more readable.
Use speech synchronization if you are displaying text that Alexa will read aloud. Otherwise, the text scrolling on the display may not match the TTS.
It can be confusing to a customer if multiple panels, such as a split screen, appear on the display with one of the panels autoscrolling with synchronized text. Minimize all other features currently being displayed so speech synchronization can stand out.
Display only as much text as Alexa reads. Limit TTS to around 15 seconds for short-format content to make it easy for customers to focus, and to interrupt if they choose. If there is more text than Alexa can read, consider including a prompt asking the customer to continue, such as “Tell me if you'd like me to read more.” If the customer chooses to continue, remember to append the additional text.
Note: For most devices, touching the screen will pause screen synchronization.
You can include video content within your APL layouts to continue your skill experience when the skill completes media playback. You can customize the video playback as well as build in playback controls like play, pause, and rewind buttons.
Only include videos if it is the main content for your skill. For example, you can play a music video when the customer asks to play a song. Videos playing in the background that are not part of the main action of the skill can be distracting or confusing to customers.
Note: Always include closed captioning in your videos.
Provide a screenshot or related image to your video content to act as a static preview.
Provide a way to pause the video content by voice and by using an on-screen button or other control. Customers should always control the video playback experience unless there is a specific reason for the experience to control it. Whenever possible, allow the customer to choose to repeat or loop a video.
Use common, easily recognizable icons for your playback controls. Likewise, allow the customer to use familiar terms to control playback using voice. At a minimum, provide a play, pause, and full screen button.