As part of a new series "Getting started with Alexa," we will feature best practices and tips from technical experts across Amazon. In the first edition of the series, Mav Peri, senior solutions architect for Alexa walks you through a custom skill which plays text to speech, a sample video and then text to speech again using APL. Peri will also cover the two different ways you can integrate the APL document, datasources document and sources document with your skill lambda function.
Peri is a neurodivergent single parent of two amazing boys, disability advocate and a senior solutions architect for Amazon Alexa. Before becoming a part of the Alexa team, he spent four years as Senior Solution Architect at AWS and had been involved in application development for many years prior.
The problem - and the solution
Developers using the APL authoring tool can sometimes be confused on how they can integrate in the Lambda function skill code, APL documents created in the APL authoring tool or found in APL ninja. This is particularly true for documents which may include data sources and sources documents.
To get started, I created a new custom skill in the Alexa Developer Console and used the hello world template. This will give us a working skill and lambda function.
For our example we will use the APL document template in APL ninja called
[Sequentially play speech, video and speech again] which plays text to speech (TTS), then a brief video followed by more TTS.
This APL template has 3 elements, the APL document, data and sources.
# Our APL template
The APL template contains 3 elements. The APL document, Data and Sources.
## APL document
The APL document, this is where the main APL document lives.
```
{
"type": "APL",
"version": "1.4",
"onMount": [
{
"type": "Sequential",
"commands": [
{
"type": "SetValue",
"componentId": "myVideoPlayer",
"property": "opacity",
"value": 0
},
{
"type": "SpeakItem",
"componentId": "cardSlideshowContainer"
},
{
"type": "SpeakItem",
"componentId": "One"
},
{
"type": "SetValue",
"componentId": "One",
"property": "opacity",
"value": 0
},
{
"type": "SetValue",
"componentId": "myVideoPlayer",
"property": "opacity",
"value": 1
},
{
"type": "PlayMedia",
"componentId": "myVideoPlayer",
"source": "https://s3.amazonaws.com/freestock-transcoded-videos-prod/transcoded/freestock_v206050.mp4"
},
{
"type": "SpeakItem",
"componentId": "Two"
},
{
"type": "SetPage",
"componentId": "cardSlideshow",
"position": "relative",
"delay": 1000,
"value": 2
}
]
}
],
"mainTemplate": {
"parameters": [
"card"
],
"item": {
"type": "Container",
"id": "cardSlideshowContainer",
"width": "100%",
"height": "100%",
"speech": "${card.properties.preambleSpeech}",
"items": [
{
"type": "Pager",
"navigation": "wrap",
"id": "cardSlideshow",
"numbered": true,
"height": "100%",
"width": "100%",
"data": "${card.properties.pagerData}",
"items": {
"type": "Container",
"direction": "row",
"id": "slideContainer",
"shrink": 0,
"width": "100%",
"height": "100%",
"alignItems": "center",
"items": [
{
"type": "Video",
"id": "myVideoPlayer",
"position": "absolute",
"bottom": "0vh",
"left": "0vw",
"width": "100%",
"height": "100%",
"autoplay": false,
"scale": "best-fill",
"audioTrack": "foreground"
},
{
"type": "Text",
"id": "${data.id}",
"text": "${data.text}",
"speech": "${data.textSpeech.url}",
"color": "#ffffff",
"width": "100%",
"fontSize": 60,
"textAlign": "center",
"fontWeight": "100",
"paddingLeft": 10,
"paddingRight": 10
}
]
}
}
]
}
}
}
```
You will see that the APL doc contains a Pager component with a number of items. Specifically, the TTS item as well as the video player item. When the lambda code is executed, the function will inflate the APL doc with the TTS and video stream details dynamically.
This part of the APL doc responsible for playing the video is bellow.
```
{
"type": "Video",
"id": "myVideoPlayer",
"position": "absolute",
"bottom": "0vh",
"left": "0vw",
"width": "100%",
"height": "100%",
"autoplay": false,
"scale": "best-fill",
"audioTrack": "foreground"
}
```
This part of the APL docresponsible for playing the text to speech (TTS)
```
{
"type": "Text",
"id": "${data.id}",
"text": "${data.text}",
"speech": "${data.textSpeech.url}",
"color": "#ffffff",
"width": "100%",
"fontSize": 60,
"textAlign": "center",
"fontWeight": "100",
"paddingLeft": 10,
"paddingRight": 10
}
```
## Data
A file with the data object used in the template.
```
{
"card": {
"type": "object",
"properties": {
"pagerData": [
{
"text": "This is a text to speech message before the video.",
"id": "One",
"pageText": "<speak>This is a text to speech message before the video.</speak>"
},
{
"text": "This is a text to speech message after the video",
"id": "Two",
"pageText": "<speak>This is a text to speech message after the video.</speak>"
}
],
"preambleSsml": "<speak>Welcome to the APL demo</speak>"
},
"transformers": [
{
"inputPath": "pagerData.*",
"outputName": "textSpeech",
"transformer": "aplAudioToSpeech",
"template": "PagerData"
},
{
"inputPath": "preambleSsml",
"outputName": "preambleSpeech",
"transformer": "ssmlToSpeech"
}
]
}
}
```
## Sources
A file which contains the sources object and, in our example, contains the speech transformers.
```
{
"PagerData": {
"type": "APLA",
"version": "0.9",
"mainTemplate": {
"parameters": [
"payload"
],
"item": {
"type": "Mixer",
"items": [
{
"type": "Speech",
"contentType": "SSML",
"content": "${payload.data.pageText}"
}
]
}
}
}
}
```
# Integrating the APL doc
Time to integrate the APL doc with our lambda function.
We will edit the ```index.js``` file as we are using ```nodejs``` runtime.
Create a folder called APL, we will use to store our APL document there. Even though we only have one document at this point, it makes sense to create a folder structure that can scale and makes maintenance easier.
In the APL folder we need to create a new file called ``` speechvideospeech.json```.
In there we need to store the contents of the APL document found in APL Ninja.
```
{
"type": "APL",
"version": "1.4",
"mainTemplate": {
"parameters": [
"card"
],
"item": {
"type": "Container",
"id": "cardSlideshowContainer",
"width": "100%",
"height": "100%",
"speech": "${card.properties.preambleSpeech}",
"items": [
{
"type": "Pager",
"navigation": "wrap",
"id": "cardSlideshow",
"numbered": true,
"height": "100%",
"width": "100%",
"data": "${card.properties.pagerData}",
"items": {
"type": "Container",
"direction": "row",
"id": "slideContainer",
"shrink": 0,
"width": "100%",
"height": "100%",
"alignItems": "center",
"items": [
{
"type": "Video",
"id": "myVideoPlayer",
"position": "absolute",
"bottom": "0vh",
"left": "0vw",
"width": "100%",
"height": "100%",
"autoplay": false,
"scale": "best-fill",
"audioTrack": "foreground"
},
{
"type": "Text",
"id": "${data.id}",
"text": "${data.text}",
"speech": "${data.textSpeech.url}",
"color": "#ffffff",
"width": "100%",
"fontSize": 60,
"textAlign": "center",
"fontWeight": "100",
"paddingLeft": 10,
"paddingRight": 10
},
{
"type": "Text",
"id": "${data.id}",
"text": "${data.text}",
"speech": "${data.textSpeech.url}",
"color": "#ffffff",
"width": "100%",
"fontSize": 60,
"textAlign": "center",
"fontWeight": "100",
"paddingLeft": 10,
"paddingRight": 10
}
]
}
}
]
}
}
}
```
Near the top of the document after the imports start with adding a variable token ID used for our APL doc.
```const myTokenID = 'myTokenID';```
Assigning the APL document we saved earlier to a variable.
```var myAPLDoc = require('./APL/speechvideospeech.json');```
Now we need to add the datasources.
```
{
"card": {
"type": "object",
"properties": {
"pagerData": [
{
"text": "This is a text to speech message before the video.",
"id": "One",
"pageText": "<speak>This is a text to speech message before the video.</speak>"
},
{
"text": "This is a text to speech message after the video",
"id": "Two",
"pageText": "<speak>This is a text to speech message after the video.</speak>"
}
],
"preambleSsml": "<speak>Welcome to the APL demo</speak>"
},
"transformers": [
{
"inputPath": "pagerData.*",
"outputName": "textSpeech",
"transformer": "aplAudioToSpeech",
"template": "PagerData"
},
{
"inputPath": "preambleSsml",
"outputName": "preambleSpeech",
"transformer": "ssmlToSpeech"
}
]
}
}
```
Immediately after we need to create the sources object which we get from the sources section of the APL authoring tool or the APL Ninja editor.
```
{
"PagerData": {
"type": "APLA",
"version": "0.9",
"mainTemplate": {
"parameters": [
"payload"
],
"item": {
"type": "Mixer",
"items": [
{
"type": "Speech",
"contentType": "SSML",
"content": "${payload.data.pageText}"
}
]
}
}
}
}
```
Next we create a small helper function for the directive objects
```
const createDirectivePayload = (aplDocumentId, dataSources = {}, tokenId = "myTokenID") => {
return {
type: "Alexa.Presentation.APL.RenderDocument",
token: tokenId,
document: require('./APL/speechvideospeech.json'),
datasources: datasource,
sources: sources
}
};
```
For our example, we want the TTS, Video, TTS to play as soon as the skill launches so we will modify our lambda launch handler to look like this
```
const LaunchRequestHandler = {
canHandle(handlerInput) {
return Alexa.getRequestType(handlerInput.requestEnvelope) === 'LaunchRequest';
},
handle(handlerInput) {
if (Alexa.getSupportedInterfaces(handlerInput.requestEnvelope)['Alexa.Presentation.APL']) {
const aplDirective = createDirectivePayload(myAPLDoc, datasource, myTokenID);
handlerInput.responseBuilder
.addDirective(aplDirective)
.addDirective({
type : 'Alexa.Presentation.APL.ExecuteCommands',
token: myTokenID,
commands: [
{
"type": "SetValue",
"componentId": "myVideoPlayer",
"property": "opacity",
"value": 0
},
{
"type": "SetPage",
"componentId": "cardSlideshow",
"position": "relative",
"delay": 2000,
"value": 0
},
{
"type": "SetValue",
"componentId": "One",
"property": "opacity",
"value": 1
},
{
"type": "SpeakItem",
"componentId": "cardSlideshowContainer"
},
{
"type": "SpeakItem",
"componentId": "One"
},
{
"type": "SetValue",
"componentId": "One",
"property": "opacity",
"value": 0
},
{
"type": "SetValue",
"componentId": "myVideoPlayer",
"property": "opacity",
"value": 1
},
{
"type": "PlayMedia",
"componentId": "myVideoPlayer",
"source": "https://s3.amazonaws.com/freestock-transcoded-videos-prod/transcoded/freestock_v206050.mp4"
},
{
"type": "SpeakItem",
"componentId": "Two"
},
{
"type": "SetPage",
"componentId": "cardSlideshow",
"position": "relative",
"delay": 1000,
"value": 2
}
]
});
}
// send out skill response
return handlerInput.responseBuilder.getResponse();
}
};
```
## We should be ready now to test our TTS-Video-TTS functionality!
The file structure for our lambda project should be as per bellow. Please note that the file structure is based on the Hello World custom skill template in Alexa developer console.
```
|-index.js (based on the hello world template, edited for this demo)
|-local-debugger.js (created by hello world template)
|-package.json (created by hello world template)
|-utils.js (created by hello world template)
|-APL/speechvideospeech.json (created for this demo)
```
For reference the full lambda code can be found bellow, it is based on the hello world custom skill code created by the Alexa developer console.
```
/* *
* This sample demonstrates handling intents from an Alexa skill using the Alexa Skills Kit SDK (v2).
* Please visit https://alexa.design/cookbook for additional examples on implementing slots, dialog management,
* session persistence, api calls, and more.
* */
const Alexa = require('ask-sdk-core');
const myTokenID = 'myTokenID';
var myAPLDoc = require('./APL/speechvideospeech.json');
const datasource = {
"card": {
type: "object",
properties: {
"pagerData": [
{
"text": "This is a text to speech message before the video.",
"id": "One",
"pageText": "<speak>This is a text to speech message before the video.</speak>"
},
{
"text": "This is a text to speech message after the video",
"id": "Two",
"pageText": "<speak>This is a text to speech message after the video.</speak>"
}
],
preambleSsml: "<speak>Welcome to the APL demo</speak>"
},
transformers: [
{
"inputPath": "preambleSsml",
"outputName": "preambleSpeech",
"transformer": "ssmlToSpeech"
},
{
"inputPath": "pagerData.*",
"outputName": "textSpeech",
"transformer": "aplAudioToSpeech",
"template": "PagerData"
}
]
}
};
const sources = {
"PagerData": {
"type": "APLA",
"version": "0.9",
"mainTemplate": {
"parameters": [
"payload"
],
"item": {
"type": "Mixer",
"items": [
{
"type": "Speech",
"contentType": "SSML",
"content": "${payload.data.pageText}"
}
]
}
}
}
}
const createDirectivePayload = (aplDocumentId, dataSources = {}, tokenId = "myTokenID") => {
return {
type: "Alexa.Presentation.APL.RenderDocument",
token: tokenId,
document: require('./APL/speechvideospeech.json'),
datasources: datasource,
sources: sources
}
};
const LaunchRequestHandler = {
canHandle(handlerInput) {
return Alexa.getRequestType(handlerInput.requestEnvelope) === 'LaunchRequest';
},
handle(handlerInput) {
if (Alexa.getSupportedInterfaces(handlerInput.requestEnvelope)['Alexa.Presentation.APL']) {
const aplDirective = createDirectivePayload(myAPLDoc, datasource, myTokenID);
handlerInput.responseBuilder
.addDirective(aplDirective)
.addDirective({
type : 'Alexa.Presentation.APL.ExecuteCommands',
token: myTokenID,
commands: [
{
"type": "SetValue",
"componentId": "myVideoPlayer",
"property": "opacity",
"value": 0
},
{
"type": "SetPage",
"componentId": "cardSlideshow",
"position": "relative",
"delay": 2000,
"value": 0
},
{
"type": "SetValue",
"componentId": "One",
"property": "opacity",
"value": 1
},
{
"type": "SpeakItem",
"componentId": "cardSlideshowContainer"
},
{
"type": "SpeakItem",
"componentId": "One"
},
{
"type": "SetValue",
"componentId": "One",
"property": "opacity",
"value": 0
},
{
"type": "SetValue",
"componentId": "myVideoPlayer",
"property": "opacity",
"value": 1
},
{
"type": "PlayMedia",
"componentId": "myVideoPlayer",
"source": "https://s3.amazonaws.com/freestock-transcoded-videos-prod/transcoded/freestock_v206050.mp4"
},
{
"type": "SpeakItem",
"componentId": "Two"
},
{
"type": "SetPage",
"componentId": "cardSlideshow",
"position": "relative",
"delay": 1000,
"value": 2
}
]
});
}
// send out skill response
return handlerInput.responseBuilder.getResponse();
}
};
const HelloWorldIntentHandler = {
canHandle(handlerInput) {
return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest'
&& Alexa.getIntentName(handlerInput.requestEnvelope) === 'HelloWorldIntent';
},
handle(handlerInput) {
const speakOutput = 'Hello World!';
return handlerInput.responseBuilder
.speak(speakOutput)
//.reprompt('add a reprompt if you want to keep the session open for the user to respond')
.getResponse();
}
};
const HelpIntentHandler = {
canHandle(handlerInput) {
return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest'
&& Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.HelpIntent';
},
handle(handlerInput) {
const speakOutput = 'You can say hello to me! How can I help?';
return handlerInput.responseBuilder
.speak(speakOutput)
.reprompt(speakOutput)
.getResponse();
}
};
const CancelAndStopIntentHandler = {
canHandle(handlerInput) {
return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest'
&& (Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.CancelIntent'
|| Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.StopIntent');
},
handle(handlerInput) {
const speakOutput = 'Goodbye!';
return handlerInput.responseBuilder
.speak(speakOutput)
.getResponse();
}
};
/* *
* FallbackIntent triggers when a customer says something that doesn’t map to any intents in your skill
* It must also be defined in the language model (if the locale supports it)
* This handler can be safely added but will be ingnored in locales that do not support it yet
* */
const FallbackIntentHandler = {
canHandle(handlerInput) {
return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest'
&& Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.FallbackIntent';
},
handle(handlerInput) {
const speakOutput = 'Sorry, I don\'t know about that. Please try again.';
return handlerInput.responseBuilder
.speak(speakOutput)
.reprompt(speakOutput)
.getResponse();
}
};
/* *
* SessionEndedRequest notifies that a session was ended. This handler will be triggered when a currently open
* session is closed for one of the following reasons: 1) The user says "exit" or "quit". 2) The user does not
* respond or says something that does not match an intent defined in your voice model. 3) An error occurs
* */
const SessionEndedRequestHandler = {
canHandle(handlerInput) {
return Alexa.getRequestType(handlerInput.requestEnvelope) === 'SessionEndedRequest';
},
handle(handlerInput) {
console.log(`~~~~ Session ended: ${JSON.stringify(handlerInput.requestEnvelope)}`);
// Any cleanup logic goes here.
return handlerInput.responseBuilder.getResponse(); // notice we send an empty response
}
};
/* *
* The intent reflector is used for interaction model testing and debugging.
* It will simply repeat the intent the user said. You can create custom handlers for your intents
* by defining them above, then also adding them to the request handler chain below
* */
const IntentReflectorHandler = {
canHandle(handlerInput) {
return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest';
},
handle(handlerInput) {
const intentName = Alexa.getIntentName(handlerInput.requestEnvelope);
const speakOutput = `You just triggered ${intentName}`;
return handlerInput.responseBuilder
.speak(speakOutput)
//.reprompt('add a reprompt if you want to keep the session open for the user to respond')
.getResponse();
}
};
/**
* Generic error handling to capture any syntax or routing errors. If you receive an error
* stating the request handler chain is not found, you have not implemented a handler for
* the intent being invoked or included it in the skill builder below
* */
const ErrorHandler = {
canHandle() {
return true;
},
handle(handlerInput, error) {
const speakOutput = 'Sorry, I had trouble doing what you asked. Please try again.';
console.log(`~~~~ Error handled: ${JSON.stringify(error)}`);
return handlerInput.responseBuilder
.speak(speakOutput)
.reprompt(speakOutput)
.getResponse();
}
};
/**
* This handler acts as the entry point for your skill, routing all request and response
* payloads to the handlers above. Make sure any new handlers or interceptors you've
* defined are included below. The order matters - they're processed top to bottom
* */
exports.handler = Alexa.SkillBuilders.custom()
.addRequestHandlers(
LaunchRequestHandler,
HelloWorldIntentHandler,
HelpIntentHandler,
CancelAndStopIntentHandler,
FallbackIntentHandler,
SessionEndedRequestHandler,
IntentReflectorHandler)
.addErrorHandlers(
ErrorHandler)
.withCustomUserAgent('sample/speech-video-speech')
.lambda();
```
You can also see the lambda code repo.
## Alternative approach: integrating directly with the APL authoring tool.
This section only applies if instead of storing your APL documention together with your Lambda function in the APL folder, you are managing your APL doc in the APL authoring tool and you wish to integrate it directly.
In this case you need to change the lambda code and replace the createDirectivePayload const from
```
const createDirectivePayload = (aplDocumentId, dataSources = {}, tokenId = "myTokenID") => {
return {
type: "Alexa.Presentation.APL.RenderDocument",
token: tokenId,
document: require('./APL/speechvideospeech.json'),
datasources: datasource,
sources: sources
}
};
```
**to**
```
const createDirectivePayload = (aplDocumentId, dataSources = {}, tokenId = "myTokenID") => {
return {
type: "Alexa.Presentation.APL.RenderDocument",
token: tokenId,
document: {
"src": "doc://alexa/apl/documents/**YOUR_OWN_APL_DOCUMENT_NAME_HERE**",
"type": "Link"
},
datasources: datasource,
sources: sources
}
};
```