The Alexa Web API for Games allows you to build rich and immersive voice-enabled games using the web technology and tooling of your choice. You can build this using a combination of new Alexa Skills Kit directives and a client side JavaScript library, which loads on supported devices such as the Echo Show and the FireTV Cube. This blog will help you familiarize with the basic concepts behind the APIs so you can get started building your own Alexa-enabled game using these new powerful tools. We will cover the topics of:
The Alexa Web API for Games works very similarly to other types of custom, multimodal Alexa skills.
Alexa skill flow diagram (above).
Before we get to the details of what these requests and responses look like, you’ll first need to enable your skill to use Alexa Web API for Games. You can do this in the developer console: open your skill, click Interfaces, toggle on Alexa Web API for Games. Afterwards, select Build Model.
How to turn on Alexa Web API for Games in the developer console (above).
If you are a CLI user, you can enable this through the skill manifest. Simply add the ALEXA_PRESENTATION_HTML interface to the manifest.apis.custom.interfaces block:
{
"type": "ALEXA_PRESENTATION_HTML"
}
This enables Alexa Web API capable devices to send the appropriate context information to your skill back end.
It is important to check for the existence of the Alexa.Presentation.HTML interface before sending a request to start your web application. This is the same supportedInterfaces block that you check for in your Alexa Presentation Language (APL) capable skills today. In step 3, where your skill confirms if the device supports Alexa Web API for Games, the context in the request envelope to your skill will contain:
...
"device": {
"deviceId": "amzn1.ask.device.XXXX",
"supportedInterfaces": {
"Alexa.Presentation.HTML": {
"runtime": {
"maxVersion": "1.0"
}
}
}
},
...
In Node.js, this check might look like:
const supportedInterfaces = Alexa.getSupportedInterfaces(handlerInput.requestEnvelope);
const htmlInterface = supportedInterfaces['Alexa.Presentation.HTML'];
if(htmlInterface !== null && htmlInterface !== undefined) {
// Add a start directive.
}
Once you determine that the requesting device supports the interface, you can send a start directive.
The Alexa.Presentation.HTML.Start directive notifies the Alexa Service that you want to start a web application. This is the message your skill back-end will send to launch your web experience. A simplified version looks like:
{
type:"Alexa.Presentation.HTML.Start",
data: {
"arbitraryDataKey1": "My arbitrary data"
},
request: {
uri: "https://mywebsite.com/alexa.html"
method: "GET"
},
configuration: {
timeoutInSeconds: 300
}
});
There are a few key pieces to this start directive. First, the type will always be Alexa.Presentation.HTML.Start. Also required is the request block with an HTTPS link. This will be the webpage that will load on the device. The SSL certificate must be valid for this to open on an Alexa-enabled device. The configuration.timeoutInSeconds allows you to configure how long your website stays on screen with no customer interaction. This can persist for up to 30 minutes without input from the player. The data field is necessary to send the initial startup information for your web application. The fields defined in this JSON payload can be anything and should include all of the information you need for the web application to show the correct state to your player at the start of the visual experience. There are more optional parts to this directive, such as the ability to add authorization headers. To learn more about this directive, see the technical documentation.
To load the Alexa JavaScript library you will need to include it in a script tag on your HTML page, like so:
<head>
<script src="https://cdn.html.games.alexa.a2z.com/alexa-html/latest/alexa-html.js">
</script >
</head>
The device will intercept the request and inject the library it into the WebView so you can reference the Alexa object in your JavaScript code. To initialize Alexa, use the following:
var alexaClient;
Alexa.create({version: '1.0'})
.then((args) => {
const {
alexa,
message
} = args;
alexaClient = alexa;
document.getElementById('debugElement').innerHTML = '<b>Alexa is ready :)</b>';
})
.catch(error => {
document.getElementById('debugElement').innerHTML = '<b>Alexa not ready :(</b>';
});
Assuming the .then block executes, you will have initialized the client side Alexa library and you can now use the skill, speech, voice, and device capability APIs. This is also a great place to set up your local game state and use the startup data you sent (captured here in the message variable). If the catch block is executed, then the library failed to instantiate. This commonly happens while testing locally with your computer’s browser. Since the Alexa library is only available on the device, you will not be able to use it when you access your server locally.
As mentioned above, you will need to have a valid SSL certificate in order to serve your web application on a device. There are multiple ways to accomplish this, which I will split into development use and production use. This will not be an exhaustive list of the different ways you could develop and deploy your Alexa skill. If you have a method which fits into your development workflow better, use that!
For development, you will want the ability to: 1) start up a local server for the web application which does not cache assets, and 2) create an HTTPS tunnel to reach the local HTTP server. This will be an efficient way to rapidly test your changes on a device as you would simply need to save and reload the page (start the skill over again, in this case).
NOTE: It is important to send a no-cache header in development in order to ensure you are developing against the most up to date code and art assets. If you serve cached assets accidentally to an Alexa-enabled device, you can rename the file (or append a bogus query parameter) to break the device cache. This can also be a strategy for development as long as you disable this code for production use.
To start up a local server, you can use a tool like the Node.js http-server module. This will set up a server and has a useful option to send a no-cache header with the -c-1
flag. For instance, after globally installing http-server:
npm install http-server -g
You can then start your server on port 8080 with a no-cache header sent in response. From the directory of your web application, run:
http-server . -p 8080 -c-1
Now, this will start up the local server on your machine. This enables you to test with your favorite browser development tools by opening 127.0.0.1/alexa-game.html, but will not allow communication with your skill back end. In order to do that, you will need to use a tool to create an endpoint reachable by your Alexa device. There are different ways to do this. One tool you can use is ngrok. This will give you a publicly accessible HTTPS endpoint which routes to localhost. Be sure to use the same port you specified in your local server startup command. For instance, after installing ngrok, run:
ngrok http 8080
As long as the port matches what you used in the local server startup, you’ll reach your webpage running locally from the publicly available ngrok URL. If you would like an example of this, see the My Cactus skill which uses this setup for web application development.
For use with live traffic, don’t use your development machine to host the web application and associated assets to avoid latency in the startup experience and have a more stable application. Using a storage solution and a content delivery network here will enable your assets to be stored and cached across servers worldwide, close to where your players are at edge locations — greatly reducing potential latency associated with your player’s experience. For this, I would recommend a storage solution like Amazon S3, and a content delivery network such as Amazon Cloudfront which has edge locations around the world. If you use Amazon Cloudfront, you might want a more descriptive domain name than a random *.cloudfront.net domain. The URL where your web application is hosted will be shown to your customers when your website starts up on the device, so make sure it’s appropriate for your brand by using a better domain name. To learn more about this, check out this AWS guide about setting up a static website using these tools or see our reference application which has a cloudformation template modeling this setup ready to work with ASK CLI V2. This is just one example, but keep in mind, general best practices for building websites apply to the web application used in your Alexa Web API for Games skill, too.
NOTE: You can use a setup like this for your in-development game, as well, but it will slow down your development cycle because you will need to upload all changed files every time you want to test. It is still a great idea for an integration (pre-production) environment if you are working on this project with multiple developers.
One incredibly useful feature is the ability to send messages between your web application running on the device and the skill back end, letting the Alexa skill side hold the core game logic. This is accomplished through a directive, Alexa.Presentation.HTML.HandleMessage, sent from the skill back end, and two JavaScript APIs, alexaClient.skill.sendMessage, and alexaClient.skill.onMessage.
The handle message directive allows you to send arbitrary data to your web application from your skill side code. The directive itself is very simple:
{
"type":"Alexa.Presentation.HTML.HandleMessage",
"message": {
"arbitraryDataKey1":"ArbitraryDataValue1"
}
}
The message can be any arbitrary JSON data that you want to be used in the front end web application. This gives you the flexibility to define your own protocol for communicating to your web application. One common use case is to send state information from your skill back end to your front end so you can render the appropriate visual. For instance, you can trigger a 3D animation in response to a game intent.
On the JavaScript side, you can register a listener to be invoked every time there is a HandleMessage directive sent. This will give you the data payload you sent as a JavaScript object for your web application to act on:
alexaClient.skill.onMessage((message) => {
console.log(JSON.stringify(message)); // {"arbitraryDataKey1":"ArbitraryDataValue1"}
});
Arbitrary message sending works in both directions. To communicate information from your JavaScript code to your skill back-end, starting from the JavaScript side, use the function, alexa.skill.sendMessage:
alexaClient.skill.sendMessage({
localDataKey:"LocalDataValue"
}, messageSentCallback);
const messageSentCallback = function(sendResponse) {
const {
statusCode,
reason,
rateLimit,
} = sendResponse;
//Handle different response codes: 500, 429, 401, 200
};
This function takes two parameters: the arbitrary data payload and an optional callback function for handling the response. You are rate limited to 2 calls per second using this API and if you exceed that, you will get a 429 status code (too many requests). You may want to handle scenarios like this by retrying the message after waiting (check out the rateLimit object documentation for APIs you can use to determine how long to wait).
This JavaScript API will send a request with the type, Alexa.Presentation.HTML.Message to your skill back end code. You can handle this just like you would handle an intent request. Here is one example of a simple cloud-side logger using the sendMessage API:
const WebAppCloudLogger = {
canHandle(handlerInput) {
return Alexa.getRequestType(handlerInput.requestEnvelope) === "Alexa.Presentation.HTML.Message";
},
handle(handlerInput) {
const messageToLog = handlerInput.requestEnvelope.request.message;
console.log(messageToLog);
return handlerInput.responseBuilder
.getResponse();
}
}
This handler will simply log the message payload to the cloud side logger, allowing you to see logs on the back end. To see some other examples of message handlers in use, see the handlers in the My Cactus skill.
The local device capabilities API allows you to see information about the device the web application is running on. This includes information that is not accessible by the standard HTML environment. The device capabilities is accessed through alexa.capabilities. This includes a field, microphone which has the following fields:
{
supportsPushToTalk: true,
supportsWakeWord: true
}
SupportsPushToTalk will notify you if the device supports push to talk input such as a FireTV remote. The customer using this device can talk to their remote to access Alexa. The other field, supportsWakeWord, will be true when the device supports wake word activation. Some devices such as a FireTV cube will support both, whereas a FireTV 4K stick will only support push to talk and an Echo Show will only support the wake word. You can use these capabilities to tailor the voice response by passing it to the back end or to fork on instruction in a hint.
The performance interface has one function, performance.getMemoryInfo(). This gets the available memory on device:
alexaClient.performance.getMemoryInfo().then((memInfo) => {
console.log(memInfo)
});
The performance.getMemoryInfo() function returns a promise with a payload including the memory that is available in MB. The above code would log something like:
{
memory: {
availableMemoryInMB: 715
}
}
NOTE: This API is useful in development for optimizing assets and debugging across device types. It should not be used in production because it can negatively affect device performance.
You can use the speech interface to react to speech events. This contains two callbacks, speech.onStarted and speech.onStopped. A simple example of this:
alexaClient.speech.onStarted(() => {
console.log('speech is playing');
});
alexaClient.speech.onStopped(() => {
console.log('speech stopped playing');
});
You can do things like modify any currently playing web audio when the output speech from your skill starts to play or animate an on-screen character to speak while the output speech is playing and stop when output speech stops.
Related to this is the Alexa.utils.speech class which has one method, fetchAndDemuxMP3. This will allow you to extract the audioBuffer and speechmarks from the output speech from your skill backend. This allows you to synchronize visuals and web audio with the speechmarks, so you can have finer control over the synchronization of visuals than is possible with just the speech callbacks. An example of this:
const transformerSpeechMp3 = 'https://tinytts.amazon.com/resource.mp3';
Alexa.utils.speech.fetchAndDemuxMP3(transformerSpeechMp3).then((audioResult) => {
const {audioBuffer, speechMarks} = audioResult;
playAudioAndAnimateCharacterSpeech(audioBuffer, speechMarks);
});
The voice interface allows you to listen and react to voice input events on device with the callbacks, onMicrophoneOpened and onMicrophoneClosed. An example of this:
alexaClient.voice.onMicrophoneOpened(() => {
console.log("Microphone is open. Reduce the volume of my web audio.");
});
alexaClient.voice.onMicrophoneClosed(() => {
console.log("Microphone is closed. Back to normal.");
});
You can use this interface to dim the screen, reduce background (web audio) volume, or animate a character on screen all in response to the microphone opening/closing. Also included in the voice interface is the ability to request the microphone to be opened on devices which support wake word activation. For example:
alexaClient.voice.requestMicrophoneOpen({
onError: (reason) => {
if (reason === "request-open-unsupported") {
requestMicrophoneOpenSupported = false;
openMicrophoneUsingSkillFallback();
} else {
console.log("Microphone open error: " + reason);
}
}
});
Note, if the player is on a device such as the FireTV Stick 4K, there is no wake-word so you will get a request-open-unsupported
reason if you attempt to open the microphone this way. You can combine this with capabilities.microphone
interface to check if this would be supported, and if not, create a fallback experience such as an on-screen prompt to press the FireTV remote push-to-talk. This API is powerful as it allows you to create experiences which do not conform to a conversation such as requesting the microphone to open in response to a screen interaction or after an animation plays, without any output speech.
The Alexa Web API for Games is a powerful combination of new Alexa Skills Kit directives, requests, and a client-side library to enable game developers to create novel Alexa games, combining touch, audio, visuals, and speech using familiar web technologies. By leveraging on device JavaScript tools such as bi-directional communication, voice and speech events, and device capabilities, you can create immersive, rich Alexa games. To see an example of some of these APIs being applied to a game, try out the code for our cactus raising simulation game on GitHub which makes use of the Alexa Web API for Games. And, let me know what you end up building with this @JoeMoCode on Twitter.