Getting Started with the Alexa Web API for Games

Joe Muoio Sep 01, 2020
Share:
Build Advanced Game Skills Multimodal Tutorial
Blog_Header_Post_Img

The Alexa Web API for Games allows you to build rich and immersive voice-enabled games using the web technology and tooling of your choice. You can build this using a combination of new Alexa Skills Kit directives and a client side JavaScript library, which loads on supported devices such as the Echo Show and the FireTV Cube. This blog will help you familiarize with the basic concepts behind the APIs so you can get started building your own Alexa-enabled game using these new powerful tools. We will cover the topics of:

  • Enabling the Alexa Web API for Games
  • Using the start directive
  • JavaScript initialization
  • Hosting solutions
  • Bi-directional communication
  • Device capabilities interface
  • Performance interface
  • Speech events
  • Voice events

How the Alexa Web API for Games works

The Alexa Web API for Games works very similarly to other types of custom, multimodal Alexa skills.

A picture of the flow between the customer, Alexa services, the skill backend, asset server, and multimodal device. This is described in detail below.

Alexa skill flow diagram (above).

 

  1. The customer speaks to their device (which can support Alexa Web API for Games).
  2. Automatic Speech Recognition (ASR) and natural language understanding (NLU) are applied to the speech. Alexa services will use the interaction model for your skill as input and route the request to your skill back end code.
  3. Your skill gets a request with context showing that Alexa Web API for Games is supported. It will send a response with an HTTPS URL to where the web application is hosted.
  4. Output text-to-speech (TTS) is sent to the customer. A WebView starts up on the device.
  5. A request goes out to the HTTPS URL defined in step 3. Multiple requests will be sent for each of the assets you include on the page just like in a traditional website.
  6. The web application loads and the customer can interact by touch and voice.

Enable the interface

Before we get to the details of what these requests and responses look like, you’ll first need to enable your skill to use Alexa Web API for Games. You can do this in the developer console: open your skill, click Interfaces, toggle on Alexa Web API for Games. Afterwards, select Build Model.

A gif showing how to turn on the Alexa Web API for Games using the Alexa Developer Console UI.

How to turn on Alexa Web API for Games in the developer console (above).

If you are a CLI user, you can enable this through the skill manifest. Simply add the ALEXA_PRESENTATION_HTML interface to the manifest.apis.custom.interfaces block:

Copied to clipboard
{
    "type": "ALEXA_PRESENTATION_HTML"
}

This enables Alexa Web API capable devices to send the appropriate context information to your skill back end.

Request context

It is important to check for the existence of the Alexa.Presentation.HTML interface before sending a request to start your web application. This is the same supportedInterfaces block that you check for in your Alexa Presentation Language (APL) capable skills today. In step 3, where your skill confirms if the device supports Alexa Web API for Games, the context in the request envelope to your skill will contain:

Copied to clipboard
...
    "device": {
      "deviceId": "amzn1.ask.device.XXXX",
      "supportedInterfaces": {
        "Alexa.Presentation.HTML": {
          "runtime": {
            "maxVersion": "1.0"
          }
        }
      }
    },
...

In Node.js, this check might look like:

Copied to clipboard
const supportedInterfaces = Alexa.getSupportedInterfaces(handlerInput.requestEnvelope);
const htmlInterface = supportedInterfaces['Alexa.Presentation.HTML'];
if(htmlInterface !== null && htmlInterface !== undefined) {
    // Add a start directive.
}

Once you determine that the requesting device supports the interface, you can send a start directive.

Start directive

The Alexa.Presentation.HTML.Start directive notifies the Alexa Service that you want to start a web application. This is the message your skill back-end will send to launch your web experience. A simplified version looks like:

Copied to clipboard
{
    type:"Alexa.Presentation.HTML.Start",
    data: {
        "arbitraryDataKey1": "My arbitrary data"
    },
    request: {
        uri: "https://mywebsite.com/alexa.html"
        method: "GET"
    },
    configuration: {
        timeoutInSeconds: 300
    }
});

There are a few key pieces to this start directive. First, the type will always be Alexa.Presentation.HTML.Start. Also required is the request block with an HTTPS link. This will be the webpage that will load on the device. The SSL certificate must be valid for this to open on an Alexa-enabled device. The configuration.timeoutInSeconds allows you to configure how long your website stays on screen with no customer interaction. This can persist for up to 30 minutes without input from the player. The data field is necessary to send the initial startup information for your web application. The fields defined in this JSON payload can be anything and should include all of the information you need for the web application to show the correct state to your player at the start of the visual experience. There are more optional parts to this directive, such as the ability to add authorization headers. To learn more about this directive, see the technical documentation.

JavaScript initialization

To load the Alexa JavaScript library you will need to include it in a script tag on your HTML page, like so:

Copied to clipboard
<head>
      <script src="https://cdn.html.games.alexa.a2z.com/alexa-html/latest/alexa-html.js"> 
      </script >
</head>

The device will intercept the request and inject the library it into the WebView so you can reference the Alexa object in your JavaScript code. To initialize Alexa, use the following:

Copied to clipboard
var alexaClient;
Alexa.create({version: '1.0'})
    .then((args) => {
        const {
            alexa,
            message
        } = args;
       alexaClient = alexa;
        document.getElementById('debugElement').innerHTML = '<b>Alexa is ready :)</b>';
    })
    .catch(error => {
        document.getElementById('debugElement').innerHTML = '<b>Alexa not ready :(</b>';
    });

Assuming the .then block executes, you will have initialized the client side Alexa library and you can now use the skill, speech, voice, and device capability APIs. This is also a great place to set up your local game state and use the startup data you sent (captured here in the message variable). If the catch block is executed, then the library failed to instantiate. This commonly happens while testing locally with your computer’s browser. Since the Alexa library is only available on the device, you will not be able to use it when you access your server locally.

Hosting solutions

As mentioned above, you will need to have a valid SSL certificate in order to serve your web application on a device. There are multiple ways to accomplish this, which I will split into development use and production use. This will not be an exhaustive list of the different ways you could develop and deploy your Alexa skill. If you have a method which fits into your development workflow better, use that!

Development

For development, you will want the ability to: 1) start up a local server for the web application which does not cache assets, and 2) create an HTTPS tunnel to reach the local HTTP server. This will be an efficient way to rapidly test your changes on a device as you would simply need to save and reload the page (start the skill over again, in this case).

NOTE: It is important to send a no-cache header in development in order to ensure you are developing against the most up to date code and art assets. If you serve cached assets accidentally to an Alexa-enabled device, you can rename the file (or append a bogus query parameter) to break the device cache. This can also be a strategy for development as long as you disable this code for production use.

To start up a local server, you can use a tool like the Node.js http-server module. This will set up a server and has a useful option to send a no-cache header with the -c-1 flag. For instance, after globally installing http-server:

 

Copied to clipboard
npm install http-server -g

You can then start your server on port 8080 with a no-cache header sent in response. From the directory of your web application, run:

Copied to clipboard
http-server . -p 8080 -c-1 

Now, this will start up the local server on your machine. This enables you to test with your favorite browser development tools by opening 127.0.0.1/alexa-game.html, but will not allow communication with your skill back end. In order to do that, you will need to use a tool to create an endpoint reachable by your Alexa device. There are different ways to do this. One tool you can use is ngrok. This will give you a publicly accessible HTTPS endpoint which routes to localhost. Be sure to use the same port you specified in your local server startup command. For instance, after installing ngrok, run:

Copied to clipboard
ngrok http 8080

As long as the port matches what you used in the local server startup, you’ll reach your webpage running locally from the publicly available ngrok URL. If you would like an example of this, see the My Cactus skill which uses this setup for web application development.

Production

For use with live traffic, don’t use your development machine to host the web application and associated assets to avoid latency in the startup experience and have a more stable application. Using a storage solution and a content delivery network here will enable your assets to be stored and cached across servers worldwide, close to where your players are at edge locations — greatly reducing potential latency associated with your player’s experience. For this, I would recommend a storage solution like Amazon S3, and a content delivery network such as Amazon Cloudfront which has edge locations around the world. If you use Amazon Cloudfront, you might want a more descriptive domain name than a random *.cloudfront.net domain. The URL where your web application is hosted will be shown to your customers when your website starts up on the device, so make sure it’s appropriate for your brand by using a better domain name. To learn more about this, check out this AWS guide about setting up a static website using these tools or see our reference application which has a cloudformation template modeling this setup ready to work with ASK CLI V2. This is just one example, but keep in mind, general best practices for building websites apply to the web application used in your Alexa Web API for Games skill, too.

NOTE: You can use a setup like this for your in-development game, as well, but it will slow down your development cycle because you will need to upload all changed files every time you want to test. It is still a great idea for an integration (pre-production) environment if you are working on this project with multiple developers.

Bi-directional communication

One incredibly useful feature is the ability to send messages between your web application running on the device and the skill back end, letting the Alexa skill side hold the core game logic. This is accomplished through a directive, Alexa.Presentation.HTML.HandleMessage, sent from the skill back end, and two JavaScript APIs, alexaClient.skill.sendMessage, and alexaClient.skill.onMessage.

Alexa skill to local web application

The handle message directive allows you to send arbitrary data to your web application from your skill side code. The directive itself is very simple:

Copied to clipboard
{
    "type":"Alexa.Presentation.HTML.HandleMessage",
    "message": {
        "arbitraryDataKey1":"ArbitraryDataValue1"
    }
}

The message can be any arbitrary JSON data that you want to be used in the front end web application. This gives you the flexibility to define your own protocol for communicating to your web application. One common use case is to send state information from your skill back end to your front end so you can render the appropriate visual. For instance, you can trigger a 3D animation in response to a game intent.

On the JavaScript side, you can register a listener to be invoked every time there is a HandleMessage directive sent. This will give you the data payload you sent as a JavaScript object for your web application to act on:

Copied to clipboard
alexaClient.skill.onMessage((message) => {
    console.log(JSON.stringify(message)); // {"arbitraryDataKey1":"ArbitraryDataValue1"}
});
JavaScript web application to Alexa skill side

Arbitrary message sending works in both directions. To communicate information from your JavaScript code to your skill back-end, starting from the JavaScript side, use the function, alexa.skill.sendMessage:

Copied to clipboard
alexaClient.skill.sendMessage({
    localDataKey:"LocalDataValue"
}, messageSentCallback);

const messageSentCallback = function(sendResponse) {
    const {
        statusCode,
        reason,
        rateLimit,
    } = sendResponse;
    //Handle different response codes: 500, 429, 401, 200
};

This function takes two parameters: the arbitrary data payload and an optional callback function for handling the response. You are rate limited to 2 calls per second using this API and if you exceed that, you will get a 429 status code (too many requests). You may want to handle scenarios like this by retrying the message after waiting (check out the rateLimit object documentation for APIs you can use to determine how long to wait).

This JavaScript API will send a request with the type, Alexa.Presentation.HTML.Message to your skill back end code. You can handle this just like you would handle an intent request. Here is one example of a simple cloud-side logger using the sendMessage API:

Copied to clipboard
const WebAppCloudLogger = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope) === "Alexa.Presentation.HTML.Message";
    },
    handle(handlerInput) {
        const messageToLog = handlerInput.requestEnvelope.request.message;
        console.log(messageToLog);
        return handlerInput.responseBuilder
            .getResponse();
    }
}

This handler will simply log the message payload to the cloud side logger, allowing you to see logs on the back end. To see some other examples of message handlers in use, see the handlers in the My Cactus skill.

Device capabilities

The local device capabilities API allows you to see information about the device the web application is running on. This includes information that is not accessible by the standard HTML environment. The device capabilities is accessed through alexa.capabilities. This includes a field, microphone which has the following fields:

Copied to clipboard
{
    supportsPushToTalk: true,
    supportsWakeWord: true
}

SupportsPushToTalk will notify you if the device supports push to talk input such as a FireTV remote. The customer using this device can talk to their remote to access Alexa. The other field, supportsWakeWord, will be true when the device supports wake word activation. Some devices such as a FireTV cube will support both, whereas a FireTV 4K stick will only support push to talk and an Echo Show will only support the wake word. You can use these capabilities to tailor the voice response by passing it to the back end or to fork on instruction in a hint.  

Performance

The performance interface has one function, performance.getMemoryInfo(). This gets the available memory on device:

Copied to clipboard
alexaClient.performance.getMemoryInfo().then((memInfo) => {
    console.log(memInfo)
});

The performance.getMemoryInfo() function returns a promise with a payload including the memory that is available in MB. The above code would log something like:

Copied to clipboard
{
    memory: { 
        availableMemoryInMB: 715
    }
}

NOTE: This API is useful in development for optimizing assets and debugging across device types. It should not be used in production because it can negatively affect device performance.

Speech

You can use the speech interface to react to speech events. This contains two callbacks, speech.onStarted and speech.onStopped. A simple example of this:

Copied to clipboard
alexaClient.speech.onStarted(() => {
    console.log('speech is playing');
});

alexaClient.speech.onStopped(() => {
    console.log('speech stopped playing');
});

You can do things like modify any currently playing web audio when the output speech from your skill starts to play or animate an on-screen character to speak while the output speech is playing and stop when output speech stops.

Related to this is the Alexa.utils.speech class which has one method, fetchAndDemuxMP3. This will allow you to extract the audioBuffer and speechmarks from the output speech from your skill backend. This allows you to synchronize visuals and web audio with the speechmarks, so you can have finer control over the synchronization of visuals than is possible with just the speech callbacks. An example of this:

Copied to clipboard
const transformerSpeechMp3 = 'https://tinytts.amazon.com/resource.mp3';
Alexa.utils.speech.fetchAndDemuxMP3(transformerSpeechMp3).then((audioResult) => {
    const {audioBuffer, speechMarks} = audioResult;
    playAudioAndAnimateCharacterSpeech(audioBuffer, speechMarks);
});
Voice

The voice interface allows you to listen and react to voice input events on device with the callbacks, onMicrophoneOpened and onMicrophoneClosed. An example of this:

Copied to clipboard
alexaClient.voice.onMicrophoneOpened(() => {
    console.log("Microphone is open. Reduce the volume of my web audio.");
});
alexaClient.voice.onMicrophoneClosed(() => {
    console.log("Microphone is closed. Back to normal.");
});

You can use this interface to dim the screen, reduce background (web audio) volume, or animate a character on screen all in response to the microphone opening/closing. Also included in the voice interface is the ability to request the microphone to be opened on devices which support wake word activation. For example:

Copied to clipboard
alexaClient.voice.requestMicrophoneOpen({
    onError: (reason) => {
        if (reason === "request-open-unsupported") {
            requestMicrophoneOpenSupported = false;
            openMicrophoneUsingSkillFallback();
        } else {
            console.log("Microphone open error: " + reason);
        }
    }
});

Note, if the player is on a device such as the FireTV Stick 4K, there is no wake-word so you will get a request-open-unsupported reason if you attempt to open the microphone this way. You can combine this with capabilities.microphone interface to check if this would be supported, and if not, create a fallback experience such as an on-screen prompt to press the FireTV remote push-to-talk. This API is powerful as it allows you to create experiences which do not conform to a conversation such as requesting the microphone to open in response to a screen interaction or after an animation plays, without any output speech.

 

Conclusion

The Alexa Web API for Games is a powerful combination of new Alexa Skills Kit directives, requests, and a client-side library to enable game developers to create novel Alexa games, combining touch, audio, visuals, and speech using familiar web technologies. By leveraging on device JavaScript tools such as bi-directional communication, voice and speech events, and device capabilities, you can create immersive, rich Alexa games. To see an example of some of these APIs being applied to a game, try out the code for our cactus raising simulation game on GitHub which makes use of the Alexa Web API for Games. And, let me know what you end up building with this @JoeMoCode on Twitter.

Related content

Subscribe