Saturday, January 16, 2016

Holiday Hack Project: Using Alexa to Access EVE Online Data

Most people have heard about Amazon's Echo in-home digital assistant.  Echo comes installed with a digital assistant named "Alexa", and will do the things most digital assistants do, like answer random questions, set alarms, keep to-do lists, etc.  However, Amazon went one step further and released public APIs for adding your own capabilities to Alexa.  Amazon has two offerings here:
  • The Alexa Skills Kit (ASK) is an API for adding "skills" to Alexa.  Skills are just named features that you can ask Alexa to use.  Custom skills are hooked up to web services you write to implement the skill.
  • The Alexa Voice Service (AVS) is an API for embedding Alexa in custom hardware.  Your device needs a microphone to accept voice input which you forward to the Alexa Voice Service.  The voice service passes your voice input to Alexa, then sends back an audio response which you can play through your device.
These two offerings are in developer preview at the moment, which means they're free.  So as I had a bit of time this holiday to do some hacking, I decided to see if I could add an Alexa skill which handled queries about EVE Online.  Here's an example of what I was able to do:



It turns out creating an Alexa skill is pretty easy, but Alexa has a hard time understanding things which aren't in the dictionary (e.g. a good portion of the in-game objects in EVE) so there's still more work to do to make this truly awesome.  In this post, I'll walk through all the steps to create an Alexa skill for serving up EVE data.

Resources

Alexa Skills Kit Basics - Skills, Intents and Architecture

Our goal here is to train Alexa how to respond to a few simple requests related to EVE Online.  To do that, we need two things: a "skill" and a skill handler.  A skill is a definition of the types of spoken requests we'll respond to.  A skill handler is code which will receive matching requests and generate appropriate replies.

An Alexa skill has four basic components:
  1. A name.  This is the trigger word you'll use to invoke your skill through Alexa.  Pick something easy to pronounce (and therefore, easy for Alexa to recognize).
  2. A schema with one or more intents.  An intent represents an action that matches a user's spoken request.  An intent can have placeholders, called slots, which match a set of words and act like arguments to the intent.  Alexa has several built in slots, but you can also define your own.
  3. Zero or more custom slots.  A custom slot is used when you need an argument that can match a specific set of words not covered by a built-in slot.  You typically use a custom slot to define a domain specific argument, for example the set of all in-game items in EVE.
  4. Finally, every intent must have one or more sample utterances.  A sample utterance gives an example of a user request which matches an intent, including the position of any slots.
Of the components of a skill, custom slots and sample utterances are perhaps the most important as they are used to train Alexa how to recognize an intent provided by your skill.  The more sample utterances you provide (covering a variety of ways to match an intent), the more likely it is that Alexa will be able to match a request to an intent.  We'll show a few examples of defining skills in the next section.

Once we have our skill defined, we'll need to implement a skill handler.  A skill handler is either a web service or an Amazon Web Services (AWS) Lambda function which accepts a structured skill request and returns a structured reply.  For this writeup, we're using a web service.  We'll show some sample code below.

Skill handling in Alexa uses the following flow:
  1. A user makes a spoken request invoking the name of a skill, for example: "Alexa, ask Auren for the current time".  In this example, "Auren" is the name of our skill (see below) and "(ask) the current time" is the user's request.
  2. Alexa recognizes our skill name and analyzes the request.  If all goes well, Alexa will match the request to an intent defined for our skill, filling in any defined slots with the appropriate words from the request.
  3. Alexa packages the request into a structured format (JSON) and forwards the request to our skill handler.  The skill handler will process the request and generate either a speech reply, or a "card" reply (or both).  These replies are returned to Alexa for processing.
  4. If our skill handler generated a speech reply, then Alexa will stream speech audio to the device where the request was made.
  5. If our skill handler generated a "card" reply, then Alexa will render the card in the companion application for the device where the request was made.  If the device is an Echo, then your phone is usually the companion app and you'll see your card rendered there.  You can also see your card on the Alexa web front end.
There are three basic error cases that are handled as follows:
  1. If Alexa can't understand your skill name or match a request to an intent, it will tell you so through your device without ever sending a request to your skill handler.
  2. If Alexa can't reach your skill handler, it will tell you so through your device, usually with a message along the lines of "that function is not available right now".
  3. Finally, Alexa may match your intent but not the way you expected.  In that case, your skill handler may report an error message which Alexa will report back through the requesting device.
So much for theory.  In the next section, we'll build our skill and our first intent.  By the way, all the code and setup instructions are in our GitHub project here.

Introducing Auren - Aura's Less Helpful Cousin

Skills need a name that Alexa can recognize to trigger your intents.  Keeping with the EVE theme for this project, I chose "Auren" (pronounce like Oren) because it sounded kind of like Aura, the in-game assistant in EVE.  However, Auren isn't (yet) as helpful as Aura.

For this project, we'll teach Auren to do four things:
  1. Provide help, when requested, to explain what Auren can do.  All skills should do this.
  2. Report the current time in EVE.  This is a simple starter skill to demonstrate the process.
  3. Report EVE server status.  A slightly more complicated skill that adds in calling the EVE XML API servers.
  4. Report the current price in Jita for an in-game item.  This skill will call eve-central's API to look up and report price data.  What makes this skill more complicated is that we'll need to use voice recognition to figure out which in-game item should be retrieved. 
We'll cover three steps in building our Alexa skill:
  1. Defining intents and sample utterances.
  2. Implementing a web service to handle intents.
  3. Testing the skill.

 Defining Intents and Sample Utterances

We have our skill name "Auren", now let's create an intent schema.  This is just a JSON document naming the types of requests your skill will handle.  Here's what our schema looks like for handling the first three items on Auren's list:
{
  "intents": [
    {
      "intent": "AMAZON.HelpIntent"
    },
    {
      "intent": "ServerTimeIntent"
    },
    {
      "intent": "ServerStatusIntent"
    }
  ]
}
The first intent is an Amazon built in intent that matches requests for help (e.g. "Alexa, ask Auren for help").  We don't need to provide sample utterances or anything else for a built-in intent.  We just have to be ready to process this request if it shows up at our web service.  A list of built-in intents is here.  The second and third intents name requests for EVE server time or EVE server status, respectively.

Our first three intents don't have any slots (custom or otherwise), so we can move on to sample utterances.  Since utterances for built-in intents are handled automatically, we only have to provide utterances for the "ServerTimeIntent" and the "ServerStatusIntent".  These are defined as lines in a simple text file.  Here are the utterances we created for this project:
ServerTimeIntent time
ServerTimeIntent current time
ServerTimeIntent eve time
ServerTimeIntent eve online time
ServerTimeIntent what is time
ServerTimeIntent what is current time
ServerTimeIntent what is eve time
ServerTimeIntent what is eve online time
ServerTimeIntent what is the time
ServerTimeIntent what is the current time
ServerTimeIntent what is the time in eve
ServerTimeIntent what is the time in eve online
ServerTimeIntent what is the current time in eve
ServerTimeIntent what is the current time in eve online
ServerStatusIntent server status
ServerStatusIntent eve online status
ServerStatusIntent eve server status
ServerStatusIntent what is eve's status
ServerStatusIntent what is the current server status
ServerStatusIntent current status
ServerStatusIntent current server status
ServerStatusIntent tell me current status
ServerStatusIntent tell me current server status
ServerStatusIntent tell me eve's status
A sample utterance consists of the name of the intent following by a sentence snippet giving a sample request.  The more samples you provide, the better Alexa will train to match your intent.  There are a few basic rules you need to follow when specifying an intent, but there are no stated limits on the number of sample utterances you can provide.  We've provided a decent set above, but we're missing a few you might hear (e.g. "can you tell me the time?").  Thus, there's a small chance Alexa won't match our intent correctly.  Alexa saves all incorrect translations so you can always go back later and improve the utterance list.

Implementing a Web Service to Handle Intents

Now let's take a look at our web service which will handle requests.  Note that you can instead use an AWS Lambda function to implement your skill (this eliminates some complications with certificates), but I elected to use a Java servlet because I already have a server available and Amazon provides a servlet-based library as part of the Alexa Skills Kit.

If you go the web service route, there are a few requirements.  The important requirements for testing are:
  • Your service must be accessible via the Internet; and,
  • Your service must accept requests on port 443 (https) and use an Amazon trusted certificate.
You'll need to worry about the rest of the requirements in order to deploy your skill beyond development mode.  We're not doing that here, so we'll skip the other requirements.

The Alexa Skills Kit Java library provides a ready made servlet (com.amazon.speech.speechlet.servlet.SpeechletServlet) which routes incoming requests to one or more "Speechlets".  A Speechlet is an interface (com.amazon.speech.speechlet.Speechlet) which defines four operations:
  • onSessionStarted - called when a new session starts as a result of a user invoking your skill.
  • onSessionEnded - called when a session ends either due to a user ending the session, or not interacting with your skill.
  • onLaunch - entry point when a skill is invoked without an intent (e.g. "Alexa, start Auren").
  • onIntent - entry point for handling speech related requests.
The only operation we're interested in right now is "onIntent".  The other operations are life-cycle events which would be important in a more complicated skill, but aren't relevant here.

Let's take a closer look at the handling of the ServerTimeIntent.  This intent will arrive on the onIntent call, so we'll need to implement that method first.  We can start as follows:
@Override
public SpeechletResponse onIntent(final IntentRequest request, final Session session) throws SpeechletException {
    Intent intent = request.getIntent();
    String intentName = (intent != null) ? intent.getName() : null;
    if ("ServerTimeIntent".equals(intentName)) {
      return getServerTimeResponse();
    } else {
      throw new SpeechletException("Invalid Intent");
    }
}
It's good practice to verify the intent name is not null so we do that here.  If the intent matches ServerTimeIntent then we return an appropriate response.  Otherwise, we throw an exception.  Here's what the response generator looks like:
private SpeechletResponse getServerTimeResponse() {
    // EVE time is always UTC so just return that directly
    Date now = new Date();
    SimpleDateFormat formatter = new SimpleDateFormat("HH:mm");
    formatter.setTimeZone(TimeZone.getTimeZone("UTC"));
    String speechText = "The current EVE time is " + formatter.format(now) + ".";
    // Create the Simple card content.
    SimpleCard card = new SimpleCard();
    card.setTitle("EVE Online Time");
    card.setContent(speechText);
    // Create the plain text output.
    PlainTextOutputSpeech speech = new PlainTextOutputSpeech();
    speech.setText(speechText);
    SpeechletResponse result = SpeechletResponse.newTellResponse(speech, card);
    result.setShouldEndSession(true);
    return result;
}
A standard response will provide both a card as well as text to be spoken by Alexa.  We use the same text in both cases, namely just the current time UTC which also happens to be EVE time.  The call to "setShouldEndSession" indicates that this response ends the current session.  You would set this to false if your response required further action by the user (e.g. your response asked the user for more information).

Once we've written our Speechlet, the last step is to add it to the Amazon provided servlet so that incoming requests are handled properly.  This is done by sub-classing com.amazon.speech.speechlet.servlet.SpeechletServlet and adding our Speechlet in the constructor:
public OrbitalAurenServlet() {
    super();
    // Add our speechlet
    this.setSpeechlet(new OrbitalSpeechlet());
}
You can see the final version of our servlet and Speechlet here.

Testing Auren

We've defined intents and sample utterances, we've implemented a Speechlet to handle intents, now we're ready to deploy and test our skill.  We'll skip the step of deploying our servlet to a servlet container.  There are instructions on how to do that on our GitHub page.

To test our new skill we need to create a configuration on Amazon's development site.  Once you've created an Amazon developer account and logged into the developer site, you can navigate to the Alexa Skills Kit page to start configuring a new skill.  Clicking on "Add a New Skill" brings up a page like the following:

Adding a new Alexa skill
Completing the configuration of an Alexa skill involves a six page form but only the first three pages need to be configured for testing.  The important fields on this first page are the name (which will appear in the Alexia companion app), the invocation name (the word users will use to tell Alexa to send your skill a request), and the endpoint.  As you can see from the figure, we've filled in "auren" as our invocation name and configured an HTTPS endpoint.  When you save this page for the first time, an "Application Id" will be created for you which you'll need to provide to your servlet (see GitHub page for details).

The next page is where we specify the interaction model:

Configuring a skill interaction model
The view in the figure represents our final interaction model.  Configuration is just a matter of cut and paste from the appropriate files containing your intent schema, values for any custom slots, and your sample utterances.  We'll talk about custom slots in the next section.

The next page you need to configure is actually the SSL Certificate page but we'll skip that one for brevity.  Suffice it to say, you can get away with a self-signed certificate for testing, but you'll need a certificate signed by a valid certificate authority in order to deploy beyond development mode.  We happen to have a valid SSL certificate for the endpoint we specified on the first page, so we're good to go.

The fourth page is where we're finally able to test our skill:

Skill test page
This page lets you enter a test string and verify that your skill is responding properly.  Here's what it looks like for getting the current time:

Ask Auren for EVE time!
On the left is the request Alexa sent to our servlet.  On the right is what our servlet sent back.  Notice we sent back both a card and a speech response.  If you click the "Listen" arrow you'll hear Alexa speak your response.

Of course, the whole point of this was to make the skill available on an Alexa device like Echo.  Fortunately, Amazon provides a really nice way to do that: if your Echo device is registered under the same login as your developer account, then all your skills in development are automatically added to your Echo!  This means we can walk up to our Echo and ask it a question.  Here's the video again of me doing exactly that to get EVE server status:


When you invoke your skill directly on Echo, you'll also generate a card on the Alexa site.  This is a great way to debug if Alexa has trouble translating your speech to text.  Here's what our card looks like for the request in the video:

The card corresponding to our voice request
The top part of the card shows the card response we sent back.  The bottom part shows how Alexa parsed the voice request.

The first three intents are implemented in the same way (help, server time and server status).  Getting the current price of an item in Jita is a little more complicated (and also doesn't work as well).  We'll cover that next.

Auren's Next Trick - Getting EVE Market Prices

The last intent we'll implement is a request to Auren to get the market price of an EVE item from Jita.  We need to do two things to get this working:
  1. We need to update our interaction model so that we can recognize requests for EVE items; and,
  2. We need to add code to our servlet to request market data prices.  
Let's look at the interaction model first.  For the market data intent, we need to capture the name of the item the user wants to price which means we need a slot.  Moreover, many EVE items aren't in any dictionary, so we'll need a custom slot to help Alexa understand the user's request.  Here's what the intent looks like:
{
  "intents": [
    ...
    {
      "intent": "MarketPriceIntent",
      "slots": [
        {
          "name": "Item",
          "type": "EVE_MARKET_ITEM"
        }
      ]
    }
  ]
}
We name our intent "MarketPriceIntent" and add one slot called "Item".  This is a custom slot, so we'll give it the name "EVE_MARKET_ITEM".  If this wasn't a custom slot, you'd use one of the built in slot types described here.

We'll need to tell Alexa more about our custom slot, but before we do that let's look at the sample utterances so we can see where we'll use the slot:
MarketPriceIntent what is the price of {Item}
MarketPriceIntent price for {Item}
MarketPriceIntent price {Item}
MarketPriceIntent how much does {Item} cost
MarketPriceIntent what is the current price for {Item}
MarketPriceIntent current price for {Item}
MarketPriceIntent the price for {Item}
MarketPriceIntent the price of {Item}
MarketPriceIntent price of {Item}
MarketPriceIntent current price {Item}
MarketPriceIntent what is the price of a {Item}
MarketPriceIntent price for a {Item}
MarketPriceIntent how much does a {Item} cost
MarketPriceIntent what is the current price for a {Item}
MarketPriceIntent current price for a {Item}
MarketPriceIntent the price for a {Item}
MarketPriceIntent the price of a {Item}
MarketPriceIntent price of a {Item}
As you can see, all we need to do is add "{Item}" to the location where we expect the user to name the item they're interested in.  Simple, right?

Now back to our custom slot.  To create our custom slot we'll need to go back to the service configuration form on the "Interaction Model" page.  Clicking the "Add Slot Type" button brings up the following screen:

Creating a custom slot
"Type" is just the name of our custom slot, i.e. "EVE_MARKET_ITEM".  The values section is where we enter the text we want to recognize as members of our custom slot.  This is where we hit our first difficulty.  Currently, custom slots are limited to 5000 unique values across all the custom slots we define for a skill.  However, there are around 11000 unique EVE items in the market.  To work around this, I did a slightly fancy selection of 4000 EVE items from the Static Data Export.  My selection criteria was based on market group ID, where I picked the market groups players are usually interested in (e.g. ships, ammo, materials, etc).  You can see my final list here.  Not to worry though.  Even if Alexa can't match what the user says to a value in your list, it will often continue to send whatever it heard to your skill handler.  So in many cases we'll still be able to try to match what the user said to a value.  This suggests another way to get around the 5000 value limit which I'll discuss further below.  Clicking "OK" saves the slot definition.  This takes a few minutes for large custom slots as Alexa prepares the data for voice recognition.

Adding an intent, custom slot, and new sample utterances finishes the configuration of the market data request feature.  Now we need to implement the feature on our servlet.  We follow the same approach as before where we add an appropriate response generator to our "Speechlet", but this time we need to match a slot value to an EVE market item.  Let's look at the "onIntent" entry point first (full code is here):
  public SpeechletResponse onIntent(final IntentRequest request, final Session session) throws SpeechletException {
    Intent intent = request.getIntent();
    String intentName = (intent != null) ? intent.getName() : null;
    if ("MarketPriceIntent".equals(intentName)) {
      return getMarketPriceResponse(intent.getSlot("Item").getValue());
    } else {
      throw new SpeechletException("Invalid Intent");
    }
  }
The "getSlot" method pulls out the value Alexa heard when our user invoked this intent.  The "getValue" method converts the value to a string.  We then pass this value to our response generator which is slightly more complicated this time around:
  private SpeechletResponse getMarketPriceResponse(String item) {
    // Clean up the item
    item = item.toLowerCase();
    // This is what we return if we can't figure out what the user wanted
    String speechText = "I'm sorry, I couldn't find an EVE market item matching that name.  Either that item isn't on the market, isn't in my database, or I don't understand your pronunciation.";
    // typeNameMap is a Map<ItemName, ItemTypeID>
    if (typeNameMap.containsKey(item)) {
      // Found it, look for a price
      int typeID = typeNameMap.get(item);
      MarketStatApi api = new MarketStatApi();
      try {
        List<MarketInfo> info = api.requestMarketstat(Collections.<Integer> singletonList(new Integer(typeID)), null, null, null, 30000142L);
        if (!info.isEmpty()) {
          MarketInfo data = info.get(0);
          Double bid = data.getBuy() != null ? data.getBuy().getMax() : null;
          Double ask = data.getSell() != null ? data.getSell().getMin() : null;
          ... clean up these values a bit ...
          speechText = "Jita is reporting ";
          speechText += bid == null ? "no current bid" : "a bid of " + bidPrice.toPlainString() + " isk";
          speechText += " and ";
          speechText += ask == null ? "no current ask" : "an ask of " + askPrice.toPlainString() + " isk";
          speechText += " for " + item + ".";
        } else {
          // eve-central didn't have a price for this item
          speechText = "I'm sorry, there is no market information available for " + item + ", perhaps this is a rare item?";
          log.info("Empty market info list for \"" + item + "\"");
        }
      } catch (ApiException e) {
        ... handle error calling eve-central ...
      }
    ... Create card and text response as before ...
  }
To look up market data, we need to map the user requested item to a type ID (the eve-central API looks up price data by type ID).  To keep things simple, we're using a simple table lookup based on the string value of what the user requested.  We populate our map from a larger type ID text file we extracted from the Static Data Export (full file is here).  If we manage to map the user's request to a type ID, then we call eve-central to look up the price.  The "MarketStatApi" class is a client we generated from a Swagger description of the eve-central API.  You can find more details about that on our project page.

 So does it work?  Let's try it from the Alexa development test page.  Here's an example where we get the current price of a Raven:

Getting the current price of a Raven
Hey it works!  Now let's try a real test on Echo:


So far so good, if a little on the verbose side.  Now let's see what happens when we ask the price of something that we left out of our custom slot.  We left Veldspar out, so let's try that.  First with the text interface:

Getting the current price of Veldspar
Good, the text interface doesn't care that "veldspar" wasn't included in the custom slot list.  Let's see if it works through the Echo:


And...fail.  Let's take a look at the Alexa card to see what it thought we said:

How Alexa translated "Veldspar"
So "Veldspar" became "builds bar".  Interestingly enough, Alexa may even fail before it gets to Auren:


In this case, the card shows it missing both "Auren" and "Veldspar":

Alexa misses both "Auren" and "Veldspar"
So it appears that if you can't get your word into the definition for a custom slot, then it's hit or miss as to whether you'll get something usable as an intent argument.

I said earlier that knowing that Alexa is good at translating dictionary words could be a way to fit within the word count limit of a custom slot.  The trick here is to remove all the words from your custom slot definition that are already in the dictionary.  This works because Alexa will fill a custom slot with whatever is translated, even if the word wasn't in your custom list.  So that's one trick we could try to give a better chance of capturing all the in-game items.  Another approach is to incorporate Alexa's strange translations of some words.  For example, we could make "builds bar" a synonym for "Veldspar" in our item map.  Those are improvements that would have to be made over time as we gather more data on how Alexa translates non-dictionary words it doesn't understand.

Last and perhaps one of the hardest problems to solve is accounting for the variety of ways people pronounce EVE in-game items.  This problem is enhanced by the fact that EVE has a large international community with unique pronunciation according to country of origin, as is evidenced by just about every episode of How Do You Say It:



So there's a lot more work to do to make Auren truly awesome.  We'll keep experimenting and post our results in a future blog.

Conclusion

If you can write a simple web service, then creating an Alexa skill is very easy.  A great way to start is to clone the orbital-auren-frontend project on GitHub and customize it to your liking.  Amazon did a great job making it easy to test your skill.  The fact that your skill is automatically available on your Echo, even in development mode, is a great feature.

Where we ran into trouble with Alexa is with large sets of non-dictionary words that we couldn't add to a custom slot definition.  Maybe such large sets are rare, but hopefully in the future Alexa will lift the restriction on the set size for custom slots.

We have a few ideas as to how we might make Auren better at handling EVE data, but we'd love to hear from others as well.  What cool features would you like to see in a voice interface?  We'll keep plugging away and post our results in a future blog update.

0 comments:

Post a Comment