“Hey Magento, what is my stock level?”

I have a new girlfriend… at least that is what my wife has been telling me recently. When I got back from Imagine 2016 a bright shiny new Amazon Echo had arrived. I was looking forward to endless hours of fun trying to program up this new toy. I must admit I was a little disappointed – it was too easy.

Amazon Echo

Alexa

The Amazon Echo is a permanently-on device that listens for voice commands. The default activation word is “Alexa”. You can say things like “Alexa, what is today’s weather”, “Alexa, what time is it”, “Alexa, tell me a joke” (my kids’ favorite at the moment). You can also play music, control some home automation systems, and more. (Yes, you can order Amazon products on it too.)

My goal however was to get it hooked up to a Magento store, for fun. Step 1 I wanted to check the store status, then ask simple questions that map to searches in Magento such as “how many red dresses are in stock”. In this post I run through a summary of the Alexa configuration steps. Details on Magento coding side are not covered here – possible material for a future post.

Getting Alexa’s Attention

Alex supports the concept of multiple “skill sets”. Obviously as the number of available skills goes up, you need to make sure you have a unique phase to identify the skill set you want. For example, you can say “Alexa, ask my Magento store blah blah blah”. A skill must have a phrase that identifies the skill. I first tried “my Magento store”, but I changed this to “my webstore” for this test as it had trouble recognizing “Magento” from “magenta”. (If I said, “Alexa, what is Magen-TOOOEEE” then it would get the right spelling and tell me it was an open source e-commerce platform. Otherwise it would describe the color.)

There are a few forms Alexa understands, such as “Alexa, ask {skill-set-phase} {request}” and “Alexa, {request} from {skill-set-phrase}”. In my initial test, “my webstore” was the skill set phrase.

Identity

When you register a new skill, you can make it available for others to use. During development you can test it on your device using your own account, but no-one else can access it. To make accessible to others, you must submit for official certification and approval by Amazon.

During my initial tests since it is only available for person use, I did not worry about proper authentication. I hard coded my web store into the skill. Obviously this would not be acceptable if you wanted to make your skill available to others. Each user should be manipulating their own store. This is where the Alexa authentication support becomes important. Alexa knows your Amazon identity as the basis of security, but it can also integrate with other services. That makes it possible, for example, to construct a solution where different users can register their own Magento web sites to administer. Alexa has enough information to work out which store to connect to.

Given that anyone can just walk up to the Alexa and start talking to it, you do want to think about what commands you will permit. Asking store status is fine – it has no impact on the store. But you may want to add a PIN number or similar for a voice command that took your store offline for maintenance!

Utterances

So what magical natural language processing engine does Alexa use to recognize human grammar? Possibly none. Sure there are lots of smarts to recognize voice, but the parsing of voice is done for you, it is the correct parsing to determine intent that is hard.

The solution adopted by Alexa is easy. What you do is give it a series of phrases to recognize. For example:

CheckStoreOnline is my store online
CheckStoreOnline are you online
CheckStoreOnline if it is online
CheckStoreOnline status
CheckStoreOnline current status

The first word on each line in an utterance file is the “intent”. If any of the phrases after the intent match, that intent is fired. So, I can say “Alexa, ask my webstore, current status” or “Alexa, ask my webstore if it is online”. Basically you write down all the variations you could imagine saying. No magic.

The following is an example of a more sophisticated set of utterances – a simple conversation. The text in braces is called a “slot”. Each slot has a type to help Alexa work out how to parse what the user says. For example, “4 digit number” is good for years. You can also define a list of words or phrases that are valid. Consider the following utterances.

CheckStockLevel how many {item} are in stock
CheckStockLevel how many {color} {item} are in stock
CheckStockLevel how many {item} do I have
CheckStockLevel how many {color} {item} do I have
CheckStockLevel what is the stock level of {item}
CheckStockLevel what is the stock level of {color} {item}

A list of color names can be supplied for the {color} slot. For a store, a list of item names can be supplied, like “shoes”, “jackets” and so on. (This actually could be a problem for a general “Magento Admin” skill as different stores are likely to have quite different inventory.)

The above utterances can then be used to accept requests such as “Alexa, ask my webstore how many red dresses are in stock”? “Red” binds to the {color} slot and “dresses” binds to the {item} slot.

Web Services

Okay, so we have registered a new “skill”, we have entered a series of utterances and slot vocabulary lists, now what? Alexa allows you to do one of two things – use Amazon Lambda to process requests or post requests as JSON to a web service. I chose the latter, using a service glued directly into a Magento instance. There are a number of fields that come across in the request, including the “intent” name above plus the known values for slots are included. (For example, not all utterances above included the color name so my service should cope when the color is not provided.)

It was pretty pretty easy to parse the JSON request from Alexa and formulate a JSON response in PHP. The response includes (1) the text to say out the speaker using a computer generated voice (which is pretty good), and (2) a “card” to display in the Alexa app. Cards allow you to return other information such as images. Users can also refer back to that history if they wish using the cards.

Again, pretty simple.

Interaction Styles

Alexa supports single statements and conversational sessions. “Alexa, ask my webstore how many red dresses are in stock” is an example of a single question where the response is returned immediately. Sessions can however involve multiple interactions.

This could be useful to pick and pack orders for example.

Me: “Alexa, ask my webstore what is the next order to pack and ship?”

Alexa: “Order 4242 contains three items. The first item is SKU 18441, red felt tip pen.”

Me: “What is the next item?”

Alexa: “The second item is SKU 34142, 200 page note book.”

Me: “Repeat.”

Alexa: “The second item is SKU 34142, 200 page note book.”

Me: “Next item?”

Alexa: “The last item is SKU 34112, a yellow highlighter.”

Me: “Print shipping label.”

Alexa: “Shipping label printed. Marking order as packed ready for shipment.”

Each request/response JSON message includes a session id to identify a conversation (which is critical if the service is being shared). Each response also includes a flag to indicate if Alexa should wait for more input from a user. For example, in the above example all responses would have included the flag to “keep session open”, except for the last message where no more interaction was expected.

This sort of flow requires the web service to track state. A good analogy is a shopping cart. When the session starts a new record (like a cart) needs to be created. The “cart” then tracks state until the user is finished. Then the cart (user session details) can then be discarded.

One minor inconvenience: the JSON is not compatible with the Magento2 web service support. Magento 2 by default returns identifiers as “session_id” (“snake case” with underscores), rather than “sessionId” (“camel case” with humps). So these experiments were done using very simple JSON parsing. There are a few Alexa PHP libraries available – I could imagine fleshing out one of these libraries to deal with state tracking better, as the actual contents relevant to an application is pretty minimal.

Hardware Deployment

All up this first experiment was pretty easy to implement. That longest delay for me was actually to get a HTTPS based web server (with a trusted certificate) running as Alexa checks the certificate to make sure it has connected to the right service. I ended up using all AWS services for simplicity. In particular, I used:

  • A small EC2 node to run the Magento web site and web service.
  • A load balancer to sit in front of my EC2 node, configured to received HTTPS and forward as HTTP to the web server.
  • If you use an Amazon load balancer, AWS can provide the HTTPS certificate for free!
  • I had to register a domain name to go with the certificate as well. (I used “alexa.alankent.me” for this experiment.)
  • You register your skill with the Alexa developer cloud service, including utterances, slot lists (legal values for each slot), and so on.

In the end, it was pretty easy. Amazon encourages developers to use Amazon Lambda – which only charges for CPU resource when code is executed. That may be a sensible model as a layer between your web store and the Alexa service. This layer would do things like look up your user identity to work out which store instance to administer.

Conclusions

Well, my new girlfriend (ahem, I mean Alexa) certainly makes it easy to build a voice control system. The API is very simple, the text recognition and generation algorithms are pretty good, and the whole process for a simple application did not take long to get going – less than a day for sure.

I started off this post saying it was almost boring. Frankly, once I got the web server running with a HTTPS certificate, it only took 5 minutes to get a simple canned response returned.

The real challenge is to work out issues like:

  • Is this actually useful in real life? In a noisy store, would the device be able to recognize customers? Or could it be more useful as an administrator tool?
  • Is it likely there will be one “skill” for many people to share, or should each merchant have their own skill? The latter allows better customization of product description utterances per store.
  • Is authentication required?
  • How to track a session across multiple web service requests? (Using database records for maintaining state is the most likely approach.)
  • How safe is a voice control system for performing administrative operations?

After Imagine I wanted a bit of a break before getting back into “real work”. Alexa provided that. If you were wondering how hard the Alex side of building an application is, it was simple to get it going. I spent less than a day learning all the concepts and tools and getting a first web service running. The bulk of the work is building the code to do implement actions to perform when utterances are made, and implementing concepts like session state tracking to make “conversations” possible.

All up, I was impressed. A fun little device. (But don’t forget that you should talk to your wife more than Alexa, if you want to remain on good terms.)

4 comments

  1. Have you ever produced any samples for this

  2. Thankyou for sample I have merged this in, apologies for all the starter/newb questions can i just confirm what the endpoint address would be at amazon as I tried various and it fails to return a response

    1. Ignore that, realised error in my way

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.