AI Ecommerce Platform Architecture Musings

At NVIDIA GTC 2024 (GPU Technology Conference) I watched a number of NVIDIA Omniverse and OpenUSD related sessions (my hobby interest, which I blog about elsewhere), but there were also some thought provoking sessions on AI. It raised the question, what would a new ecommerce platform look like if designed today, taking into account the recent advancements in AI? NVIDIA NIM I think gives a feel of what this might be like.

What is NIM?

In the GTC keynote, Jensen Huang covered a range of topics, including NIM. NIM is not a new AI technology. It represents standardization of deployment and management of existing AI technologies, leveraging existing technologies like Kubernetes. While not as sexy as a new AI model for generating video from a line of text, it is important for productionization of AI. Businesses need technologies that are reliable in production. It falls more into the MLOps world (DevOps for machine learning).

Why does AI need a bit more help? Why not just run out and build your own Kubernetes pods? You can of course, but I assume they have done work to help abstract access to GPUs. Different deployments might have different GPUs available, so how to make pods and containers portable? This to me is the value of standardization – reusing knowledge across deployments. (Note: I have not used NIM yet, so I don’t know how good it is — I am more interested in the trend.)

NIM is NVIDIA’s standardization efforts in that area. They have a catalog of AI microservices that you can deploy in multiple environments from desktops to cloud.

They also describe the goals for NIM. They even offer a free online service at ai.nvidia.com to play with microservices in their catalog.

Are AI system architectures different?

Scattered during the keynote, Jensen mentions NIM (and NeMo for training models). Some of the mentions describe how NVIDIA have a number of NIMs deployed internally. He describes how the services often include a language model fine-tuned for that domain, allowing you to chat with the services and ask them questions (not only have an API for other programs to use). This starts to feel more like an AI agent approach. Instead of just having “dumb” services that do exactly what the API says it will do, have a series of agents that solve one problem and “talk” to each other.

In a separate post, I summarized two sessions from GTC on robotics from Google and Disney. They described new robotics approaches based around generative AI advances, particularly Large Language Models or LLMs (ChatGPT and Gemini are example LLMs).

One approach for AI is to have a mega-model that you train on lots of data and it comes up with the final result. It does the full end-to-end solution. The problem I have with this approach is what if something goes wrong? How do you debug it? Breaking the larger problem into smaller components reduces the risk as you can test each component in isolation. For example, with robotics you could have one component for vision recognition (interpreting what a video camera shows), another component for planning strategies to problems, another component to turn action commands into servo controls on a robotic arm, and so on.

Note: There is a counter argument that it is good to combine things into one solution. In the Google robotics presentation they were looking at multi-modal generative AI (e.g., feed an image and text in as a request) to merge the planning, actuation, and perception stages into a single model. Personally it makes me nervous when a single system gets too complex.

One of the interesting aspects of the two presentations was by using generative AI to turn human instructions into a limited language of more specific commands is you could understand what it was instructing the next component to do, and you could chat to each component (in English) to get a feel of why it made that decision.

For example, the vision component can answer what it sees in a scene, describing objects and their positions. The planning component can show how it broke down a request (“make me a cup of coffee”) into a series of individual actions (“I am going to turn the kettle on, I am going to locate a cup, I am going to select the coffee to use, …”). It was understandable to humans. Personally I find that useful. It opens up new opportunities for insights and debugging, both of which I think matter in production.

How about for an ecommerce platform?

A traditional ecommerce site has a number of components: a product catalog, a search service, a pricing engine, a cart, checkout and payments, etc. Each of these components could be turned into an AI component (a NIM).

Image from: https://www.gomage.com/blog/composable-commerce-vs-headless/

A possible way then for a merchant to interact with the deployment is to describe their requirements in English, chatting to the platform. “For the next week, put all hand soaps currently priced over $4 on sale for 25% off. Other hand soaps offer buy 2 get one free.” Instead of learning the UI, describe what is wanted in English. Merging AI into the various components above may make ecommerce systems easier to configure, allowing businesses to be more nimble.

Is AI a good approach for shopper transactions? Not sure yet. It’s cute, but is it efficient? Do you really want every transaction to go through a LLM? Probably not. There are of course other AI models that are more efficient for specific tasks, but even then totaling prices in a cart is more like a calculator – is AI really needed for that task? It may be more practical to use AI to turn human requirements into rules for a rules engine to follow. The merchant can still chat with the LLM to explore and define, but use simpler code for cart logic. Even better, continue to use an existing technology stack rather than build a new one. Use AI for merchant configuration and data insights, not shopper transactions.

Of course there are parts of an ecommerce deployment that can benefit from AI, such as personalization. Which product image should you display in search results? The user’s preferences could be used to choose between multiple available thumbnail images. Sneaking in AI into different parts of the architecture is why I like the approach of multiple components. You still want to be able to replace individual parts without fear of affecting other parts of the total architecture.

Are we there yet?

I found the GTC conference interesting in that it was not a research conference. It was an industry conference. It talked about technologies that are either coming soon or already in existence, with a focus on adoption. NIM is an example of this. They are building up libraries of components to make it easier for companies to leverage and deploy AI technologies.

Do I expect NIM to be perfect? No. I am sure there are lots of outstanding issues to resolve. For example, consider an AI-powered language-driven configuration experience for merchants, fine-tuned to the merchant’s specific business. Does the container need to be running 24×7? That feels wasteful. But if you shut it down after a period of inactivity, how fast can the container start up and load a language model with fine-tuning on demand? These are practical cost efficiency questions that still need working through.

But the recent advancements in AI, and standardization efforts like NIM, do open up interesting possibilities, breathing a bit of life into ecommerce system architectures. Shopping carts today are commodity. Rapid AI progress however is opening up other opportunities which is interesting to follow. It will be interesting to see which platforms make these services available to their user base first.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.