Memorial Day Musings: How to move the ecommerce industry forwards?

Karen Baker posted on LinkedIn about what ecommerce platforms need to succeed. Here are my Memorial Day musings for your entertainment.

Support not Kill

Let’s start with a controversial topic! One comment left on LinkedIn was the solution should support, not kill, existing products in the market. A fine principle. I disagree however. I think this is a business reality, not a technical consideration. It can also lead to stagnation. If one partner builds a shipping management solution, does that mean no other partner can? If a partner builds a bad staging solution, does that mean the platform cannot come up with a more general solution that everyone can use? If someone builds a simple solution with limited resources, does that mean a more advanced solution cannot be created with more resources?

I think such decisions are business decisions, not architectural decisions. I think a solution should support different models, not purposely decide to stop competition.

And what about existing ecommerce platforms? Is the goal of a new platform to compete with other platforms? If you say “no, don’t hurt existing open source ecommerce platforms”, then this changes what a new solution would be.

SaaS or Open Source

I think a solution should allow open source to exist, but the reality is if someone provides are great SaaS solution, it is less work for merchants who should be focusing on their business. Also, many extensions are powered by SaaS offerings, even if connected to from a self-hosted open source platform. So a solution should play well with both.

Monolith or Services

Many open source ecommerce platforms were designed pre-cloud. Platforms such as Magento are a monolith, with connectors out to external systems. Monoliths have pros and cons. Its much easier to manage, deploy, fiddle with rules in a monolith. Deployments are atomic. The service may be on a single machine. When you decompose all the logic into small services connected say by message queues, you now have many individual services that each need monitoring. Kubernetes and similar reduce the management effort, but not back to the level of a monolith.

So I think there are problems still to be solved in operational and monitoring infrastructure for a cloud-native open source ecommerce solution to take off and gain traction. Deployment, monitoring, backups – these all need to scale from one piece of software to many, efficiently.

Understandability

I think a solution should be able to be understood by merchants. Not the technical implementation details, but metrics should be understandable business metrics. Orders per second, inventory levels, and so on. If you have many services, they should interact using business concepts that are understandable to everyone in the room. A part of the effort may therefore be standardization of jargon, and defining architectural patterns that can be explained to non-technical folks. It helps keep everyone on the same page.

Diversity of systems

Another reason that I think monoliths cannot be relied on into the future is there is an increasing need to be where the customer is. You can have an online ecommerce store. Great. What if you have an in person event? What if you open a physical store? What if you do pop-up kiosks at the local farmers market? What if you want to sell directly from a YouTube video? What if you want to create a Roblox in-game store? Want to sell your products on those new swanky AR glasses? The places where touch points can occur with customers, for many businesses, is going to increase.

For businesses who want agility, monoliths can be still be used, but the reality is you are going to need to hook many different systems together. I see this as increasing over time, not decreasing. This is a driving need. Its not that existing platforms are getting worse, its that the word is getting bigger and more interconnected.

Detecting problems

People often think to the happy paths. How do I come up with a clever new pricing and discounting model, make sure everyone implements it. Done? Great! Let’s go!  I actually think that is the easy part. The hard part is dealing with the unhappy paths.

How to cope with system failures? Do you go super redundant? If something breaks, take everything down until its fixed? I don’t think that is a reality in the modern world. If you are using a third party shipping SaaS offering, you have to deal with independent systems. So a solution should embrace that model.

And even if you solved hardware failures, what about bugs or hackers? Things go wrong. Retail stores have always had to deal with this. Stock may be stolen or damaged. Their solution? Do a stocktake periodically and reconcile what is on the shelves with your what your systems think.

I think strategies such as reconciliation should be built in. This can be expensive! At the end of each month, tally up all the money you took from customers from your payment provider and all the orders you fulfilled (dealing with those nasty edge cases for systems running 24×7 – did you take a payment just before the end of the month, but process the order just after). This may require your payment provider to provide you a reconciliation service. You are checking your system by comparing what different systems think is how the real world is, spotting differences between them to be dealt with. Don’t pretend to build systems that never go wrong, build systems that can detect inconsistencies so they can be handled.

But that is a big hit on your infrastructure if you hit you main database! Process the last month of transactions in a few minutes or hours? I think strategies need to be agreed to so services can build efficient solutions to such problems from the beginning. Maybe there is a flat file with all transactions you can download for reconciliation instead of hitting the main database. But everyone in the overall solution needs to play their part.

Recovery from problems

Recovery from failed systems (e.g. from backups), from deployment of bugs (all tax calculations were rounded incorrectly), and so on is hard. Such reconciliation is not fun, but I think should be built in. You need to have some form of documented adjustments that is a part of your regular flow of events.

Again, many systems already have such solutions in place. I think they just need to be standardized and brought to the forefront, not hidden in back corners. There should be an expectation of failures, and well defined and agreed to practices to deal with them. Renting a warehouse? Expect rain damage at times and have APIs or flows that cope with them.

Non-atomic deployments

I touched on this already, but when you have multiple systems run be different vendors working together, you have to deal with systems updating with different schedules. Patterns should support well known patterns such as sending a small percentage of traffic to a new implementation to see how it behaves, or sending all traffic to two copies of a service to do parallel runs. Message queues are appealing here because you can fork traffic flows in different ways without the systems being aware. (You can however doing it with webhooks – it just might mean the platforms have to support common patterns directly. Duplicate traffic, split traffic by percentage, etc.)

Non-atomic deployments mean you have to deal with concepts such as schema evolution. There is no hiding from it. If you are trying to introduce a product attribute that should eventually be mandatory, you may need to first introduce it as an option attribute, make sure all systems understand and handle it correctly, then flip the flag to make it mandatory. New features may need coordinated releases across multiple systems in multiple phases. Get that knowledge out there as a standard practice for everyone in the industry.

Dealing with bad data

There are also patterns like how to deal with bad data. Do you just throw it away if an order is missing a mandatory field? If you are using message queues rather than synchronous APIs that will return a failure immediately, then message flow patterns need to include asynchronous failure modes. Have another queue for “bad” orders. Design that in. Did you take the customer’s money already? Can you fix the order manually? Or do you just cancel the order and send an email back to the customer? These complexities are real.

Staging

I think there are some hard problems that are worth thinking about at the platform level. Do you want to be able to stage an upcoming sales event to test it before it goes live? The underlying data model needs to support such concepts (dates when things go live and go away). Do you feed a copy of product and deal updates to different systems? One optimized for production, the other for small numbers of users but where you can adjust the current time to see what the system will look at different time points in the future?

My Solution

I woke up this morning and wrote this down, so don’t expect a deep well thought out solution here! Expectations set? Great, so where do I lean?

  • My solution would be a set of patterns for *existing* systems to conform to, to reduce integration effort.
  • I would define a set of standard base data types for orders, inventory, customer profiles etc, and define them in a similar way to schema.org. Schema.org is pretty loose, but documents agreed meaning of data fields. Specific implementations then impose more specific rules (I support these fields, not those ones, this one is mandatory, etc.) Think of it like blobs of typed JSON, where the type system is extensible. An order management system might say “I understand the following schema.org order type, but addresses inside an order are an opaque blob to me that I will pass along – so I work with any address format!”
  • I would define patterns for flows between systems. Define patterns for reconciliation, bad data, schema evolution, staging, etc.

I am not yet convinced the problem needs another software vendor. Do we really need another shopping cart implementation? Another web storefront? 

I wonder whether the real way to move the industry forward is to get smart folks from existing vendors to meet together to hash out more agreement on data formats and interaction flows between systems. Come up with agreed patterns on topics such as schema evolution and system reconciliation. Can we all agree what an order is? A receipt? An address? How to deal with errors? How to reconcile systems efficiently?

Is it in their business interest to do so from vendors? Hmmm. Maybe not. It might not be the vendors who would drive such an effort, but rather the merchants/agencies/vertical tech partners that support them. Ignoring dark topics like platforms benefiting from lock-in, they have other priorities and limited resources.

Until the benefits are clear, the return on investment is clear, I don’t see a clear path forward for such musings to make any real progress. I think such an effort has to come from those feeling the pain the most.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.