Docker, Distributed Systems, and Why it Matters to Magento (Part 1)

Magento Docker PanamaxDocker is a really interesting technology for scaling Magento installations. Docker is standardizing the way to ship application functionality. It goes further than having an installer and setting up a machine with an operating system, the application, and all the configuration settings by providing a way to package all that configuration into a single “container” that is easy to install and run. Docker also abstracts the linking containers together allowing the containers to be combined in different ways without change to the container itself.

Disclaimer: The following is personal opinions and not necessarily that of my employer. There is no commitment implied here about what the core Magento team is going to officially support.

Docker is not perfect. Docker (and VMs in general) definitely chew up the disk space by having an OS image per container. That is one of the reasons for emergence of minimal Linux distributions (distributions with just enough to run say a web server, and nothing more). But these days disk is not your main expense. The labor cost to set up such systems can easily dwarf the hardware costs. So getting a predefined MySQL container you can just reuse with optimized settings for typical Magento installations can save you quite a bit of time and money. And if you do need to tweak some settings, you can build your own image by extending an existing image (you don’t have to start again from scratch).

If you want to learn more about the purpose and vision for Docker (rather than technical details) I recommend the following 20 minute video. There are lots of good points made, but in particular what resonated with me the most is that Docker is making it easier for developers to build distributed systems, without needing developers to be distributed system experts.

Magento and Docker

For those not familiar with Docker, let me pause and make Docker a bit more concrete in the context of Magento. Using Docker, you may define the following containers: Magento web server (you could decide to have separate web servers for store front and administration for security reasons), Varnish cache, Redis cache, and MySQL. Each of these may be bundled in its own Docker container. You would build your own Magento web server container (building upon a predefined template) to incorporate your local customizations and purchased extensions.

Why is this useful? Let’s say you have a great idea of a new business, but you are really not sure what demand there is yet.  You start with almost no traffic so only need a small store. In this case you may get a single VM from a hosting partner with a MySQL container (with attached storage) and a web server running on the same VM.

After a while, business starts to grow. Your site seems sluggish so you decide to invest further by moving the database from the VM to a separate VM with better disk throughput or perhaps even a dedicated server. As your business continues to grow you scale up by adding more web servers. You also add a Varnish cache in front for speed.

You hit the big time. You get a spot on a TV show and your traffic leaps. You can add more web servers, more Varnish caches, and you consider upgrading the database server to top of the line hardware. You decide your current hosting partner has served you well, but you need the next level of service. You need to pick your whole application up and drop it in a new data center.

You get noticed internationally. You start with English speaking countries (UK, Australia, Canada, New Zealand etc) but as demand grows you add support for additional languages to tackle Germany, China, India, and so on. For site speed, you want to get as many web assets as possible close to the customer. CDNs do this for images – you try moving Varnish cache instances to data centers in those countries for an extra boost. You now have your Magento instance internationally distributed across multiple data centers.

The Point

The point is many of the above steps only required moving containers to different machines and reconnecting them. There will always be a place for optimization and tuning by experts, but keeping the total effort down saves costs.

Having technology like Docker makes it easy to take your existing applications and preconfigure most of the settings, with only minimal changes to connect up multiple instances. Having the framework make it easy to design a distributed solution without much effort means you can design for the large scale and yet easily implement the same pattern at the small scale. This makes upgrading easy.

The observant would have seen above that one of the hardest areas to scale with Magento is the single master database. Scaling up hardware will get you a long way. After that, I have my eyes on “New SQL” technologies such as Clustrix to see if they solve the database scaling issue without having to introduce more complicated techniques such as sharding into Magento. Having experts solve the database scaling problem keeps Magento application and extension development simpler.

Reasons for Distributing Magento

The above highlighted a number of reasons why distributed Magento is important:

  • Horizontal scaling for growth. VMs are typically cheaper than dedicated hardware. That means it can be more cost effective to incrementally add low cost VMs to cope with growth in load. If you have a special promotional event, you can add more VMs for the duration of the event, then let go of the hardware when the traffic dies down. Many hosting partners offer hourly as well as monthly rental plans on VMs. Dedicated hardware has longer lead time and typically longer financial commitments.
  • High availability is another reason for horizontal scaling to at least 2 nodes. If one server crashes, the other can take the load. (The database again is the most complicated problem here – if your master MySQL database goes down you cannot take orders. Again, technologies like Clustrix are interesting here with its claims of horizontal scalability and high availability built into the core design.)
  • Geographic distribution is another benefit. It is known site responsiveness can improve conversion rates. To get better performance, it may be worthwhile to locate either Varnish caches or web servers in data centers closer to your target customer base.

I also like making it easier to migration your application to a different hosting partner. Any move has its risks and so it is not something you would ever do frequently, but hosting partners are not silly. They know to keep your business is important to them. Having your application easier to migrate helps keep the hosting partners focused on doing a good job to keep your business.

Coming Next

In part 2 of this post I will talk about managing a distributed installation and where I see some of the gaps between where we are today and where I can see Docker support for Magento going.

Want to give Magento 2 on Docker a try? Please see my previous post Magento 2 on Docker and Panamax. If you do give it a try, please leave feedback on that post. I would love to hear the honest real-world experiences of others – was it a positive useful experience for you?

2 comments

  1. Matthias Zeis · · Reply

    Hi Alan,

    you have a typo in your link – it should be http://www.clustrix.com/. Feel free to delete the comment after you edited the post.

    Thanks for sharing your thoughts, great article!

    Matthias

    1. Thanks, fixed.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.