There is a lot of noise these days around containerization technologies like Docker. I this post I explore why I believe such technologies are reducing the gap with SaaS based solutions. I explore SaaS and Docker based solutions in particular, aware there are a myriad of options such as dedicated managed hosted solutions. My goal is to show how the gap between containerized solutions and SaaS is reducing, not provide a one-solution-fits-all solution.
SaaS (Software as a Service)
In a nutshell, SaaS is where one application is run providing a service to many customers. Gmail is a great example of a SaaS solution. Customers do not need to be concerned how the service is implemented – they just know they can use it.
The following are particular characteristics of SaaS solutions that help drive efficiencies, and hence lower costs.
Multi-tenancy: SaaS based solutions are typically multi-tenant. That is, the one set of machines supports multiple external customers concurrently. Data in the database tables may have a ‘tenant id’ or similar, where the application ensures that the tenant id is included when accessing the database so a tenant (client of the service) never sees the data of another tenant. Multi-tenancy typically delivers greater efficiency in resource consumption as it effectively allows fractions of a CPU/IO channel/etc be allocated per tenant. For example, if one application needs effectively one and a half servers and another needs only half a server, then two servers is sufficient. Without multi-tenancy, three servers would have been needed (two servers for the 1.5 server load, and a separate server for the half server load).
It is worth noting that a multi-tenant based solution is likely to have more complex database queries as the tenant id must be included in every query, used in database join operations, and so forth. So while a database may be shared across tenants, query complexity is likely to increase reducing the performance of queries. This has the potential to result in worse performance.
Availability: SaaS based solutions typically include redundancy in the architecture. Multiple servers makes it possible to lose a server and still have other servers taking the load. For example, if I have 10 servers and one goes down, the impact is much less than on a site with say only 2 servers. So for small sites, SaaS can offer a more cost effective, high availability solution by sharing redundancy of servers and hardware across multiple tenants.
Less Flexibility: SaaS base solutions typically offer less flexibility in solution design. They typically define a template for all customers to follow. This is a part of how they achieve greater scale with lower costs. Making a SaaS site too flexible can lead to problems isolating the impact and changes made by one tenant on other tenants.
Containerization is the most recent trend in virtualized solutions. Virtualization allows physical servers and infrastructure to be shared between different tenants, providing OS level access within each virtual machine. Containerization is going to the next step, where developers can ship a complete container including pre-installed system software (PHP, Apache, etc) along with configured applications. Preconfigured containers reduce the need for users to access the OS directly.
My personal expectation is over the next year or so there will be more standardization in monitoring and health metrics, as well as logging support. This will allow definition of black-box containers, with less and less need for customers to access the OS. Just like a SaaS solution, the customer should not need to know what is inside the container. They should just be able to select the containers they want and spin them up.
The following are existing and emerging characteristics of container based solutions that I expect will help drive efficiencies, and hence lower costs, reducing the gap between SaaS solutions and containerized solutions.
Performance Isolation: One of the risks with SaaS is the workload of one tenant can impact other tenants. This risk is reduced with containerization. A customer has greater control over the hardware the container runs on. They can run in a VM based cloud, on dedicated hardware, etc. The customer can choose.
Granularity: It is getting easier to develop smaller containers. With Magento, example containers may include Varnish, Redis, MySQL, and an Apache Web Server (with the Magento application). This means a customer can start small with a number of containers on a single machine, and later move containers to different machines to spread the load when required. This is possible because the complete application is made from a number of containers. Containers are bigger than a program, but smaller than the complete solution. Having smaller containers also makes it easier to create different topologies.
Availability: For a small site, a container based solution is likely to provide less availability than a SaaS based solution. However, as a site increases in capacity, it is possible to improve availability by having multiple redundant light-weight containers. Containers are different to (most) virtual machines in that a container does not have to use up a full CPU. Containers have low overheads and are finer grain than typical virtual machines, making it feasible to have greater levels of redundancy with lower hardware overheads.
Security: Containerization provides a different security profile to SaaS based solutions. It is common that if a hacker breaks into a SaaS based solution, then the data of all tenants is compromised. With a container based solution, each customer has a separate account with a hosting provider (assuming a public cloud is being used), or even potentially a private cloud. A hacker breaking into one customer site does get access to sites of other merchants.
A container based approach may also be less enticing to a hacker in that a SaaS based solution has a bigger pay-off if the attack succeeds. So hackers may be more attracted to SaaS services.
Note that I say the security profile is different rather than better. A SaaS based solution may have dedicated staff to help with security issues. A container based solution depends more on the skill of the customer’s staff (or the partner they outsource such work to). So the risks are different. You need to decide what matters more in your situation.
Support Services: A SaaS based solution is likely to offer better support services (such as support for issue tracking and resolution, database backups, software upgrades, OS patching) than a container based solution that the customer sets up themselves. This can be mitigated by use of a higher quality hosting partner, or a system integrator able to offer such services. Over time I also expect containers to be designed to require less specialized human maintenance.
Hardware Costs: Over time, the cost of the hardware is going down (especially cloud hosted hardware). This means that the hardware savings potentially offered by a SaaS solution may translate to smaller percentage of the total cost of ownership. That is, over time the cost of the hardware itself is likely to reduce, not increase.
Magento makes an interesting case study. Magento is well known for its flexibility. It is common, for example, for a site to load up a number of additional extensions, where it is common for such extensions to add columns or tables to the existing database schema.
However this makes providing a multi-tenant SaaS based solution harder.
For this reason I think containerization technology such as Docker is such a good fit for Magento. Containerization provides great flexibility with some cost savings. SaaS based Magento solutions are certainly possible, but typically reduce the flexibility for customers. This tradeoff is typically best made by consulting with experts (such as a partner).
There is no doubt in my mind that SaaS is here to stay, and a good solution in many situations. I do believe however that the trend towards containerization of applications is going to reduce the gap between a customer running their own separate copy of an application over a shared SaaS solution. Today I think “common knowledge” is to go multi-tenant. Tomorrow however I wonder whether containerization may make it just as efficient to spin up containers per customer.
I also expect that infrastructure services will improve over the next year or two. Monitoring and logging are two areas I expect movement. It is important to monitor a site to detect potential site problems (this fits also into projects such as Apache Mesos), and it is important to have access to logs to diagnose issues when problems occur (using a solution such as LogStash).
The other weakness of containers in the cloud at present relates to persistent storage. This may be files on disk or a database. It is a common application design to put all persistent storage in a database of some kind, making it easier to add and remove application nodes as required. This means application nodes can typically be removed or added at any time with minimal impact on the site. This is not true where there is persistent storage, such as the database server. If the database is hosted in the cloud, you cannot just shut down the node and start a new node up somewhere else. The persistent storage must either be separate to the server (and so able to be attached to a different server upon restart), or the database technology needs to provide HA services (e.g. be a clustered database with multiple nodes for HA) by replicating data over several database server nodes allowing any node to be taken down without actual loss of data.
What is pretty clear to me is that containerized solutions such as Docker provide an interesting alternative to a SaaS-based solution. As more and more tools become available for use with containers, the benefits of SaaS are reducing. As above, there are many factors to consider when picking which solution is best for you. I think the containerization industry still has a lot of exciting things coming up as the field continues to mature. Interesting days!