Serverless programming is a model where you write code without having to think about what servers you are running it on. You just deploy the code, and the serverless provider worries about how many servers to spin up for you based on load. The provider also performs automatic restart of servers that fail. AWS Lambda and Google Cloud functions are both well known examples of the serverless model. As well as the simplicity this model offers to developers, it also has the advantage that you don’t pay for CPU resources while no requests are coming in. The combination can be very appealing to some use cases.
While there are many real benefits, this blog post explores some of the challenges the serverless model introduces. For larger scale applications, it is important to understand the pros and cons of the serverless model. Simplicity and consistency in infrastructure has real value at scale – to help make complex systems easier to understand and monitor.
My conclusion? TL;DR: Serverless is great in some use cases, but not a magic bullet. (What a surprise!)
Disclaimer: While I work at Google, I am not a part of the Google Cloud team. This post contains personal opinions based on generally available information. I was just thinking about this on the drive home one day so decided to write it down.
Serverless Auto-scaling
How does auto-scaling magic happen? In practice it is not that magical. Your code still runs in a normal web server. The web server is just started up for you when a request arrives, your request is processed, then the web server with your code is kept running for a while longer in case another request arrives. If the load on the server gets too high, an additional server instance is started up on another machine (auto-scaling). How long a web server is kept running is up to the provider. Whether redundant servers are run in the same or different data centers is up to the provider. There is an implication/assumption that the start up time of a serverless computation unit (a “Cloud Function”) is low(ish).
Serverless services typically cost more than getting a dedicated machine if your service is sustaining heavy, consistent load. This is because someone has to pay for the resources (e.g. memory) the web server is using when still running but not serving requests. So it is worth doing your maths carefully if you are designing a solution with a heavy consistent load.
Event Processing
One conceptually clean use case for the serverless model is when you receive events such as a message arriving on a message queue. It is conceptually simple – process events as they occur. Don’t worry about how to auto-scale to cope with load. If the arrival rate of messages go up, the infrastructure will spin up new instances for you. When the load reduces, the number of concurrent instances decreases again. It works nicely. No monitoring required.
Monitoring
Regarding “no monitoring required”, okay, the reality is this is not completely true. Typically you do want to think about monitoring as you are still responsible to your customers. If the serverless platform is misbehaving, or your code is getting stuck/slowing down, you do want to be aware of it as quickly as possible. So monitoring may be simpler under the serverless model, but not completely gone.
Further, for larger applications, you may find that not all of the application fits the serverless model. For example, serverless generally has the model where there are inputs to a computation and outputs that represent the result of the computation, without side effects. (That is why they are often called serverless “functions”.) Repeating the computation should yield the same result. Serverless code is generally best to be stateless.
Databases on the other hand are all about side effects (when you are updating the database contents). So a database, or caching technology in front of a database, does not fit the serverless model well. They are services accessed from serverless functions, but better fit the more traditional long-running-service model. You don’t want to lose resources just because no requests are currently being processed. As soon as you have some components that are not serverless, that means they need to be monitored differently to serverless code.
So a consideration that should be taken into account is whether there are some parts of your solution that do not fit the serverless model. If so, ask yourself whether the benefits of serverless worth the negatives of having too many different programming models in the one application?
The Cold Start Problem
When serving HTTP browser requests from users you should be aware of the “cold start” problem. A cold start is where the serverless provider has to start up a web server to run your application. If a system generated queued message has to wait for a second or two to spin up a new server to run the code, it generally does not matter. If a user is waiting on a web browser for a response, delays are less desirable. It may not be unacceptable for the occasional request to be delayed, but you should at least be aware of the delay when designing your app.
One solution used for such a delay is to have a “ping” service that sends a request through periodically to keep the service warm. This can stop the number of web servers reaching zero, resulting in a cold start. It’s a bit hacky, but does work.
In addition, I would not be surprised to see providers provide an option to guarantee at least one instance is running at all times (for a cost) to avoid the cold start delay. It goes against the grain of “pay for what you use”, but as serverless programming matures I expect these sorts of issues to come up and be resolved. Not every customer will make low cost the number one priority.
Reading various blogs on the cold start problem it appears cold start can also occur when additional servers are spun up. It sounds like at least some platforms will queue requests waiting a new server instance to start up to process the request. Ideally the new web server instance should be given a chance to start up and fully warm any internal caches before requests are passed to it. Until then, requests should be given to existing servers as they may finish their current workload before the new server is ready. The new server is then really to cope with the predicted load of future requests.
If this is what is happening today, I expect it to be solved by the platforms. The problem can be addressed by predictive modeling based on the cold start speed of a service, traffic flow rates, and variation thereof to reduce the probability of insufficient resources being available. The problem would be simpler if spinning up a new service was instant to cope with load, but I think application developers should think about the problem more carefully if they have wildly fluctuating traffic levels. For example, is the developer trying to minimize cost, or deliver the best customer experience? By how much can requests be slowed down due to bursts in the queue?
Web Servering
For a HTTP web server request from a browser, the cold start problem might not be too serious a problem if a cold start is in the order of a second or two. If your traffic rates are so low that cold startups are occurring frequently, then cold start delays may be the last of your business problems!
It is however worth measuring the performance of your code base to see how bad a cold start is. The cold start overhead is influenced by multiple issues, including the programming language your application is developed in.
Fault Tolerance Considerations
There are other issues that need to be taken into account when considering fault tolerance and resiliency. What happens if a machine fails half way though processing a request? Should the processing be restarted? If triggered from a queue event, will the event be lost? What if duplicate messages arrive on the queue? (Many queues can result in duplicate events to recover from error scenarios.) Serverless programming does not remove the need from developers to think about such concepts.
Cloud Neutral Serverless Open Source Solutions
One interesting development is the emergence of open source platforms for building the serverless model on top of Kubernetes. That provides a degree of cloud provider independence to your code base, but you do lose some of the serverless benefits of not having to worry about servers (you have to manage the Kubernetes cluster).
A more serious issue is whether the serverless platform has good integration with cloud services, such as message queues. If you cannot easily trigger a serverless function when a message arrives on a queue (or other cloud provider generated events), using such a serverless platform can be of less overall value.
Conclusions
All up the serverless model is very interesting and continues to mature. It certainly “feels easier” to get a simple application built. But for larger products where aspects such as performance matters, it is also important to understand the limitations of the serverless model. As always, it is not a silver bullet. Use it only when appropriate. And it does not eliminate the need to think about concepts such as eventual consistency, event sourcing, caching, monitoring, logging, and disaster recovery.
So do I personally like the serverless programming model? It appeals to me for asynchronous event triggered background processing, but it is less clear if suitable for a request/response RPC call pattern (including the HTTP request of a web browser) where returning a fast response matters. For example, for an e-commerce site, keeping the code thin between a web client and the database holding product data makes sense. This may also be true for shipping, tax, and cart discount computations.
On the other hand, background processing of orders, inventory management, stock level updates, and product data updates all seem reasonable candidates for reliable queues joined with serverless functions. (Reliable queues are needed to make sure events are never lost until successfully processed by a serverless function.) Don’t worry about when your physical stores Point of Sale systems call in with stock updates from in-store sales, your backend will scale to ingest the updates quickly, then spin back down. Over time, more of such systems can move to real-time over batch, providing more opportunities for stock level optimizations across multiple sales channels (e.g. less “buffer stock” is required).
I think life will get more interesting when you can combine services like serverless functions with cheaper offline compute costs. “Cloud provider, please give me a discount in return for me letting you defer some of my compute jobs by up to an hour, so you can load balance across the data center better.” Such services are easier when the cloud provider is given more control over the scheduling of workloads.