Modern web application design patterns bring exciting new opportunities for faster and better user experiences. Client Side Rendering (CSR) allows web pages to be more interactive in the browser; Server Side Rendering (SSR), for the first page of a CSR site, can improve indexability and first page load times; Progressive Web Applications can support push notifications, offline caching, add to home page, etc.; and more.
But have you taken into account the change in external traffic patterns that is likely as a result? Will your backend server cope with the increased number of API calls being made? Do you need to introduce a server side API caching strategy to ensure the backend does not become the bottleneck limiting the speed of your shiny new web application?
This post discusses a number of server side caching considerations. This is distinct from web browser caching which is equally important for performance and must work in sync with server side caching. See this post for an excellent description of HTTP caching strategies and how web browsers and servers work together to minimize the number of HTTP requests a browser needs to make.
A Brief History
Originally websites returned static HTML files from disk to the browser for display. Programs may be used to generate the HTML files, but the files were static.
Websites became more interactive with scripts reacting to user requests and dynamically generating HTML content from database content or similar.
To improve throughput, reverse proxy caching (e.g. Varnish) was put in front of the web server so if different visitors access the same page, a cached version could be returned without hitting the web server. This can improve latency as any page assembly code in the web server does not have to be executed per request. It also reduces server load meaning less hardware is required to run the site.
However, not all HTML pages are safe to cache. Other internal backend techniques are also used, such as using in-memory caches (e.g. Memcached or Redis) to reduce database accesses but still assemble the final HTML page dynamically.
Content Distribution Networks (CDNs) also came into being, caching content closer to end users reducing network latency. Caches are distributed around the world to reduce the network distance between end user devices and the CDN cached content. Using a CDN may eliminate the need for a reverse proxy cache sitting next to the web server, but some CDNs still have multiple locations hit the webserver (and so retaining the web server cache may still be desirable).
And yes, I left out lots of details.
HTTP Page Cache Invalidation Strategies
But how to invalidate content of a HTTP page cache? If pages are dynamically generated, how to ensure when there is new content to return (e.g. product information in a database is updated), any relevant cached content is invalidated? This is one of the great challenges of computer science – designing a good cache invalidation strategy.
There are multiple approaches, and sites need to decide which is most appropriate for their use case. But first let’s quickly review some challenges.
- If a page is requested that is not in the cache, the browser must wait for the web server to finish executing any code to build the page and respond. Depending on the page, this can add noticeable delay to the user experience. That is, caching both saves costs by reducing server load, but also reduces latency for requests. Which is more important for your site?
- What is the acceptable latency between data being updated in a database and being visible to site visitors? It may not be critical for other users to see perfectly synchronized product descriptions, but if a product is added to a cart it should appear instantly, guaranteed.
- What is a good time-to-live for cache pages? How stale can data be? Product descriptions for example have a longer time-to-live than stock level data or pricing. Some sites simplify their architecture by using very short cache times (seconds) and avoiding the need to explicitly invalidate cache contents, at the expense of potentially higher CPU requirements.
- During a flash sale a product may be added when the sale starts, with many different users requesting the same page at the same item. Because it takes time to create the page and its not yet in the cache, some caches will forward multiple concurrent requests to the web server for the same page. This has the risk of overloading the server until the cache is populated with a copy of the page. Some CDNs attempt to avoid this problem by only forwarding one request to the web server (they dedupe requests), others do not (each geo where the CDN is located may independently ask for a copy).
- How to cope with global cache wipes? If a new site upgrade is released, you may wish to flush all caches, resulting in a massive upsurge in site traffic until the cache starts to refill. Do you size your backend hardware based on peak loads? Or can you scale up the backend capacity during such events, then scale back down once the system stabilizes?
Common cache invalidation strategies include:
- Don’t cache. Just pay for more hardware if needed for good performance. This may not be as crazy as it first seems for some applications. For example, cart contents cannot be cached – it is specific per user. If you have a large number of pages that are accessed infrequently, a cache may not actually improve performance.
- Set a cache “time to live” (TTL) duration, so the cache automatically discards content after a period of time. Simple, but you have to get the TTL value right. (Too long and you will serve lots of stale content; too short and you won’t get the full benefit of the cache.)
- Consider using a shorter TTL with a cache warming strategy (write code to fetch pages to load them into the cache, even if a user has not requested it yet). Cache warming is also useful for popular pages when bringing a new server into production to avoid it being overwhelmed.
- Have a stale-while-revalidate policy to return a stale page in the cache, while fetching the latest content in the background. This avoids the latency hit that would have otherwise resulted from a cache miss. There are two TTL settings in this case – how long to cache the page before fetching new content, and how stale a cache page is allowed to be and still be returned.
- Attach tags to returned pages and then allow purging of pages via tag. You may tag pages with products that appear on the page, categories that appear, and so on. Then when the backend updates a product or category, it can tell the cache to purge all content with those tags. Cache tags allow the cache to hold pages longer, but if you do frequent global catalog updates, this can result in large scale cache invalidations.
The above discussion focussed on HTML Page caches. Caching inside the web server during page assembly (e.g. using memcached or Redis) is generally easier as the cache is internal to the code base. Similar issues exist for invalidating old content from the cache, but the data to be cached is generally more granular. It is often more acceptable to take more conservative caching strategies with such cached data, and it is generally easier for the application to proactively invalidate content in the cache.
Modern Web Applications
All of the previous discussion is in common practice today. But what new challenges to modern web application architectures bring? What changes in network traffic occur that may impact the effectiveness of caching strategies?
If you have a Client Side Rendered (CSR) web application (e.g. built using a framework like React, Vue, or Angular), the source code for the application will most likely be statically compiled and stored in files on disk for the web server to return. It will cache well. But CSR applications still need to fetch product data or similar from the web server (e.g. as JSON) so they can render it.
The CSR web application may include caches in the browser to avoid requesting the same data multiple times. (Service workers are a great technology for this.) But websites receive requests from multiple users. With a CSR web app, there will be fewer requests for complete HTML pages, but more API requests for data. This reduces the execution time on the server as most page manipulation is now done on the client – but there is still execution time required to fetch the data to render.
This implies that all of the caching strategies discussed previously for HTML pages and images, may also be relevant for API calls. That is, to speed up API calls you may still wish to use cache tags also on API call responses to know when to invalidate the API call response data from the cache.
GraphQL introduces new challenges here. GraphQL allows a single request to aggregate data from multiple sources, improving performance. For example, when fetching an order, the request may also request that stock level data or current pricing also be retrieved at the same time. GraphQL can collect data from a range of different sources, as determined by the client query. This means the client request may influence how long the response can be cached for. Stock level generally cannot be cached as long as product details. This may mean caching logic needs to be pushed deeper into the scatter-gather layer of the API implementation, rather than using an HTTP caching layer in front of the API.
Another consideration is it may be more efficient to fetch relatively static product descriptions in separate API calls to more dynamic data such as product pricing and inventory level. While it adds additional network calls, it can result in better backend caching which may provide a better overall result.
Finally, it is worth remembering that if a generic caching strategy is too difficult to implement, it is still a valid approach to only cache specific queries queries coming from clients. This is particularly effective if the client and backend developers work closely together as the number of client requests is typically limited by the application code.
Server Side Rendering
Add to the above mix Server Side Rendering (SSR) and the next wave of challenges arrive. SSR is where the first HTML page of a CSR web application is generated on the server ready for immediate download and display by the client. This normally delivers superior first page load time performance. The rest of the CSR web application code can then be downloaded in the background while the user is viewing the first page.
The SSR engine typically works by making the same API calls as the CSR. It assumes that the total CPU for a user to see a page is improved if the server generates the HTML DOM tree. There are several reasons why this is frequently the case:
- If users are on low end phones, the CPU in the phone may be much slower than the server.
- After being generated the first time, a SSR rendered page can be cached, resulting in the same performance as a static page on the site.
Also implied however is the importance of keeping the SSR page cache and the API cache consistent, to avoid strange site experiences.
There are also lots of variants such as not performing a full server side render of a page, but instead including the responses of API calls in the returned page. When the CSR is ready to render, the responses to API calls is already in a local cache, avoiding the need for an external API call.
This blog provides a quick overview of the evolution of web site strategies. The objective was to highlight the importance of server side caching strategies as a part of your overall web application design. Client side caching is also important, but typically more obvious to web application developers. They know reducing the number of network requests will improve the performance of their application.
So if you are a backend developer, web application front end architectures matter to you as well. It is important to understand the likely changes in traffic patterns to your web server as the web application architecture changes. CSR web applications are likely to move traffic volumes from full HTML page requests to API calls – do you have a caching strategy to cope? SSR pages mean HTML page caching is not going away. Do you have a strategy to keep your SSR cached pages and API calls in sync?
All these problems are solvable. But it is better to plan for such changes as a part of your adoption of modern web application design patterns rather than be forced to react when your server melts down under unexpected load from uncached API calls.