This is the second in a series of posts explaining how we got web infrastructure to handle a years worth of traffic in a single week.
In the first post we looked at the exact issue at hand and how websites aren't hosted in some magical place, but are running on specialised computers which often operate slowly when you try to get them to handle lots of traffic.
Efficiency through server-side caching
In the process of building a web page for the user, the web server draws upon a number of different resources, but this all takes time. It draws upon the database for page copy, event results and other data. It draws upon the file system for images and documents. All of these resources and the processing involved in accessing them takes time and server power.
This is where caching comes in. Caching is the saving of the result of a computation and reusing that result amongst multiple users and requests.
So, instead of:
- Server gets a user request
- Server processes the resource
- Server sends the result to user's computer
You change the code to be something like:
- Server gets a user request
- Server looks in the cache
- If resource is in cache it sends it to user's computer
- If resource is not in cache, it processes the resource, sends it to the user's computer and at the same time saves it in cache (ready for the next request)
It's only valuable if you have lots of users accessing the same resource in the same way and if storing and retrieving from the cache is quicker than just doing the processing straight-up.
A good example is the process of delivering dynamically resized images to the user. That is where an image has been requested by a user but the server needs to resize it on-the-fly to make it fit in a specific location on a page before sending it to the user's computer. Such a resize can take a long time and use a lot of the server's processing power which all contribute to inefficiency. By comparison, fetching an already resized image from the server's cache is very fast and uses virtually no processing power.
Caching has its downsides though. Anything stored in the cache can have a delay between when the original resource is updated (e.g. within SproutCMS) and when it actually updates on the website. This can happen because a cache really works best if you set a period of time where a file is maintained without checking for an updated version. This is known as a cache timeout and can be frustrating for the client, especially when the cache timeouts are set for a long period of time.
The complexity of the code for accessing resources is greatly increased as well; not only does it need to process the resource but it needs to manage cache storage, retrieval, and expiry.
We need more power
Another way to solve performance issues is to get more power (remember that websites run on computers so for more grunt you can go out and purchase a faster computer). Most internet computer time is rented from organisations who specialise in infrastructure management. Larger companies like Google have their own in-house team to manage such infrastructure, but at the scale we are working at, that is simply overkill.
A common way to set this up is using a Virtual Private Server (VPS). The infrastructure organisations which you are renting power from set up physical machines which are much more powerful than they expect their clients will need. These end up being very expensive computers. They then use software called a hypervisor to virtually split the machine up into multiple independent slices which behave as if they were separate computers - but they are actually virtual computers. These different slices are then rented out to various clients.
When using a VPS you can resize the virtual computer to have more or less power without too much effort. You can contact the hosting company and they reconfigure the virtual machine to be a larger slice of the host computer and bill you for the difference. This works really well as long as there is excess capacity on the host computer. If the host is fully utilised, you cannot size up the virtual machine any further.
While simply getting a bigger box can be an easy-to-implement solution, you may get trapped if the host machine doesn't have enough free capacity. If this is the case, you will really feel the sting if you need to resize up right in the middle of a peak usage week. There is also some downtime involved with these resizes so you really don't want to have to do an emergency resize at a critical moment.
It's still not enough
Over time we realised that these solutions are useful but a little naïve and don't really solve all of the problems very well. Resizing virtual servers can be slow and it's difficult to keep cost effective for the client. There are other problems too. You've got a single point of failure; if that server goes down, you're left with nothing.
We continued searching for answers - take a read of part three!