As Oscar Wilde once said, “there is only one thing worse than being talked about and that is not being talked about.”
If he were alive today, he would probably rephrase that as “there is only one thing worse than being talked about, and that is your website going down so that nobody can see why they were talking about you in the first place.”
Many things can cause sudden spikes in traffic, including breaking world news or new releases of hotly-anticipated products. Any of these scenarios can cause your steady stream of website traffic to morph into an unmanageable torrent that can slow your site down to a crawl, or even worse, cause it to stop functioning altogether.
What kind of measures can you take to prevent this situation from occurring? What are the important factors you should consider to prevent this embarrassing scenario from scuppering your marketing plans?
Today, with the availability of cloud technology, it is no longer something that only technical people understand – the cloud is something that is a normal part of how most businesses deploy their sites. With the cloud, you can configure the cloud hosting environment that if there are more than X number of people looking at your site, or more than X concurrent users submitting forms, if the CPU utilization on the server is more than X%, or the application is eating more than X GBs of RAM, then these metrics and the conditions you set cause the system automatically to expand – scale.
Also, when the spike goes down, the cloud shrinks the environment to remove that extra server so that you are no longer paying for it. So the best way for people to prepare for this is to take advantage of the cloud’s elastic infrastructure and, based on simple rules, it is able to expand or shrink the number of servers that are running your application accordingly.
In the past, there was resistance to the cloud in general, towards the privacy or insecurity of the data, when exposing sensitive data to a third party. That is why most cloud providers reacted to this by introducing the concept of a private cloud, where you keep sensitive data in-house behind a firewall, meaning it is safe from a third party. Instead, you are running your applications in the cloud, and they are consuming the data from your private data center, as a hybrid. But it is important that the application you want to run supports auto scaling.
The web farm concept is based on the fact that you have multiple servers running separate instances of the same application.
If the nature of the application requires those separate instances to be synchronized, you have to have a way to keep them in sync. Kentico, as an app, stores some of the data in memory on the server. So if you have three servers all running Kentico, you have three separate memory spaces on each server that have their local copy of the form of data.
If you change the data on one of the servers, you need to make sure that all those other servers know that the data changed there, and as all servers are supposed to display the same content and provide the same experience, the other servers need to refresh their data to reflect whatever is there. So you have to consider how you are going to synchronize the memory of the application that is distributed in the web farm environment, and how to synchronize whatever is stored on the file system on those separate servers.
You need to consider what the key metrics you should use to understand what is going on in your environment are. You have to know what the red flags are. Noticing that the site is not responding is too late! You need to know what precedes that breakdown of the system. Typically it is: constantly high CPU utilization, the servers disks are too busy, or too much operating memory is being consumed. It could also be the length of the queue of page requests sitting idle waiting for processing, and if the queue is getting longer, it means that something is going wrong. It could be that the server is too slow to handle all those requests. So you need to know how to identify that there is a problem with your environment.
Estimate what your realistic expectations are. Do you usually have 100,000 users a day? Based on historical performance, as soon as you run a campaign through a certain channel, you can expect two or three times more traffic. What triggers those spikes? What is typically the highest amount you need to be able to handle? You need to load test that this environment, based on your estimation, would be able to handle this peak scenario you know happened in the past. Online tools like the Load Impact can help with distributed load testing. You need to set up a baseline, which is what your environment is able to handle under normal circumstances, and your new web farm environment needs to able to handle it.
Now you are almost ready to implement your new web farm environment, but first you need to make some calculations.
Based on the fact that you are currently running on two servers, and they can handle 50,000 page views/hour, using simple mathematics, you can work out that one server = 25,000 page views/hour. So if you need to handle 100,000 page views/hour, you need four servers. Be aware, though, the relation between the number of servers and number of page views, in this case, is not always linear. If the weakest part of the infrastructure is the SQL DB, adding web servers to the environment will not really help.
According to this, you set up what you estimate as being sufficient to handle the expected peak and you load test and stress test the environment. This means that you do not have to reach the highest peak only, you need to go above it. You need to keep pushing the architecture and extending it until you can handle whatever expected level you have set, plus, a certain buffer zone. 15 – 20 % more than what is expected can be considered safe. And if it is not sufficient, you need to add another server and test it again until you get to the point where you know it can handle it.
Of course, the process of setting up the environment is either easy, or it may become more complicated if the application does not contain built-in support to enable you to do this.
After that, all that's left is to roll it out.