Romans Malinovskis (pictured) is Founder and CTO of Agile Technologies, an eBusiness solution company based in Dublin. In his document he introduces you to the challenges of web software scalability. It has been created based on his company's experience while dealing with high-volume projects.
What is software scalability?
Scalability is ability of the software to handle a growing amount of work. In web applications, this is generally related to the increasing amount of web users or visitors. Alternatively, it can also be caused by an increasing amount of data. At some point one of the resources reaches the limits of the hardware which is called a bottleneck.
In reality, the limit is generally reached because software is not using the resources efficiently and well designed systems won't demand as much of the resources and have an improved overall performance.
What is non-scalable software?
In the 90-ties when the web was only just starting, developers didnʼt have any techniques or frameworks to create web applications at their disposal.
It was common to find a CGI application which would work in the following way:
- Launch new process during each web request
- Read file into memory containing data, lock file
- Execute request, update data in memory
- Write file back overwriting the file, unlock file
Such applications might perform well initially, but as you increase the amount of requests you run into multiple problems. New process execution implies serious memory operations and increases system load. Locking a data file limits the requests to one per second hence creating a request queue. As the file contains more data it takes longer to read it. It requires more memory to hold the data and if a fault occurs during the writing process, the data file becomes corrupted.
Examples of scalable applications
Generally all the popular on-line resources you know have been built in a very scalable way. Wikipedia servers manage around total of 500,000 requests per second from 300 servers. Amazon serves around 16,000 requests per second. Google, Hotmail, Bebo, MySpace - all of them have been set up in a scalable way. An increasing amount of visitors can be handled rather easily by simply adding more computers into the cluster.
If you have the software but you don't have a contingency plan when the web server canʼt handle the volume of requests anymore, you might be in trouble!
How non-scalable applications can be dangerous?
At first, slightly slower response times would be seen. You might think it will continue to work for a while, but then suddenly the system stops responding. You wonʼt be able to login into the server anymore and all you could do is a hard reboot, probably resulting in another lock-up. Eventually significant number of users will abandon your service and the system would begin to work properly again.
Such a bottleneck is difficult to find unless you have experience with scalability optimisation.
At this point you might realise - itʼs end of the line. We know. We have helped many clients in this exact same situation who had not prepared. Their on-line business began loosing money and reputation. They were panicking and needed an immediate solution.
You need to know that you will never hit a scalability problem. Many projects are developed without scalability in mind and they are still working fine. You can do so easily by looking at statistics.
This is a common picture of a healthy on-line product. You notice a spike in December. What happened? Perhaps your product was featured in an article and therefore got a lot of immediate attention. Often people do not consider the possibility of such sudden attention.
An article about your resource on digg.com might mean instant death for your resource!
In other words, whilst you sleep in a different time-zone, hundreds of thousands people interested in your resource will meet the broken page.
In informatics (the science of software development), the term computational complexity studies the relationship between the amount of input data and resources required to solve problem.
The most common problem is object sorting. If you have 10 books, how long does it take you to arrange them in alphabetical order? 10 seconds. If you would have 100 books, would it take you a minute and a half to arrange them?
An improperly designed algorithm or database schema might require software alteration at the later date.
We've already discussed the real world where our resources are limited. We canʼt stop users from visiting our website too frequently. So as the number of requests grows and your system canʼt handle them, eventually it stops responding.
Take a close look to the graph on the right. Notice that the system can handle up to 15 requests per second without major effort, however as the number of requests grows to 20 the system load jumps up instantly.
Below right is a picture of a different system, which does not cause any excessive load despite raising amount of requests.
To discover if your application is faulty, you need to perform stress-testing. It is always very difficult to artificially reproduce high system load and it might not show all the possible faults.
Here is what you can expect from a nonscalable application on a single server:
- Up to 800 static file requests per second.
- Up to 60 mbps of data transfer
- Up to 300 simultaneous connections to your webserver
- Up to 5,000 user logins/day
- Up to 20,000 total users
- Wordpress up to 5 requests per second without caching
- Ruby on Rails up to 3 requests per second without optimisation
- Simple PHP / ASP application up to 50 requests per second
- Agile Toolkit (our own PHP5 framework) - up to 40 requests per second
The table below shows how different measurements stack up in an average situation. Note that some applications can be more 'bandwidth consuming' or 'request intensive.'
Have a plan
As a business owner, you need to have a growth plan. It might be a good time to pull out your business plan and match up the numbers. Are you going to hit scalability issues? Have you considered the associated expenses?
Remember - preventing problems is much cheaper than facing consequences.
Mostly all unixes come with utility called "ab" (or ab2). It allows to benchmark your website.
$ ab2 -n 30 -c5 'http://www.rte.ie/'
Requests per second: 24.23 [#/sec] (mean)
$ ab2 -n 30 -c5 'http://www.locle.com/'
Requests per second: 51.44 [#/sec] (mean)
$ ab2 -n 30 -c5 'http://irishdev.com/'
Requests per second: 54.09 [#/sec] (mean)
$ ab2 -n 30 -c5 'http://irishdev.com/Search.html?q=asp'
Requests per second: 25.61 [#/sec] (mean)
You will notice how certain requests to your website are much slower than others. Try pointing at images and you should have over 500 requests per second. Try pointing on your log-in form result or dynamic search page.
However unlike the "ab" program, each of your visitors will send several dozens of requests for all the images and scripts.
How fast is that working for you?
Test your website and see how it compares with others.
Choosing an expert
Now that you put all the pieces together, calculate how many visitors your website can accommodate potentially. How many do you have now. Look at your growth. Keep in mind that there are peaks when your website has the most visitors and periods when your website is not requested at all.
Doing this will help you understand: when itʼs the best time to get advice and look for a scalability expert.
Handling scalability often is a very challenging task. As you patch up one problem, another problem appears. As you remove one obstacle, anew one is in the way.
Making software ready
If your software is not capable of working on multiple computers, the software design has to be revised to make it enterprise-ready.
Common challenges are:
- Using a scalable framework
- Relying on an effective web programming language
- Optimising queries and indexing the database properly
- Caching and clearing cache
- Session handling across multiple web-servers
- Handling uploads and data exchange between multiple servers
- Notifying distributed components and keeping data in sync
- Well designed and documented expansion system
- Making software capable of handling overloads and component faults
- Avoiding single tables (esp in MySQL) with over a million records
- Using proper algorithms, especially in geo-location, match-making and media processing
- Error handling and reporting, with a pleasant page for the user
- Cluster proxy and load distribution, nginx, lighttpd
- SQL cluster or Master/Slave setup
- Increasing the concurrent connections limit, and decreasing communication time-outs
- Getting ready for Denial of Service attacks
- Multiple requests by single instance - FastCGI, Tomcat
- Distributed systems, Service oriented architecture, Vertical and Horizontal scalability
Making hardware ready
Server hardware is much more expensive than the common PC. This is because of faster input output between the hardware components, and duplication of certain hardware components (coolers).
As you plan your expansion, you have to seriously look into different measurements of your servers:
- Lack of memory. You never want your server to start swapping.
- Hard drive activity interference with CPU use (common in IDE hard-drives, fixed in SCSI and SATA)
- Hard drive and bus speed. How fast you can read / write data. Seeking speed. Use of RAID arrays.
- Network card speed, latency and quality. Poor network cards might load CPU during high speeds.
- Expansion ability. Will you have extra ports in your switch? Whatʼs your up-link bandwidth limit?
- Hardware fault tolerance - multiple power supplies, multiple coolers, multiple network cards.
- Maximum ram limitation. Will your system support over 1GB?
- 32/64bit operating system. 32-bit hardware is limited to 4GB, but will operating system support it?
- Availability for hardware replacement components. How fast can you expand your RAID array?
- Backup and the impact on your system productivity during backup. How fast can you restore?
- Inter-server traffic. If you split SQL and Web servers, will your 100Mbit network card handle it?
- Raid or software raid? Maybe external raid array? How about software network raid?
Network and infrastructure
Having a separate internal network helps you to exchange data between computers without clogging your upstream. Things also worth looking at:
- Up-stream limitation. Is your provider using 100 mbit router for your up-link? Soft bandwidth limits? What backbone is it connected to?
- Consider Fiber-optics but keep costs in mind. Look at professional data storage solutions.
- Disk space allocation, logical volume manager. How will you handle hard drive upgrade or expansion?
- Setting up new server and integrating. If your technical person is in Hawaii, will someone else be capable of performing this action?
- Security between servers. If one server gets compromised, what risks are you facing?
- Transmission errors. Even on local network packets may be lost.
- Should server hardware fail - will you have another server with all software restored?
- If you hosting company promises you 100% fault tolerance - request to have a test.
- Consider human error. It might be your administrator who makes mistake and wipes hard drive.
- Network filesystems - consider file locks, network failures, reboots and attacks.
- Monitoring, SNMP, alerts, Nagios
- Multiple data copies - for situations when storage is not quick enough (150mbit and up)
- Virtual networks (VPN) for additional security.
- Overseas latency - if your market is U.S. consider hosting there.
Virtualisation - thing to watch out for
With virtual servers itʼs much easier to expand your infrastructure, but it does bring new challenges.
- Is someone else is using your hardware? What if their system clogs up?
- Can available system memory be consumed by other instance?
- Using AWS? What if your instance gets terminated.
- Backups and snapshots. Will this cause database malfunction?
Tales of Scales
An on-line ticket distribution network promised the tickets for a popular event would be available for purchase on-line during one day only. The evening before that day, a large amount of people wanting to buy tickets started camping on the website.
The next day the website was unavailable for the whole day. Fans kept hitting "refresh" in their browsers complicating the problem even further.
Microsoft bought Hotmail in 1998. For several years, they put extensive resources trying to migrate Hotmail from FreeBSD into Windows 2000 but it simply wouldnʼt scale. Eventually they did it, but a service like Hotmail is not something you could run on stock software.
Twitter was blown away within few months of itʼs deployment. Some people blamed Ruby on Rails, however the reality is that any framework and programming language would be knocked out with such a rapid growth. Initial implementation could handle 3 requests per second, however by changing their infrastructure , moving to an asynchronous messaging model, 3 levels of cache, and rewriting their middle-ware to a mixture C and Scala / JVM they could boost it up to 150 requests per second.
Know any other scalability tales? Let us know. Email: firstname.lastname@example.org
Romans Malinovskis is the Founder of Dublin based Agile Technologies
Copyright Agile Technologies
Credits: Gita Malinovska, Faycal Charabi, Agile Technologies Developers
People reading this article also read....
More on Agile Technologies
Get Instant Updates....