Egg Timer

Functionality is easy, performance is hard, part two: Wait Time

Egg Timer

CPU performance; Wall time, CPU time, wait time

Last time we looked at how a typical system deals with purely CPU-bound applications and how the webserver, OS and CPU cores work together to produce the maximum throughput for your system. We started by looking at how we measure the ‘speed’ of your application by taking the ‘wall time’ (i.e. the time it takes to run your app as measured by your wall clock) and then dividing it into ‘CPU time’ (I.e the amount of time the CPU is actually using cycles on your app) and ‘wait time’ (the amount of time the CPU is waiting for something to happen). In this post we’ll look at the wait time of an application so if you havent already done so you should read the first post on cpu time.

What is wait time?

Simply put the CPU is waiting around for something to happen. That’s not to say you have a feral CPU, rather you have told it to wait for something. For example it could be waiting for:

  • A disk to read or write some data
  • An event to trigger (perhaps from the user)
  • Another process to return
  • A database query to respond
  • A response from a web-service or REST API

Probably the most common form of waiting in a web-app is when you issue a request across a network to retrieve some data, be that from a database or some sort of API. Once your app has established a connection to the remote server and initiated the request, the CPU doesnt really have to do anything; it’s just sat there waiting for the network to respond. As we will see, this kind of behaviour scales very differently to the CPU time we looked at last time.

Scaling Wait Time

To simulate waiting for a network resource we can create a simple script that ‘sleeps’:


<?php
usleep(200000);

This will make the CPU sleep for 200ms, that’s it! If you run this script using curl and time then you should see approximately 200ms reported:

rathers@geoff:~$ time curl http://127.0.0.1/wait.php

real    0m0.214s
user    0m0.009s
sys     0m0.003s

If you remember from last time we had a formula that could work out the max requests per second for our application based on response time and number of cores:

Max Req/s  =  No Cores /  Response Time

So again using my 4 core system we should expect to see 4 / 0.2 = 20 requests per second. Right? Well let’s find out! First let’s try it with a concurrency of 4, which gave us 20 req/s last time:

rathers@geoff:~$ ab -c 4 -n 100 http://127.0.0.1/wait.php
...
Requests per second:    19.87 [#/sec] (mean)

So far so good! So last time even when we doubled the concurrency to 8 we still only managed 20 req/s. What about now?

rathers@geoff:~$ ab -c 8 -n 100 http://127.0.0.1/wait.php
...
Requests per second:    39.72 [#/sec] (mean)

Much better! So how high will this go? Will req/s keep on doubling as I double the concurrency?

rathers@geoff:~$ ab -c 32 -n 1000 http://127.0.0.1/wait.php
...
Requests per second:    158.22 [#/sec] (mean)
rathers@geoff:~$ ab -c 128 -n 10000 http://127.0.0.1/wait.php
...
Requests per second:    627.36 [#/sec] (mean)
rathers@geoff:~$ ab -c 256 -n 10000 http://127.0.0.1/wait.php
...
Requests per second:    1189.24 [#/sec] (mean)
rathers@geoff:~$ ab -c 512 -n 10000 http://127.0.0.1/wait.php
...
Requests per second:    663.02 [#/sec] (mean)

Bascially, yes! We have linear scalability!

This graph shows nicely shows what is going on (click the graph to go to the raw data). The CPU is having no problems whatsoever initially; response time remains a constant 200ms and req/s scale up linearly. That is until we hit a concurrency of 256. At this point the req/s starts tailing off and response times starts increasing, both exponentially!

Selection_037

 

If you remember from last time, when we had reached our maximum throughput of 20 req/s our CPU cores were totally maxed out and we could see that in dstat. This time even at nearly 1200 req/s my CPU cores are still nowhere near maxed out as this dstat trace shows:

-------cpu0-usage--------------cpu1-usage--------------cpu2-usage--------------cpu3-usage------ 
usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq|
 23   9  65   0   1   2: 22   9  65   0   1   3: 25   9  63   0   1   2: 25   8  64   0   1   2|
 19  12  64   0   2   3: 16  12  66   0   1   5: 20  11  65   0   1   3: 21  10  64   0   1   4|
 20  12  62   0   1   5: 26  12  57   0   1   4: 31  12  48   3   1   4: 24  12  60   0   1   3|
 18  22  55   0   2   3: 23  12  61   0   1   3: 21  11  61   0   2   5: 21  14  60   0   1   5|
 31  16  47   0   2   4: 29  11  55   0   1   4: 26  13  56   0   1   4: 27  13  54   0   2   4|
 21  13  59   0   2   5: 22  12  60   0   2   4: 20  15  59   0   1   5: 20  14  60   0   1   5|
 29  13  53   0   1   4: 29  13  53   0   1   4: 28  12  55   0   1   4: 24  11  60   0   1   4|
 35  11  49   0   2   4: 23  13  59   0   1   4: 31  11  53   0   1   4: 23  12  59   1   1   4|

All 4 cores are around 60% idle! Hopefully this illustrates how different CPU-time and wait-time are.

The Bottleneck

So what is so special about the number 256? Things were exactly as you might expect up to this number; the CPU was quite happy to sit and do 256 waits in parallel, waiting is afterall pretty easy to do! Even I could probably manage to do 256 lots of nothing in parallel! The answer is nothing really to do with the application, or the number of CPUs or RAM or anything like that; it’s simply an apache config value. The apache config on my system limits apache to 256 worker processes. This means as soon as you ask for more than 256 requests in parallel things start to go wrong as all the worker processes are busy and cannot accept the extra work.

Optimising Webserver Performance for Wait Time

As we’ve seen so far because our application is just waiting we are limited by the number of processes that the webserver will give us. This value will become the maximum number of requests that our application can handle concurrently.

So how do we scale further? Well it really depends on your webserver. In all the examples so far I have been using the Apache webserver. To scale our waiting app further we need to look into some of the internals of Apache and something that Apache calls the ‘Multi-Processing Module’ or MPM.

Apache MPMs

If you’re using Apache in a standard configuration, the likelihood is that you’re using the ‘prefork’ MPM already. This is a simple but robust MPM that uses a single master process that forks worker processes. Each worker process is single threaded.

The distinction between threads and processes is important here. A process is a stand alone task governed by the OS. It is given its own block of memory that only it can access and if it tries to access the memory belonging to another process the OS will shut it down. This is known as a segmentation fault (or segfault, illegal operation, crash etc). Many processes can run in parallel as the OS will manage swapping processes on and off the various CPU cores based on demand and relative process priorities. A thread is analogous to a process but it is a stream of operation within the process itself. The process is responsible for managing the threads within it. So the OS can manage multiple processes and a process can manage multiple threads within it.

So when we say prefork MPM produces single threaded processes this means that each process can only do one thing at once. Therefore to serve an HTTP request, a spare process must be available to accept and process the request. Initially apache will start with very few worker processes (5 on my system). When it detects that it is running out of processes it will spawn more. Because it is computationally expensive to create more processes apache limits the rate at which more are spawned. However the rate increases exponentially until the number of processes reaches a hard limit set by the MaxServers directive (256 on my system). You can see this happen quite clearly if you use the tools I gave you in my previous post. If you use apachebench to batter apache with loads of requests and monitor the number of processes running you will see it slowly creep up, then the rate of increase will keep rising until eventually the number of processes plateaus.

Tuning prefork MPM

For the most part apache will take care of itself. The only value you need to really worry about is the MaxClients directive which as we’ve seen caps the number of worker processes that can be spawned. Set this value too low and you wont be able to serve enough traffic, set it too high and you’ll run out of RAM. It also depends on your application and your machine as to how you tune this value. If you recall our CPU-limited example from last time you’ll remember that no matter how hard we tried we couldnt squeeze more than 20 req/s out of it. If this were all we had to serve there wouldnt be much point in setting MaxClients much beyond 20 as the CPU simply cant cope with more than 20 requests in parallel. Allowing it to go higher is only going to make things worse. Now consider our wait-limited example above. This is happily going up to 256 req/s and this is in fact limited by MaxClients itself. So it seems logical to allow MaxClients to go higher if we have enough RAM to do so.

Other Apache MPMs

The main MPMs besides prefork are called worker, event and winnt. As event is experimental and winnt is for use on Windows (and running servers on Windows is the reserve of the insane) we shall look briefly at worker MPM.

Worker MPM is essentially the same as prefork MPM in terms of how the apache processes are spawned and killed but each process can have multiple threads running within it. This allows you to have fewer processes running to achieve the same level of concurrency as with prefork. The problem with this approach though is that many of the scripting languages that can be embedded into apache (PHP in our case) aren’t guaranteed to be thread-safe and so to use worker MPM you need to interface with your scripting language over FastCGI or similar.

Worker MPM is not a panacea however, you will still have a maximum number of threads that can be in operation at anyone time as set by your apache config. For a wait-heavy application you will probably be able to squeeze more out of your server using worker than with prefork but you’ll still have a limit.

Event based servers

If your application does a lot of waiting then you might want to consider using an event driven platform such as node.js. The event-driven architecture takes a fundamentally different approach to dealing with having to wait for things to happen. Rather then make a whole process or thread block until something has happened you instead register event handlers. When something happens that you are interested in (e.g. your API query has got a response) then an event is raised and any registered event handlers are executed. This gives a totally non-linear, asynchronous flow and makes good use of system resources when dealing with lots of waiting around.

Real-world Applications

The examples I have used across this post and the last were deliberately simple and one-dimensional but real-world applications are much more complicated. That said they are essentially made of these two components; time you are hammering the CPU and time when you’re waiting around. Hopefully you now have an understanding of the effect these two things have on your server and how to recognise them.

Optimising Applications

I started my previous post with the hypothetical situation of having to support 1000 req/s for a given application. So if you benchmark your app and you’re only getting 500 req/s out of it, how do you squeeze the extra performance out? Well you first need to work out what your limits are. If at that 500 req/s limit your CPUs are maxed out then you’re probably limited by CPU time. If the CPUs aren’t maxed out but all your apache processes are busy then you are likely limited by wait time.

The details of specific optimisations are beyond the scope of this article but to reduce your CPU time you’re going to have to look at using less CPU cycles somehow. That maybe by streamlining your code, efficiency improvements, caching or maybe even compiling your PHP code. To reduce your wait time you’ll need to look at increasing the speed of whatever is holding you up. That could be a disk raid setup, database or query optimisation, faster network between you and an API server etc, or if you can you should look at caching the output of the slow parts so you dont have to wait for them all the time.

Premature Optimisation

It is important to know what the limits of your application are and the nature of them before you start to optimise it. Imagine your code does lots of disk writes and is not meeting your non-functional requirements for throughput in req/s. You might at first think you are CPU-time limited so you start stripping out for loops and using more efficient algorithms in your code. It wont make any difference and is arguably wasted effort. If you monitored your server while benchmarking you should have been able to see the massive amounts going onto the disk and you would have seen the write speed of the disk plateau as you push the load upwards. So what you should really be doing is looking at is making the disks go faster!

Conclusion

We’ve seen how CPUs and webservers respond to high load in terms of being asked to do a lot of stuff at once and also being told to wait around for other stuff to happen. We’ve looked at a few simple tools to monitor what your system is doing and how to optimise some aspects of the server for your situation. Next time you are asked to make something ‘faster’, ask yourself what that actually means in terms of your application and be sure to establish exactly what the limiting factors are before you try and optimise around them.

Advertisement
Egg Timer

Functionality is Easy, Performance is Hard, Part one: CPU Time

Egg Timer

Make It Work

Software developers are really good at getting stuff working, they really are! Even when faced with absurd deadlines, scatter-brained product owners and conflicting demands. Sure some people need a little guidance here and there and some may produce more bugs than others but on the whole functionality is delivered without too much fuss. A lot of the applications that we build are relatively simple in terms of functionality so I have confidence in most developers to be able to get stuff working to meet the functional requirements of a project. A working system, fully tested, fully signed off, perfect, job done. Oh what’s that? It needs to serve 1000 pages per second? Errr, help! This sort of non-functional requirement often gets overlooked until the last minute and dealing with it, in my experience, involves a lot more fuss than the functional stuff, especially when left this late in the day.

Make it scale

In the web development industry it’s easy to see how people are more adept at making things work than making them scale. Perhaps you start out doing front-end stuff, then move onto some backend programming using PHP or Python, maybe moving onto Java or C#. Maybe you start by building sites for your mates then get a job writing business process systems. It is probably fairly rare that you get asked to support massive volumes of traffic, to query huge volumes of data, to reduce page response times or to reduce server resource utilisation. ‘The Server’ may well just be a black box that just copes with everything magically.

Rather than sit here whining I’ve decided to explain some of the core concepts behind performance-tuning of web applications. First off; how we make the most of the CPU.

Performance Tuning CPU Utilisation in Web Applications

The CPU is the heart of our server and for many applications it is likely to be the first bottleneck that you hit. First let’s examine how we measure the CPU performance of our application

Wall Time

Simply put this is the time it takes to render a particular page of your application, as you might measure it with your wall clock. You can easily get a rough idea of this by doing something like this:


time curl http://localhost/myapp.php

(I’m assuming many things here, like you’re using Linux, PHP, you’re hosting your app locally etc)

There are two main components to the wall time that we’ll look at. ‘CPU time’ is how much time your process was actually swapped onto the CPU consuming it’s cycles. ‘I/O time’ or ‘wait time’ is the amount of time that your process was basically waiting for something to happen, be that waiting for a response from a database or API, or waiting for local resources to become available (e.g. disk, CPU etc).

Performance tuning CPU time

An application that uses pure CPU is fairly rare but a very simplistic app would be something that just counts up to a very big number:


<?php
for ($i=0; $i<1000000000; $i++) {
//do nothing
}

If you run this code you will see your CPU ‘spike’ to 100% (or near as damn-it). If you have a multi-core system you will notice it only spikes one core (more on that later). Let’s turn the big number down a bit to simulate a realistic application. If you change it to 4 million then on the system I’m currently using this takes about 200ms of wall time. Because this is basically a pure CPU application we can assume that the CPU time is essentially the same as the wall time (not strictly true but true enough for this illustration), so we’re consuming 200ms of CPU time to render this page. That’s still fairly quick right? A user probably wouldn’t notice 200ms of delay, with all the network latency and asset loading of the web page to do, this will pale into insignificance right? Plus the page isn’t doing much so we’ll be able to serve loads of them right? Nope.

Think that we’re using the entire CPU for 200ms. Nothing else can realistically use that CPU core for a period of 200ms. So how many of these things can we actually serve up? Well doing some simple maths if we continually hit this page, serially requesting a new page as soon as one has been served up then we’ll be constantly eating up the CPU time of a single core. With purely serial requests like this against a single core you’ll be able to get up to 5 requests per second. Simple really it’s just 1/0.2

5 requests per second ain’t so great. That’s basically saying that some guy could melt your server by just pressing F5 as fast as he can. As soon as your server is trying to serve more than 5 req/s it is going to be in trouble if that is sustained over a long period.

So how do we scale it? Well the obvious thing is to get a faster CPU. If I double my CPU speed then my CPU time is going to come down to 100ms and I’ll be able to serve 10 req/s. Awesome! But there is a limit to how high you can go with this kind of ‘vertical’ scaling. More prudent would be to add more cores. If I add a second core then the operating system will be able to run two of these page requests in parallel at the same speed as before. My CPU time will remain at 200ms but I’ll be able to do 2 at a time so I’ll be able to get to 10 req/s just as with the faster CPU scenario. This is horizontal scaling and it is much more sustainable. Push it up to 8 cores and I’ve got 40 req/s. 32 cores (not unrealistic for some servers!) and I’ve got 160 req/s. We’re flying now! For a purely CPU-bound application the following formula quite neatly sums this up:

Max Req/s  =  Number of Cores /  Response Time

Monitoring CPU time

When you have multiple cores and multiple parallel requests attacking them it can get quite complex to work out what is going on from the point of view of the user agent, you really need to get inside the server and have a look (from an OS perspective that is!). To illustrate the scaling I’ve just described working in practice there are some handy tools we can use. For flinging lots of requests at our page we can use apachebench. To monitor CPU utilisation you can use any of the standard linux tools such as vmstat, top, iostat, but I personally recommend dstat. These both work fantastically on Linux. If you’re using Mac or Windows you’re on your own I’m afraid; you should have installed a proper OS! 😛

If you run dstat -f you’ll see loads of numbers printing down the screen (make sure your terminal is wide enough to fit it all in). Mine looks like this:

-------cpu0-usage--------------cpu1-usage--------------cpu2-usage--------------cpu3-usage------ --dsk/sda-----dsk/sdb-- --net/eth0- ---paging-- ---system--
usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq| read  writ: read  writ| recv  send|  in   out | int   csw 
  7   3  87   4   0   0:  7   2  90   1   0   0:  6   1  92   1   0   0:  6   2  90   2   0   0| 416k   40k:  36k   28B|   0     0 |   0     0 |1529  2318 
  6   0  93   0   1   0:  5   1  93   0   0   1:  4   2  93   0   1   0:  5   1  94   0   0   0|   0     0 :   0     0 |   0     0 |   0     0 |1143  1887 
  7   3  90   0   0   0:  4   1  95   0   0   0:  6   1  93   0   0   0:  7   2  91   0   0   0|   0     0 :   0     0 |   0     0 |   0     0 |1224  1920 
  8  13  79   0   1   0:  5   2  92   0   1   0:  7   1  92   0   0   0:  3   0  97   0   0   0|   0     0 :   0     0 |   0     0 |   0     0 |1454  2230 
 10   3  87   0   1   0:  5   1  94   0   0   0:  5   1  94   0   0   0:  4   1  95   0   0   0|   0     0 : 132k    0 | 196B   98B|   0     0 |1395  2289

There’s a lot to take in there but we’re really only interested in a few of the columns. Dstat is printing a line every second with loads of info about what is going on in your computer. Stuff like CPU usage, disk and network activity. If you look at the first row you’ll see the top level headings and you’ll see one for each CPU core you have (you can see that I have 4 cores). Each core then lists 6 different columns, most important of which is ‘idl’ which is a percentage measure of how much time the CPU is idling. If you run the cpu-spiking loop we made before but with the number shoved back up to a billion or so you’ll be able to see the spike in dstat:

-------cpu0-usage--------------cpu1-usage--------------cpu2-usage------
usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq
  5   1  94   0   0   0:  9   1  90   0   0   0:  3   1  96   0   0   0
  4   2  93   0   1   0: 18   0  82   0   0   0:  6   0  93   0   1   0
  6   1  93   0   0   0:100   0   0   0   0   0:  5   2  93   0   0   0
 10   0  90   0   0   0:100   0   0   0   0   0: 10   0  90   0   0   0
  6   1  92   0   1   0: 99   0   0   0   1   0:  6   1  93   0   0   0
  7   1  92   0   0   0:100   0   0   0   0   0:  3   1  95   0   0   1
  6   1  92   0   1   0: 67   1  32   0   0   0:  4   1  95   0   0   0
  2   3  95   0   0   0:  3   1  96   0   0   0:  4   0  95   0   1   0
  5   9  85   0   0   1:  7   0  93   0   0   0:  3   1  96   0   0   0

Watch cpu1 as the ‘idl’ value drops to 0 and the ‘usr’ value goes up to 100. This is the process ‘spiking’ that CPU. Notice how the other CPUs aren’t bothered by this at all; they sit there mostly idling. If you run the big loop script in parallel in two different terminals you’ll see that it will spike two CPUs, 3 loop processes will max out 3 CPUs etc.

If you stick the loop script under a webserver such as apache we can now watch the scaling example I presented in practice. Based on our previous numbers we should expect to be able to serve 5 requests per second per core with our ~200ms script. On a 4 core box like mine we’d expect to see four times that number, i.e. about 20 req/s. So let’s see! Apachebench is a simple application that allows you to fire requests at a URL and control how many requests to complete and also how many to send in parallel. Let’s try only 1 at a time and 100 requests:

rathers@geoff:~$ ab -n 100 http://127.0.0.1/cpu.php                                                                                                             
This is ApacheBench, Version 2.3                                                                                                               
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/                                                                                            
Licensed to The Apache Software Foundation, http://www.apache.org/                                                                                                  
                                                                                                                                                                    
Benchmarking 127.0.0.1 (be patient).....done


Server Software:        Apache/2.2.22
Server Hostname:        127.0.0.1
Server Port:            80

Document Path:          /cpu.php
Document Length:        0 bytes

Concurrency Level:      1
Time taken for tests:   19.835 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      20200 bytes
HTML transferred:       0 bytes
Requests per second:    5.04 [#/sec] (mean)
Time per request:       198.353 [ms] (mean)
Time per request:       198.353 [ms] (mean, across all concurrent requests)
Transfer rate:          0.99 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   191  198   4.0    197     211
Waiting:      191  198   4.0    197     211
Total:        191  198   4.0    197     211

Percentage of the requests served within a certain time (ms)
  50%    197
  66%    197
  75%    198
  80%    201
  90%    205
  95%    206
  98%    211
  99%    211
 100%    211 (longest request)

A lot of output, but the figure we’re most interested in at the moment is “Requests per second”. This is at 5.04 req/s which is quite a bit lower than our estimate of 20! This is because we’re only running one request in parallel. Apachebench is firing off a request and waiting for that one to complete before it sends another. We are only allowing the server to process 1 loop at a time so 5 req/s is our theoretical max. We still have lots of headroom with our additional cores though so we should be able to get much higher. By doubling the concurrency we should get to 10 req/s:

rathers@geoff:~$ ab -c 2 -n 100 http://127.0.0.1/cpu.php

Requests per second:    10.09 [#/sec] (mean)

Nice. Let’s keep going. Concurrency of 3:

rathers@geoff:~$ ab -c 3 -n 100 http://127.0.0.1/cpu.php

Requests per second:    15.08 [#/sec] (mean)

Ok, what about concurrency of 4? This should give us 20 in theory, which also should be our theoretical max:

rathers@geoff:~$ ab -c 4 -n 100 http://127.0.0.1/cpu.php

Requests per second:    20.23 [#/sec] (mean)

Pretty close! So what happens if we push it too far? What if we try 8 concurrency?

rathers@geoff:~$ ab -c 8 -n 100 http://127.0.0.1/cpu.php

Requests per second:    20.33 [#/sec] (mean)

It’s basically the same! How can this be so? Well if you look at some of the other stats you’ll see why. If you look at the “Time per request” metric you’ll see that the mean was 200ms when concurrency was 4, but when concurrency was 8 the time per request was 400ms; double the amount! So we see that the CPU can’t churn out any more than 20 pages per second but if you ask it to do more in parallel it will just take longer to complete the requests. The OS is using time multiplexing to share 4 cores out amongst 8 processes. Each process only gets the CPU for half the time so it takes twice as long to complete its task.

Will this keep going forever? What if we hit it with a concurrency of 10 times our sweet spot? 100 times? Well you’ll not get above 20 req/s but the average request time will just keep going up and up. Eventually you’ll start hitting bottlenecks in other areas and the OS will be spending an increasing amount of resources swapping processes on and off the CPU cores. This means it becomes increasingly inefficient. So you’ll see the number of req/s actually start to decrease as you push the concurrency higher.

Here is a graph of actual results gathered on my box using our example script:

Selection_035

As you can see we hit the 20 req/s plateau very quickly and the throughput remains pretty much constant no matter what concurrency we throw at it, although you can start to see it tailing off slightly at higher concurrencies. The response time on the other hand starts off on the 200ms plateau but very soon starts rising linearly. The full results are publicly available on google docs

Dstat and apachebench Together

If you monitored dstat while doing the above tests you’ll see that the single concurrency example doesn’t just use one core, and the 2 concurrency example uses more than 2 cores. This is because the OS is choosing a different CPU core to process your request each time so on average with the single concurrency example you should see 25% utilisation across four cores rather than 100% on one and 0% on the others.

Monitoring Apache worker processes

When we were hitting the server with a concurrency greater than 1 the OS was spreading the load across multiple processes across multiple cores. This is because apache is (in some configurations anyway) set to spawn multiple webserver processes. Each process is effectively independent of the others and your OS can move them around CPU cores as it pleases. Because they are separate processes as far as the OS is concerned you can quite easily ask the OS to tell you how many apache processes are running. Run this command to see all your apache processes (you might need to grep for ‘httpd’ rather than apache on some systems):

rathers@geoff:~$ ps aux |grep apache
root      1369  0.0  0.2 127408  7964 ?        Ss   11:36   0:00 /usr/sbin/apache2 -k start
www-data  1391  0.0  0.1 127432  5012 ?        S    11:36   0:00 /usr/sbin/apache2 -k start
www-data  1392  0.0  0.1 127432  5012 ?        S    11:36   0:00 /usr/sbin/apache2 -k start
www-data  1393  0.0  0.1 127432  5012 ?        S    11:36   0:00 /usr/sbin/apache2 -k start
www-data  1394  0.0  0.1 127432  5012 ?        S    11:36   0:00 /usr/sbin/apache2 -k start
www-data  1395  0.0  0.1 127432  5012 ?        S    11:36   0:00 /usr/sbin/apache2 -k start
rathers   3191  0.0  0.0  28076   844 pts/1    S+   12:26   0:00 grep --color=auto apache

This should show you the grep command itself, the master apache process (running as root) and the worker processes (running as a low-privilege user). On my system I have 5 worker processes. So to monitor how many apache worker processes are running at any one time you can run this nifty little command:

watch -n 1 "ps aux |grep apache |wc -l"

This actually reports an artificially high number for reasons I don’t fully understand but it gives you a good tool to be able to monitor the increase and decrease in the number of apache processes. If you set this running and then run one of the high concurrency apachebench tests (concurrency of 20 say) then you’ll see the number or processes steadily rise until reaching a plateau. Once apachebench has finished its test then you’ll see the number slowly decrease back down to a low plateau. I’ll look at this in more detail later but this should give you an idea of the how apache is spawning more processes to deal with the concurrency (the number of worker processes should roughly match the concurrency) and how the OS is distributing the processes around your CPU cores.

Summary

We’ve seen how to spike a single CPU core and then watch how the OS starts to spread the work amongst less busy cores when further CPU intensive processes are started. We’ve seen the effect of pure CPU time on the requests per second that an app can achieve and it’s relationship to response time. We then experimented with scaling our CPUs vertically and horizontally. Finally we looked briefly at how apache and the OS work together to spawn worker processes and spread them out across available CPUs in order to make the most of your system’s resources.

In my next post we’ll look at the performance implication of wait time which is a totally different beast to CPU time.

Factory workers on a production line

We are People not Resources; The poisonous ‘R’ word

Factory workers on a production line

Disclaimer: The slightly vitriolic parts of this post are not aimed at anyone in particular! 🙂

What is a ‘resource’

Where I work many of the staff have a habit of referring to people who ‘do’ stuff (like software developers, testers, web designers etc) as ‘resources’. They don’t use it as some benign collective noun, it is used to refer to individual people explicitly! Here are some examples of usage that I hear daily:

  • How many resources are working on that?
  • We need some more design resource for that project.
  • I have a spare resource this week, what should he work on?
  • Which resource is picking up this bug?
  • I need to refer to my resource plan

These sorts of phrases are uttered by people as though it were perfectly normal. The terminology is so conventional they might as well be asking what you had for breakfast. As much as I would like to remain professional, my response to people who talk like this is F**K YOU! Quite frankly to refer to people who are educated, qualified, intelligent, experienced and passionate as ‘resources’ is incredibly offensive to say the least. This demeaning term implies that we are:

  • On a par with ‘materials’
  • As disposable as paper clips
  • All the same
  • Infinitely interchangeable and replaceable
  • Little more than numbers on a balance sheet

Clearly this is not the case.

There is much literature written about the sociological aspects of software development teams which is beyond the scope of this post but it is immediately apparent to anyone who has worked in a software development team that none of the above statements are true. So why does this offensive practice continue?

My view is that the term is a symptom of a fundamentally broken but deep rooted approach to managing projects and people. The approach is an attempt to simplify software development and model it as though it were a factory production line. You’ve got your business analysts shoving requirements in at one end, the developers in the middle making the ‘goods’ which are then passed to the testers (or should I say test resources?) who allow a production quality product out the other end. Laughably my department was even once re-branded as a ‘Software Factory’. It’s enough to make you weep!

In this model the developers and testers are simply ‘resources’; generic units of man-power. The more you assign to a project, the faster the backlog of work is completed. With the factory model this is probably correct. If you have 5 people packing fudge and they can pack 500 boxes per hour between them, you would reasonably expect 10 fudge packers to be able to do 1000 boxes per hour. Same with software development right? We aren’t getting through the work quick enough, so let’s shove another 5 development resources on it, problem solved. Or not. In 1975 Fred Brooks published a book called “The Mythical Man Month” that discussed the fallacy inherent in this ideology, yet nearly 40 years later this is still going on and appears to be rife among companies that are household names.

Poisonous

The ‘R’ word is also poisonous because it seeks to widen the gap between the shop floor drones (i.e. the highly skilled software developers) and the ‘management’. Managers discuss what their resources are working on behind closed doors, shuffling people around on their ‘resource plans’ and hopelessly trying to make sense of the scheduling mess they have created with their flawed ideology. To me the usage of the word resource implies a certain ignorance of the fine art of scheduling software development projects.

At one point I got so sick of this that I launched a fight back against the ‘R’ word. I raised awareness of the word’s implicit evilness and explained that quite simply:

We are people not resources!

I put the word ‘resource’ on a par with the worst swear words and made people cough up a pound every time they uttered it. I’m not sure how much was collected in the end but it certainly got people thinking and the team had a nice bit of cash to spend at the Xmas party!

So please join me in the fight against this most disgusting of terms and tell your management that you’re a person not a resource! Correct them when they use the term in your presence. Tell them that Dilbert is satire, not an instruction manual. Ask them to read Brooks’ book and step out of the 1970s.

Superman vs software testers

Coding Heroics vs Testing; Delivering software efficiently and effectively.

Superman vs software testers
How much process do you need to run a software development team? Or a technical organisation in general? Some? Lots? None? Maybe you’d opt for the likes of ITIL and Prince2? Or something lighter weight and more pragmatic like Scrum? What about testing of software? How much should you do?

I do not claim to know the answers to these questions and obviously the answers depend a lot on your team, your organisation and the product you are developing. However I have noticed widely differing attitudes towards testing amongst engineers. The two angles:

Test Everything – Process wins

How can you be sure that you are releasing working code unless you test it? You’ll need to do user acceptance testing at a minimum, preferably system/integration testing too. If your application requires it then performance testing, security testing and penetration testing. Ideally all developers should be utilising test driven development and writing suites of unit tests that cover their code 100%. Ideally all of this stuff will be automated but would have at least one round of manual testing too to ensure the tests themselves are correct (testing the tests)..

When code is changed you can never be sure what might be affected so you’ll have to run all the tests every time new code is prepared for release. Code must be tested first by the developers in a dev environment, then pass to a testing or integration environment where most tests are ran again. Then finally everything is released to a live-like pre-production environment where regressions testing is performed. Then you can put the code live.

Test Nothing – Heroics win

How can I be sure that I’m releasing working code? Because I’ve checked it in my dev environment and it works. I have thought about the edge cases and defensively coded around them. I have had to refactor some things but I am confident in my approach and there should be no affect on the refactored code. I know my code will still be as secure and scalable as the previous release because I haven’t written anything that could introduce any issues. When I release this code I will monitor the production system carefully for unforeseen errors. Because I have access to the production environment and a quick lightweight release procedure I can easily rollback if things go horribly wrong. Any minor bugs that are found in live I can fix later.

Everyone in my team knows the entire codebase just as well as I do so we can all be confident in each other’s ability to code and release like this. When we have new starters we spend a long time training them and gradually easing them into doing releases that increase in complexity with their level of experience.

Which test strategy is for me?

Obviously if I were developing a core accounts system for a bank the heroic method probably isn’t going to work too well. Conversely if I were developing a low-traffic company intranet the process approach would be overkill.

I have seen both approaches work within the same team. The heroics approach is wonderfully efficient when it works as there is very little to get in the way of writing useful functional code and getting it released. You can achieve a staggering throughput with relatively few people. It can be scary doing releases but it’s probably worth it in order to get more stuff out the door.

The process based approach gives you a lot of confidence in the code you are releasing; heart-in-mouth moments and failed releases are almost non-existent, bugs are reduced in scale and severity, sleeping at night is easier! Once the effort has been put in to writing tests it is fairly simple to then automate them. Running the full suite of tests every time someone commits a change to the code allows you to get rapid feedback on whether it has affected anything else and seeing a green report that all tests have passed gives you a lot of confidence. When paired with manual testing you gain even more confidence that your tests are valid and appropriate.

The heroic approach totally relies on having very technically strong, multi-skilled developers. As soon as even an average developer enters the team and tries to play the same game, problems start to appear. Anyone who cannot adequately assess the impact of their code changes will encounter problems, big problems. It also doesn’t scale very well as it relies on everyone knowing about all areas of your systems. Not only does this experience take a long time to gather but it becomes exponentially harder as the team grows: The more team members you have, the more code you produce, and the longer it takes to gain experience of it. As the team grows you’ll be spending more time learning existing code and less time writing new code, or more likely you’ll ignore the learning bit and just write more code. The team becomes less effective, bugs and outages increase.

The process approach requires a great effort put into testing. Writing unit tests, integration tests, user acceptance tests, however much you want to do. Writing tests that give good coverage is a significant effort and in some cases may be comparable to the effort to actually write the code in the first place, perhaps even more. In order to be able to rely on your automated tests to give you confidence you have to make sure that all new code has good automated test coverage and you have to maintain all your existing tests too. If you fall behind maintaining decent coverage the value of your test back-catalogue is eroded and you lose confidence in your tests; effectively defeating the purpose of having them in the first place.

Clearly it isn’t a case of picking one or the other option, there are as many shades of grey between those extremes as there are development teams in the world. The main point is to think where on the spectrum your team should sit. This will depend upon many things such as:

  • How risk averse your company/team is
  • Stability requirements of your product
  • Size and make up of the team
  • Maturity of the team
  • Strength and style of developers in the team

It is useful to bear in mind this spectrum when responding to production failures. When you release something that experiences problems and you inevitably get asked the question “what are you going to do to stop this happening again”, the answer you might naturally turn to is “do more testing”. However in this case it may well be better to convince stakeholders that it would be more beneficial to take this sort of thing on the chin and to do less testing, not more.

Thanks to Cartoon Tester for the testing cartoon 🙂

Man programming a punch card machine

The Coding Paradox; Write less to write more

Man programming a punch card machine

I Just Want To Write Code!

That is something I hear from a lot of developers; they aren’t interested in any of the other activities involved with delivering a software project. They dont want to speak to people, gather requirements, design architecture, attend planning meetings, write documentation, fix bugs, monitor production services etc. I totally get this, I feel the same way to some extent. Those of us that fell in love with coding by sitting in our bedrooms until the early hours writing hundreds of lines of code will probably get it. You fall in love with coding. No one falls in love with planning meetings (if you do then you need to see a doctor!).

The above attitude is obviously not conducive to having a successful team of engineers with a breadth of experience, nor would it do your career any good if you adopted it. But that is not what I want to discuss here. Rather I have noticed an amusing paradox to the above attitude, that is the more code you write, the less code you can write in future.

If your organisation works anything like mine then for every line of code that you write there will be a future cost or liability associated with it. Imagine if you’re so wrapped up in the code for a specific project that you voluntarily spend your entire weekend working on it, getting some of the high-importance low-urgency items off the backlog. Maybe you completed that bit of hairy refactoring work on the core objects that we’d been putting off for months? Brilliant, fantastic! That will make our lives much easier! 🙂

Or will it…

Now you have changed all that code the following things will probably happen:

  • You’ll have to explain and document the changes for the test team so your changes can be put into production
  • Assuming your changes alter the way in which the app behaves in production, even slightly, then you’ll have to explain and document the changes for the support team so they can still effectively support it.
  • Assuming you’re using Scrum or similar, you will have to demo your changes to the whole team.
  • Maybe your changes affect other applications that use those objects in some way. You may have to refactor those apps too (and here we’re into recursion of all the other points!), or at the very least explain and document the changes for other teams so they understand how to modify their app to accommodate your changes.
  • You will have to explain (and possibly document) the changes for the rest of the developers on your project so they understand what you have done, why you’ve done it and how it may affect them.
  • You will become the de facto technical authority on these changes so if anyone works in this area in future they will come to you asking for advice.
  • Any future projects building on the foundations you have created may well pick you out as a design authority, dragging you into early project discussions around feasibility and architecture.
  • Your proactive stance on pushing the team forward may well push you into a de facto technical lead role meaning more and more people will be coming to you for advice, mentoring, code reviews etc.

Not all of the above will apply to every commit you ever make but the principle holds that the more code you write, the more of these other tasks will crop up on your TODO list, which is not what you wanted is it? You just wanted to write code!

So next time you’re on an all-night coding binge, have a think about the horrid work you are queuing up for yourself!