I develop a client-server style, database based system and I need to devise a way to stress / load test the system. Customers inevitably want to know such things as:
• How many clients can a server support?
• How many concurrent searches can a server support?
• How much data can we store in the database?
• Etc.
Key to all these questions is response time. We need to be able to measure how response time and performance degrades as new load is introduced so that we could for example, produce some kind of pretty graph we could throw at clients to give them an idea what kind of performance to expect with a given hardware configuration.
Right now we just put out fingers in the air and make educated guesses based on what we already know about the system from experience. As the product is put under more demanding conditions, this is proving to be inadequate for our needs going forward though.
I've been given the task of devising a method to get such answers in a meaningful way. I realise that this is not a question that anyone can answer definitively but I'm looking for suggestions about how people have gone about doing such work on their own systems.
One thing to note is that we have full access to our client API via the Python language (courtesy of SWIG) which is a lot easier to work with than C++ for this kind of work.
So there we go, I throw this to the floor: really interested to see what ideas you guys can come up with!
Test 1: Connect and Disconnect clients like mad, to see how well you handle the init and end of sessions, and just how much your server will survive under spikes, also while doing this measure how many clients fail to connect. That is very important
Test 2: Connect clients and keep them logged on for say a week, doing random actions (FuzzTest). Time the round-trip of each action. Also keep record of the order of actions, because this way your "clients" will find loopholes in your usecases (very important, and VERY hard to test rationally).
Test 3 & 4: Determine major use cases for your system, and write up scripts that do these tasks. Then run several clients doing same task(test 3), and also several clients doing different tasks(test 4).
Series: Now the other dimension you need here is amount of clients. A nice series would be: 5,10,50,100,500,1000,5000,10000,...
This way you can get data for each series of tests with different work loads.
Also congrats on SWIGing your clients api to Python! That is a great way to get things ready.
Note: IBM has a sample of fuzz testing on Java, which is unrealted to your case, but will help you design a good fuzztest for your system
If you are comfortable coding tests in Python, I've found funkload to be very capable. You don't say your server is http-based, so you may have to adapt their test facilities to your own client/server style.
Once you have a test in Python, funkload can run it on many threads, monitoring response times, and summarizing for you at the end of the test.
For performance you are looking at two things: latency (the responsiveness of the application) and throughput (how many ops per interval). For latency you need to have an acceptable benchmark. For throughput you need to have a minimum acceptable throughput.
These are you starting points. For telling a client how many xyz's you can do per interval then you are going to need to know the hardware and software configuration. Knowing the production hardware is important to getting accurate figures. If you do not know the hardware configuration then you need to devise a way to map your figures from the test hardware to the eventual production hardware.
Without knowledge of hardware then you can really only observe trends in performance over time rather than absolutes.
Knowing the software configuration is equally important. Do you have a clustered server configuration, is it load balanced, is there anything else running on the server? Can you scale your software or do you have to scale the hardware to meet demand.
To know how many clients you can support you need to understand what is a standard set of operations. A quick test is to remove the client and write a stub client and the spin up as many of these as you can. Have each one connect to the server. You will eventually reach the server connection resource limit. Without connection pooling or better hardware you can't get higher than this. Often you will hit a architectural issue before here but in either case you have an upper bounds.
Take this information and design a script that your client can enact. You need to map how long your script takes to perform the action with respect to how long it will take the expected user to do it. Start increasing your numbers as mentioned above to you hit the point where the increase in clients causes a greater decrease in performance.
There are many ways to stress test but the key is understanding expected load. Ask your client about their expectations. What is the expected demand per interval? From there you can work out upper loads.
You can do a soak test with many clients operating continously for many hours or days. You can try to connect as many clients as you can as fast you can to see how well your server handles high demand (also a DOS attack).
Concurrent searches should be done through your standard behaviour searches acting on behalf of the client or, write a script to establish a semaphore that waits on many threads, then you can release them all at once. This is fun and punishes your database. When performing searches you need to take into account any caching layers that may exist. You need to test both caching and without caching (in scenarios where everyone makes unique search requests).
Database storage is based on physical space; you can determine row size from the field lengths and expected data population. Extrapolate this out statistically or create a data generation script (useful for your load testing scenarios and should be an asset to your organisation) and then map the generated data to business objects. Your clients will care about how many "business objects" they can store while you will care about how much raw data can be stored.
Other things to consider: What is the expected availability? What about how long it takes to bring a server online. 99.9% availability is not good if it takes two days to bring back online the one time it does go down. On the flip side a lower availablility is more acceptable if it takes 5 seconds to reboot and you have a fall over.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With