By Glenn, with help of Johan, Dimitri & Guido
“We’re gonna performance test our system so we’re good to go (live)” — Something every single one among us said at some point probably. What happens next is unpredictable. Many different paths of execution follow after making such a statement. At De Persgroep IT we’ve learned that it is important, even though not everybody has an idea on where to get started. Often people start Googling for “load testing tools”, expecting to find a silver bullet solution which involves hitting a single button and announcing that everything is good. Allow me to try to give some insights in how you deal with the concept of performance testing.
Why spend resources on performance testing?
Let’s take a few steps back from installing a load testing tool. First ask yourself why you want to performance test to start with. I guess you want some answers to questions such as:
- What’s the performance of our system?
- What’s the response time of our system?
- What’s the capacity of our system?
- What’s the throughput of our system?
- How many (concurrent) users do you support?
- How do you scale efficiently?
- Are you ready to release this product in the wild?
- How does provisioned performance relate to cost?
The questions are all related to each other, even though the testing strategy might deviate a bit. So after reminding yourself why you want to performance test, take a look at how you want your system to actually behave. Because, you know: how can you cross the finish line without knowing its location?
This brings me to the architectural quality attributes of the system. What do they say about performance, availability, and capacity? Many projects without proper architectural work involved will hit a barrier at this point already. Are they even defined? Or are they still spread out in different people’s heads in a non-scientific manner? Not having these specifications should set off an alarm. You should go and get them. Good, so you agreed upon an SLA of 250ms for the API HTTP response? And an SLA of triple nines for availability? And even an SLA for support of 500,000 concurrent users? Did you challenge your stakeholders enough to agree upon an SLA? Can you successfully multiply the expected SLA in case your stakeholders under provisioned , and how much can you multiply?’ I guess you’re ready to continue.
Where do you get off?
So how do you go about performance testing? The most important thing to mention is that performance testing is a process with probably a lot of bumps along the way. Something that’s widely misunderstood is that running a performance test is not a solution, but rather a way to bring out problems involving speed and capacity. Performance testing leads to more work on the short term, but less in the long term. And a chance to save face. I can’t stress this enough. I often hear people planning to run “a performance test” the day before the deadline. As if it’ll magically prevent problems from occurring on the deadline. Anyone calling out such a plan should be redirected to “doing computers 101”. Yes, I feel strongly about that.
A process involving bumps, where every bump is actually a chance to learn about your system. How it behaves. Every bump is probably something unanticipated and surprising that needs to be tackled. You’ll be applying fixes and hurdle your performance test to the next bump and happily accept that bump being present. Because, you know: you just got an opportunity to fix it before it showed in production. Remember it’s easy to consider whether tackling yet another bump is worth the effort. You can just take a look at what your system is expected to be capable of. Are you checking all the boxes in the quality attributes? Well, there you go. Your system exudes enough quality to halt right there. Never forget it’s completely unnecessary to deliver a system capable of landing the Mars Rover, when you’re building a mobile app for personal use to control our entertainment system at home.
Having a management that actually supports your performance testing efforts is a blessing. It shows software delivery maturity. Remember: the sooner in the process you start performance testing, the less likely you’ll be surprised during production operation of your system. Also, if it weren’t clear yet: a single performance test will not do the trick.
What does it look like?
Let’s take a look at what a performance test actually looks like. They come in many shapes and forms. A very basic performance test may just be a looping curl or ab (Apache Benchmark). But there are also custom Python scripts, complex Jmeter profiles, and fancy cloud solutions simulating traffic. In the end they all do pretty much the same thing, though. They fire requests and await a response. Choosing the appropriate tool basically depends on how complex your performance test will be. For example, are you going to:
- test multiple endpoints?
- use Keep-Alive connections?
- use SSL or HTTP 2 connections?
- need data sets of IDs to cycle through?
- need a distributed performance test setup involving multiple agents?
- simulate any other traffic than simple GET — requests?
You should be aware that most cloud solutions are just reselling existing tools on scale. Let’s say you don’t have a decent systems guy at hand to setup a distributed environment, you could opt for a paid cloud solution. Say you’ll only be evaluating the performance of a single GET — request without any ID cycling, you could just as well go for the ab tool. You might end up with a Jmeter XML — file, a bash script looping curls, or a saved cloud test. Whatever it is, it’ll be a description of what requests need to happen.
Each such description will have some configuration options. The most obvious being the duration of the performance test. Other typical configuration options are the amount of concurrent threads firing requests, the amount of think time in between requests, the ramp up period, the ramp up scheme, and the location of the data sets containing IDs to cycle. Or maybe you want to limit traffic to a certain level of throughput? Setting the connection timeout and request timeout is another rather important one. Even more complex setups will maybe grant you a choice on how many different servers will be used to fire requests from. Configuration options will vary depending on our description you’ve built for our specific use-case.
Let me show you the simplest form of a performance test:
It says:
- Get me / on https://www.persgroep.be
- With Keep-Alive enabled (-k)
- With 5 concurrent threads (-c)
- For a total amount of 100 times (-n)
It doesn’t get much simpler than that, right? You’ll probably find yourself running a lot more complex performance tests to actually be able to sign off on the required quality attributes, but this gives a basic idea of what a performance test actually is. For the purpose of this article, I’m not going to dig deeper into more complex setups.
The traffic it produces
A more complete performance test will probably involve multiple requests. You’ll test multiple features of the system in a single performance test. Even though we often see microservice architectures, often these small components still provide multiple features. Your system might find some features more difficult to process than others. You could see very high throughputs on feature X and a lot less throughput on feature Y. I don’t think I’m surprising anyone here. This leads me to correct load distribution. Performance tests should be realistic. The traffic you simulate should be as close as possible to what it will be in real production operation. This means balancing traffic between the different features in a correct scientific manner, not only on gut feeling.
There are two possibilities. Either you’re replacing an existing system, or you’re releasing a brand new system. It’s rather easy to get insights in real production traffic if you’re replacing an existing system. Just process the access logs or analytics of that existing system. The performance tests should basically be a replay of that traffic.
If you’re building a brand new system, you have some calculation on your hands. Although, this will never have an exact outcome as it will be in real production operation, you should be able to approach this rather closely. A common tactic would be to produce user journeys based on actual user testing. You’ll have to translate these user journeys in exact traffic for our system. This isn’t a trivial approach, but perfectly possible. I often found our user journeys mappings to be very close to reality.
The different performance tests
We often hear different words when talking about performance testing. Some talk about load testing, others about soak and endurance testing. We sometimes even talk about spike and stress testing. Is there a difference?
It all starts with the performance test, which you’ll use to try to validate the speed. After that you’ll hit your system with more traffic, hence starting a load test to validate whether it remains performant under more traffic. You could induce short peaks in traffic to again validate whether it remains performant. This would be a spike test. Soak/endurance testing would be running any test for a longer period of time, say 24 hours. Finding the breaking point of our current system by steadily increasing traffic is a stress test. As you can see, all kinds of tests are very similar. In the end they’re the same test description, as described above, with different configuration properties.
The finish line is in sight!
When do you know that you’ve hit the target you craved for? When do you send “Great Success!” — emails? Well, you need metrics. Many metrics. Metrics are cheap. Often you have many more to your disposal than you might think, really. Now this is where it gets technical and where you need the right people to read and interpret them.
Obviously for the pure performance test, you want to monitor response times. How fast did the responses get to you? How many end up in an error? Are you on spiky Wi-Fi? Are you passing through a CDN, a reverse proxy, or a NAT translating machine? Every single machine along the network path will have a potential impact on your response times. Where’s the bottleneck? You might want metrics for each layer there.
As soon as you ramp up the load for our load test, you’ll want to start monitoring metrics such as CPU load averages, CPU utilization, memory and swap usage, network bandwidth usage, thread limitations, CPU steals, and a lot of other metrics. Being scared to get too technical, I won’t dive deeper into what’s a good value and what’s a bad value for each of them. Again, you probably need a guy with enough system engineering affinity to interpret these correctly. Having visualizations of these metric values in graphs will help interpreting these metrics as well.
You should also not forget that metrics are collected per machine. If your system involves several copies of the same machine — you know, to load balance and provide high availability — you should be aware of the aggregate results of these metrics. A single bad value from a single machine might be beclouded by the aggregate as a whole.
If your system depends on other systems in real-time, you should be aware of the impact these dependencies have on you. You should have insights in their behavior in order to correlate potential bad results to their behavior. Also, you should have agreed upon SLAs before accepting a dependency to your system.
Let’s also not forget about the simplest check possible. Check the front facing facade of your system. How does it behave under load? Do you see some obvious silliness right there?
To wrap up
A system typically exists of multiple moving parts. They will impact each other. Many metrics will show related behavior as well. During the performance testing process you might see strange behavior that you’ll find difficult to reproduce consistently. A classic example being that latency impacts volume throughput directly. Analyzing the metrics and application logs close enough will show you what you need to be actionable at in order to create consistency.
Don’t forget that testing in any other environment than the one you’ll be using for production operation, might have some nuanced behavior compared to the real production environment. In the end, the only environment that can be trusted is production. Machines might be scaled down, stores might have less capacity, and your third party dependencies might be scaled down as well. Performance testing in production is a perfect possibility, but you should avoid putting it under breaking stress. It might take some experience to deal with an existing production environment.
Remember that the performance test results of your system will change over time. Features will be added or refactors might occur. It’s not always clear what the impact is on performance. Re-run performance tests. It should never stop.
It’s important to not change performance test configurations until the metrics are all showing up in green. We should keep the previously defined performance test as it is, and look for problems in going where we want to be. As mentioned in the beginning, we’re trying to take away the surprise factor during production operation.
I hope this article gives a respectable overview of how you go about performance testing and what it encapsulates. If you’ve found this interesting, you might find our post about chaos engineering an interesting read.