Strategy for Performance Testing & Improvement
Nitty-Gritty of measuring and improving performance of an application
Software application today are built for scale. In the world of internet, your application has a potential to reach billions of users across the globe. So it isn’t enough if your application is built to work logically correct but should be able to handle huge traffic.
The methodology to test your application under stress of huge traffic and load is commonly known as performance testing or load testing. The outcome of it could lead to multiple corrective measures, which is out of scope to this article. In this article, we will focus on what we intend to check as part of performance testing.
Key Tenets of Performance Test
Memory : The amount of memory your application can consume on any machine is limited, commonly known as heap memory. If your application consumes more than the allocated heap memory, your application will crash or turn unresponsive.
CPU : The number of processors/cores on a machine is finite, and your application needs to be mindful of it. As CPU utilization reaches close to 100%, your application will turn unresponsive.
Latency : Time taken by your application to respond to a request. The lesser the latency, the better it is.
Fluidity : Applicable if your application has a user interface (UI). Is your UI responsive enough for user actions such as fast/slow scrolls and navigation between different screens. Due to its close association with the UI, it’s directly associated to where your application is used such as android, iOS or web browser etc. It gets very subjective and would park it as out of scope for this article.
Compromising either of the above tenets could lead to bad customer experience.
Standard Operating Procedures (SOP) to measure the tenets
Every application’s use case could be different and hence the way to test too will vary. The following could be general guideline
Memory/CPU : One of the most common mistake is, measuring at a specific point of time, which doesn’t give true picture. What’s more critical is measuring it over a period of time as a time series data. In the lifecycle of your application, define a start and end time (system steady state). Measure memory and CPU at periodic intervals (say every 3 seconds) between this interval. You can visualize with a line graph. Key things to focus
Maximum memory
Minimum memory
Time taken for the application to reach a steady state
Latency can be measured in one of the following way
User Perceived Latency (UPL) from UI : Perform the user action and measure the time taken as how your customer would experience it.
API time taken from logs : Add info log messages at the start and end of your API and calculate the latency from the difference in timestamp of the logs.
Next steps towards fixing
There is no one solution that fits all as they are very specific to each application, but there could be general guidelines as best practices to approach the not so straight forward performances fixes.
Memory
Is the maximum/peak memory consumed within acceptable limits ?
If yes, then we are good. Else, take a heap dump when the memory is at peak and analyze the objects held in memory. Explore opportunities to optimize your code to reduce memory.
Does the memory reduce to acceptable limits when in steady state ?
If yes, then we are good. Else, take a heap dump in steady state and analyze the objects held in memory.
Explore opportunities to optimize your code to reduce memory, adhering to best coding practices. Few basic recommendations
Lazy loading of objects
Reduce the scope of objects. Avoid global scope, such as creating static objects. Create object where ever possible in method scope instead of class scope etc.
Limit the size of your in memory cache (if any)
Close connections and streams diligently.
Prefer encapsulation to sharing of objects.
CPU
Are you abusing threads ?
Keep it simple. Prefer sequential over parallel. If sequential operations fulfill your requirement within permissible limits, then don’t try to perform parallel operations with multiple threads. Apart from consuming CPU, it also makes the code hard to read and debug.
Can synchronous call be converted to asynchronous?
We often end up making calls to other applications or process. Identify your requirement and make a wise choice between synchronous and asynchronous invocations. Synchronous is always a blocking call and holds up CPU resources until the completion of operation, unlike asynchronous invocations.
Is asynchronous always better than synchronous ?
No, not always. Operations which are time sensitive and there is no wait on other process or resources, asynchronous is an overkill. For example, Broadcasts in Androids. In asynchronous operations, there are listeners periodically listening for broadcasts. When there are multiple broadcasts and listeners, applications are competing for threads which are finite. For small operations, it could lead to overkill on CPU and latency.
Latency
Fixes are typically not isolated to a particular tenet of performance. If you compromise or improve on one, you would notice a ripple effect on others.
Breakdown to sub KPIs
Breaking down the latency to a more granular level, helps you identify the time taken by each component in the chain. Once we have a break-down at the most granular level, explore and analyze code at each level for scopes of improvement.
Summary
Performance is a key aspect to every application and has many dimensions. Unlike many problems, you need both high and low level picture to address and improve the performance of an application. It isn’t straight forward and easy but definitely interesting and rewarding. The solutions aren’t isolated to a single dimension, rather each change could cause a ripple effect across all the dimensions.