Sunday, December 23, 2012

Dealing with Stress


Stress failures and bug advocacy - looking at stress tests from a value perspective

Stress is part of your test strategy. You use it as a tool to test your product and find bugs. This is one of the “non-functional” test categories you run. Did you devote the time to think about what is actually being tested by your stress tests?

You may continue to test without answering this question, but when the time comes for bug advocacy, you have to defend your test strategy and findings, and this may force you to search for an answer.

What are we stressing for?
1)      Statistical failure  - Stress increases the chances of the appearance of a sporadic defect since it executes a flow a lot of times
2)      Run stability tests in a shorter time – the stress speeds up the time factor – failure reveals in a short time a defect that a system which runs in normal conditions (amount of data, number of simultaneous actions, etc.) will experience after a longer run.  A common example of such a failure is a memory leak found using the stress setup.
3)      Load (sometimes defined as a category by itself) – when we test how our system scales with multiple calls, large amount of data or both. Here, the failure reveals a point when the system fails to handle the load.
4)      Any  combination of 1, 2 or 3.

In a utopic scenario, when a stress related defect is reported, it follows the path of debug, root cause and fix. But in many cases, we will need our bug advocacy skills in order to convince our stakeholders of the need to fix the defect.

A typical bug discussion can start like this:
Developer: “Stress of 4½ hours and 5MB data files is not a normal usage of our system. A typical use case takes 15 minutes and a smaller amount of data. We should reject this bug.”
This point in the discussion can reveal whether you did your homework or not.
To decide that the failure is from the 1st classification – statistical, we need to decompose the stress to a meaningful use case and run it over and over while bringing the system to a clean state between the each use case. Automation can be a big help here.
If we succeed in reproducing the failure under such conditions, our report will transform from a stress failure report to a use case failure report with reproduction rate. When we have a sufficient statistical sample, the impact is clear.

Pinpointing whether the failure is related to time or to load is more complex, as we need to “play” with both factors in order to reach a conclusion about the amount of time, load or both that is needed in order to cause the system to reach a failure point. The awareness of the possible options is an important tool in bug advocacy. For example, it can enhance stakeholder’s perspective when you are able to say that “we’re not sure yet, but it is possible that we will see the failure in normal conditions after a long period of time.”
Doing complete research before reporting the stress failure can consume lot of resources and time, so I don’t suggest delaying the report till the tester has all of the answers. Many times, we can reach faster and better conclusions about the failure from a focused code review or a debug log analysis.

I would like to suggest the following: learn to classify your stress failures.  When you see and report a stress failure, treat it as a start of the classification and investigation. While sometimes the report will be enough to call for a bug fix, many times it will serve as a call for investigation. During the investigation – make clear to stakeholders what you already know and what you don’t know yet. Make sure that new findings are updated in the bug and don’t be afraid to change the title to reflect it.

There is much more to learn than the basics I summarized in this post. Learning more about stress in general and about your specific system, can help you classify and investigate your stress failures and no less important – plan your stress tests better.