System stress and load testing

System stress testing is run after integration testing to shake out bugs before they become critical field problems. The main objective is to "burn-in" the system in the lab environment. If the system is subject to conditions that are harsher than the field, there is a good chance that all the show stopper bugs would be caught before deploying the system in the field.

System stress testing can be divided into the following steps :

Feature Interference Tests
Interference Load Tests
Stress Load Tests

Feature Interference Tests

End to end testing of features is generally tested fairly well during integration testing. However, interactions between features are generally left out in this stage of testing. This hole is filled in by feature interference testing. Feature interference testing involves testing each feature offered by the system in presence of every other feature.

The best way to develop feature interference tests is to produce a feature interference matrix. The feature interference matrix is discussed below.

Feature Interference Matrix

Feature interference matrix is produced by making a cross of each feature offered by the system with every other feature. A table is drawn by having the list of all the system features as the first horizontal row and also as the first vertical column. Then each box in the table corresponds to the cross of two features depending on its position horizontally and vertically. A test should be developed for each such box. This table then corresponds to the feature interference test matrix.

The feature interference matrix can be easily populated by considering the row and column features for each box. A test case is identified by assuming that the row feature executed first and the column feature started execution while the row feature was in progress. Here is an example of a feature interference matrix that should clarify the population of the matrix. The matrix has been produced by taking a sub-list of the features offered by a switching system.

Features	Originating Subscriber Call	Terminating Subscriber Call	Switching Processor Failure	Central Processor Failure	Subscriber Port Failure	Operator Commands
Originating Subscriber Call	Test 1-1: Handling of two originating calls from a single PBX.	Test 1-2: Verify that a terminating call is rejected if an originating call has been setup for the subscriber.	Test 1-3: Verify system behavior when Switching processor fails after an originating call has been setup.	Test 1-4: Verify system behavior when Central processor fails after an originating call has been setup.	Test 1-5: Verify handling of an originating call when the subscriber port fails.	Test 1-6: Verify handling of an originating call when operator puts the subscriber port out-of-service.
Terminating Subscriber Call	Test 2-1: Add test to verify that an originating call is rejected if a terminating call has been setup for the subscriber.	Test 2-2: Add test to verify handling of two terminating calls to a single PBX.	Test 2-3: Verify system behavior when Switching processor fails after a terminating call has been setup.	Test 2-4: Verify system behavior when Central processor fails after a terminating call has been setup.	Test 2-5: Verify handling of a terminating call when the subscriber port fails.	Test 2-6: Verify handling of a terminating call when operator puts the subscriber port out-of-service.
Switching Processor Failure	Test 3-1: Verify that an originating call cannot be setup when Switching processor for the subscriber port fails.	Test 3-2: Verify that a terminating call cannot be setup when Switching processor for the subscriber port fails.	Test 3-3: Verify handling of calls of when multiple Switching processors fail at the same time.	Test 3-4: Verify system behavior when Central processor fails when a Switching processor has already failed.	Test 3-5: Verify system behavior when a subscriber port fails when a Switching processor has failed. The system should detect the port failure when the Switching processor recovers.	Test 3-6: Verify that operator commands for a failed Switching processor are rejected by the OMC.
Central Processor Failure	Test 4-1: Verify that an originating call can be setup when a Central processor has failed.	Test 4-2: Verify that a terminating call can be setup when a Central processor has failed.	Test 4-3: Verify system behavior when Switching processor fails when a Central processor has already failed.	Test 4-4: Verify that no calls can be supported when a Central processor fails when another Central processor has already failed.	Test 4-5: Verify system behavior when a subscriber port fails when a Central processor has failed.	Test 4-6: Verify that operator commands for a failed Central processor are rejected by the OMC.
Subscriber Port Failure	Test 5-1: Verify that call setups are rejected on a failed port.	Test 5-2: Verity that call termination is rejected on a failed port.	Test 5-3: Reboot Switching processor and verify that the number of failed ports for the Switching before and after the reboot is same.	Test 5-4: Reboot Central processor and verify that the number of failed ports in the system before and after the reboot is same.	Test 5-5: Verify that simultaneous failure of two ports in the same Switching is handled correctly.	Test 5-6: Verify that a failed port can be put out-of-service by the operator.
Operator Commands	Test 6-1: Verify that a subscriber will not get dial tone and the originating call will fail if operator has put the subscriber port out-of-service.	Test 6-2: Verify that a terminating call will fail if operator has put the subscriber port out-of-service.	Test 6-3: Reboot Switching when an operator command for subscriber port on the Switching is in progress. Verify that the command failure is reported with the correct reason.	Test 6-4: Verify the clearing of all calls when one Central processor fails when the other one is already put out-of-service by the operator.	Test 6-5: Verify that a subscriber port failure is handled even when an operator command is in progress for the same port.	Test 6-6: Verify that the system is able to handle simultaneous commands from two different operators for the same entity. The commands should be executed one after the other.

Interference Test Procedures

Once you have developed the feature interference matrix, define detailed test procedures from the matrix. The test procedures can be divided into two categories:

Simple tests involving initiating a row feature followed by a column feature
Load tests involving multiple instances of row and column features. We will discuss this in the next section.

Interference Load Tests

Interference load tests are identified from the feature interference matrix. Basically you run simultaneous load for different features. Here load does not only mean handling subscriber load. Load could also mean repeatedly executing operator commands via a script , rebooting boards periodically.

Interference load tests are best explained by examples from the above matrix:

Run subscriber load (originating to terminating calls) and operator command load overnight.
Run subscriber load with periodic Central and Switching processor failures.
Run subscriber load and periodically inject faults in subscriber ports.

Stress Load Tests

Stress load tests are the final step of system stress testing. Here the system is subjected to field like conditions. Actually the conditions for these tests are harder than what the system would have to handle after deployment.

Stress Load Testing Guidelines

Overload the system. During stress load conditions, the system should be subjected to harsher conditions than the field environment. By doing this you can make sure that the system will run stably for extended periods of time. Thus a weekend stress test might give you confidence that the system would survive a month of regular system operation.
Load test the system with field type traffic mix. Run a traffic mix that is close to the expected load in the field conditions. Many times field traffic mix data can be obtained from studies and papers on that subject.
Load test the system with traffic that is varying with time. When the system is deployed in the field, it will be subjected to huge fluctuations in traffic. Simulate such fluctuations in the lab. Keep in mind that there might be bugs in the system which show up only with fluctuating traffic. Most system have bugs in handling of high load as well as low load conditions.
Load test the system with events that have random inter-arrival time. Run load such that the inter-arrival traffic distribution is random. This will exercise several legs of your code that you could not even imagine. To make the random tests reproducible, seed the random number generator with a known value before the load test. This way you would be able to recreate the exact test conditions by feeding the same random number seed.
Load test the system with events that have random service time. Do not run load with a fixed call/session duration. Use random session durations during your load tests. For best results you should use a load generator that works with Poisson inter-arrival and service times.
Load test everything. Do not restrict your load testing to just subscriber load. Load testing fault conditions and operator commands will make sure that these features do not display memory leaks and other slow build up faults.
Measure load performance. Load runs are not just for verifying the stability of the system. Always measure and plot the performance of the system during the entire load test. The best way to do this is to use tools that plot graphs showing system performance.