Failover and Recovery Testing

Lecture



Failover and Recovery Testing tests the product under test in terms of its ability to withstand and recover successfully from possible failures caused by software errors, hardware failures or communication problems (for example, a network failure). The purpose of this type of testing is to check the recovery systems (or duplicate main functionality of the systems), which, in case of failure, will ensure the safety and integrity of the data of the tested product.

Testing for failure and recovery is very important for systems operating on the “24x7” principle. If you create a product that will work, for example, on the Internet, then you simply cannot do without this type of testing. Because every minute of downtime or loss of data in case of equipment failure can cost you money, loss of customers and market reputation.

The method of such testing is to simulate various conditions of failure and the subsequent study and evaluation of the response of protective systems. In the process of such checks, it is determined whether the required degree of system recovery was achieved after the occurrence of a failure.

For clarity, consider some options for such testing and general methods for their implementation. The object of testing in most cases are very likely operational problems, such as:

  • Failure of electricity on the server computer
  • Electricity failure on the client computer
  • Incomplete data processing cycles (data filter interruption, synchronization interruption).
  • Declaring or inserting impossible or erroneous elements in data arrays.
  • Failure of data carriers.

These situations can be reproduced as soon as some point in development has been reached, when all the recovery or duplication systems are ready to perform their functions. Technically implement the tests in the following ways:

  • Simulate a sudden power failure on the computer (de-energize the computer).
  • Simulate loss of communication with the network (turn off the network cable, de-energize the network device)
  • Simulate media failure (de-energize external storage media)
  • Simulate the situation of the presence of incorrect data in the system (a special test suite or database).

If the appropriate failure conditions are reached and the results of the recovery systems, you can evaluate the product in terms of failure testing. In all of the above cases, upon completion of the recovery procedures, a certain required state of the product data should be achieved:

  • Loss or damage of data within acceptable limits.
  • A report or reporting system indicating processes or transactions that were not completed as a result of a failure.

It is worth noting that testing for failure and recovery is a very product-specific testing. Development of test scenarios should be made taking into account all the features of the system under test. Taking into account the rather rigid methods of exposure, it is also worth assessing the feasibility of conducting this type of testing for a specific software product.


Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Quality Assurance

Terms: Quality Assurance