Improving the reliability of the software due to redundancy

Lecture



In real-time software systems, to ensure high reliability of their operation, it is necessary to detect distortions as quickly as possible and restore normal operation. In complex PS errors are inevitable, therefore, regular automatic verification of the program execution process and data integrity is necessary. Developers are required to create reliable programs that are resistant to various negative perturbations and are able to maintain a sufficient quality of results in actual operating conditions. The causes of the distortions are unpredictable and diverse, therefore it is not necessary to immediately establish these causes, the main task is to restore the normal functioning as quickly as possible and limit the consequences of defects. In order to ensure high reliability of PS operation, computational resources are required for the fastest possible detection of the manifestation of defects and automated measures that ensure the rapid restoration of normal PS operation.
For these purposes, the following operational methods of improving reliability are used:
I. Temporary excess;
2. Information redundancy;
3. Software Redundancy
Temporary redundancy consists in using some part of the computer's performance to control the execution of programs and restore (restart) the computational process. To this end, when designing systems, there should be a performance margin to be used to monitor and quickly increase the reliability of operation. The magnitude of the temporal redundancy depends on the requirements for reliability and ranges from 5-10% of processor performance to 3-4 times the duplication of the performance of an individual machine in multiprocessor computing complexes.
Information redundancy is the duplication of the original and intermediate data processed by programs. Redundancy is used to preserve the reliability of data that most affect the normal functioning of the PS and require considerable time to recover. They are protected by a 2-3 fold duplication with periodic updates.
Software redundancy is used to control and ensure the reliability of the most important information processing software solutions. It consists in comparing the results of processing the same source data by programs that differ in the algorithms used, and in eliminating distortion when the results do not match. Software redundancy is also necessary for the implementation of automatic control and data recovery programs using information redundancy and for the operation of all reliability tools using temporary redundancy.
Means of operational program control are included after the execution of application and service programs. Therefore, software controls usually cannot detect the immediate cause of the distortion of the computational process or data, i.e. primary error, and only the consequences of the primary distortion are fixed, i.e. secondary error. The results of the primary distortion can be catastrophic in the case of late detection and localization.
To ensure reliability, defects must be detected with a minimum delay. In this case, the minimum cost of computer resources. Therefore, hierarchical control schemes are used, in which several methods are sequentially used in order to deepen control and increase costs until the source of distortion is identified. It is advisable to focus resources on the potentially most dangerous defects and fairly frequent recovery modes: with program and data distortions, with performance overloads and parallel execution of programs.


Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Software reliability

Terms: Software reliability