3. Ensuring the reliability of complex software tools.

Lecture

3.1. Ensuring quality and reliability in the development of complex software tools

3.1.1. Requirements for technology and automation of complex software development

The content of the stages and works during the creation of components and software systems as a whole is defined in standards and models of the software life cycle. The required reliability of objects is formed and ensured during the implementation of each stage and is finally confirmed by tests and documents upon their completion. To ensure the quality and reliability of software standards it is recommended to formulate the requirements:

to the object of development at this stage - to its software and information components, as well as to the interface between them and the external environment;

to the process, technology and organization of work at each stage;

to the methods and characteristics of work automation;

to methods and means of control, measurement and documentation of the quality of processes and the results of work performed.

Compliance with these requirements should be monitored at each stage. Quality indicators are recorded and compared with the given. Upon detection of deviations from the requirements, measures should be taken either to improve the real indicators or to adjust the requirements for the indicators.

The requirements for the automation tools for developing complex PSs are most fully set forth in the IEEE 1209 standard. This standard formulates the requirements

- to the technological environment and CASE-development tools,
- to project management tools of complex PS,
- to the means of supporting technological and operational documentation for a set of programs,
- to the analysis of the correctness and reliability of software components included in PS.

3.1.2. Planning and managing program quality and reliability assurance

Measures that ensure the reliability and safety of programs should cover the entire software life cycle. To implement measures for planning and quality and reliability management, specialists of 2 categories are needed. Specialists of the first category, managing software quality assurance, should know the standards and methods that support registration, control, documentation and management of the impact on quality indicators at all stages of creating programs. They should ensure the identification of all deviations from the specified indicators of the quality of objects and analyze the possible consequences of the identified deviations.

Specialists of the second category directly create components and software as a whole with specified indicators of quality and reliability and form all the source and reporting documents.

This division of specialists provides independent and reliable quality control of development results. In order for the monitoring of the characteristics of the PS at intermediate stages to be purposeful, reference data are necessary, and the developers should strive to achieve them. The quality indicators of software are consistently refined and corrected in the process of interaction between the customer and the developer, taking into account objectively changing characteristics of the project.

The organized basis for the quality management of PS is a plan to ensure the specified quality indicators at all stages of the development of a set of programs. The quality assurance planning manual of the PS is the ANSI / IEEE 983 standard. According to this standard, in terms of ensuring and managing the quality and reliability of a set of programs, the following indicators should be reflected:

quality management objectives, a nomenclature of quality indicators and requirements for their values;

management methods and achievement of specified values of quality, organization of developers and technology of creating software;

basic documents and standards used to ensure quality at all stages of development;

development automation tools to achieve and measure specified values of quality indicators.

the structure and content of reporting documents certifying the achievement of a certain quality and reliability of the PS.

Having a quality assurance plan for the PS does not guarantee the achievement of the specified characteristics. Restrictions on resources used in the development process, changes in the external environment and customer requirements lead to deviations from the plan. These deviations should be reflected in a special document and communicated to all specialists.

3.1.3. Resources needed to ensure the reliability of the software

Resources for the development of PS are allocated for various types of resources:

allowable financial and economic costs;

professional staff;

computing resources.

The values of available resources are the criteria that influence the choice of development methods, the achieved quality and reliability of PS. Distinguish between the resources required for the direct solution of the main functional tasks of the PS, and the resources required to ensure the reliability and safety of the functioning of the PS. The ratio between these types of resources depends on the complexity of the tasks and requirements for the reliability and security of the entire information system. In various PSs, resources for ensuring reliability can range from 5–20% to 100–300% of the resources used to solve functional problems.

3.2. Types of testing to ensure the reliability of software

Systematization of types of testing and their orderly conduct can provide significant assistance in improving the reliability of complex software. Different types of testing are focused on the differential identification of certain classes of errors. For each type of testing, a methodology for its implementation is developed with indication of monitored parameters and expected results. A rational sequence of testing complex software systems in real time can be represented by the following types of testing:

Testing the completeness of solving functional problems with typical input data. This method is designed to detect defects in operation in normal conditions, defined by the technical specifications for the basic version of PS.

Testing the functioning of programs in critical situations under the conditions and logic of problem solving. It is conducted when testing the execution of programs in emergency situations that are rarely implemented, but are important for the reliability of data processing systems.

Testing to measure the achieved value of the reliability of the basic versions of PS. This type of testing is intended to determine the main indicators of reliability in the actual functioning of programs.

Testing the correctness of memory usage and performance of the computing system. This type of testing is used to assess the reliability of program execution during memory overloads and their performance.

Testing of parallel execution of programs is used to detect reductions in reliability due to inconsistent use of source and intermediate data, as well as computing system devices during parallel operation of programs.

Testing the effectiveness of protection against the distortion of the original data is used to identify defects and errors in programs that manifest themselves in case of false or distorted data.

Testing to assess the effectiveness of protection against hardware failures and undetected defects and errors of programs and data is used to check the quality of software monitoring and prompt restoration of various unintended distortions in the functioning of the PS.

Testing the convenience of operation and human interaction with the PS is designed to detect difficult to formalize display errors and use the source and resulting data. During testing, the volume, convenience of presentation and control of source data entered by a person, as well as the displayed result data, ease of analysis and reliability of use are estimated.

Testing of the basic versions of the PS during the transfer and configuration of the equipment is used to detect errors that occur when changing the composition or characteristics of the components of the computer system or the external environment.

3.3. Certification to ensure the reliability of software

Certification is a procedure for confirming the compliance of a product with specified requirements. The specification is performed to protect the interests of the user. The main purpose of software certification is a guarantee of their high quality, reliability and safety of use.

The international standard ISO / IEEE 0002 defines conformity certification as a 3rd party action, proving that it provides the necessary assurance that a product, process or service conforms to a particular standard or other regulatory document. As a result of this action, a certificate of conformity is issued. The validity of such a certificate is usually limited either in time or before a significant modification of the product.

Certification can be voluntary and mandatory. Mandatory certification is necessary for the PS, performing particularly responsible functions in which errors or failures can cause great damage or are dangerous to human life and health. Voluntary certification is used to certify the quality of PS with the aim of increasing their competitiveness, expanding the scope of use and obtaining additional economic benefits (large circulation of products during production, a longer life cycle with many versions, tax cuts for high quality, increased profits).

The following initial data should be prepared for certified tests:

criteria and clearly defined values of quality indicators to be achieved;

the values of the source data and results, within which the specified quality indicators must be satisfied;

standards, regulatory documents and methods for accurate measurement of quality indicators, as well as the composition and significance of the source and resulting data are required for certification.

The organizational structure of the certification system usually includes: the state certification body, departmental bodies for managing certification of products of certain classes and purposes, as well as testing laboratories. The main functions of the state certification body are organization, coordination, scientific and methodological, informational, regulatory and technical support of testing and certification, as well as accreditation of laboratories for certification testing in accordance with the powers of the certification body. The departmental certification bodies perform the same functions in a limited volume for specific product classes. To conduct certification tests of PS, specialized certification laboratories are created. Such laboratories independent of developers and customers may have the status of international, state, departmental or corporate.

3.4. Improving the reliability of the software due to redundancy

In real-time software systems, to ensure high reliability of their operation, it is necessary to detect distortions as quickly as possible and restore normal operation.

In complex software systems, errors are inevitable; therefore, regular automated verification of the process of program execution and data integrity is necessary. Developers are required to create reliable programs that are resistant to various negative perturbations and are able to maintain a sufficient quality of results in actual operating conditions. The causes of the distortions are unpredictable and diverse, so it is not necessary to immediately establish these causes. The main task is to restore the normal functioning as quickly as possible and limit the consequences of defects. In order to ensure high reliability of PS operation, computational resources are required for the fastest possible detection of the manifestation of defects and automated measures that ensure the rapid restoration of normal PS operation. For these purposes, the following types of redundancy are used:

temporary redundancy

information redundancy

software redundancy.

Temporary redundancy consists in using some part of the computer's performance to control the execution of programs and restore (restart) the computational process. To this end, when designing an information system, a performance margin should be provided for, which will then be used to monitor and quickly increase the reliability of operation. The magnitude of the temporal redundancy depends on the requirements for reliability and ranges from 5-10% of computer performance to 3-4 times the duplication of individual machine performance in multiprocessor computing systems. Temporary redundancy or time reserve is used to monitor and detect distortions, to diagnose them and make decisions to restore the computational process or information, as well as to implement recovery operations.

Information redundancy is the duplication of the accumulated source and intermediate data processed by programs. Redundancy is used to preserve the reliability of the data that most affect the normal functioning of the PS and take considerable time to recover. They are protected by a 2-3 fold duplication with periodic updates.

Software redundancy is used to control and ensure the reliability of the most important information processing solutions. It consists in comparing the results of processing the same source data by programs that differ in the algorithms used, and in eliminating distortion when the results do not match. Software redundancy is also necessary for the implementation of automatic control and data recovery programs using information redundancy and for the operation of all reliability tools using temporary redundancy.

The means of operational program control are turned on after using application and service programs, therefore program control tools usually cannot directly detect the cause of the distortion of the computational process or data and only fixes the consequences of the primary distortion, i.e. secondary error. The results of primary distortion can become catastrophic in the event of delay in their detection and localization. To ensure reliability, defects must be detected with minimal delay, while minimal hardware resources are desirable, therefore hierarchical control schemes are used, in which several methods are used consistently to increase control and increase costs until the source of distortion is detected. It is advisable to focus resources on the potentially most dangerous defects and quite frequent recovery modes: with program and data distortions, with performance overloads and with parallel use of programs.

Comments

To leave a comment

If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.

To reply

Comment

To confirm that you are not a bot, answer:

Name

Email(not published)

Vote