2.3. TYPES OF FAILURES, FEATURES OF THEIR APPEARANCE AND DETECTION

Lecture



2.3.1. MAIN TYPES OF ERRORS AND PRINCIPLE APPROACH TO IT

All faults that for one reason or another occur in the PC or affect its operation are caused by errors that can be classified into the following main types:

- errors in the programs;
- erroneous operator actions;
- errors in the storage and transmission devices;
- equipment errors:

  • errors in the logic equipment,
  • control system errors
  • faults in power and cooling systems.

The detection of errors in programs consists in the detection by the system of detecting violations of formalized actions on the part of the program, which entail errors in the calculations. Such violations include, for example, addressing invalid or prohibited addresses, the appearance of invalid operation codes, etc., i.e., everything that can be formalized and provided for in the detection system to verify these formal requirements. Obviously, such protection is able to identify only elementary errors in the program, because it is difficult to create a fairly simple error detection system in the logic of solving the problem.

Errors of this kind are more easily detected by the programmers themselves or by the operators who run the program in accordance with the instructions than by the machine.

Erroneous operator actions are difficult to predict. The operator can start the wrong program, not to mention other more “minor” errors — press the wrong button, transfer the wrong control, etc. The whole complexity of the question is that the cause of the operator’s errors is not only so much inattention, how much increased fatigue in the work and its internal state.

Recent studies have clearly shown the need for special attention to the problem of increasing the reliability of the human factor in control systems of varying complexity and purpose. The effectiveness of man-machine systems drops sharply as the operator’s ability to cope with the responsibilities assigned to him decreases. The ability of a human operator to perform their functions in a timely and accurate manner for a given time is influenced by many factors, of which perhaps the most essential is the psychophysical characteristics determining his condition. Therefore, the possibility of eliminating errors on the part of the operator is associated both with the creation of optimal conditions for his work and with the formalization of the operator’s actions, which makes it possible to introduce criteria for evaluating these actions. However, the determination of which part of the operator’s activities can be formalized to detect errors remains an unsolved problem.

Errors in the data to be written to memory and stored eliminated error correction scheme before recording or by restoring information in memory after receiving error signals. To this end, the original information is stored for a certain time to allow the correction of the data obtained, distorted as a result of the appearance of an error. In some machines, the information is stored with excess bits, which facilitate the task of correcting it. There are various codes that are used in computer memory devices.

Errors in the transmission of information via communication channels are similar. errors in storage devices. These errors are corrected in the process of transmission (using special correction codes) or information is restored in memory (usually by the method of retransmission of data received with an error).

If errors occur in the logic circuits of the machine, a restart is performed if the error is single. If the error is repeated or persistent, the system is repaired or reconfigured (excluding the faulty unit while preserving the system’s further operation).

If errors occur in the control circuits themselves , then the operator himself must choose a further mode of operation. However, if there is a need to continue the calculations, he must remember that for this period of time the machine will be unprotected.

Malfunctions of control circuits can be of two types: the appearance of a false error signal and the absence of a signal when an error occurs in a controlled circuit.
Faults of the first type are detected in two ways: by stopping the device in case of an error, after which the state of the monitored circuit is analyzed and concluded, or by running a special test program that diagnoses the signals of the control circuit during its operation.

Malfunctions of the second type of control circuitry are more dangerous than the appearance of false error signals. Therefore, control circuits are either periodically checked using a test program or, if one or another control circuit cannot be covered by such a check, they are duplicated.

Faults in power , cooling or mechanical systems machine devices can cause erroneous results like faults in logic circuits. In this case, the machine must be stopped and the malfunction that has arisen eliminated. Faults in power and cooling systems are detected by means of sensors and monitoring instruments. Faults in the mechanical devices of the machine are more difficult to install, so the main guarantee of their performance is the timely conduct of preventive maintenance and maintenance of these devices in a technically sound condition.

2.3.2. BASIC DIRECTIONS FOR TROUBLESHOOTING

Before troubleshooting, you must perform a series of actions that will localize the source of the error.

  1. Turn off the computer and all connected devices. Disable all external devices except keyboard and monitor.
  2. Check the quality of the computer's connection to the network.
  3. Check the correct connection of the keyboard and monitor. Turn on the monitor and set the brightness and contrast controls to the position of 2/3 of the maximum. In some monitors, these parameters are set using the buttons and the on-screen menu. A description of how to configure a monitor can be found in its documentation.
  4. If the computer boots from the hard disk, then check that there is no floppy disk in the drive. You can put into the drive a known working boot floppy or a floppy disk with a diagnostic program.
  5. Turn on the computer. Look at the fans of the power supply, processor and other elements (if they exist); Also note the front panel lights. If the fans do not rotate, and the power indicator does not glow, then the problem is most likely in the power supply or the system board.
  6. Follow the power-on self-test (POST) process. If there are no problems, the system will emit a single beep and start downloading. Non fatal error codes will be displayed on the monitor screen. If fatal errors occur, the system will beep. Cheats and beeps are determined by the BIOS used.
  7. Wait for the successful launch of the operating system.

POST issues

During the power-on self-test, most often errors occur due to incorrect hardware configuration. If a POST error occurs, check the following:

  1. Are all cables connected correctly?
  2. Are the device parameters configured correctly in the BIOS?
  3. Are all devices properly installed?
  4. Are the switches and jumpers set correctly?
  5. Is there a device conflict, i.e. whether they use the same system resources.
  6. Is the 110/220 V voltage switch installed correctly on the power supply?
  7. Are all the boards installed correctly?
  8. Is the keyboard connected?
  9. Is a bootable hard disk installed?
  10. Does the BIOS support installed devices?
  11. Is a boot diskette in disk drive?
  12. Are the memory modules installed correctly?
  13. Is the operating system installed?

Hardware issues after boot

Sometimes problems arise after booting the system, and without changing the hardware and software. To eliminate such errors, perform a series of actions.

  1. Reinstall the software that leads to errors.
  2. Reset BIOS settings.
  3. Check cables, connectors, and other items that may be accidentally removed from connectors.
  4. Check with the help of measuring tools power the computer. An unstable power supply can cause unexpected restarts, monitor flickering, or a complete freeze.
  5. Check the quality of the installation of memory modules.

Software problems

Software (especially the latest) can cause errors. Most often this is due to the incompatibility of software and hardware.

      1. Does the system meet the minimum requirements of the software? The answer to this question can be found in the documentation attached to the program.
      2. Check the correctness of the installation program. Reinstall it if necessary.
      3. Check if the latest device drivers are installed.
      4. Check your system for viruses using the most advanced antivirus software.

Adapter issues

Problems with adapters most often occur due to improper installation or allocation of resources (interruption, direct memory access channel and I / O addresses). Also, do not forget to install the latest driver for this adapter, which is known to the operating system.


Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Diagnostics, maintenance and repair of electronic and radio equipment

Terms: Diagnostics, maintenance and repair of electronic and radio equipment