Race condition in multi-threaded applications

Lecture



Race condition (eng. Race condition ) - an error in the design of a multi-threaded system or application in which the operation of a system or application depends on the order in which the parts of the code are executed. The error got its name from a similar design error of electronic circuits (see Signal Racing ).

The race condition is a “floating” error (heisenbag), which appears at random times and “disappears” when you try to localize it.

Example

Consider sample code (in Java).

  volatile int x;
  // Stream 1:
 while (! stop)
 {
   x ++;
   ...
 }
  // Stream 2:
 while (! stop)
 {
   if (x% 2 == 0)
     System.out.println ("x =" + x);
   ...
 }

Let x = 0. Suppose the program runs in this order:

  1. The if statement in stream 2 checks for parity x .
  2. The “ x ++ ” operator in stream 1 increments x by one.
  3. The output operator in stream 2 outputs “ x = 1 ”, although it would seem that the variable was checked for parity.

Solutions

Local copy

The easiest solution is to copy the variable x to a local variable. Here is the corrected code:

  // Stream 2:
 while (! stop)
 {
   int cached_x = x;
   if (cached_x% 2 == 0)
     System.out.println ("x =" + cached_x);
   ...
 }

Naturally, this method works only when the variable is one and the copying is done in one machine command.

Synchronization

A more complicated, but also a more universal solution method is thread synchronization, namely:

  int x;
  // Stream 1:
 while (! stop)
 {
   synchronized (SomeObject)
   {
     x ++;
   }
   ...
 }
  // Stream 2:
 while (! stop)
 {
   synchronized (SomeObject)
   {
     if (x% 2 == 0)
       System.out.println ("x =" + x);
   }
   ...
 }

Here the semantics of happens before does not require the keyword volatile .

Combined method

Suppose there are two variables (and the volatile keyword does not work), and in the second thread, instead of System.out.println there is more complex processing. In this case, both methods are unsatisfactory: the first is because one variable can change while the other is being copied; the second is because too much code is synchronized.

These methods can be combined by copying “dangerous” variables in a synchronized block. On the one hand, this will remove the restriction on one machine command, on the other - it will allow to get rid of too large sync blocks.

  volatile int x1, x2;
  // Stream 1:
 while (! stop)
 {
   synchronized (SomeObject)
   {
     x1 ++;
     x2 ++;
   }
   ...
 }
  // Stream 2:
 while (! stop)
 {
   int cached_x1, cached_x2;
   synchronized (SomeObject)
   {
     cached_x1 = x1;
     cached_x2 = x2;
   }
   if ((cached_x1 + cached_x2)% 100 == 0)
     DoSomethingComplicated (cached_x1, cached_x2);
   ...
 }

There are no obvious ways to detect and correct race conditions. The best way to get rid of the race - the correct design of a multitasking system.

The case of Therac-25

Therac-25 was the first medical device in the United States to provide security issues solely to software. This unit worked in three modes:

  1. Electronic Therapy: The electron gun directly irradiates the patient; the computer sets the electron energy from 5 to 25 MeV.
  2. X-ray therapy: an electron gun irradiates a tungsten target, and the patient is irradiated with X-rays passing through a cone-shaped scatterer. In this mode, the electron energy is constant: 25 MeV.
  3. In the third mode, there was no radiation. On the way of electrons (in case of an accident) there is a steel reflector, and the radiation is simulated by light. This mode is used to accurately guide the beam to the sore spot.

These three modes were set by a rotating disk, in which there was a hole with deflecting magnets for electronic therapy, and a target with an X-ray diffuser. Due to the race condition between the control program and the keyboard handler, it sometimes happened that in the X-ray therapy mode the disc was in the “Electronic Therapy” position, and the patient was directly irradiated with a 25 MeV electron beam, which led to overexposure. In this case, the sensors removed the "zero dose", so the operator could repeat the procedure, aggravating the situation. As a result, at least four patients died.

Some of the code was taken from Therac-6 and Therac-20. At the same time, there was no X-ray therapy in Therac-6, and there were hardware security measures in Therac-20 that prevented radiation from being turned on when the disk was in the wrong position.

Hacking by exploiting race conditions

There is a class of errors (and the types of attacks that exploit them) that allow an unprivileged program to influence the operation of other programs through the ability to change publicly available resources (usually temporary files; English / tmp race is a race condition in the temporary directory) the file by the programmer's error is available for recording to all or part of the users of the system.

An attacking program can destroy the contents of the file, causing the victim program to crash, or by replacing the data, forcing the program to perform some action at the level of its privileges.

It is for this reason that software with serious security requirements, such as a web browser, uses random numbers of cryptographic quality to name temporary files.

see also

  • Therac-25
  • Semaphore (informatics)
  • Mutex
  • Interlock
  • ABA problem

Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Operating Systems and System Programming

Terms: Operating Systems and System Programming