Threads, Processes, Contexts / Thread Classification

Lecture

Consider the various streaming models that are implemented in modern operating systems (preemptive, cooperative threads). We will also take a brief look at how threads and synchronization tools are implemented in the Win32 API and Posix Threads. Although scripting languages are more popular on Habré, however everyone should know the basics;)

Threads, processes, contexts

System call (syscall). This concept, you will meet quite often in this article, but despite the power of the sound, its definition is quite simple :) The system call is the process of calling the kernel function from the user application. Kernel mode - code that runs in the processor's zero protection ring (ring0) with maximum privileges. User Mode - code executed in the third processor protection ring (ring3) has lower privileges. If the code in ring3 will use one of the prohibited instructions (for example, rdmsr / wrmsr, in / out, attempt to read the cr3 register, cr4, etc.), a hardware exception will occur and the user process whose code was executed by the processor in most cases will be interrupted . A system call performs a transition from kernel mode to user mode by calling the syscall / sysenter instruction, int2eh to Win2k, int80h to Linux, etc.

And so, what is the flow? A thread is the essence of the operating system, the process of executing a set of instructions on a processor, more precisely, of program code. Threads general purpose - parallel execution of two or more different tasks on a processor. As you can guess, the threads were the first step towards a multitasking OS. The OS scheduler, guided by the thread priority, distributes time slices between different threads and sets the threads to execute.

Along with the flow, there is also such an entity as a process. Process (process) - nothing more than a kind of abstraction that encapsulates all the resources of the process (open files, files mapped in memory ...) and their descriptors, streams, etc. Each process has at least one thread. Each process also has its own virtual address space and execution context, and the threads of one process share the process address space.

Every thread, like every process, has its own context. A context is a structure in which the following elements are stored:

CPU registers.
Pointer to a thread / process stack.

It should also be noted that in the case of the execution of a system call by a thread and a transition from user mode to kernel mode, the stack stack changes to the kernel stack. When switching the flow of one process to another, the OS updates some processor registers that are responsible for virtual memory mechanisms (for example, CR3), since different processes have different virtual address space. Here I do not specifically touch upon aspects regarding the kernel mode, since such things are specific to a single OS.

In general, the following recommendations are true:

If your task requires intensive parallelization, use threads from a single process instead of multiple processes. This is because the switching of the process context is much slower than the context of the thread.
When using a thread, try not to abuse the synchronization tools that require kernel system calls (for example, mutexes). Switching to kernels is a costly operation!
If you are writing code executed in ring0 (for example, a driver), try to avoid using additional threads, since changing the context of a thread is an expensive operation.

Fiber (fiber) - lightweight stream running in user mode. The fiber will require significantly fewer resources, and in some cases it allows minimizing the number of system calls and, consequently, increasing productivity. Usually, the fibers are executed in the context of the thread that created them and will only require saving the processor registers when switching them. Somehow, but the fibers did not find their popularity. They were implemented at one time in a variety of BSD OS, but over time they were thrown out. The Win32 API also implements the fiber mechanism, but it is used only to facilitate porting software written for another OS. It should be noted that either the process level scheduler is responsible for switching the fibers, or the switching must be implemented in the application itself, in other words manually :)

Thread classification

Since the classification of flows is an ambiguous question, I propose to classify them in the following way:

By mapping to the core: 1: 1, N: M, N: 1
By multitasking model: preemptive multitasking (preemptive multitasking), cooperative multitasking (cooperative multitasking).
By implementation level: kernel mode, user mode, hybrid implementation.

Thread classification by mapping to kernel mode

As I mentioned, threads can be created not only in kernel mode, but also in user mode. There can be several thread schedulers in the OS:

The central OS scheduler of the kernel mode, which distributes the time between any thread in the system.
Thread Library Scheduler. The user mode thread library can have its own scheduler, which distributes time between threads of different user mode processes.
Process Flow Scheduler. The fibers already considered by us are put on execution in exactly this way. For example, every Mac OS X process written with the Carbon library has a Thread Manager.

So. Model 1: 1 is the simplest model. According to its principles, any thread created in any process is controlled directly by the OS kernel scheduler. Those. we have a mapping of 1 to 1 user process flow to kernel flow. This model has been implemented in Linux since the 2.6 kernel, as well as Windows.

The N: M model maps some number of N user process threads to M kernel mode threads. Simply put, we have some kind of hybrid system, when some threads are put to run in the OS scheduler, and most of them are in the process flow planner or thread library. As an example, GNU Portable Threads. This model is rather difficult to implement, but it has greater performance, since a significant number of system calls can be avoided.

Model N: 1 . As you probably guessed, a lot of user process threads are mapped to one thread of the OS kernel. For example fiber.

Thread classification by multitasking model

In the days of DOS, when single-tasking OSs ceased to satisfy the consumer, programmers and architects conceived of implementing a multi-tasking OS. The simplest solution was the following: take the total number of threads, determine some minimum interval for executing one thread, and take and divide between all the brat-threads and the execution time equally. So the concept of cooperative multitasking (cooperative multitasking), i.e. all threads are executed alternately, with equal execution time. No other thread can displace the currently executing thread. This very simple and obvious approach has found its application in all versions of Mac OS up to Mac OS X, also in Windows up to Windows 95, and Windows NT. Until now, cooperative multitasking has been used in Win32 to run 16 bit applications. Also for compatibility, cooperative multitasking is used by the stream manager in Carbon applications for Mac OS X.

However, cooperative multitasking over time showed its inconsistency. The volumes of data stored on the hard drives grew, the speed of data transmission in networks also increased. It became clear that some of them should have a higher priority, such as device interrupt service flows, processing synchronous IO operations, etc. At this time, each thread and process in the system acquired such a property as priority. Read more about the priorities of threads and processes in the Win32 API you can read in the book of Jeffrey Richter, we will not stop at this;) Thus, a stream with a higher priority can force out a stream with a smaller one. Such a principle formed the basis of preemptive multitasking ( preemptive multitasking ). Now all modern operating systems use this approach, with the exception of the implementation of fibers in user mode.

Flow classification by implementation level

As we have already discussed, the implementation of the flow scheduler can be carried out at different levels. So:

Implementing threads at the kernel level . Simply put, this is a classic 1: 1 model. This category includes:
- Win32 threads.
- Linux Posix Threads Implementation - Native Posix Threads Library (NPTL). The fact is that before the 2.6 kernel version, pthreads in Linux was fully implemented in user mode (LinuxThreads). LinuxThreads implemented the 1: 1 model as follows: when creating a new thread, the library made a clone system call, and created a new process that nonetheless shared a single address space with the parent. This gave rise to many problems, for example, threads had different process identifiers, which contradicted some aspects of the Posix standard that concern the scheduler, signals, and synchronization primitives. Also, the model of crowding out threads, worked in many cases with errors, for this reason it was decided to put pthread support on the shoulders of the kernel. Immediately two developments were carried out in this direction by IBM and Red Hat. However, IBM’s implementation has not gained its popularity, and has not been included in any of the distributions, because IBM has suspended further library development and support (NGPT). Later, NPTL entered the glibc library.
- Lightweight nuclear threads (Leight Weight Kernel Threads - LWKT), for example in DragonFlyBSD. The difference of these streams from other streams of the kernel mode is that lightweight nuclear streams can force out other nuclear streams. DragonFlyBSD has many nuclear threads, such as a hardware interrupt service flow, a software interrupt service flow, and so on. All of them work with a fixed priority, so LWKT can displace these threads (preempt). Of course, these are already more specific things about which one can talk endlessly, but I will give two more examples. In Windows, all kernel threads are executed either in the context of the thread that initiated the system call / IO operation, or in the context of the system process flow. In Mac OS X, there is an even more interesting system. In the core, there is only the notion of task, i.e. tasks. All kernel operations are performed in the context of kernel_task. Processing a hardware interrupt, for example, occurs in the context of a driver thread that serves this interrupt.
The implementation of threads in user mode . Since the system call and the context change are quite heavy operations, the idea to implement support for threads in user mode has been in the air for a long time. Many attempts have been made, but this technique has not gained popularity:
- GNU Portable Threads - Posix Threads implementation in user mode. The main advantage is the high portability of this library, in other words, it can be easily transferred to other operating systems. The problem of embedding streams in this library was solved very simply - streams in it are not crowded out :) And of course, there can be no talk about any multi-processor. This library implements the N: 1 model.
- Carbon Threads, which I mentioned more than once, and RealBasic Threads.
Hybrid implementation . An attempt to use all the advantages of the first and second approaches, but as a rule, such mutants have much greater disadvantages than advantages. One example: Posix Threads implementation on NetBSD using the N: M model, which was later replaced with a 1: 1 system. For more details, see the Scheduler Activations: Effective Kernel Support for User-Level Management of Parallelism.

Win32 API Threads

If you are still not tired, I offer a small overview of the API for working with threads and synchronization tools in the win32 API. If you are already familiar with the material, feel free to skip this section;)

Threads in Win32 are created using the CreateThread function, where a pointer to a function is passed (let's call it a thread function), which will be executed in the thread created. A thread is considered complete when the thread function is executed. If you want to ensure that the thread is complete, you can use the TerminateThread function, but do not abuse it! This function "kills" the stream, and does not always do it correctly. The ExitThread function will be called implicitly when the thread function ends, or you can call this function yourself. Its main task is to free the stack of the thread and its handle, i.e. kernel structures that serve this thread.

Stream in Win32 can be in sleep state (suspend). You can “put a stream to sleep” by calling the SuspendThread function, and wake it up by calling ResumeThread, you can also put the flow into a sleep state when creating it by setting the CreateSread function's CreateSread parameter. It is not surprising if you do not see such functionality in cross-platform libraries, such as boost :: threads and QT. It's very simple, pthreads just doesn't support this functionality.

Means of synchronization in Win32 are of two types: implemented at the user level, and at the kernel level. The first ones are critical sections ( critical section ), the second set includes mutexes ( mutex ), events ( event ) and semaphores ( semaphore ).

The critical sections are a lightweight synchronization mechanism that works at the user process level and does not use heavy system calls. It is based on the mechanism of mutual locks or spin locks ( spin lock ). A thread that wishes to protect certain data from race conditions calls the EnterCliticalSection / TryEnterCriticalSection function. If the critical section is free, the thread takes it; if not, the thread is blocked (that is, it does not execute and does not eat off the processor time) until the section is released by another thread by calling the LeaveCriticalSection function. These functions are atomic, i.e. You can not worry about the integrity of your data;)

Much has been said about mutexes, events and semaphores, so I’ll not stop at them in detail. It should be noted that all these mechanisms have common features:

They use kernel primitives when executed, i.e. system calls that affect non-performance.
They can be named or unnamed, i.e. each such synchronization object can be assigned a name.
They work at the system level, and not at the process level, i.e. can serve as an interprocess communication mechanism (IPC).
A single function is used to wait and capture the primitive: WaitForSingleObject / WaitForMultipleObjects.

Posix Threads or pthreads

It is difficult to imagine which of * nix of similar operating systems does not implement this standard. It should be noted that pthreads are also used in various real-time operating systems (RTOS), therefore the requirement for this library (or rather the standard) is stricter. For example, the pthread stream cannot sleep. There are also no events in pthread, but there is a much more powerful mechanism - conditional variables, which more than covers all the necessary needs.

Let's talk about the differences. For example, a thread in pthreads can be canceled (cancel), i.e. just removed from the execution via the pthread_cancel system call while waiting for a mutex or condition variable to be released, at the time of the pthread_join call (the calling thread is blocked until the thread for which the function was called) stops and so on. d. There are separate calls for working with mutexes and semaphores, such as pthread_mutex_lock / pthread_mutex_unlock, etc.

Conditional variables (cv) are commonly used in pairs with mutexes in more complex cases. If the mutex simply blocks the thread, until another thread releases it, then cv creates conditions where the thread can block itself until any unblocking condition occurs. For example, the cv mechanism helps to emulate events in the pthreads environment. So, the pthread_cond_wait system call waits until the thread has been notified that a specific event has occurred. pthread_cond_signal notifies one thread from the queue that the cv worked. pthread_cond_broadcast notifies all threads that caused pthread_cond_wait to trigger cv.

What are threads?

In order to structure your understanding of what threads are (this word is translated into Russian as “threads” almost everywhere except books on the Win32 API, where it is translated as “threads”) and how they differ from processes, you can use the following two definitions:

Thread is a virtual processor having its own set of registers, similar to the registers of a real central processor. One of the most important registers for a virtual processor, like for a real one, is an individual pointer to the current instruction (for example, an individual EIP register on x86 processors),
The process is primarily an address space . In modern architecture, created by the OS kernel through the manipulation of page tables. And secondarily, the process should be looked at as an anchor point for “resources” in the OS. If we analyze such an aspect as multitasking in order to understand the essence of threads, then at this moment we don’t need to think about OS “resources” such as files and what they are tied to.

Очень важно понять, что thread – это концептуально именно виртуальный процессор и когда мы пишем реализацию threads в ядре ОС или в user-level библиотеке, то мы решаем именно задачу «размножения» центрального процессора во многих виртуальных экземплярах, которые логически или даже физически (на SMP, SMT и multi-core CPU платформах) работают параллельно друг с другом.
На основном, концептуальном уровне, нет никакого «контекста». Контекст – это просто название той структуры данных, в которую ядро ОС или наша библиотека (реализующая threads) сохраняет регистры виртуального процессора , когда она переключается между ними, эмулируя их параллельную работу. Переключение контекстов – это способ реализации threads , а не более фундаментальное понятие, через которое нужно определять thread.
При подходе к определению понятия thread через анализ API конкретных ОС обычно вводят слишком много сущностей – тут тебе и процессы, и адресные пространства, и контексты, и переключения этих контекстов, и прерывания от таймера, и кванты времени с приоритетами, и даже «ресурсы», привязанные к процессам (в противовес threads). И все это сплетено в один клубок и зачастую мы видим, что идем по кругу, читая определения. Увы, это распространенный способ объяснять суть threads в книгах, но такой подход сильно путает начинающих программистов и привязывает их понимание к конкретике реализации.
Понятное дело, что все эти термины имеют право на существование и возникли не случайно, за каждым из них стоит какая-то важная сущность. Но среди них нужно выделить главные и второстепенные (введенные для реализации главных сущностей или навешанные на них сверху, уже на следующих уровнях абстракции).
Главная идея thread – это виртуализация регистров центрального процессора – эмуляция на одном физическом процессоре нескольких логических процессоров, каждый из которых имеет свое собственное состояние регистров (включая указатель команд) и работает параллельно с остальными.
Главное свойство процесса в контексте этого разговора – наличие у него своих собственных страничных таблиц, образующих его индивидуальное address space . The process is not in itself executable.
We can say in the definition that “every process in the system always has at least one thread”. Otherwise, address space is logically devoid of sense for the user if it is not visible at least to one virtual processor (thread). Therefore, it is logical that all modern OSs destroy the address space (complete the process) when the last thread working on this address space is completed. And you can not say in the definition of the process that it has "at least one thread." Moreover, at the lower system level a process (as a rule) can exist as an OS object even without having threads in it.
Если Вы посмотрите исходники, например, ядра Windows, то Вы увидите, что адресное пространство и прочие структуры процесса конструируются до создания в нем начальной нити (начальной thread для этого процесса). По сути, изначально в процессе не существует threads вообще. В Windows можно даже создать thread в чужом адресном пространстве через user-level API…
Если смотреть на thread как на виртуальный процессор – то его привязка к адресному пространству представляет собой загрузку в виртуальный регистр базы станичных таблиц нужного значения. :) Тем более, что на нижнем уровне именно это и происходит – каждый раз при переключении на thread, связанную с другим процессом, ядро ОС перезагружает регистр указателя на страничные таблицы (на тех процессорах, которые не поддерживают на аппаратном уровне работу со многими пространствами одновременно).

Comments

To leave a comment

If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.

To reply

Comment

To confirm that you are not a bot, answer:

Name

Email(not published)

Vote