What do threads mean in a processor? What is a process? What is a stream


Good afternoon. Today I would like to look at what threads are in a processor. The very ones whose functions and capabilities most people have no idea about, but love to show off to others.

The purpose of the function is that 1 core can simultaneously process multiple data streams. While the first thread is idle and the second is doing calculations, running application can use the vacant logical power for his own purposes. As a result, interruptions occur much less frequently, and you do not feel slowdowns or other inconveniences while working.

The disadvantage of the technology is as follows:

  • both threads access the same level 2 and 3 cache;
  • heavy computing processes can cause conflict in the system.

To put it very roughly, all the bricks from one place to another can be moved in one hand (1 flow), or in two (2 flows), but there is only one person (1 core) and gets tired equally under any conditions, although his productivity is actually doubles. In other words, we rely on the performance of the CPU, and more specifically, its frequency.

In this article we will talk about topics such as processes and threads, process descriptors, let's talk about synchronization of flows and let's touch on everyone's favorite windows task manager.

Throughout existence process its execution can be interrupted and continued many times. To resume execution process, it is necessary to restore its condition operating environment. The state of the operating environment is displayed by the state of registers and the program counter, the processor operating mode, and pointers to open files, information about unfinished I/O operations, error codes of system calls performed by this process, etc. This information is called process context.

In order for the OS to manage processes, it must have all the information necessary for this. For this purpose, each process starts process handle.

Descriptor – a special information structure that is created for each process (task descriptor, task control block).

In general, the descriptor contains the following information:

  1. Process ID.
  2. A process type (or class) that defines some resource provisioning rules for the supervisor.
  3. Process priority.
  4. A state variable that determines what state the process is in (ready to run, running, waiting for an I/O device, etc.)
  5. A protected memory area (or the address of such a zone) in which the current values ​​of processor registers are stored if the process is interrupted without completing its work. This information is called task context.
  6. Information about resources that the process owns and/or has the right to use (pointers to open files, information about pending I/O operations, etc.).
  7. A place (or its address) for organizing communication with other processes.
  8. Startup time parameters (the point in time when the process should be activated and the frequency of this procedure).
  9. If there is no file management system, the address of the task on the disk in its initial state and the address on the disk where it is unloaded from random access memory, if it is replaced by another.

Process handle compared to the context contains more operational information, which should be easily accessible to the process scheduling subsystem. The process context contains less relevant information and is used by the operating system only after a decision has been made to resume the interrupted process.

Descriptors, as a rule, are permanently located in RAM in order to speed up the work of the supervisor, who organizes them into lists (queues) and displays changes in the state of the process by moving the corresponding descriptor from one list to another.

For each state (except for the run state for a single-processor system), the OS maintains a corresponding list of tasks that are in that state. However, for a wait state there can be not one list, but as many as various types resources may cause a wait state.

For example, there can be as many waiting states for an I/O operation to complete as there are I/O devices in the system.

Processes and Threads

To support multiprogramming, the OS must define and design for itself those internal units of work between which the processor and other computer resources will be divided. Currently, most operating systems define two types of units of work:

  • Process(larger unit of work).
  • Flow(thread or thread) is a smaller unit of work that a process requires to complete.
  • When they talk about processes, then they want to note that the OS supports their isolation: each process has its own virtual address space, each process is assigned its own resources - files, windows, etc. Such isolation is needed in order to protect one process from another, since they, sharing all the resources of the computing system, they compete with each other.

In general processes are simply not related to each other in any way and may even belong to different users sharing the same computer system. In other words, in the case of processes, the OS considers them completely unrelated and independent. In this case, it is the OS that is responsible for competition between processes for resources.

To increase the speed of processes, it is possible to use internal parallelism in the processes themselves. processes.

For example, some operations performed by an application may require quite a lot of CPU usage to complete. In this case, when interactive work with the application, the user is forced to wait a long time for the completion of the ordered operation and cannot control the application until the operation is completed to the very end. Such situations occur quite often, for example, when processing large images in graphic editors. If software modules that perform such lengthy operations are designed as independent “subprocesses” ( streams), which will be executed in parallel with other “subprocesses”, then the user has the opportunity to perform several operations in parallel within one application (process).

The following differences can be distinguished threads from processes:

  • The OS for threads should not organize a full-fledged virtual machine.
  • Threads do not have their own resources, they develop in the same virtual address space, can use the same files, virtual devices and other resources similar to this process.
  • The only thing threads need to have is CPU resource. On a single-processor system, threads share CPU time among themselves in the same way as normal processes do, but on multiprocessor system can be executed simultaneously if they do not encounter competition due to access to other resources.

The main thing that ensures multithreading, is the ability to simultaneously perform several types of operations in one application program. How is it implemented? efficient use CPU resources, and the total execution time of tasks becomes less.

For example, if a spreadsheet or word processor were designed with multi-threading capabilities in mind, the user can request a recalculation of their worksheet or a merge of multiple documents and simultaneously continue to fill out a table or open the next document for editing.

WINDOWS Task Manager

Task Manager displays information about programs and processes executing on a computer. You can also view the most commonly used process performance metrics there.

Task Manager serves to display key indicators computer speed. For running programs, you can view their status and terminate programs that have stopped responding. You can view the activity of running processes using up to 15 parameters, as well as graphs and information about CPU and memory usage.

Additionally, if you are connected to a network, you can view the network status and performance parameters. If multiple users are connected to your computer, you can see their names, what tasks they are performing, and send them a message.

On the tab Processes displays information about the processes running on the computer: information about CPU and memory usage, process counter and some other parameters:

On the Performance tab, information about the descriptor counter and threads, memory parameters is displayed:

Need in thread synchronization occurs only in a multiprogram OS and is associated with the joint use of hardware and information resources computer. Synchronization is necessary to avoid races (see below) and deadlocks when exchanging data between threads, sharing data, and when accessing the processor and I/O devices.

Thread synchronization and processes consists in coordinating their speeds by suspending the flow until the occurrence of a certain event and its subsequent activation upon the occurrence of this event.

Neglecting synchronization issues in a multi-threaded system can lead to wrong decision tasks or even system crash.

Example . The task of maintaining a customer database for a certain enterprise.

Each client is assigned a separate record in the database, which contains the Order and Payment fields. The program that maintains the database is designed as a single process with several threads, including:

  • Thread A, which enters information about orders received from customers into the database.
  • Thread B, which records in the database information about customer payments for invoices.

Both of these threads work together to shared file databases using the same type of algorithms:

  1. Read a record with a given identifier from a database file into a buffer on the client.
  2. Enter a new value in the Order (for flow A) or payment (for flow B) field.
  3. Return the modified record to the database file.

Let's denote steps 1-3 for flow A as A1-A3, and for flow B as B1-B3. Let's assume that at some point thread A updates the Order field of a record about customer N. To do this, it reads this record into its buffer (step A1), modifies the value of the Order field (step A2), but does not have time to add the record to the database, since its execution is interrupted, for example, due to the expiration of a time slice.

Let's assume that thread B also needed to enter payment information regarding the same client N. When thread B's turn arrives, it manages to read the entry into its buffer (step B1) and update the field Payment(step B2) and then aborts. Note that in the buffer of stream B there is a record about client N, in which the field Order has the same, unchanged meaning.

An important concept process synchronization is the concept of a “critical section” of a program. Critical section is the part of the program in which shared data is accessed. To eliminate the effect of race conditions on a resource, it is necessary to ensure that at any time there is at most one process in the critical section associated with that resource. This technique is called mutual exclusion.

The simplest way to enforce mutual exclusion is to allow the process in the critical section to disable all interrupts. However, this method is unsuitable because it is dangerous to trust a user process to control the system; it can occupy the processor for a long time, and if a process crashes in a critical area, the entire system will crash because interrupts will never be enabled.

Another way is to use blocking variables. Each shared resource has a binary variable associated with it, which takes the value 1 if the resource is free (that is, no process is in this moment in the critical section associated with this process), and the value 0 if the resource is busy. The figure below shows a fragment of the process algorithm that uses the blocking variable F(D) to implement mutual exclusion of access to the shared resource D. Before entering the critical section, the process checks whether resource D is free. If it is busy, then the check is repeated cyclically; if it is free, then the value of the variable F(D) is set to 0, and the process enters the critical section. After the process has completed all operations with the shared resource D, the value of the variable F(D) is again set to 1.

If all processes are written using the above conventions, then mutual exclusion is guaranteed. It should be noted that the operation of checking and setting a blocking variable must be indivisible. This is explained as follows. Suppose that, as a result of checking a variable, the process determined that the resource was free, but immediately after that, without having time to set the variable to 0, it was interrupted. While it was suspended, another process occupied the resource, entered its critical section, but was also interrupted without completing work with the shared resource. When control was returned to the first process, it, considering the resource free, set the busy sign and began executing its critical section. Thus, the principle of mutual exclusion was violated, which could potentially lead to undesirable consequences. To avoid such situations, it is advisable to have in the machine command system a single team“check-installation”, or implement system means appropriate software primitives that would disable interrupts throughout the entire verification and installation operation.

Implementing critical sections using blocking variables has significant drawback: During the time that one process is in the critical section, another process that requires the same resource will perform the routine task of polling the blocking variable, wasting CPU time needlessly. To eliminate such situations, the so-called event apparatus can be used. This tool can be used to solve not only mutual exclusion problems, but also more general process synchronization problems. In different operating systems, the event apparatus is implemented in its own way, but in any case, system functions of a similar purpose are used, which are conventionally called WAIT(x) and POST(x), where x is the identifier of some event.

If the resource is busy, the process does not perform a cyclic poll, but calls system function WAIT(D), where D denotes the event that resource D is released. The WAIT(D) function transfers active process to the WAITING state and makes a mark in its descriptor that the process is waiting for event D. The process that is currently using resource D, after exiting the critical section, executes the system function POST(D), as a result of which the operating system scans the queue of waiting processes and puts the process waiting for event D into the READY state.

A general means of synchronizing processes was proposed by Dijkstra, who introduced two new primitives. In abstract form, these primitives, denoted P and V, operate on non-negative integer variables called semaphores. Let S be such a semaphore. Operations are defined as follows:

V(S): variable S increases by 1 with one indivisible action; fetch, increment, and store cannot be interrupted, and S is not accessed by other processes while this operation is in progress.

P(S): Decreases S by 1 if possible. If S=0, then it is impossible to decrease S and remain in the region of non-negative integer values, in which case the process calling the P operation waits until this decrease becomes possible. Successfully checking and reducing is also an indivisible operation.

In the special case where the semaphore S can only take the values ​​0 and 1, it turns into a blocking variable. A P operation has the potential to put the process that is executing it into a waiting state, while a V operation may, under some circumstances, activate another process that was suspended by the P operation.

Process deadlock

When organizing the parallel execution of several processes, one of the main functions of the OS is the correct distribution of resources between running processes and providing processes with means of mutual synchronization and data exchange.

When executing processes in parallel, situations may arise in which two or more processes are in a blocked state at all times. The simplest case is when each of two processes is waiting for a resource occupied by the other process. Because of this wait, neither process can continue executing and eventually release the resource needed by the other process. This deadlock is called deadlock(dead lock) dead end, clinch or deadlock.

In a multitasking system, a process is said to be deadlocked if it is waiting for an event that will never happen.

Deadlock situations must be distinguished from simple queues, although both arise when sharing resources and look similar in appearance: the process is suspended and waits for the resource to be freed. However, a queue is normal and is an inherent sign of high resource utilization when requests arrive randomly. It occurs when a resource is not available at the moment, but after some time it is released, and the process continues its execution. A deadlock is a somewhat unsolvable situation.

The deadlock problem includes the following tasks:

  1. preventing deadlocks.
  2. deadlock recognition.
  3. restoring the system after deadlocks.

Deadlocks can be prevented at the stage of writing programs, that is, programs must be written in such a way that a deadlock cannot occur under any ratio of the mutual speeds of the processes. So, if in the previous example process A and process B requested resources in the same sequence, then a deadlock would be impossible in principle. The second approach to preventing deadlocks is called dynamic and involves using certain rules when assigning resources to processes, for example, resources can be allocated in a certain sequence that is common to all processes.

In some cases, when a deadlock occurs among many processes using many resources, recognizing the deadlock is a non-trivial task. There are formal, software-implemented methods for recognizing dead ends, based on maintaining resource distribution tables and query tables for busy resources. Analysis of these tables allows you to detect deadlocks.

If a deadlock situation occurs, then it is not necessary to remove all blocked processes from execution. You can remove only some of them, which frees up resources expected by other processes, you can return some processes to the swap area, you can “rollback” some processes to the so-called checkpoint, which stores all the information necessary to restore program execution from this place. Checkpoints are placed in the program in places after which a deadlock may occur.

A process is a program in execution. A process can also be thought of as a unit of work for. For modern types Processors also have a smaller unit of work, a thread or thread. In other words, a process can spawn one or more threads.

What is the fundamental difference between the concepts of process and flow? The process is considered by the OS as a request for all types of resources (memory, files, etc.), except for one - processor time. A thread is a request for processor time.

In the future, the concepts of process and thread will be used as the unit of operation of the OS. In those cases where this does not play a significant role, they will be called a task

Scheduling Processes and Threads

Process and thread scheduling includes:

  • Creation and destruction of processes
  • Communication between processes
  • CPU time allocation
  • Providing processes with the necessary resources (individually, jointly)
  • Synchronization (monitoring the occurrence of races and blocking)
  • After the process is completed - “cleaning”, i.e. removing traces of being in the system

Each process is isolated from others by its virtual address space, which is the set of addresses that it can manipulate software module process. The OS maps virtual address space to that allocated to a process.

To communicate, processes turn to the OS, which provides means of communication (pipelines, mailboxes, shared memory sections, etc.)

The ability to parallelize calculations within a process into threads increases efficiency. The mechanism for parallelizing calculations for one application is called multithreading. Process threads have one address virtual space. Parallelization speeds up process execution by eliminating the OS switching from one address space to another, which occurs when executing processes. Programs become more logical. A special effect is achieved in multiprocessor systems.

An example of multi-threaded processing is executing MS SQL Server queries

Creating processes

To create a process is to create a process descriptor (an information structure containing the information necessary to manage this process)

Examples of descriptors for:

  • Windows NT/2000/XP - object-process
  • UNIX - process handle
  • OS/2 - Process Control Block (PCB)

Additionally, create a process - this also includes the following steps:

  • Find a program on disk
  • redistribute RAM
  • allocate memory to a new process
  • rewrite the program into allocated memory
  • change some program parameters

Note. In some systems, codes and data may not immediately fit into memory, but rather be rewritten to a special area of ​​the disk - the swap area

Creating Threads

In a multi-threaded system, when a process is created, at least one thread is created. For a thread, the OS generates a thread descriptor (thread identifier, information about rights, priority, thread state, etc.). The initial state of the thread is suspended.

A thread can spawn another thread - a child. When terminating a parent thread, different algorithms are used. Asynchronous completion involves the continuation of execution of child threads after the completion of the parent thread. Synchronous termination of a parent thread causes all of its children to terminate.

4.1 Processes

4.1.1 Process concept

Process(task) - a program in execution mode.

Each process is associated with its address space, from which it can read and to which it can write data.

The address space contains:

    the program itself

    data for the program

    program stack

Each process has a set associated with it registers , For example:

    program counter (in the processor) - a register that contains the address of the next command in the queue for execution. After an instruction is fetched from memory, the program counter is adjusted and the pointer moves to the next instruction.

    stack pointer

On many operating systems, all information about each process in addition to the contents of its own address space is stored in process table operating system.

Some table fields:

Process management

Memory management

File management

Registers

Program counter

Stack pointer

Process state

A priority

Planning options

Process ID

Parent process

Process group

Process start time

CPU time used

Pointer to text segment

Pointer to data segment

Pointer to a stack segment

Root directory

Working directory

File descriptors

User ID

Group ID

4.1.2 Process model

In a multitasking system, the real processor switches from process to process, but to simplify the model, we consider a set of processes running in parallel (pseudo-parallel).

Consider a circuit with four running programs.

Only one process is active at a time

On the right are parallel running processes, each with its own program counter. Of course, there is actually only one physical program counter, into which the current process's logical program counter is loaded. When the time allocated to the current process ends, the physical program counter is stored in memory, in the process's logical program counter.

4.1.3 Creating a process

Three main events leading to the creation of processes (call fork or CreateProcess):

    A running process issues a system call to create a process

    User request to create a process

In all cases, the active current process issues a system call to create a new process.

In UNIX, each process is assigned a process identifier (PID - Process IDentifier)

4.1.4 Terminating the process

Four events that cause a process to stop (call exit or ExitProcess):

    Planned completion (end of execution)

    Scheduled exit on a known error (for example, missing file)

    Exit due to fatal error (program error)

    Destroyed by another process

Thus, a suspended process consists of its own address space, usually called way of memory(core image), and components of the process table (including its registers).

4.1.5 Process hierarchy

IN UNIX systems There is a strict hierarchy of processes. Every new process created system call fork is a child of the previous process. The child process receives variables, registers, etc. from the parent. After calling fork, once the parent's data is copied, subsequent changes in one of the processes do not affect the other, but the processes remember who the parent is.

In this case, in UNIX there is also the ancestor of all processes - the process init.

Process tree for UNIX systems

4.1.6 Process status

Three process states:

    Execution (CPU occupied)

    Ready (a process is temporarily suspended to allow another process to run)

    Waiting (the process cannot be started on its own internal reasons, for example, waiting for an I/O operation)

Possible transitions between states.

1. The process is blocked waiting for input data

2. The scheduler chooses another process

3. The scheduler selects this process

4. Input data has arrived

Transitions 2 and 3 are called by the operating system process scheduler, so the processes themselves don't even know about these transitions. From the point of view of the processes themselves, there are two states of execution and waiting.

On servers, to speed up the response to a client request, they often put several processes in standby mode, and as soon as the server receives the request, the process goes from "waiting" to "executing". This transition is much faster than starting a new process.

4.2 Threads (threads, lightweight process)

4.2.1 Concept of flow

Each process has an address space and a single flow executable commands. In multi-user systems, each time the same service is accessed, a new process has to be created to service the client. This is less beneficial than creating a quasi-parallel thread within that process with a single address space.

4.2.2 Flow model

Each thread is associated with:

    Command execution counter

    Registers for current variables

    State

Threads share elements of their process among themselves:

    Address space

    Global Variables

    Open files

  • Semaphores

    Statistical information.

Otherwise, the model is identical to the process model.

POSIX and Windows have kernel-level support for threads.

4.2.3 Benefits of using threads

    Simplifying the program in some cases by using a common address space.

    The speed of creating a stream, compared to the process, is approximately 100 times.

    Improving the performance of the program itself, because It is possible to simultaneously perform calculations on the processor and an I/O operation. Example: text editor with three threads can simultaneously interact with the user, format text and write a backup copy to disk.

4.2.4 Implementation of threads in user space, kernel and mixed

B- threads in kernel space

When A The kernel knows nothing about threads. Every process needs flow table, similar to the process table.

Advantages of the case A:

    Such multithreading can be implemented on a kernel that does not support multithreading

    More fast switching, creation and termination of threads

    A process can have its own scheduling algorithm.

Disadvantages of the case A:

    No timer interrupt within one process

    When using a blocker (the process is put into standby mode, for example: reading from the keyboard, but no data is received) system request all other threads are blocked.

    Implementation complexity







2024 gtavrl.ru.