Remote procedure call. Remote Procedure Call (RPC)


The purpose of this article is to discuss terminology. The article is not about how and for what purpose, but only about the use of terminology. The article reflects the opinion of the author and does not pretend to be scientific.

Introduction

If you work in programming distributed systems or in systems integration, then most of the information here is not new to you.

The problem arises when people who use different technologies meet and when these people start technical conversations. In this case, there is often a mutual misunderstanding due to terminology. Here I will try to bring together terminologies used in different contexts.

Terminology

There is no clear terminology and classification in this area. The terminology used below is a reflection of the author's model, that is, it is strictly subjective. Any criticism and any discussion is appreciated.

I have divided the terminology into three areas: RPC (Remote Procedure Call), Messaging, and REST. These areas have historical roots.

RPC

RPC technologies are the oldest technologies. The brightest representatives of RPC are - CORBA and DCOM.

In those days, it was mainly necessary to interconnect systems on fast and relatively reliable local area networks. The main idea behind RPC was to make calling remote systems very similar to calling functions within a program. All the mechanics of remote calls were hidden from the programmer. At least they tried to hide it. Programmers in many cases were forced to work at a deeper level, where the terms marshaling ( marshalling) and unmarshalling(how is that in Russian?), which essentially meant serialization. Normal function calls within processes were handled on the caller in Proxy, and on the side of the system performing the function, in Dispatcher... Ideally, neither the calling system nor the processing system was concerned with the intricacies of transferring data between systems. All these subtleties were concentrated in the Proxy - Dispatcher bundle, the code of which was generated automatically.

Therefore, you will not notice, you should not notice, any difference between a call to a local function and a call to a remote function.
Now there is a kind of RPC renaissance, the most prominent representatives of which are: Google ProtoBuf, Thrift, Avro.

Messaging

Over time, it turned out that an attempt to protect the programmer from the fact that the called function is still different from the local one did not lead to the desired result. The implementation details and fundamental differences between distributed systems were too great to be solved with the automatically generated Proxy code. Gradually, it became clear that the fact that the systems are connected by an unreliable, slow, low-speed environment should be explicitly reflected in the program code.

Technologies appeared web services... We started talking ABC: Address, Binding, Contract... It is not entirely clear why contracts appeared, which are essentially Envelopes for input arguments. Contracts often complicate the entire model rather than simplify it. But ... never mind.

Now the programmer was explicitly creating service(Service) or client(Client) calling the service. The service was a set operations (Operation), each of which at the input took inquiry(Request) and gave answer(Response). Customer explicitly sent(Sent) request, the service explicitly received ( Receive) it and answered (Sent), sending the answer. The client received (Receive) a response and the call ended.

Just like in RPC, Proxy and Dispatcher worked somewhere here. And as before, their code was generated automatically and the programmer did not need to understand it. Unless just now, the client explicitly used classes from the Proxy.

Requests and responses are explicitly mapped to a format intended for transmission over the wire. Most often this is an array of bytes. The transformation is called Serialization and Deserialization and sometimes hides in the Proxy code.
The culmination of messaging was the emergence of the paradigm ESB (Enterprise Service Bus)... No one can really articulate what it is, but everyone agrees that data on the ESB moves in the form of messages.

REST

In the constant struggle with the complexity of the code, programmers took the next step and created REST.

The basic principle of REST is that function operations are sharply limited and only a set of operations is left. CRUD: Create - Read - Update - Delete... In this model, all operations are always applied to some data. The operations available in CRUD are sufficient for most applications. Since REST technologies in most cases imply the use of the HTTP protocol, the CRUD commands affected the commands HTTP (Post - Get - Put - Delete) ... It is constantly argued that REST is not necessarily tied to HTTP. But in practice, the reflection of operation signatures on the syntax of HTTP commands is widely used. For example, calling the function

EntityAddress ReadEntityAddress (string param1, string param2)

It will be expressed as follows:

GET: entityAddress? Param1 = value1 & param2 = value2

Conclusion

Before starting a discussion on distributed systems or integration, define some terminology. If Proxy will always mean the same thing in different contexts, then, for example, request will mean little in RPC terms, and marshalling will cause confusion when discussing REST technologies.

Remote Procedure Call(or Calling remote procedures) (from the English. Remote Procedure Call (RPC)) is a class of technologies that allow computer programs to call functions or procedures in another address space (usually on remote computers). Typically, an RPC technology implementation includes two components: a network protocol for client-server exchange and an object serialization language (or structures, for non-object RPCs). Different RPC implementations have very different architectures and capabilities: some implement SOA, others CORBA or DCOM. At the transport layer, RPCs are mainly used by the TCP and UDP protocols, however, some are based on HTTP (which violates the ISO / OSI architecture, since HTTP was not originally a transport protocol).

Implementation

There are many technologies that provide RPC:

  • Sun RPC (binary protocol based on TCP and UDP and XDR) RFC-1831 also known as ONC RPC RFC-1833
  • .Net Remoting (binary protocol based on TCP, UDP, HTTP)
  • SOAP - Simple Object Access Protocol see specification: RFC-4227
  • XML RPC (HTTP Based Text Protocol) See specification: RFC-3529
  • Java RMI - Java Remote Method Invocation - see spec: http://java.sun.com/j2se/1.5.0/docs/guide/rmi/index.html
  • JSON-RPC JavaScript Object Notation Remote Procedure Calls (HTTP-based text protocol) See specification: RFC-4627
  • DCE / RPC - Distributed Computing Environment / Remote Procedure Calls (binary protocol based on various transport protocols, including TCP / IP and Named Pipes from SMB / CIFS protocol)
  • DCOM - Distributed Component Object Model known as MSRPC Microsoft Remote Procedure Call or "Network OLE" (an object-oriented extension of DCE RPC that allows you to pass references to objects and call methods of objects through such references)

Principle

The idea behind Remote Procedure Call (RPC) is to extend the well-known and well-understood mechanism for transferring control and data within a program running on a single machine to transfer control and data over a network. Remote procedure call facilities are designed to facilitate the organization of distributed computing and the creation of distributed client-server information systems. The greatest efficiency of using RPC is achieved in those applications in which there is interactive communication between remote components with a short response time and a relatively small amount of data transferred. Such applications are called RPC-oriented.

Implementing remote calls is much more complex than implementing local procedure calls. The following problems and tasks can be identified that need to be solved when implementing RPC:

  • Since the caller and the called procedure are executed on different machines, they have different address spaces, and this creates problems when passing parameters and results, especially if the machines are running different operating systems or have different architectures (for example, big or big endian ). Since RPC cannot count on shared memory, this means that RPC parameters must not contain pointers to non-stack memory locations and that parameter values ​​must be copied from one computer to another. To copy the procedure parameters and the result of their execution over the network, they are serialized.
  • Unlike a local call, a remote procedure call necessarily uses the transport layer of the network architecture (for example, TCP), but this remains hidden from the developer.
  • The execution of the calling program and the called local procedure on the same machine is done in a single process. But there are at least two processes involved in RPC implementation - one on each machine. If one of them abnormally terminates, the following situations may arise: if the calling procedure crashes, the remotely called procedures will become "orphaned", and if the remote procedures terminate abnormally, the calling procedures will become "deprived parents" of the calling procedures, which will in vain wait for a response from the remote procedures.
  • There are a number of problems associated with the heterogeneity of programming languages ​​and operating environments: data structures and procedure call structures supported in any one programming language are not supported in the same way in all other languages. Thus, there is a compatibility problem that has not yet been resolved either by the introduction of one generally accepted standard, or by the implementation of several competing standards in all architectures and in all languages.

Subsystems

  • Transport subsystem

Management of outgoing and incoming connections. - support for the concept of "message boundary" for transport protocols that do not directly support it (TCP). - Support for guaranteed delivery for transport protocols that do not directly support it (UDP).

  • Thread pool (called side only). Provides an execution context for code called over the network.
  • Marshaling (also called "serialization"). Packing call parameters into a byte stream in a standard manner that does not depend on the architecture (in particular, on the byte order in a word). In particular, arrays, strings, and structures pointed to by pointer parameters can be exposed to it.
  • Encrypting and digitally signing packages.
  • Authentication and Authorization. Transmission over the network of information identifying the subject making the call.

In some RPC (.NET Remoting) implementations, subsystem boundaries are open polymorphic interfaces, and it is possible to write your own implementation of almost all of the subsystems listed. In other implementations (DCE RPC on Windows) this is not the case.

see also

Remote Procedure Call (RPC) Remote Procedure Call Concept

The idea behind Remote Procedure Call (RPC) is to extend the well-known and well-understood mechanism for transferring control and data within a program running on a single machine to transfer control and data over a network. Remote procedure call facilities are designed to facilitate the organization of distributed computing. The most effective use of RPC is achieved in those applications in which there is interactive communication between remote components with short response times and relatively little data transfer. Such applications are called RPC-oriented.

Local procedure calls are characterized by:

  • Asymmetry, that is, one of the interacting parties is the initiator;
  • Synchronicity, that is, the execution of the calling procedure is suspended from the moment the request is issued and is resumed only after returning from the called procedure.

Implementing remote calls is significantly more complex than implementing local procedure calls. To begin with, since the caller and the callee are executed on different machines, they have different address spaces, and this creates problems when passing parameters and results, especially if the machines are not identical. Since RPC cannot count on shared memory, this means that RPC parameters must not contain pointers to non-stack memory locations and that parameter values ​​must be copied from one computer to another. The next difference between RPC and a local call is that it necessarily uses the underlying communication system, but this should not be explicitly seen either in the definition of procedures or in the procedures themselves. Remoteness introduces additional problems. The execution of the calling program and the called local procedure on the same machine is done in a single process. But there are at least two processes involved in RPC implementation - one on each machine. If one of them terminates abnormally, the following situations may arise: if the calling procedure crashes, the remotely called procedures will become "orphaned", and if the remote procedures terminate abnormally, the callers will become "deprived parents" of the calling procedures, which will vainly wait for a response from the remote procedures.

In addition, there are a number of problems associated with the heterogeneity of programming languages ​​and operating environments: data structures and procedure call structures supported in any one programming language are not supported in the same way in all other languages.

These and some other problems are solved by the widespread RPC technology that underlies many distributed operating systems. Basic RPC Operations

To understand how RPC works, consider first executing a local procedure call on a conventional machine that is running autonomously. For example, let it be a system call

count = read (fd, buf, nbytes);

where fd is an integer, buf is a character array, and nbytes is an integer.

To make the call, the calling procedure pushes the parameters onto the stack in reverse order (Figure 3.1). After the call to read is executed, it places the return value in a register, advances the return address, and returns control to the calling procedure, which pops the parameters from the stack, returning it to its original state. Note that in C, parameters can be called either by reference (by name) or by value (by value). With respect to the called procedure, value parameters are initializable local variables. The called procedure can change them without affecting the value of the originals of these variables in the calling procedure.

If a pointer to a variable is passed to the called procedure, then changing the value of this variable by the called procedure will change the value of this variable for the calling procedure as well. This fact is essential for RPC.

There is also another mechanism for passing parameters, which is not used in C language. It is called call-by-copy / restore and consists in the necessity of the calling program copying variables onto the stack as values, and then copying them back after the call is made over the original values ​​of the calling procedure.

It is up to the language designers to decide which parameter passing mechanism to use. Sometimes it depends on the type of data being transferred. In C, for example, integer and other scalar data are always passed by value, and arrays are always passed by reference.

Application

Much of the Windows operating system remote management tools (Event Viewer, Server Manager, Print Management, User Lists) use DCE RPC as a means of network communication between the managed service and the managing user interface application. DCE RPC support has been present in Windows NT since the very first version 3.1. DCE RPC clients were also supported on the lightweight line of the Windows 3.x / 95/98 / Me operating system.

The Windows system libraries that provide such control and serve as the base layer for control user interface applications (netapi32.dll and partly advapi32.dll) actually contain the client code of the DCE RPC interfaces that perform this control.

This architectural design has been the subject of intense criticism from Microsoft. The generic marshalling procedures found in DCE RPC are very complex and have a huge potential for flaws that can be exploited on the network by sending a deliberately malformed DCE RPC packet. A significant portion of the Windows security flaws discovered from the late 1990s to the mid 2000s were bugs in the DCE RPC marshalling code.

In addition to DCE RPC, Windows actively uses DCOM technology. For example, it is used as a means of communication between the IIS web server management tools and the managed server itself. A fully functional interface for communicating with the MS Exchange Server mail system - MAPI - is also based on DCOM.

Remote Procedure Call (RPC) Remote Procedure Call Concept

The idea behind Remote Procedure Call (RPC) is to extend the well-known and well-understood mechanism for transferring control and data within a program running on a single machine to transfer control and data over a network. Remote procedure call facilities are designed to facilitate the organization of distributed computing. The most effective use of RPC is achieved in those applications in which there is interactive communication between remote components with short response times and relatively little data transfer. Such applications are called RPC-oriented.

Local procedure calls are characterized by:

Asymmetry, that is, one of the interacting parties is the initiator; Synchronicity, that is, the execution of the calling procedure when stops from the moment the request is issued and resumes only after returning from the called procedure.

Implementing remote calls is significantly more complex than implementing local procedure calls. To begin with, since the caller and the callee are executed on different machines, they have different address spaces, and this creates problems when passing parameters and results, especially if the machines are not identical. Since RPC cannot count on shared memory, this means that RPC parameters must not contain pointers to non-stack memory locations and that parameter values ​​must be copied from one computer to another. The next difference between RPC and a local call is that it necessarily uses the underlying communication system, but this should not be explicitly seen either in the definition of procedures or in the procedures themselves. Remoteness introduces additional problems. The execution of the calling program and the called local procedure on the same machine is done in a single process. But there are at least two processes involved in RPC implementation - one on each machine. If one of them crashes, the following situations may arise: if the calling procedure crashes, the remotely called procedures will become "orphaned", and if the remote procedures terminate abnormally, the callers will become "deprived parents" of the callers, who will vainly wait for a response from the remote procedures.

In addition, there are a number of problems associated with the heterogeneity of programming languages ​​and operating environments: data structures and procedure call structures supported in any one programming language are not supported in the same way in all other languages.

These and some other problems are solved by the widespread RPC technology that underlies many distributed operating systems.

Basic RPC Operations

To understand how RPC works, consider first executing a local procedure call on a conventional machine that is running autonomously. For example, let it be a system call

Count = read (fd, buf, nbytes);

where fd is an integer,
buf is an array of characters,
nbytes is an integer.

To make the call, the calling procedure pushes the parameters onto the stack in reverse order (Figure 3.1). After the call to read is executed, it places the return value in a register, advances the return address, and returns control to the calling procedure, which pops the parameters from the stack, returning it to its original state. Note that in C, parameters can be called either by reference (by name) or by value (by value). With respect to the called procedure, value parameters are initializable local variables. The called procedure can change them without affecting the value of the originals of these variables in the calling procedure.

If a pointer to a variable is passed to the called procedure, then changing the value of this variable by the called procedure will change the value of this variable for the calling procedure as well. This fact is essential for RPC.

There is also another mechanism for passing parameters, which is not used in C language. It is called call-by-copy / restore and consists in the necessity of the calling program copying variables onto the stack as values, and then copying them back after the call is made over the original values ​​of the calling procedure.

It is up to the language designers to decide which parameter passing mechanism to use. Sometimes it depends on the type of data being transferred. In C, for example, integer and other scalar data are always passed by value, and arrays are always passed by reference.

Rice. 3.1. a) The stack before the read call is made;
b) The stack during the execution of the procedure;
c) Stack after returning to the calling program

The idea behind RPC is to make a remote procedure call look as close as possible to a local procedure call. In other words, make RPC transparent: the caller does not need to know that the called procedure is on a different machine, and vice versa.

RPC achieves transparency in the following way. When the called procedure is indeed remote, instead of the local procedure, another version of the procedure called the client stub (stub) is placed in the library. Like the original procedure, the stub is invoked using the calling sequence (as in Figure 3.1), and an interrupt occurs when accessing the kernel. Only, unlike the original procedure, it does not put parameters in registers and does not ask the kernel for data, instead it generates a message to send to the kernel of the remote machine.

RPC steps

The interaction of software components during the execution of a remote procedure call is illustrated in Figure 3.2. After the client stub has been called by the client program, its first task is to fill the buffer with the message being sent. On some systems, the client stub has a single, fixed-length buffer that is filled from the beginning every time a new request arrives. On other systems, a message buffer is a pool of buffers for individual message fields, some of these buffers already full. This method is especially useful when the packet is formatted with a large number of fields, but the values ​​of many of these fields do not change from call to call.

The parameters must then be converted to the appropriate format and inserted into the message buffer. At this point, the message is ready to be sent, so an interrupt is performed on the kernel call.

Rice. 3.2. Remote procedure call

When the kernel gains control, it switches contexts, saves processor registers and a memory map (page descriptors), and installs a new memory map that will be used to run in kernel mode. Since the kernel and user contexts are different, the kernel must copy the message exactly to its own address space, so that it can access it, remember the destination address (and possibly other header fields), and it must also pass it on to the network interface. This completes the work on the client side. The transmit timer is enabled, and the kernel can either cyclically poll for a response, or pass control to a scheduler that chooses some other process to execute. In the first case, the query execution is accelerated, but there is no multiprogramming.

On the server side, the incoming bits are placed by the receiving hardware either in a built-in buffer or in RAM. When all information is received, an interrupt is generated. The interrupt handler verifies the correctness of the packet data and determines which stub should be passed it. If none of the stubs are expecting this packet, the handler must either buffer it or discard it altogether. If there is a pending stub, then the message is copied to it. Finally, a context switch is performed, as a result of which the registers and the memory map are restored, taking the values ​​that they had at the moment when the stub made the receive call.

Now the server stub starts working. It unpacks the parameters and pushes them appropriately onto the stack. When everything is ready, a server call is made. After completing the procedure, the server sends the results to the client. For this, all the steps described above are performed, only in reverse order.

Figure 3.3 shows the sequence of commands that must be executed for each RPC call, and Figure 3.4 shows how much of the total RPC execution time is spent executing each of the 14 stages described. The research was carried out on a DEC Firefly multiprocessor workstation, and although the presence of five processors necessarily influenced the measurement results, the histogram shown in the figure provides an overview of the RPC execution process.

Rice. 3.3. RPC Procedure Steps

Rice. 3.4. Allocating Time Between 14 RPC Runs

1. Call stub

2. Prepare the buffer

3. Pack parameters

4. Fill in the title field

5. Calculate the checksum in the message

6. Interrupt to core

7. Package queue for execution

8. Sending a message to the controller via the QBUS

9. Ethernet transmission time

10. Get a packet from the controller

11. Interrupt handling procedure

12. Calculating checksum

13. Switching context to user space

14. Executing a server stub

Dynamic linking

Consider the question of how the client specifies the server location. One method for solving this problem is to directly use the server's network address in the client program. The disadvantage of this approach is its extreme inflexibility: when the server is moved, or when the number of servers is increased, or when the interface is changed, in all these and many other cases, it is necessary to recompile all programs that used a hard assignment of the server address. In order to avoid all these problems, some distributed systems use what is called dynamic linking.

The starting point for dynamic binding is the formal definition (specification) of the server. The specification contains the name of the file server, version number and a list of service procedures provided by this server for clients (Figure 3.5). For each procedure, a description of its parameters is given, indicating whether this parameter is input or output relative to the server. Some parameters can be simultaneously input and output - for example, some array that is sent by the client to the server, modified there, and then returned back to the client (copy / restore operation).

Rice. 3.5. RPC Server Specification

The formal server specification is used as input to a stub generator program that creates both client and server stubs. They are then placed in the appropriate libraries. When the user (client) program calls any procedure defined in the server specification, the corresponding stub procedure is associated with the program binary. Likewise, when a server is compiled, the server stubs are associated with it.

When a server starts up, its very first action is to hand over its server interface to a special program called binder. This process, known as the server registration process, involves the server sending its name, version number, unique identifier, and descriptor of the server's location. The descriptor is system independent and can be an IP, Ethernet, X.500, or some other address, and may also contain other information such as authentication.

When the client calls one of the remote procedures for the first time, for example, read, the client stub sees that it is not yet connected to the server and sends a message to the binder program asking it to import the interface of the correct version of the required server. If such a server exists, then binder passes a handle and a unique identifier to the client stub.

The client stub uses a descriptor as an address when sending a request message. The message contains parameters and a unique identifier that the server core uses in order to forward the incoming message to the correct server if there are several of them on this machine.

This method of importing / exporting interfaces is very flexible. For example, there may be multiple servers supporting the same interface, and clients are randomly distributed among the servers. Within the framework of this method, it becomes possible to periodically poll the servers, analyze their performance and, in case of failure, automatic shutdown, which increases the overall fault tolerance of the system. This method can also support client authentication. For example, a server might determine that it can only be used by clients from a specific list.

However, dynamic linking has disadvantages, such as the additional overhead (time) of exporting and importing interfaces. The magnitude of these costs can be significant, since many client processes exist for a short time, and at each start of the process, the interface import procedure must be performed again. In addition, in large distributed systems, the binder program can become a bottleneck, and creating multiple programs of a similar purpose also increases the overhead of creating and synchronizing processes.

RPC semantics on failure

Ideally, RPC should function correctly in the event of a failure. Consider the following classes of failures:

The client cannot determine the location of the server, for example, if the required server fails, or because the client program was compiled a long time ago and used an old version of the server interface. In this case, a message containing an error code is received in response to the client's request. Lost client-to-server request. The simplest solution is to repeat the request after a certain time. Lost reply message from server to client. This option is more complicated than the previous one, since some procedures are not idempotent. An idempotent procedure is a procedure whose execution request can be repeated several times, and the result will not change. An example of such a procedure is reading a file. But the procedure for withdrawing a certain amount from a bank account is not idempotent, and if the answer is lost, a repeated request can significantly change the state of the client's account. One of the possible solutions is to bring all procedures to an idempotent form. However, in practice this is not always possible, so another method can be used - sequential numbering of all requests by the client kernel. The server core remembers the number of the most recent request from each of the clients, and upon receipt of each request, it analyzes whether this request is primary or repeated. The server crashed after receiving the request. The property of idempotency is also important here, but unfortunately the query numbering approach cannot be applied. In this case, it matters when the failure occurred - before or after the operation. But the client kernel cannot recognize these situations, it only knows that the response timed out. There are three approaches to this problem: Wait until the server restarts and try again. This approach ensures that the RPC has been completed at least once, and possibly more. Immediately report the error to the application. This approach ensures that the RPC has been executed no more than once. The third approach does not guarantee anything. When the server fails, there is no support for the client. RPC may or may not be performed at all, or many times. In any case, this method is very easy to implement.

None of these approaches are very attractive. And the ideal option that would guarantee exactly one RPC execution, in the general case, cannot be implemented for reasons of principle. Suppose, for example, a remote operation is printing some text, which includes loading the printer buffer and setting one bit in some control register of the printer, as a result of which the printer starts. A server crash can occur either a microsecond before or a microsecond after the control bit is set. The moment of failure entirely determines the recovery procedure, but the client cannot know about the moment of failure. In short, the possibility of a server crash radically changes the nature of RPC and clearly reflects the difference between a centralized system and a distributed system. In the first case, a server crash leads to a client crash, and recovery is impossible. In the second case, it is possible and necessary to perform the actions to restore the system.

The client crashed after submitting the request. In this case, calculations are performed on the results that no one expects. Such calculations are called "orphans". The presence of orphans can cause various problems: overhead of CPU time, blocking resources, replacing the response to the current request with the response to a request that was issued by the client machine before the system was restarted.

What to do with orphans? Let's consider 4 possible solutions.

Destruction. Before the client stub sends an RPC message, it makes a note in the log notifying what it will do now. The log is stored on disk or other fault tolerant memory. After the accident, the system reboots, the log is analyzed and the orphans are eliminated. The disadvantages of this approach include, firstly, the increased costs associated with writing about each RPC to disk, and, secondly, the possible inefficiency due to the appearance of second generation orphans generated by RPC calls issued by first generation orphans. Reincarnation. In this case, all problems are solved without using disk writing. The method consists of dividing time into sequentially numbered periods. When the client reboots, it broadcasts to all machines to start a new period. Upon receipt of this message, all remote computations are terminated. Of course, if the network is segmented, then some orphans may survive. Soft reincarnation is similar to the previous case, except that not all remote computations are found and destroyed, but only those of the rebooting client. Expiration date. Each request is assigned a standard time interval T, during which it must be fulfilled. If the request is not completed within the allotted time, then an additional quantum is allocated. Although this requires additional work, if, after a client crash, the server waits for an interval T before rebooting the client, then all orphans are necessarily destroyed.

In practice, none of these approaches is desirable; moreover, killing orphans can exacerbate the situation. For example, suppose an orphan has locked one or more database files. If the orphan is suddenly destroyed, then these locks will remain, in addition, the destroyed orphans may remain standing in various system queues, in the future they may cause the execution of new processes, etc.

Programs communicating over a network need a communication mechanism. At the lower level, upon receipt of packets, a signal is processed by the network signal processing program. At the top level, the rendezvous mechanism, adopted in the Ada language, works. NFS uses a remote procedure call (RPC) mechanism in which the client communicates with the server (see Figure 1). In accordance with this process, the client first accesses a procedure that sends a request to the server. Upon arrival of a packet with a request, the server calls the procedure for its opening, performs the requested service, sends a response, and control is returned to the client.

The RPC interface can be thought of as having three layers:

The upper level is completely transparent. A program at this level might, for example, call rnusers (), which returns the number of users on the remote machine. You don't need to know about using the RPC mechanism because you are making the call in the program.

The middle tier is for the most common applications. RPC calls at this level are handled by the registerrpc () and callrpc () routines: registerrpc () receives system-wide dark code, and callrpc () executes a remote procedure call. The rnusers () call is implemented using these two routines.

The lower level is used for more complex tasks, changing the default to the values ​​of procedure parameters. At this level, you can explicitly manipulate the sockets used to transmit RPC messages.

As a general rule, you should use the upper layer and avoid using the lower layers unnecessarily.

Despite the fact that in this tutorial we consider the interface only in C, calls to remote procedures can be made from any language. The work of the RPC mechanism for organizing communication between processes on different machines does not differ from its work on the same machine.

RPC (Remote Procedure Call) is an interface between remote users and specific host programs that are invoked by those users. An RPC service for a host typically provides a suite of programs to clients. Each of these programs, in turn, consists of one or more remote procedures. For example, an NFS remote filesystem service that relies on RPC calls can only consist of two programs: for example, one program interacts with high-level user interfaces and the other with low-level I / O functions.

Each RPC call involves two parties: the active client, which sends the procedure call request to the server, and the server, which sends the response to the client.

Note. Note that the terms "client" and "server" in this case refer to a specific transaction. A specific host or software (process or program) can act as both a client and a server. For example, a program that provides the operation of the remote procedure service, at the same time, can be a client in the work with the network file system.

RPC is built on a remote procedure call model similar to that of local procedure calls. When you call a local procedure, you push arguments to a specific memory location, stack, or environment variables, and transfer control of the process to a specific address. After completing the work, you read the results at a specific address and continue your process.

In the case of a remote procedure, the main difference is that the remote function call is served by two processes: the client process and the server process.

The client process sends a message to the server, which includes the parameters of the called procedure and waits for a response message with the results of its work. When a response is received, the result is read and the process continues. On the server side, the call handler process is in the waiting state, and when a message arrives, it reads the procedure parameters, executes it, sends a response, and becomes in the waiting state for the next call.

The RPC protocol does not impose any requirements on additional communications between processes and does not require synchronization of the functions performed, that is, calls can be asynchronous and non-independent, so that the client can execute other procedures while waiting for a response. The RPC server can allocate a separate process or virtual machine for each function, therefore, without waiting for the previous requests to finish, it can immediately accept the next.

However, there are several important differences between local and remote procedure calls:

1. Error processing. The client should in any case be notified of errors that occur when calling remote procedures on the server or on the network.

2. Global variables. Because the server does not have access to the client's address space, you cannot use hidden parameters in the form of global variables in remote procedure calls.

3. Performance. The speed of execution of remote procedures, as a rule, is one or two orders of magnitude lower than the speed of execution of similar local procedures.

4. Authentication. Because remote procedure calls occur over the network, client authentication mechanisms must be used.

Principles of constructing the protocol.

The RPC protocol can use several different transport protocols. The only responsibilities of the RPC protocol are to enforce standards and interpret message transmission. The reliability and reliability of message transmission is entirely ensured by the transport layer.

However, RPC can control the choice and some functions of the transport protocol. As an example of the interaction between RPC and the transport protocol, consider the procedure for assigning an RPC port of an application process via RPC - Portmapper.

This function dynamically (on demand) assigns a specific port to an RPC connection. Function Portmapper is used quite often because the set of transport ports reserved for RPC is limited, and the number of processes that can potentially run concurrently is very high. Portmapper, for example, called when NFS client / server communication ports are selected.

Service Portmapper uses the RPC broadcast message mechanism to a specific port - III. On this port, the client broadcasts a request for the port of a specific RPC service. Service Portmapper processes the tax message, determines the address of the local RPC service, and sends a response to the client. RPC service Portmapper can work with both TCP and UDP protocols.

RPC can work with various transport protocols, but it never duplicates their functions, that is, if RPC runs on top of TCP, RPC leaves all the worries about the reliability and reliability of the connection to TCP. However, if RPC is installed on top of UDP, it can provide additional native functionality to ensure message delivery is guaranteed.

Note. Applications can view the RPC protocol as a defined function call procedure over a Jump Subroutine Instruction (JSR) network.

For the RPC protocol to work, the following conditions must be met:

1. Unique identification of all remotely called procedures on a given host. RPC requests contain three fields of identifiers - the number of the remote program (service), the version number of the remote program, and the number of the remote procedure of the specified program. The program number is assigned by the manufacturer of the service, the procedure number indicates the specific function of this service

2. Identification of the RPC protocol version. RPC messages contain an RPC protocol version field. It is used to match the formats of the transmitted parameters when the client is working with different versions of RPC.

3. Providing mechanisms for authenticating the client to the server. The RPC protocol provides a procedure for authenticating the client in the service, and, if necessary, with each request or sending a response to the client. In addition, RPC allows various additional security mechanisms to be used.

RPC can use four types of authentication mechanisms:

AUTH_NULL - no authentication required

AUTH_UNIX - UNIX standard authentication

AUTH_SHORT - UNIX standard authentication with its own encoding structure

AUTH_DES - DES authentication

4. Identification of messages in response to the corresponding requests. RPC response messages contain the ID of the request they were based on. This identifier can be called the transaction identifier of the RPC call. This mechanism is especially useful when working in asynchronous mode and when executing a sequence of several RPC calls.

5. Identification of protocol errors. All network or server errors have unique identifiers by which each of the participants in the connection can determine the cause of the failure.

Protocol message structures

When transferring RPC messages over a transport protocol, several RPC messages can be located within one transport packet. In order to distinguish one message from another, a record marker (RM - Record Marker) is used. Each RPC message is "marked" with exactly one RM.

An RPC message can be composed of several fragments. Each chunk consists of four bytes of header and (0 to 2 ** 31-1) data. The first bit of the header indicates whether the chunk is the last, and the remaining 31 bits indicate the length of the data packet.

The structure of RPC is formally described in the language of description and representation of data formats - XDR with additions concerning the description of procedures. You could even say that the RPC markup language is an extension of XDR, supplemented by work with procedures.

The structure of the RPC package looks like this:

struct rpc_msg (

unsigned int xid;

union switch (msg_type mtype) (

call_body cbody;

reply body rbody;

where xid is the identifier of the current transaction, call_body is the request packet, reply_body is the response packet. The request structure looks something like this:

struct call body (

unsigned int rpcvers;

unsigned int prog;

unsigned int vers;

unsigned int proc;

opaque_auth cred;

opaque_auth verf;

/ * procedure parameters * /

The reply_body structure can contain either a structure passed on in case of an error (in which case it contains the error code), or a structure for successful processing of the request (in which case it contains the returned data).

High-level programming interface.

Using subroutines in a program is the traditional way to structure a task, to make it clearer. The most frequently used subroutines are collected in libraries, where they can be used by various programs. In this case, we are talking about a local (local) call, that is, both the caller and the called objects work within the same program on the same computer.

In the case of a remote invocation, a process running on one computer starts the process on the remote computer (that is, it actually runs the procedure code on the remote computer). Obviously, a remote procedure call differs significantly from a traditional local one, but from the point of view of a programmer, there are practically no such differences, that is, the architecture of a remote procedure call allows you to simulate a local call.

However, if, in the case of a local call, the program passes parameters to the called procedure and receives the result of work through the stack or shared memory areas, then in the case of a remote call, the transfer of parameters turns into a transmission of a request over the network, and the result of the work is in the received response.

This approach is a possible basis for creating distributed applications, and although many modern systems do not use this mechanism, the basic concepts and terms in many cases remain. When describing the RPC mechanism, we will traditionally refer to the calling process as the client, and the remote process that implements the procedure as the server.

A remote procedure call includes the following steps:

1. The client program makes a local call to a procedure called a stub. At the same time, the client "seems" that by calling the stub, it actually makes a call to the server procedure. Indeed, the client passes the required parameters to the stub, and it returns the result. However, this is not exactly how the client envisions it. The job of the stub is to accept arguments for the remote procedure, possibly convert them to some standard format, and form a network request. Packing arguments and making a network request is called marshalling.

2. The network request is sent over the network to the remote system. To do this, the stub uses the appropriate calls, for example, those discussed in the previous sections. Note that in this case, various transport protocols can be used, and not only of the TCP / IP family.

3. On the remote host, everything happens in reverse order. The server stub waits for the request and, on receipt, retrieves the parameters — the arguments of the procedure call. Extracting (unmarshalling) can involve necessary conversions (for example, changing the order of the bytes).

4. The stub makes a call to the real server procedure to which the client's request is addressed, passing it the arguments received over the network.

5. After completing the procedure, control returns to the server stub, passing the required parameters to it. Like a client stub; the server stub converts the values ​​returned by the procedure to form a network response message that is sent over the network to the system from which the request came.

6. The operating system passes the received message to the client stub, which, after the necessary transformation, passes the values ​​(which are the values ​​returned by the remote procedure) to the client, which interprets this as a normal return from the procedure.

Thus, from the client's point of view, it makes a remote procedure call as it would for a local one. The same can be said about the server: the procedure is called in the standard way, an object (server stub) calls the local procedure and receives the values ​​returned by it. The client treats the stub as a callable server procedure, and the server interprets its own stub as the client.

Thus, stubs constitute the core of the RPC system, responsible for all aspects of the generation and transmission of messages between the client and the remote server (procedure), although both the client and the server assume that the calls are made locally. This is the basic concept of RPC - to completely hide the distributed (network) nature of communication in the stub code. The advantages of this approach are obvious: both the client and the server are independent of the network implementation, they both operate within a distributed virtual machine, and procedure calls have a standard interface.

Passing parameters

Passing value parameters is straightforward. In this case, the client stub places the parameter value in the network request, possibly performing conversions to the standard form (for example, changing the byte order). The situation with passing pointers is much more complicated, when the parameter is the address of the data, and not their value. Passing an address in a request is meaningless, since the remote procedure is executed in a completely different address space. The simplest RPC solution is to prevent clients from passing parameters otherwise than by value, although this certainly imposes serious restrictions.

Binding

Before a client can call a remote procedure, it must bind to the remote system hosting the required server. Thus, the task of linking is split into two:

Finding the Remote Host with the Desired Server

Finding the required server process on a given host

Various approaches can be used to find a host. A possible option is to create some kind of centralized directory in which hosts announce their servers, and where the client, if desired, can choose the host and procedure address suitable for him.

Each RPC procedure is uniquely identified by a program and procedure number. The program number defines a group of remote procedures, each of which has its own number. Each program is also assigned a version number, so that when minor changes are made to the program (for example, when a procedure is added), there is no need to change its number. Usually, several functionally similar procedures are implemented in one program module, which, when launched, becomes the server of these procedures, and which is identified by the program number.

Thus, when a client wants to call a remote procedure, he needs to know the program, version, and procedure numbers that provide the required service.

To pass the request, the client also needs to know the host's network address and port number associated with the server program providing the required procedures. This is done using the portmap (IM) daemon (called rpcbind (IM) on some systems). The daemon runs on a host that provides remote procedure services and uses a well-known port number. When a server process initializes, it registers its routines and port numbers with portmap (IM). Now, when the client needs to know the port number to call a particular procedure, it sends a request to the portmap (IM) server, which in turn either returns the port number, or redirects the request directly to the RPC server and returns a response to the client when executed. In any case, if the required procedure exists, the client receives the procedure's port number from the portmap (IM) server, and further requests can be made directly to this port.

Handling exceptions

Handling exceptions when calling local procedures is not particularly problematic. UNIX handles process errors such as division by zero, invalid memory accesses, and so on. Calling a remote procedure increases the likelihood of error situations. Added to server and stub errors are errors related to, for example, receiving an erroneous network message.

For example, when using UDP as the transport protocol, messages are retransmitted after a specified timeout. An error is returned to the client if, after a certain number of attempts, a response from the server has not been received. In the case where the TCP protocol is used, an error is returned to the client if the server terminated the TCP connection.

Call semantics

Calling a local procedure unambiguously leads to its execution, after which control returns to the main program. The situation is different when calling a remote procedure. It is impossible to establish when exactly the procedure will be executed, whether it will be performed at all, and if so, how many times? For example, if the request is received by the remote system after the server program terminates abnormally, the procedure will not be executed at all. If the client, after not receiving a response after a certain period of time (timeout), resends the request, then a situation may arise when the response is already being transmitted over the network, and the repeated request is again accepted for processing by the remote procedure. In this case, the procedure will be performed several times.

Thus, the execution of a remote procedure can be characterized by the following semantics:

- One and only one time. This behavior (in some cases the most desirable) is difficult to enforce due to potential server crashes.

- Maximum times. This means that the procedure was either not performed at all, or was performed only once. A similar statement can be made when an error is received instead of a normal response.

- At least once. The procedure was probably performed once, but more is possible. For normal operation in such a situation, the remote procedure must have the property of idempotency (from the English idemponent). This property is possessed by a procedure, the repeated execution of which does not cause cumulative changes. For example, reading a file is idempotent, but adding text to a file is not.

Data presentation

When the client and server run on the same system on the same computer, there are no data incompatibility issues. The binary data is represented in the same way for both the client and the server. In the case of a remote call, the matter is complicated by the fact that the client and the server can run on systems with different architectures with different data representations (for example, floating point value representation, byte ordering, etc.)

Most RPC implementations define some standard representation of data to which all values ​​passed in requests and responses must be converted.

For example, the format for representing data in RPC from Sun Microsystems is as follows:

Byte Order - Most Significant - Last

Floating Point Representation - IEEE

Character representation - ASCII

In terms of functionality, the RPC system is intermediate between the application layer and the transport layer. According to the OSI model, this provision corresponds to the presentation and session layers. Thus, RPC is theoretically independent from the network implementation, in particular from the transport layer network protocols.

Software implementations of the system, as a rule, support one or two protocols. For example, Sun Microsystems' RPC system supports messaging using the TCP and UDP protocols. The choice of one or another protocol depends on the requirements of the application. The choice of UDP is justified for applications with the following characteristics:

Called procedures are idempotent

The size of the arguments passed and the returned result is less than the size of the UDP packet - 8 KB.

The server provides work with several hundred clients. Since, when working with TCP protocols, the server is forced to maintain a connection with each of the active clients, this takes up a significant part of its resources. UDP is less resource intensive in this regard.

On the other hand, TCP provides efficient operation of applications with the following characteristics:

Application Requires Reliable Transfer Protocol

Called Procedures Are Not Identical

Arguments or returned result is larger than 8KB

The choice of the protocol usually remains with the client, and the system organizes the formation and transmission of messages in different ways. So, when using the TCP protocol, for which the transmitted data is a stream of bytes, it is necessary to separate the messages from each other. This can be done, for example, using the RFC1057 "RPC: Remote Procedure Call Protocol specification version 2" record labeling protocol, which precedes each message with a 32-bit integer specifying the message size in bytes.

The situation is different with the semantics of the call. For example, if RPC is performed using an unreliable transport protocol (UDP), the system retransmits the message at short intervals (timeouts). If the client application does not receive a response, then it is safe to say that the procedure has been executed zero or more times. If a response has been received, the application can conclude that the procedure has been executed at least once. With reliable transport protocol (TCP), when a response is received, the procedure can be said to have been executed once. If no response is received, it is impossible to say for sure that the procedure was not performed3.

How it works?

Essentially, the actual RPC system is built into the client program and the server program. It is gratifying that when developing distributed applications, you do not have to delve into the details of the RPC protocol or program message processing. The system assumes the existence of an appropriate development environment, which greatly facilitates the life of the creators of application software. One of the key points in RPC is that the development of a distributed application begins with the definition of an object interface - a formal description of server functions, written in a special language. Client and server stubs are then automatically generated from this interface. The only thing that needs to be done after this is to write the actual procedure code.

As an example, consider the RPC from Sun Microsystems. The system consists of three main parts:

Rpcgen (1) is an RPC compiler that generates client and server stubs as C programs based on the description of the remote procedure interface.

Library XDR (eXternal Data Representation), which contains functions for converting various types of data into a machine-independent form, allowing the exchange of information between heterogeneous systems.

A library of modules that ensure the operation of the system as a whole.

Let's look at an example of a basic distributed event logging application. The client, at startup, calls the remote procedure to write a message to the log file of the remote computer.

To do this, you will have to create at least three files: the specification of the interfaces of the log.x remote procedures (in the interface description language), the actual text of the log.c remote procedures, and the text of the client's main program main () - client.c (in the C language).

The rpcgen (l) compiler generates three files based on the log.x specification: the C client and server stub text (log clnt.c and log svc.c) and the log.h definition file used by both stubs.

So, let's look at the source code of the programs.

This file specifies the registration parameters of the remote procedure — program, version, and procedure numbers — and defines the calling interface — input arguments and return values. Thus, the RLOG procedure is defined, which takes a string as an argument (which will be written to the log), and the return value, by default, indicates the success or failure of the ordered operation.

program LOG_PROG (

version LOG_VER (

int RLOG (string) = 1;

) = 0x31234567;

The rpcgen (l) compiler generates a header file log.h where, in particular, the procedures are defined:

log.h

* Please do not edit this file.

* It was generated using rpcgen.

#ifndef _LOG_H_RPCGEN

#define _LOG_H_RPCGEN

#include

/ * Program number * /

#define LOG_PROG ((unsigned long) (0x31234567))

#define LOG_VER ((unsigned long) (1)) / * Version number * /

#define RLOG ((unsigned long) (1)) / * Routine number * /

extern int * rlog_l ();

/ * Internal procedure - we won't have to use it * / extern int log_prog_l_freeresult ();

#endif / *! _LOG_H_RPCGEN * /

Let's take a closer look at this file. The compiler translates the RLOG name specified in the interface descriptor into rlog_1, replacing uppercase characters with lowercase ones and adding the program version number with an underscore. The return type has changed from int to int *. This is the rule - RPC allows you to send and receive only the addresses of the parameters declared when describing the interface. The same rule applies to the string passed as an argument. Although this does not follow from the print.h file, in fact, the address of the line is also passed as an argument to the rlog_l () function.

In addition to the header file, the rpcgen (l) compiler generates client stub and server stub modules. Essentially, the text of these files contains all the code for the remote call.

The server stub is the head program that handles all network interaction with the client (more precisely, with its stub). To perform the operation, the server stub makes a local call to the function, the text of which must be written:

log.c

#include

#include

#include

#include "log.h"

int * rlog_1 (char ** arg)

/ * The return value must be defined as static * /

static int result;

int fd; / * Log file descriptor * /

/ * 0 open the log file (create it if it does not exist), in case of failure, return the error code result == 1. * /

if ((fd = open ("./ server .log",

O_CREAT | O_RDWR | O_APPEND))< 0) return (&result);

len = strlen (* arg);

if (write (fd, * arg, strlen (* arg))! = len)

return (& result); / * Return the result - address result * /

The client stub takes an argument passed to the remote procedure, makes the necessary transformations, issues a request to the portmap (1M) server, communicates with the remote procedure server, and finally passes the return value to the client. For the client, a remote procedure call is a stub call and is no different from a regular local call.

client.c

#include

#include "log.h"

main (int argc, char * argv)

char * server, * mystring, * clnttime;

if (argc! = 2) (

fprintf (stderr, "Call format:% s Host_address \ n",

/ * Get the client descriptor. In case of failure, we will inform you about

impossibility of establishing connection with the server * /

if ((с1 = clnt_create (server,

LOG_PROG, LOG_VER, "udp")) == NULL) (

clnt_pcreateerror (server);

/ * Allocate a buffer for the string * /

mystring = (char *) malloc (100);

/ * Determine the time of the event * /

bintime = time ((time_t *) NULL);

clnttime = ctime (& bintime);

sprintf (mystring, "% s - Client started", clnttime);

/ * Let's send a message for the log - the time when the client started working. In case of failure, we will report an error * /

if ((result = rlog_l (& mystring, cl)) == NULL) (

fprintf (stderr, "error2 \ n");

clnt_perror (cl, server);

/ * In case of failure on the remote computer, we will report an error * /

if (* result! = 0)

fprintf (stderr, "Error writing to log \ n");

/ * 0 free the descriptor * /

cint destroy (cl);

The client stub log_clnt.c is compiled with the client.c module to get the client executable.

cc -o rlog client.c log_clnt.c -Insl

The log_svc.c server stub and the log.c routine are compiled to get the server executable.

cc -o logger log_svc.c log.c -Insl

Now on some host server.nowhere.ru it is necessary to start the server process:

Then, when the rlog client is started on another machine, the server will add a corresponding entry to the log file.

The scheme of RPC operation in this case is shown in Fig. 1. Modules interact as follows:

1. When the server process starts, it creates a UDP socket and binds any local port to that socket. The server then calls the svc_register (3N) library function to register program numbers and version numbers. To do this, the function calls the portmap (IM) process and passes the required values. The portmap (IM) server is usually started during system initialization and binds to some well-known port. Portmap (3N) now knows the port number for our program and version. The server is waiting to receive the request. Note that all described actions are performed by a server stub generated by the rpcgen (IM) compiler.

2. When rlog starts, the first thing it does is call the library function clnt_create (3N), giving it the address of the remote system, the program and version numbers, and the transport protocol. The function makes a request to the portmap (IM) server of the remote system server.nowhere.m and obtains the remote port number for the log server.

3. The client calls the rlog_1 () routine defined in the client stub and transfers control to the stub. That, in turn, forms the request (converting the arguments to XDR format) in the form of a UDP packet and forwards it to the remote port received from the portmap (IM) server. Then it waits for a response for some time and, if not received, resends the request. Under favorable circumstances, the request is accepted by the logger server (server stub module). The stub determines which function was called (by the procedure number) and calls the rlog_1 () function of the log.c module. After control returns back to the stub, the stub converts the value returned by the rlog_1 () function into XDR format, and forms the response also in the form of a UDP packet. Upon receiving the response, the client stub extracts the returned value, transforms it, and returns it to the client host.




Programs communicating over a network need a communication mechanism. At the lower level, upon receipt of packets, a signal is processed by the network signal processing program. At the top level, the rendezvous mechanism, adopted in the Ada language, works. NFS uses a remote procedure call (RPC) mechanism in which the client communicates with the server (see Figure 1). In accordance with this process, the client first accesses a procedure that sends a request to the server. Upon arrival of a packet with a request, the server calls the procedure for its opening, performs the requested service, sends a response, and control is returned to the client.

The RPC interface can be thought of as having three layers:

  1. The upper level is completely transparent. A program at this level might, for example, call rnusers (), which returns the number of users on the remote machine. You don't need to know about using the RPC mechanism because you are making the call in the program.
  2. The middle tier is for the most common applications. RPC calls at this level are handled by the registerrpc () and callrpc () routines: registerrpc () receives system-wide dark code, and callrpc () executes a remote procedure call. The rnusers () call is implemented using these two routines.
  3. The lower level is used for more complex tasks, changing the default to the values ​​of procedure parameters. At this level, you can explicitly manipulate the sockets used to transmit RPC messages.

As a general rule, you should use the upper layer and avoid using the lower layers unnecessarily.

Despite the fact that in this tutorial we consider the interface only in C, calls to remote procedures can be made from any language. The work of the RPC mechanism for organizing communication between processes on different machines does not differ from its work on the same machine.

RPC (Remote Procedure Call) is an interface between remote users and specific host programs that are invoked by those users. An RPC service for a host typically provides a suite of programs to clients. Each of these programs, in turn, consists of one or more remote procedures. For example, an NFS remote filesystem service that relies on RPC calls can only consist of two programs: for example, one program interacts with high-level user interfaces and the other with low-level I / O functions.

Each RPC call involves two parties: the active client, which sends the procedure call request to the server, and the server, which sends the response to the client.

Note. Note that the terms "client" and "server" in this case refer to a specific transaction. A specific host or software (process or program) can act as both a client and a server. For example, a program that provides the operation of the remote procedure service, at the same time, can be a client in the work with the network file system.

RPC is built on a remote procedure call model similar to that of local procedure calls. When you call a local procedure, you push arguments to a specific memory location, stack, or environment variables, and transfer control of the process to a specific address. After completing the work, you read the results at a specific address and continue your process.

In the case of a remote procedure, the main difference is that the remote function call is served by two processes: the client process and the server process.

The client process sends a message to the server, which includes the parameters of the called procedure and waits for a response message with the results of its work. When a response is received, the result is read and the process continues. On the server side, the call handler process is in the waiting state, and when a message arrives, it reads the procedure parameters, executes it, sends a response, and becomes in the waiting state for the next call.

The RPC protocol does not impose any requirements on additional communications between processes and does not require synchronization of the functions performed, that is, calls can be asynchronous and non-independent, so that the client can execute other procedures while waiting for a response. The RPC server can allocate a separate process or virtual machine for each function, therefore, without waiting for the previous requests to finish, it can immediately accept the next.

However, there are several important differences between local and remote procedure calls:

  1. Error processing. The client should in any case be notified of errors that occur when calling remote procedures on the server or on the network.
  2. Global variables. Because the server does not have access to the client's address space, you cannot use hidden parameters in the form of global variables in remote procedure calls.
  3. Performance. The speed of execution of remote procedures, as a rule, is one or two orders of magnitude lower than the speed of execution of similar local procedures.
  4. Authentication. Because remote procedure calls occur over the network, client authentication mechanisms must be used.

Principles of constructing the protocol.

The RPC protocol can use several different transport protocols. The only responsibilities of the RPC protocol are to enforce standards and interpret message transmission. The reliability and reliability of message transmission is entirely ensured by the transport layer.

However, RPC can control the choice and some functions of the transport protocol. As an example of the interaction between RPC and the transport protocol, consider the procedure for assigning an RPC port for an application process via RPC - Portmapper.

This function dynamically (on demand) assigns a specific port to an RPC connection. The Portmapper feature is often used because the set of transport ports reserved for RPC is limited and the number of processes that can potentially run concurrently is very high. Portmapper, for example, is invoked when you select the communication ports of an NFS system client and server.

The Portmapper service uses the RPC broadcast message mechanism to a specific port - III. On this port, the client broadcasts a request for the port of a specific RPC service. The Portmapper service processes the tax message, determines the address of the local RPC service, and sends a response to the client. The RPC Portmapper service can work with both TCP and UDP protocols.

RPC can work with various transport protocols, but it never duplicates their functions, that is, if RPC runs on top of TCP, RPC leaves all the worries about the reliability and reliability of the connection to TCP. However, if RPC is installed on top of UDP, it can provide additional native functionality to ensure message delivery is guaranteed.

Note.

Applications can view the RPC protocol as a defined function call procedure over a Jump Subroutine Instruction (JSR) network.

For the RPC protocol to work, the following conditions must be met:

  1. Unique identification of all remotely called procedures on a given host. RPC requests contain three fields of identifiers - the number of the remote program (service), the version number of the remote program, and the number of the remote procedure of the specified program. The program number is assigned by the manufacturer of the service, the procedure number indicates the specific function of this service
  2. RPC protocol version identification. RPC messages contain an RPC protocol version field. It is used to match the formats of the transmitted parameters when the client is working with different versions of RPC.
  3. Providing mechanisms for authenticating the client to the server. The RPC protocol provides a procedure for authenticating the client in the service, and, if necessary, with each request or sending a response to the client. In addition, RPC allows various additional security mechanisms to be used.

RPC can use four types of authentication mechanisms:

  • AUTH_NULL - no authentication required
  • AUTH_UNIX - UNIX standard authentication
  • AUTH_SHORT - UNIX standard authentication with its own encoding structure
  • AUTH_DES - DES authentication
  1. Identification of response messages to corresponding requests. RPC response messages contain the ID of the request they were based on. This identifier can be called the transaction identifier of the RPC call. This mechanism is especially useful when working in asynchronous mode and when executing a sequence of several RPC calls.
  2. Identification of protocol errors. All network or server errors have unique identifiers by which each of the participants in the connection can determine the cause of the failure.

Protocol message structures

When transferring RPC messages over a transport protocol, several RPC messages can be located within one transport packet. In order to distinguish one message from another, a record marker (RM - Record Marker) is used. Each RPC message is "marked" with exactly one RM.

An RPC message can be composed of several fragments. Each chunk consists of four bytes of header and (0 to 2 ** 31-1) data. The first bit of the header indicates whether the chunk is the last, and the remaining 31 bits indicate the length of the data packet.

The structure of RPC is formally described in the language of description and representation of data formats - XDR with additions concerning the description of procedures. You could even say that the RPC markup language is an extension of XDR, supplemented by work with procedures.

The structure of the RPC package looks like this:


The reply_body structure can contain either a structure passed on in case of an error (in which case it contains the error code), or a structure for successful processing of the request (in which case it contains the returned data).

High-level programming interface.

Using subroutines in a program is the traditional way to structure a task, to make it clearer. The most frequently used subroutines are collected in libraries, where they can be used by various programs. In this case, we are talking about a local (local) call, that is, both the caller and the called objects work within the same program on the same computer.

In the case of a remote invocation, a process running on one computer starts the process on the remote computer (that is, it actually runs the procedure code on the remote computer). Obviously, a remote procedure call differs significantly from a traditional local one, but from the point of view of a programmer, there are practically no such differences, that is, the architecture of a remote procedure call allows you to simulate a local call.

However, if, in the case of a local call, the program passes parameters to the called procedure and receives the result of work through the stack or shared memory areas, then in the case of a remote call, the transfer of parameters turns into a transmission of a request over the network, and the result of the work is in the received response.

This approach is a possible basis for creating distributed applications, and although many modern systems do not use this mechanism, the basic concepts and terms in many cases remain. When describing the RPC mechanism, we will traditionally refer to the calling process as the client, and the remote process that implements the procedure as the server.

A remote procedure call includes the following steps:

  1. The client program makes a local call to a procedure called a stub. At the same time, the client "seems" that by calling the stub, it actually makes a call to the server procedure. Indeed, the client passes the required parameters to the stub, and it returns the result. However, this is not exactly how the client envisions it. The job of the stub is to accept arguments for the remote procedure, possibly convert them to some standard format, and form a network request. Packing arguments and making a network request is called marshalling.
  2. The network request is sent over the network to the remote system. To do this, the stub uses the appropriate calls, for example, those discussed in the previous sections. Note that in this case, various transport protocols can be used, and not only of the TCP / IP family.
  3. On the remote host, everything happens in reverse order. The server stub waits for the request and, on receipt, retrieves the parameters — the arguments of the procedure call. Extracting (unmarshalling) can involve necessary conversions (for example, changing the order of the bytes).
  4. The stub makes a call to the real server procedure to which the client's request is addressed, passing it the arguments received over the network.
  5. After completing the procedure, control returns to the server stub, passing the required parameters to it. Like a client stub; the server stub converts the values ​​returned by the procedure to form a network response message that is sent over the network to the system from which the request came.
  6. The operating system passes the received message to the client stub, which, after the necessary transformation, passes the values ​​(which are the values ​​returned by the remote procedure) to the client, which interprets this as a normal return from the procedure.

Thus, from the client's point of view, it makes a remote procedure call as it would for a local one. The same can be said about the server: the procedure is called in the standard way, an object (server stub) calls the local procedure and receives the values ​​returned by it. The client treats the stub as a callable server procedure, and the server interprets its own stub as the client.

Thus, stubs constitute the core of the RPC system, responsible for all aspects of the generation and transmission of messages between the client and the remote server (procedure), although both the client and the server assume that the calls are made locally. This is the basic concept of RPC - to completely hide the distributed (network) nature of communication in the stub code. The advantages of this approach are obvious: both the client and the server are independent of the network implementation, they both operate within a distributed virtual machine, and procedure calls have a standard interface.

Passing parameters

Passing value parameters is straightforward. In this case, the client stub places the parameter value in the network request, possibly performing conversions to the standard form (for example, changing the byte order). The situation with passing pointers is much more complicated, when the parameter is the address of the data, and not their value. Passing an address in a request is meaningless, since the remote procedure is executed in a completely different address space. The simplest RPC solution is to prevent clients from passing parameters otherwise than by value, although this certainly imposes serious restrictions.

Binding

Before a client can call a remote procedure, it must bind to the remote system hosting the required server. Thus, the task of linking is split into two:

  1. Finding the Remote Host with the Desired Server
  2. Finding the required server process on a given host

Various approaches can be used to find a host. A possible option is to create some kind of centralized directory in which hosts announce their servers, and where the client, if desired, can choose the host and procedure address suitable for him.

Each RPC procedure is uniquely identified by a program and procedure number. The program number defines a group of remote procedures, each of which has its own number. Each program is also assigned a version number, so that when minor changes are made to the program (for example, when a procedure is added), there is no need to change its number. Usually, several functionally similar procedures are implemented in one program module, which, when launched, becomes the server of these procedures, and which is identified by the program number.

Thus, when a client wants to call a remote procedure, he needs to know the program, version, and procedure numbers that provide the required service.

To pass the request, the client also needs to know the host's network address and port number associated with the server program providing the required procedures. This is done using the portmap (IM) daemon (called rpcbind (IM) on some systems). The daemon runs on a host that provides remote procedure services and uses a well-known port number. When a server process initializes, it registers its routines and port numbers with portmap (IM). Now, when the client needs to know the port number to call a particular procedure, it sends a request to the portmap (IM) server, which in turn either returns the port number, or redirects the request directly to the RPC server and returns a response to the client when executed. In any case, if the required procedure exists, the client receives the procedure's port number from the portmap (IM) server, and further requests can be made directly to this port.

Handling exceptions

Handling exceptions when calling local procedures is not particularly problematic. UNIX handles process errors such as division by zero, invalid memory accesses, and so on. Calling a remote procedure increases the likelihood of error situations. Added to server and stub errors are errors related to, for example, receiving an erroneous network message.

For example, when using UDP as the transport protocol, messages are retransmitted after a specified timeout. An error is returned to the client if, after a certain number of attempts, a response from the server has not been received. In the case where the TCP protocol is used, an error is returned to the client if the server terminated the TCP connection.

Call semantics

Calling a local procedure unambiguously leads to its execution, after which control returns to the main program. The situation is different when calling a remote procedure. It is impossible to establish when exactly the procedure will be executed, whether it will be performed at all, and if so, how many times? For example, if the request is received by the remote system after the server program terminates abnormally, the procedure will not be executed at all. If the client, after not receiving a response after a certain period of time (timeout), resends the request, then a situation may arise when the response is already being transmitted over the network, and the repeated request is again accepted for processing by the remote procedure. In this case, the procedure will be performed several times.

Thus, the execution of a remote procedure can be characterized by the following semantics:

  • One and only one time. This behavior (in some cases the most desirable) is difficult to enforce due to potential server crashes.
  • Maximum times. This means that the procedure was either not performed at all, or was performed only once. A similar statement can be made when an error is received instead of a normal response.
  • At least once. The procedure was probably performed once, but more is possible. For normal operation in such a situation, the remote procedure must have the property of idempotency (from the English idemponent). This property is possessed by a procedure, the repeated execution of which does not cause cumulative changes. For example, reading a file is idempotent, but adding text to a file is not.

Data presentation

When the client and server run on the same system on the same computer, there are no data incompatibility issues. The binary data is represented in the same way for both the client and the server. In the case of a remote call, the matter is complicated by the fact that the client and the server can run on systems with different architectures with different data representations (for example, floating point value representation, byte ordering, etc.)

Most RPC implementations define some standard representation of data to which all values ​​passed in requests and responses must be converted.

For example, the format for representing data in RPC from Sun Microsystems is as follows:

  1. Byte Order - Most Significant - Last
  2. Floating Point Representation - IEEE
  3. Character representation - ASCII

Net

In terms of functionality, the RPC system is intermediate between the application layer and the transport layer. According to the OSI model, this provision corresponds to the presentation and session layers. Thus, RPC is theoretically independent from the network implementation, in particular from the transport layer network protocols.

Software implementations of the system, as a rule, support one or two protocols. For example, Sun Microsystems' RPC system supports messaging using the TCP and UDP protocols. The choice of one or another protocol depends on the requirements of the application. The choice of UDP is justified for applications with the following characteristics:

  • Called procedures are idempotent
  • The size of the arguments passed and the returned result is less than the size of the UDP packet - 8 KB.
  • The server provides work with several hundred clients. Since, when working with TCP protocols, the server is forced to maintain a connection with each of the active clients, this takes up a significant part of its resources. UDP is less resource intensive in this regard.

On the other hand, TCP provides efficient operation of applications with the following characteristics:

  • Application Requires Reliable Transfer Protocol
  • Called Procedures Are Not Identical
  • Arguments or returned result is larger than 8KB

The choice of the protocol usually remains with the client, and the system organizes the formation and transmission of messages in different ways. So, when using the TCP protocol, for which the transmitted data is a stream of bytes, it is necessary to separate the messages from each other. This can be done, for example, using the RFC1057 "RPC: Remote Procedure Call Protocol specification version 2" record labeling protocol, which precedes each message with a 32-bit integer specifying the message size in bytes.

The situation is different with the semantics of the call. For example, if RPC is performed using an unreliable transport protocol (UDP), the system retransmits the message at short intervals (timeouts). If the client application does not receive a response, then it is safe to say that the procedure has been executed zero or more times. If a response has been received, the application can conclude that the procedure has been executed at least once. With reliable transport protocol (TCP), when a response is received, the procedure can be said to have been executed once. If no response is received, it is impossible to say for sure that the procedure was not performed3.

How it works?

Essentially, the actual RPC system is built into the client program and the server program. It is gratifying that when developing distributed applications, you do not have to delve into the details of the RPC protocol or program message processing. The system assumes the existence of an appropriate development environment, which greatly facilitates the life of the creators of application software. One of the key points in RPC is that the development of a distributed application begins with the definition of an object interface - a formal description of server functions, written in a special language. Client and server stubs are then automatically generated from this interface. The only thing that needs to be done after this is to write the actual procedure code.

As an example, consider the RPC from Sun Microsystems. The system consists of three main parts:

  • rpcgen (1) is an RPC compiler that generates client and server stubs as C programs based on the description of the remote procedure interface.
  • Library XDR (eXternal Data Representation), which contains functions for converting various types of data into a machine-independent form, allowing the exchange of information between heterogeneous systems.
  • A library of modules that ensure the operation of the system as a whole.

Let's look at an example of a basic distributed event logging application. The client, at startup, calls the remote procedure to write a message to the log file of the remote computer.

To do this, you will have to create at least three files: the specification of the interfaces of the log.x remote procedures (in the interface description language), the actual text of the log.c remote procedures, and the text of the client's main program main () - client.c (in the C language).

The rpcgen (l) compiler generates three files based on the log.x specification: the C client and server stub text (log clnt.c and log svc.c) and the log.h definition file used by both stubs.

So, let's look at the source code of the programs.

This file specifies the registration parameters of the remote procedure — program, version, and procedure numbers — and defines the calling interface — input arguments and return values. Thus, the RLOG procedure is defined, which takes a string as an argument (which will be written to the log), and the return value, by default, indicates the success or failure of the ordered operation.


program LOG_PROG ( version LOG_VER ( int RLOG (string) = 1; ) = 1; ) = 0x31234567;

The rpcgen (l) compiler generates a header file log.h where, in particular, the procedures are defined:


Let's take a closer look at this file. The compiler translates the RLOG name specified in the interface descriptor into rlog_1, replacing uppercase characters with lowercase ones and adding the program version number with an underscore. The return type has changed from int to int *. This is the rule - RPC allows you to send and receive only the addresses of the parameters declared when describing the interface. The same rule applies to the string passed as an argument. Although this does not follow from the print.h file, in fact, the address of the line is also passed as an argument to the rlog_l () function.

In addition to the header file, the rpcgen (l) compiler generates client stub and server stub modules. Essentially, the text of these files contains all the code for the remote call.

The server stub is the head program that handles all network interaction with the client (more precisely, with its stub). To perform the operation, the server stub makes a local call to the function, the text of which must be written:


The client stub takes an argument passed to the remote procedure, makes the necessary transformations, issues a request to the portmap (1M) server, communicates with the remote procedure server, and finally passes the return value to the client. For the client, a remote procedure call is a stub call and is no different from a regular local call.

client.c


#include #include"log.h" main(int argc, char* argv) (CLIENT * cl; char* server, * mystring, * clnttime; time_t bintime; int* result; if(argc! = 2) (fprintf (stderr, "Call format:% s Host_address \ n", argv); exit (1);) server = argv; / * Get the client descriptor. In case of failure, we will inform you about the impossibility of establishing a connection with the server * / if((c1 = clnt_create (server, LOG_PROG, LOG_VER, "udp")) == NULL) (clnt_pcreateerror (server); exit (2);) / * Allocate a buffer for the string * / mystring = ( char*) malloc (100); / * Determine the time of the event * / bintime = time ((time_t *) NULL); clnttime = ctime (& bintime); sprintf (mystring, "% s - Client started", clnttime); / * Let's send a message for the log - the time when the client started working. In case of failure, we will report an error * / if((result = rlog_l (& mystring, cl)) == NULL) (fprintf (stderr, "error2 \ n"); clnt_perror (cl, server); exit (3);) / * In case of failure on the remote computer, we will report an error * / if(* result! = 0) fprintf (stderr, "Error writing to log \ n"); / * 0 free the descriptor * / cint destroy (cl); exit (0); )

The client stub log_clnt.c is compiled with the client.c module to get the client executable.


Now on some host server.nowhere.ru it is necessary to start the server process:


$ logger

Then, when the rlog client is started on another machine, the server will add a corresponding entry to the log file.

The scheme of RPC operation in this case is shown in Fig. 1. Modules interact as follows:

  1. When the server process starts, it creates a UDP socket and binds any local port to that socket. The server then calls the svc_register (3N) library function to register program numbers and version numbers. To do this, the function calls the portmap (IM) process and passes the required values. The portmap (IM) server is usually started during system initialization and binds to some well-known port. Portmap (3N) now knows the port number for our program and version. The server is waiting to receive the request. Note that all described actions are performed by a server stub generated by the rpcgen (IM) compiler.
  2. When the rlog program starts, the first thing it does is call the library function clnt_create (3N), giving it the address of the remote system, the program and version numbers, and the transport protocol. The function makes a request to the portmap (IM) server of the remote system server.nowhere.m and obtains the remote port number for the log server.
  3. The client calls the rlog_1 () routine defined in the client stub and transfers control to the stub. That, in turn, forms the request (converting the arguments to XDR format) in the form of a UDP packet and forwards it to the remote port received from the portmap (IM) server. Then it waits for a response for some time and, if not received, resends the request. Under favorable circumstances, the request is accepted by the logger server (server stub module). The stub determines which function was called (by the procedure number) and calls the rlog_1 () function of the log.c module. After control returns back to the stub, the stub converts the value returned by the rlog_1 () function into XDR format, and forms the response also in the form of a UDP packet. After receiving the response, the client stub extracts the returned value, converts it, and returns it to the client host






2021 gtavrl.ru.