Sockets TCP and UDP echo server using select function


Socket vs Socket part 2, or say “no” to the TCP protocol - Archive WASM.RU

In the first part, devoted to the basics of using MSWindows sockets in assembly programs, we talked about what sockets are, how they are created, and what parameters are specified. At the same time, it was mentioned in passing about non-connection-oriented UDP protocol, which does not guarantee the delivery of packages, as well as the order in which they arrive at their destination. The tutorial example then used our favorite TCP protocol. And everything was fine with us, but in the end there were a number of unresolved questions, in particular, how to organize mutual exchange between several computers on the network, how to transfer something to many computers at once, etc.

Generally speaking, reading the first part is not at all necessary to understand the current one, although I will constantly refer to it along the way. So it goes. Haha...

So, we pose the problem: we have a local network of, say, a dozen computers, we need to organize the exchange of messages between any two of them, and (optional) between one and all others.

I hear, I hear a chorus of hints that say, use the built-in Windows features, type:

net send 192.168.0.4 Zhenya sends greetings to you!

net send Node4 Waiting for your answer!

There are only two objections to this. First, you never know what our operating system or other ready-made programs can do, we want to learn how to write our own programs, don’t we? And secondly, it is not a fact that the message goes from person to person. In the general case, the operator may not know anything... Or even should not know anything...

For me, the most important thing in setting this task was to ensure the ability to transfer something to all computers on the network at once. Imagine that we wrote a certain program... Who said - a Trojan? No, no and NO! No Trojans. Just a small (very) accounting program, for example. Which after some time was able to settle on many computers on our local network. And now the appointed time comes, it’s time to balance the balance, to summarize, so to speak, the results for the quarter... Everything must be done quickly and preferably at the same time. How to do this within the framework of the material that we studied in the first part remained unclear.

The answer, as always, comes from WindowsAPI. We search and find. Function sendto() – sends data to the specified address. What then is its difference from the function already studied in the first part? send() ? It turns out that sendto() can broadcast to a special IP address. But, please note, this only works for sockets of the SOCK_DGRAM type! And sockets that were opened using the SOCK_DGRAM value as a socket type parameter operate via the UDP protocol, not TCP! This makes clear the meaning of the subtitle of this article... Of course, this is just a literary device, no one protocol is better or worse than another, they are just... different, that's all. Although both are transport layer protocols that “...provide data transfer between application processes.” Both use a network layer protocol such as IP to transmit (receive) data. Through which they (the data) then get to physical layer, i.e. on the transmission Wednesday... And what kind of Wednesday it is, who knows. Maybe it’s a copper cable, or maybe it’s not Wednesday at all, but Thursday, and not a copper cable, but broadcast...

Scheme of interaction of network protocols.

UDPU ser D atagram P rotocol

TCP-T release C control P rotocol

ICMP-I internet C control M essay P rotocol (control message exchange protocol)

ARPA ddress R esolution P rotocol (address discovery protocol)

In general, if the drawing didn’t help you in any way, it doesn’t matter. It is important to understand one thing that TCP is a transport layer protocol that provides reliable transporting data between application processes by setting up a logical connection (emphasis mine). But UDP is not. And further. Somewhere there, on application level, our application will be located in one of the empty rectangles.

Let's finish the introductory part here and move on to looking at how to use it from the very beginning.

To demonstrate all the material, as usual, a training example is used, which can be downloaded< >. We skip the part common to all Windows applications and describe only what concerns the operation of sockets. First you need to initialize the Windows Sockets DLL using the function WSAStartup() , which will return zero if successful, or, otherwise, one of the error codes. Then, when initializing the main application window, open a socket to receive messages:

    invoke socket, AF_INET, \

    SOCK_DGRAM, \ ; specifies the socket type - UDP protocol!

    0 ; protocol type

    If eax != INVALID_SOCKET ; if there is no error

    mov hSocket, eax ; remember handle

After this, as usual, we need to tell Windows to send messages to the specified window from the socket we opened:

    invoke WSAAsyncSelect, hSocket, hWnd, WM_SOCKET, FD_READ

Where hSocket- socket descriptor
hWnd- handle to the window to whose procedure messages will be sent
WM_SOCKET- message, defined by us in section.const
FD_READ– a mask that specifies the events of interest to us, in this case it is the readiness of data from the socket for reading.

I hear, I hear a surprised chorus with despair in their voice: they promised a hidden application, but here is the main window and all that... The fact is that you can’t do without it, because... The operating system sends all messages to our application through its window procedure. The solution is simple. If necessary, hide this most important application window. How? For example, comment out the line:

    invoke ShowWindow, hwnd, SW_SHOWNORMAL

or, more correctly, use:

    invoke ShowWindow, hwnd, SW_HIDE

After this, our application will also start, the main window will be created, a WM_CREATE message will be sent to it from Windows with all the consequences... Only its window will not be visible either on the desktop or on the taskbar. If this is what you wanted, I'm glad. Anyway, let's continue...

To do this, we convert the port number to network byte order using a special API function:

    invoke htons, Port

    mov sin.sin_port, ax

    mov sin.sin_family, AF_INET

    mov sin.sin_addr, INADDR_ANY

A small lyrical digression, not necessary to understand the meaning of this article .

The port numbers for our sockets were discussed at the end of part one. It is difficult to give recommendations as to what they should be. The only thing that can be said is that they cannot be. It is unwise to try to use port numbers defined for widely used services such as:

via protocol TCP: 20, 21 – ftp; 23 – telnet; 25 – smtp; 80 – http; 139 - NetBIOS session service;

via protocol UDP: 53 – DNS; 137, 138 – NetBIOS; 161 – SNMP;

Of course, the API has a special function getservbyport() , which, given a port number, returns the name of the corresponding service. More precisely, the function itself returns a pointer to a structure, inside of which there is a pointer to this name...

You can call it like this:

    invoke htons, Port; convert the port number to network byte order

    invoke getservbyport, ax, 0;

Note what Win32 Programmer's Reference says about getservbyport:

“...returns a pointer to a structure that is distributed by Windows Sockets. An application should never attempt to modify this structure or any of its components. Additionally, only one copy of this structure is allocated toflow, so the application must copy any information it needs before any other call Windows features Sockets".

And here is the structure itself:

  1. s_name DWORD ?; pointer to a string with the service name

    s_aliases DWORD ?;

    s_port WORD ?; port number

    s_proto DWORD ?;

The API also has a “paired” function, so to speak: getservbyname(), which, based on the service name, returns information about the port number used.

Unfortunately, we will not be able to derive practical benefit from these functions. So, know that they exist and forget about them...

    invoke bind, hSocket, addr sin, sizeof sin

    If eax == SOCKET_ERROR; if there is an error

    invoke MessageBox, NULL, addr ...

At this point, the preparatory work on creating and configuring a receiving socket using datagrams can be considered complete. There is no need to set the socket to listen on the port using the invoke function listen, as we did for a socket of type SOCK_STREAM in the first part. Now in our application's main window procedure we can add code that will be executed when a WM_SOCKET message arrives from the socket:

    ; if a message is received from a socket (hSocket)

    Elseif uMsg == WM_SOCKET

  1. If ax == FD_READ;

  2. If ax == NULL ; no error

    ; receive data (64 bytes) from the socket into the BytRecu buffer

    invoke recv, hSocket, addr BytRecu, 64, 0;

Now let's talk about how to open a socket for sending messages. Here are all the necessary program actions:

    invoke socket, AF_INET, SOCK_DGRAM, 0

      invoke htons, Port

      mov sin_to.sin_port, ax

      mov sin_to.sin_family, AF_INET

      invoke inet_addr, addr AddressIP

      mov sin_to.sin_addr, eax

    When it comes to transferring data, all you need to do is:

      invoke sendto, hSocket1, addr BytSend1, 64, 0, \

      addr sin_to, sizeof sin_to

    The parameter values ​​when calling this API function are as follows:

    hSocket1- handle to a previously opened socket
    addrBytSend1- address of the buffer containing data for transmission
    64 - size of data in the buffer, in bytes
    0 - indicator..., in the MSDN example it’s just 0
    addrsin_to- pointer to a structure that contains the destination address
    sizeofsin_to– the size of this structure in bytes.

    If, when executing a function sendto() If no errors occurred, then it returns the number of bytes transferred, otherwise the output is SOCKET_ERROR in eax.

    Now is the time to talk about that same broadcast address that was mentioned at the beginning. In structure we pre-filled the field with the destination IP address, indicating where, in fact, to send the data. If this address is 127.0.0.1 - naturally, nowhere further own computer our data will not go away. The literature clearly states that a packet sent to a network with the address 127.x.x.x will not be transmitted on any network. Moreover, a router or gateway should never propagate routing information for network number 127 - this address is not a network address. To send a “transmission” to all computers on the local network at once, you need to use an address formed from our own IP address, but with all the ones in the low octet, something like 192.168.0.255.

    That's all, actually. When the program closes, you need to close the sockets and release the Sockets DLL resources; this is done simply:

      invoke closesocket, hSocket

      invoke closesocket, hSocket1

      invoke WSACleanup

    For multi-threaded applications after WSACleanup socket operations are completed for all threads.

    The hardest part for me in this article was deciding how best to illustrate using Windows Sockets API. You have probably already seen one approach, when both a socket for receiving and a socket for sending messages were used simultaneously in a single application. Another method seems no less attractive, when the code for one and the other is clearly separated, even what exists in different applications. In the end, I also implemented this method, which may be a little easier for beginners to understand. In the second<архиве

    Without this function send() will produce SOCKET_ERROR!

    Finally, we can note some common problems that arise when working with sockets. To handle the window message indicating that the state of the socket had changed, we used direct messages from Windows to the main application window as usual. There is another approach when creating separate windows for each socket.

    Generally speaking, centralized message processing by the main window seems like an easier-to-understand method, but can still be a hassle in practice. If a program is using more than one socket at the same time, it needs to store a list of socket descriptors. When a message from the sockets appears, the main window procedure in the list looks for information associated with that socket descriptor and sends a state change message further to the procedure intended for this. Which already reacts in one way or another, does something there... This approach forces the processing of network tasks to be integrated into the program core, which makes it difficult to create libraries of network functions. Each time these networking functions are used, additional code must be added to the application's main window handler.

    In the second method of processing messages, the application creates a hidden window to receive them. It serves to separate the application's main window procedure from processing network messages. This approach can simplify the main application and make it easier to use existing networking code in other programs. The negative side of this approach is the excessive use of Windows - user memory, because For each created window, a fairly large volume is reserved.

    Which method to choose is up to you. One more thing. While experimenting, you may need to disable your personal firewall. For example, Outpost Pro 2.1.275 in learning mode responded to an attempt to transfer to the socket, but when the transfer was manually allowed, the data still did not arrive. So much for UDP. Although this may not be the case. There were no problems with my ZoneAlarmPro 5.0.590 in the same situation.

    P.S. While finishing the second part of the article, I accidentally came across the source code of the Trojan on the Internet in our favorite MASM language. Everything compiles and runs, one thing is that the client does not want to connect to the server, and even under Windows 2000 sp4 it sometimes crashes with an error, saying that the application will be closed and all that... Personally, what I like about this Trojan is that the program does not just keep a log of clicks , or “rips out” a file with passwords and sends it by email, and has a wide range of remotely controlled functions, implemented in a very original way. If we manage to bring this whole business to life, then perhaps a third part will soon appear, devoted to a description of a specific implementation... For those who have carefully read both articles and understood the operation of the socket API functions, there is nothing complicated there. It seems... By the way, the author himself writes in the readme that he wrote it (Trojan) for educational purposes. Oh well. We will use this.

    DirectOr

Sockets

Socket is one end of a two-way communication channel between two programs running on the network. By connecting two sockets together, you can transfer data between different processes (local or remote). The socket implementation provides encapsulation of network and transport layer protocols.

Sockets were originally developed for UNIX at the University of California, Berkeley. In UNIX, the communication I/O method follows the open/read/write/close algorithm. Before a resource can be used, it must be opened with appropriate permissions and other settings. Once a resource is open, data can be read from or written to. After using a resource, the user must call the Close() method to signal the operating system that it is done with the resource.

When were features added to the UNIX operating system? Inter-Process Communication (IPC) and network exchange, the familiar input-output pattern was borrowed. All resources exposed for communication in UNIX and Windows are identified by handles. These descriptors, or handles, may point to a file, memory, or some other communication channel, but actually point to the internal data structure used operating system. The socket, being the same resource, is also represented by a descriptor. Therefore, for sockets, the life of a handle can be divided into three phases: open (create) the socket, receive from or send to the socket, and finally close the socket.

The IPC interface for communication between different processes is built on top of I/O methods. They make it easier for sockets to send and receive data. Each target is specified by a socket address, so this address can be specified in the client to establish a connection to the target.

Socket types

There are two main types of sockets - stream sockets and datagram sockets.

Stream sockets

A stream socket is a connection-based socket consisting of a stream of bytes that can be bidirectional, meaning that an application can both send and receive data through this endpoint.

A stream socket ensures error correction, handles delivery, and maintains data consistency. It can be relied upon to deliver orderly, duplicated data. A stream socket is also suitable for transferring large amounts of data, since the overhead of establishing a separate connection for each message sent may be prohibitive for small amounts of data. Stream sockets achieve this level of quality by using the protocol Transmission Control Protocol (TCP). TCP ensures that data reaches the other side in the correct sequence and without errors.

For this type of socket, the path is formed before messages are sent. This ensures that both parties involved in the interaction accept and respond. If an application sends two messages to a recipient, it is guaranteed that the messages will be received in the same sequence.

However, individual messages may be split into packets, and there is no way to determine the boundaries of records. When using TCP, this protocol takes care of breaking the transmitted data into packets of the appropriate size, sending them to the network and reassembling them on the other side. The application only knows that it sends a certain number of bytes to the TCP layer and the other side receives those bytes. In turn, TCP effectively breaks this data into appropriately sized packets, receives these packets on the other side, extracts the data from them, and combines them together.

Streams are based on explicit connections: socket A requests a connection to socket B, and socket B either accepts or rejects the connection request.

If the data must be guaranteed to be delivered to the other side or the size of the data is large, stream sockets are preferable to datagram sockets. Therefore, if reliable communication between two applications is of utmost importance, choose stream sockets.

An email server is an example of an application that must deliver content in the correct order, without duplication or omissions. The stream socket relies on TCP to ensure messages are delivered to their destinations.

Datagram sockets

Datagram sockets are sometimes called connectionless sockets, i.e., no explicit connection is established between them - the message is sent to the specified socket and, accordingly, can be received from the specified socket.

Stream sockets do provide a more reliable method than datagram sockets, but for some applications the overhead associated with establishing an explicit connection is unacceptable (for example, a time of day server providing time synchronization to its clients). After all, establishing a reliable connection to the server takes time, which simply introduces service delays and the server application's task fails. To reduce overhead, you should use datagram sockets.

The use of datagram sockets requires that the transfer of data from the client to the server be handled by User Datagram Protocol (UDP). In this protocol, some restrictions are imposed on the size of messages, and unlike stream sockets, which can reliably send messages to the destination server, datagram sockets do not provide reliability. If the data is lost somewhere on the network, the server will not report errors.

In addition to the two types discussed, there is also a generalized form of sockets, which is called unprocessed or raw.

Raw sockets

The main purpose of using raw sockets is to bypass the mechanism by which the computer handles TCP/IP. This is achieved by providing a special implementation of the TCP/IP stack that overrides the mechanism provided by the TCP/IP stack in the kernel - the packet is passed directly to the application and is therefore processed much more efficiently than when passing through the client's main protocol stack.

By definition, a raw socket is a socket that accepts packets, bypasses the TCP and UDP layers in the TCP/IP stack, and sends them directly to the application.

When using such sockets, the packet does not pass through the TCP/IP filter, i.e. is not processed in any way, and appears in its raw form. In this case, it is the responsibility of the receiving application to properly process all the data and perform actions such as stripping headers and parsing fields - like including a small TCP/IP stack in the application.

However, it is not often that you may need a program that deals with raw sockets. Unless you're writing system software or a packet sniffer-like program, you won't need to go into such detail. Raw sockets are primarily used in the development of specialized low-level protocol applications. For example, various TCP/IP utilities such as trace route, ping, or arp use raw sockets.

Working with raw sockets requires a solid knowledge of the basic TCP/UDP/IP protocols.

Ports

The port is defined to allow the problem of simultaneous interaction with multiple applications. Essentially, it expands the concept of an IP address. A computer running multiple applications at the same time receiving a packet from the network can identify the target process using the unique port number specified when the connection was established.

The socket consists of the machine's IP address and the port number used by the TCP application. Because an IP address is unique on the Internet and port numbers are unique on an individual machine, socket numbers are also unique on the entire Internet. This characteristic allows a process to communicate over the network with another process based solely on the socket number.

Port numbers are reserved for certain services - these are well-known port numbers, such as port 21, used in FTP. Your application can use any port number that has not been reserved and is not yet in use. Agency Internet Assigned Numbers Authority (IANA) maintains a list of commonly known port numbers.

Typically a client-server application using sockets consists of two different applications - a client initiating a connection to a target (server) and a server waiting for a connection from the client.

For example, on the client side, the application must know the target address and port number. By sending a connection request, the client tries to establish a connection with the server:

If events develop successfully, provided that the server is started before the client attempts to connect to it, the server agrees to the connection. Having given consent, the server application creates a new socket to interact specifically with the client that established the connection:

Now the client and server can interact with each other, reading messages each from their own socket and, accordingly, writing messages.

Working with sockets in .NET

Socket support in .NET is provided by classes in the namespace System.Net.Sockets- let's start with their brief description.

Classes for working with sockets
Class Description
MulticastOption The MulticastOption class sets the IP address value for joining or leaving an IP group.
NetworkStream The NetworkStream class implements the base stream class from which data is sent and received. This is a high-level abstraction that represents a connection to a TCP/IP communication channel.
TcpClient The TcpClient class builds on the Socket class to provide higher-level TCP services. TcpClient provides several methods for sending and receiving data over the network.
TcpListener This class also builds on the low-level Socket class. Its main purpose is server applications. It listens for incoming connection requests from clients and notifies the application of any connections.
UdpClient UDP is a connectionless protocol, hence different functionality is required to implement UDP service in .NET.
SocketException This exception is thrown when an error occurs on the socket.
Socket The last class in the System.Net.Sockets namespace is the Socket class itself. It provides the basic functionality of a socket application.

Socket class

The Socket class plays an important role in network programming, providing both client and server functionality. Primarily, calls to methods in this class perform necessary security-related checks, including checking security permissions, after which they are forwarded to the methods' counterparts in the Windows Sockets API.

Before turning to an example of using the Socket class, let's look at some important properties and methods of this class:

Properties and methods of the Socket class
Property or method Description
AddressFamily Gives the socket address family - a value from the Socket.AddressFamily enumeration.
Available Returns the amount of data available for reading.
Blocking Gets or sets a value indicating whether the socket is in blocking mode.
Connected Returns a value indicating whether the socket is connected to the remote host.
LocalEndPoint Gives the local endpoint.
ProtocolType Gives the protocol type of the socket.
RemoteEndPoint Gives the remote socket endpoint.
SocketType Gives the socket type.
Accept() Creates a new socket to handle an incoming connection request.
Bind() Binds a socket to a local endpoint to listen for incoming connection requests.
Close() Forces the socket to close.
Connect() Establishes a connection with a remote host.
GetSocketOption() Returns the SocketOption value.
IOControl() Sets low-level operating modes for the socket. This method provides low-level access to the underlying Socket class.
Listen() Places the socket in listening (waiting) mode. This method is for server applications only.
Receive() Receives data from a connected socket.
Poll() Determines the status of the socket.
Select() Checks the status of one or more sockets.
Send() Sends data to the connected socket.
SetSocketOption() Sets the socket option.
Shutdown() Disables sending and receiving operations on the socket.

Hence the “sharpening” of this protocol for working with individual documents, mainly text ones. HTTP uses the capabilities of TCP/IP in its work, so let's look at the capabilities provided by java for working with the latter.

In Java, there is a special package “java.net” for this, containing the java.net.Socket class. Socket in translation means “socket”; this name was given by analogy with the sockets on equipment, the very ones where plugs are connected. According to this analogy, you can connect two “sockets” and transfer data between them. Each nest belongs to a specific host (Host - owner, holder). Each host has a unique IP (Internet Packet) address. At the moment, the Internet operates using the IPv4 protocol, where the IP address is written in 4 numbers from 0 to 255 - for example, 127.0.0.1 (read more about the distribution of IP addresses here - RFC 790, RFC 1918, RFC 2365, read about the IPv6 version here - RFC 2373 )

The sockets are mounted on the host port (port). A port is designated by a number from 0 to 65535 and logically indicates a place where a socket can be bound. If a port on this host is already occupied by some socket, then it will no longer be possible to dock another socket there. Thus, after the socket is installed, it has a very specific address, symbolically written like this: for example - 127.0.0.1:8888 (means that the socket occupies port 8888 on host 127.0.0.1)

In order to make life easier, so as not to use an inconvenient IP address, the DNS system (DNS - Domain Name Service) was invented. The purpose of this system is to map symbolic names to IP addresses. For example, the address "127.0.0.1" on most computers is associated with the name "localhost" (in common parlance - "localhost").

Localhost, in fact, means the computer itself on which the program is running, it is also the local computer. All work with localhost does not require access to the network and communication with any other hosts.

Client socket

So, let's return to the java.net.Socket class. It is most convenient to initialize it as follows:

Public Socket(String host, int port) throws UnknownHostException, IOException In the host string constant, you can specify both the server's IP address and its DNS name. In this case, the program will automatically select a free port on the local computer and “screw” your socket there, after which an attempt will be made to contact another socket, the address of which is specified in the initialization parameters. In this case, two types of exceptions may occur: unknown host address - when there is no computer with the same name on the network, or an error that there is no connection with this socket.

It is also useful to know the function

Public void setSoTimeout(int timeout) throws SocketException This function sets the timeout for working with a socket. If during this time no actions are taken with the socket (meaning receiving and sending data), then it self-destructs. The time is set in seconds; when timeout is set to 0, the socket becomes “eternal”.

For some networks, changing the timeout is not possible or is set at certain intervals (for example, from 20 to 100 seconds). If you try to set an invalid timeout, an appropriate exception will be thrown.

The program that opens this type of socket will be considered the client, and the program that owns the socket you are trying to connect to will be called the server. In fact, by analogy with a socket-plug, the server program will be the socket, and the client is precisely the plug.

Server socket

I have just described how to establish a connection from a client to a server, now how to make a socket that will serve the server. For this purpose, there is the following class in Java: java.net.ServerSocket The most convenient initializer for it is the following:

Public ServerSocket(int port, int backlog, InetAddress bindAddr) throws IOException As you can see, an object of another class is used as the third parameter - java.net.InetAddress This class provides work with DNS and IP names, so the above initializer can be used in programs like this: ServerSocket(port, 0, InetAddress.getByName(host)) throws IOException For this type of socket, the installation port is specified directly, therefore, during initialization, an exception may occur indicating that this port is already in use or is prohibited for use by the security policy computer.

After installing the socket, the function is called

Public Socket accept() throws IOException This function causes the program to wait for the client to connect to the server socket. Once the connection is established, the function will return a Socket class object for communicating with the client.

Client-server via sockets. Example

As an example, here is a simple program that implements working with sockets.

On the client side, the program works as follows: the client connects to the server, sends data, then receives data from the server and outputs it.

From the server side it looks like this: the server sets the server socket to port 3128, and then waits for incoming connections. Having accepted a new connection, the server transfers it to a separate computing thread. In a new stream, the server receives data from the client, assigns a connection sequence number to it, and sends the data back to the client.


Logical structure of the example programs

Simple TCP/IP client program

(SampleClient.java) import java. io.* ; import java. net.* ; class SampleClient extends Thread ( public static void main(String args) ( try ( // open the socket and connect to localhost:3128 // get the server socket Socket s = new Socket("localhost" , 3128 ); // take the output stream and output the first argument there // specified during the call, the address of the open socket and its port args[ 0 ] = args[ 0 ] + "\n" + s. getInetAddress() . getHostAddress() + ":" + s. getLocalPort(); s. getOutputStream() . write(args[ 0 ] . getBytes()); // read the answer byte buf = new byte [ 64 * 1024 ]; int r = s. getInputStream() . read(buf); String data = new String(buf, 0 , r); // output the response to the console System. out. println(data); ) catch (Exception e) ( System. out. println("init error: " + e);) // output exceptions } }

Simple TCP/IP server program

(SampleServer.java) import java. io.* ; import java. net.* ; class SampleServer extends Thread ( Socket s; int num; public static void main(String args) ( try ( int i = 0 ; // connection counter // screw the socket to localhost, port 3128 ServerSocket server = new ServerSocket(3128, 0, InetAddress. getByName("localhost" )); System. out. println("server is started" ); // listen to the port while (true) ( // wait for a new connection, after which we start processing the client // into a new computational thread and increase the counter by one new SampleServer(i, server. accept()); i++ ; ) ) catch (Exception e) ( System. out. println("init error: " + e);) // output exceptions) public SampleServer(int num, Socket s) ( // copy the data this. num = num; this. s = s; // and launch a new computational thread (see function run()) setDaemon(true); setPriority(NORM_PRIORITY); start(); ) public void run() ( try ( // take a stream of incoming data from the client socket InputStream is = s. getInputStream(); // and from there - the data flow from the server to the client OutputStream os = s. getOutputStream(); // data buffer of 64 kilobytes byte buf = new byte [ 64 * 1024 ]; // read 64kb from the client, the result is the number of actually received data int r = is. read(buf); // create a string containing the information received from the client String data = new String(buf, 0 , r); // add data about the socket address: data = "" + num+ ": " + "\n" + data; // output data: os. write(data. getBytes()); // end the connection s. close(); ) catch (Exception e) ( System. out. println("init error: " + e);) // output exceptions } }

After compilation, we get the files SampleServer.class and SampleClient.class (all programs here and below are compiled using JDK v1.4) and first start the server:

Java SampleServer and then, after waiting for the message "server is started", and any number of clients: java SampleClient test1 java SampleClient test2 ... java SampleClient testN

If, during the startup of the server program, instead of the line "server is started" it produced a line like

Init error: java.net.BindException: Address already in use: JVM_Bind then this will mean that port 3128 on your computer is already occupied by some program or is prohibited for use by security policy.

Notes

Let us note an important feature of the server socket: it can accept connections from several clients at once. Theoretically, the number of simultaneous connections is unlimited, but almost everything depends on the power of the computers. By the way, this problem of the finite power of computers is used in DOS attacks on servers: they are simply bombarded with so many connections that the computers cannot cope with the load and “crash”.

In this case, using the example of SimpleServer, I show how to process several simultaneous connections at once: the socket of each new connection is sent to a separate computing thread for processing.

It is worth mentioning that the Socket - ServerSocket abstraction and work with data streams are used by C/C++, Perl, Python, and many other programming languages ​​and operating system APIs, so much of what has been said is applicable not only to the Java platform.

It's time to use Erlang for its intended purpose - to implement a network service. Most often, such services are made on the basis of a web server, on top of the HTTP protocol. But we will take the level below - TCP and UDP sockets.

I assume you already know how the network works, what Internet Protocol, User Datagram Protocol and Transmission Control Protocol are. This topic is familiar to most programmers. But if for some reason you missed it, you will have to first catch up and then return to this lesson.

UDP socket

Let's remember in general outline what is UDP:

  • short message transfer protocol (Datagram);
  • Fast shipping;
  • no persistent connection between client and server, stateless;
  • message delivery and delivery order are not guaranteed.

To work with UDP, the gen_udp module is used.

Let's launch two nodes and establish communication between them.

On the 1st node, open UDP on port 2000:

1> (ok, Socket) = gen_udp:open(2000, ). (ok,#Port<0.587>}

Calling gen_udp:open/2, we pass the port number and a list of options. The list of all possible options is quite large, but we are interested in two of them:

binary-- the socket is opened in binary mode. Alternatively, the socket can be opened in text mode by specifying the option list. The difference is how we interpret the data received from the socket - as a byte stream, or as text.

(active, true)-- the socket is open in active mode, which means that data arriving on the socket will be sent as messages to the mailbox of the thread that owns the socket. More on this below.

On the 2nd node, open UDP on port 2001:

1> (ok, Socket) = gen_udp:open(2001, ). (ok,#Port<0.587>}

And we will send a message from the 1st node to the 2nd:

2> gen_udp:send(Socket, (127,0,0,1), 2001,<<"Hello from 2000">>). ok

Calling gen_udp:send/4, we transmit the socket, the address and port of the recipient, and the message itself.

The address can be a domain name as a string or an atom, or an IPv4 address as a tuple of 4 numbers, or an IPv6 address as a tuple of 8 numbers.

On the 2nd node we will make sure that the message has arrived:

2> <0.587>,{127,0,0,1},2000,<<"Hello from 2000">>) ok

The message arrives as a tuple (udp, Socket, SenderAddress, SenderPort, Packet).

Let's send a message from the 2nd node to the 1st:

3> gen_udp:send(Socket, (127,0,0,1), 2000,<<"Hello from 2001">>). ok

On the 1st node, we will make sure that the message has arrived:

3> flush(). Shell got (udp,#Port<0.587>,{127,0,0,1},2001,<<"Hello from 2001">>) ok

As you can see, everything is simple here.

Active and passive socket mode

AND gen_udp, And gen_tcp, both have one important setting: the mode of working with incoming data. This can be either active mode (active, true), or passive mode (active, false).

In active mode, a thread receives incoming packets as messages in its mailbox. And they can be received and processed by calling receive, like any other messages.

For a udp socket these are messages like:

(udp, Socket, SenderAddress, SenderPort, Packet)

we've already seen them:

(udp,#Port<0.587>,{127,0,0,1},2001,<<"Hello from 2001">>}

For a tcp socket similar messages:

(tcp, Socket, Packet)

Active mode is easy to use, but dangerous because the client can overflow the thread's message queue, run out of memory, and crash the node. Therefore, passive mode is recommended.

In passive mode, the data must be retrieved by calls gen_udp:recv/3 And gen_tcp:recv/3:

Gen_udp:recv(Socket, Length, Timeout) -> (ok, (Address, Port, Packet)) | (error, Reason) gen_tcp:recv(Socket, Length, Timeout) -> (ok, Packet) | (error, Reason)

Here we indicate how many bytes of data we want to read from the socket. If this data is there, then we receive it immediately. If not, the call is blocked until enough data arrives. You can specify Timeout to avoid blocking the thread for a long time.

However, gen_udp:recv ignores the Length argument and returns whatever data is on the socket. Or it blocks and waits for some data if there is nothing on the socket. It is not clear why the Length argument is present in the API at all.

For gen_tcp:recv the Length argument works as expected. Unless the option is specified (packet, size), which will be discussed below.

There is still an option (active, once). In this case, the socket starts in active mode, receives the first data packet as a message, and immediately switches to passive mode.

TCP socket

Let's remember in general terms what TCP is:

  • reliable data transfer protocol guarantees message delivery and delivery order;
  • permanent connection between client and server, has a state;
  • additional overhead for establishing and closing connections and transferring data.

It should be noted that maintaining constant connections with many thousands of clients for a long time is expensive. All connections must work independently of each other, which means in different threads. For many programming languages ​​(but not Erlang) this is a serious problem.

This is why the HTTP protocol is so popular, which, although it works on top of a TCP socket, implies a short interaction time. This allows a relatively small number of threads (tens or hundreds) to serve a significantly larger number of clients (thousands, tens of thousands).

In some cases, there remains a need to have long-lived persistent connections between the client and server. For example, for chats or for multiplayer games. And here Erlang has few competitors.

To work with TCP, the gen_tcp module is used.

Working with a TCP socket is more difficult than working with a UDP socket. We now have client and server roles that require different implementations. Let's consider a server implementation option.

Module(server). -export(). start() -> start(1234). start(Port) -> spawn(?MODULE, server, ), ok. server(Port) -> io:format("start server at port ~p~n", ), (ok, ListenSocket) = gen_tcp:listen(Port, ), ) || ID<- lists:seq(1, 5)], timer:sleep(infinity), ok. accept(Id, ListenSocket) ->io:format("Socket #~p wait for client~n", ), (ok, _Socket) = gen_tcp:accept(ListenSocket), io:format("Socket #~p, session started~n", ), handle_connection (Id, ListenSocket). handle_connection(Id, ListenSocket) -> receive (tcp, Socket, Msg) -> io:format("Socket #~p got message: ~p~n", ), gen_tcp:send(Socket, Msg), handle_connection(Id , ListenSocket); (tcp_closed, _Socket) ->

There are two types of socket: Listen Socket And Accept Socket. There is only one Listen Socket, it accepts all connection requests. You need many Accept Sockets, one for each connection. The thread that creates the socket becomes the owner of the socket. If the owner thread exits, the socket is automatically closed. Therefore, we create a separate thread for each socket.

The Listen Socket must always be running, and to do this, its owner thread must not terminate. Therefore in server/1 we added a challenge timer:sleep(infinity). This will block the thread and prevent it from finishing. This implementation is, of course, educational. It would be good to provide the ability to correctly stop the server, but this is not possible here.

The Accept Socket and the thread for it could be created dynamically as clients appear. First, you can create one such thread and call gen_tcp:accept/1 and wait for the client. This call is blocking. It ends when the client appears. Then you can serve the current client in this thread, and create a new thread waiting for a new client.

But here we have a different implementation. We create a pool of several threads in advance, and they all wait for clients. After finishing work with one client, the socket is not closed, but waits for a new one. So, instead of constantly opening new sockets and closing old ones, we use a pool of long-lived sockets.

This is more effective when there are a large number of clients. Firstly, because we accept connections faster. Secondly, due to the fact that we manage sockets more carefully as a system resource.

Threads belong to an Erlang node, and we can create as many of them as we like. But sockets belong to the operating system. Their number is limited, although quite large. ( It's about about the limit on the number of file descriptors that the operating system allows a user process to open, usually 2 10 - 2 16).

Our pool size is toy-sized - 5 stream-socket pairs. In reality, we need a pool of several hundred such pairs. It would also be nice to be able to increase and decrease this pool at runtime in order to adapt to the current load.

The current session with the client is processed in the function handle_connection/2. It can be seen that the socket is in active mode, and the thread receives messages like (tcp, Socket, Msg), Where Msg-- this is binary data coming from the client. We send this data back to the client, that is, we implement a banal echo service :)

When the client closes the connection, the thread receives a message (tcp_closed, _Socket), returns back to accept/2 and is waiting for the next client.

This is what the operation of such a server with two telnet clients looks like:

$ telnet localhost 1234 Trying 127.0.0.1... Connected to localhost. Escape character is "^]". hello from client 1 hello from client 1 some message from client 1 some message from client 1 new message from client 1 new message from client 1 client 1 is going to close connection client 1 is going to close connection ^] telnet> quit Connection closed.

$ telnet localhost 1234 Trying 127.0.0.1... Connected to localhost. Escape character is "^]". hello from client 2 hello from client 2 message from client 2 message from client 2 client 2 is still active client 2 is still active but client 2 is still active but client 2 is still active and now client 2 is going to close connection and now client 2 is going to close connection ^] telnet> quit Connection closed.

2> server:start(). start server at port 1234 ok Socket #1 wait for client Socket #2 wait for client Socket #3 wait for client Socket #4 wait for client Socket #5 wait for client Socket #1, session started Socket #1 got message:<<"hello from client 1\r\n">> Socket #1 got message:<<"some message from client 1\r\n">> Socket #2, session started Socket #2 got message:<<"hello from client 2\r\n">> Socket #2 got message:<<"message from client 2\r\n">> Socket #1 got message:<<"new message from client 1\r\n">> Socket #2 got message:<<"client 2 is still active\r\n">> Socket #1 got message:<<"client 1 is going to close connection\r\n">> Socket #1, session closed Socket #1 wait for client Socket #2 got message:<<"but client 2 is still active\r\n">> Socket #2 got message:<<"and now client 2 is going to close connection\r\n">> Socket #2, session closed Socket #2 wait for client

Server in passive mode

This is all good, but good server must operate in passive mode. That is, it should receive data from the client not in the form of messages in the mailbox, but by calling gen_tcp:recv/2,3.

The nuance is that here we need to indicate how much data we want to read. How can the server know how much data the client sent it? Well, apparently, the client himself must say how much data he is going to send. To do this, the client first sends a small service packet, in which it indicates the size of its data, and then sends the data itself.

Now we need to decide how many bytes this service packet should occupy. If it is 1 byte, then you cannot pack a number larger than 255 into it. You can pack the number 65535 into 2 bytes, and 4294967295 into 4 bytes. 1 byte is obviously not enough. It is likely that the client will need to send more than 255 bytes of data. A 2 byte header is fine. A 4-byte header is sometimes needed.

So, the client sends a 2-byte service packet indicating how much data will follow it, and then the data itself:

Msg =<<"Hello">>, Size = byte_size(Msg), Header =<>, gen_tcp:send(Socket,<

>),

Full client code:

Module(client2). -export(). start() -> start("localhost", 1234). start(Host, Port) -> spawn(?MODULE, client, ). send(Pid, Msg) -> Pid ! (send, Msg), ok. stop(Pid) -> Pid ! stop, ok. client(Host, Port) -> io:format("Client ~p connects to ~p:~p~n", ), (ok, Socket) = gen_tcp:connect(Host, Port, ), loop(Socket). loop(Socket) -> receive (send, Msg) -> io:format("Client ~p send ~p~n", ), Size = byte_size(Msg), Header =<>, gen_tcp:send(Socket,<

>), loop(Socket); (tcp, Socket, Msg) -> io:format("Client ~p got message: ~p~n", ), loop(Socket); stop -> io:format("Client ~p closes connection and stops~n", ), gen_tcp:close(Socket) after 200 -> loop(Socket) end.

The server first reads 2 bytes, determines the size of the data, and then reads all the data:

(ok, Header) = gen_tcp:recv(Socket, 2),<> = Header, (ok, Msg) = gen_tcp:recv(Socket, Size),

In the server code the functions start/0 And start/1 have not changed, the rest has changed a little:

Server(Port) -> io:format("start server at port ~p~n", ), (ok, ListenSocket) = gen_tcp:listen(Port, ), ) || ID<- lists:seq(1, 5)], timer:sleep(infinity), ok. accept(Id, ListenSocket) ->io:format("Socket #~p wait for client~n", ), (ok, Socket) = gen_tcp:accept(ListenSocket), io:format("Socket #~p, session started~n", ), handle_connection (Id, ListenSocket, Socket). handle_connection(Id, ListenSocket, Socket) -> case gen_tcp:recv(Socket, 2) of (ok, Header) -><> = Header, (ok, Msg) = gen_tcp:recv(Socket, Size), io:format("Socket #~p got message: ~p~n", ), gen_tcp:send(Socket, Msg), handle_connection( Id, ListenSocket, Socket); (error, closed) -> io:format("Socket #~p, session closed ~n", ), accept(Id, ListenSocket) end.

An example of a session from the client side:

2> Pid = client2:start(). Client<0.40.0>connects to "localhost":1234<0.40.0>3> client2:send(Pid,<<"Hello">>). Client<0.40.0>send<<"Hello">> ok Client<0.40.0>got message:<<"Hello">> 4> client2:send(Pid,<<"Hello again">>). Client<0.40.0>send<<"Hello again">> ok Client<0.40.0>got message:<<"Hello again">> 5> client2:stop(Pid). Client<0.40.0>closes connection and stops ok

And from the server side:

2> server2:start(). start server at port 1234 ok Socket #1 wait for client Socket #2 wait for client Socket #3 wait for client Socket #4 wait for client Socket #5 wait for client Socket #1, session started Socket #1 got message:<<"Hello">> Socket #1 got message:<<"Hello again">> Socket #1, session closed Socket #1 wait for client

This is all well and good, but there's really no need to manually deal with the header package. This has already been implemented in gen_tcp. You need to specify the size of the service packet in the settings when opening a socket on the client side:

(ok, Socket) = gen_tcp:connect(Host, Port, ),

and on the server side:

(ok, ListenSocket) = gen_tcp:listen(Port, ),

and the need to form and parse these headers yourself disappears.

On the client side, sending is simplified:

Gen_tcp:send(Socket, Msg),

and on the server side it makes it easier to get:

Handle_connection(Id, ListenSocket, Socket) -> case gen_tcp:recv(Socket, 0) of (ok, Msg) -> io:format("Socket #~p got message: ~p~n", ), gen_tcp:send (Socket, Msg), handle_connection(Id, ListenSocket, Socket); (error, closed) -> io:format("Socket #~p, session closed ~n", ), accept(Id, ListenSocket) end.

Now when calling gen_tcp:recv/2 we specify Length = 0. gen_tcp it knows how many bytes need to be read from the socket.

Working with text protocols

In addition to the service header option, there is another approach. You can read from the socket one byte at a time until a special byte is encountered, symbolizing the end of the packet. This can be a null byte, or a newline character.

This option is typical for text protocols (SMTP, POP3, FTP).

There is no need to write your own implementation of reading from a socket, everything is already implemented in gen_tcp. You just need to specify in the socket settings instead (packet, 2) option (packet, line).

(ok, ListenSocket) = gen_tcp:listen(Port, ),

Otherwise, the server code remains unchanged. But now we can return to the telnet client again.

$ telnet localhost 1234 Trying 127.0.0.1... Connected to localhost. Escape character is "^]". hello hello hello again hello again ^] telnet> quit Connection closed.

We will need a TCP server, a text protocol and a telnet client in our course work.

Applications that use TCP and UDP are fundamentally different because UDP is an unreliable, connectionless datagram protocol and is fundamentally different from the connection-oriented, byte-stream-reliable transfer of TCP. However, there are cases where it makes sense to use UDP instead of TCP. We consider such cases in Section 22.4. Some popular apps built using UDP, such as DNS (Domain Name System), NFS (network file system- Network File System) and SNMP (Simple Network Management Protocol).

In Fig. Figure 8.1 shows function calls for a typical UDP client-server scheme. The client does not establish a connection to the server. Instead, the client simply sends a datagram to the server using the sendto function (described in the next section), which takes the recipient (server) address as an argument. Likewise, the server does not establish a connection with the client. Instead, the server just calls the recvfrom function, which waits for data to arrive from some client. The recvfrom function returns the client address (for a given protocol) along with the datagram, so that the server can send a response to the exact client that sent the datagram.

Rice. 8.1. Socket functions for the UDP client-server model

Figure 8.1 illustrates the timing diagram of a typical UDP datagram exchange scenario between a client and server. We can compare this example with the typical TCP exchange shown in Figure. 4.1.

In this chapter, we'll describe the new functions used with UDP sockets, recvfrom and sendto, and rework our client-server model to use UDP. We'll also look at using the connect function with a UDP socket and the concept of asynchronous errors.

8.2. recvfrom and sendto functions

These two functions are similar to the standard read and write functions, but require three additional arguments.

ssize_t recvfrom(int sockfd , void * buff , size_t nbytes , int flags ,

struct sockaddr * from , socklen_t * addrlen);

ssize_t sendto(int sockfd, const void * buff, size_t nbytes, int flags,

const struct sockaddr * to , socklen_t addrlen);

Both functions return the number of bytes written or read on success, -1 on error.

The first three arguments, sockfd , buff , and nbytes , are identical to the first three arguments of the read and write functions: a handle, a pointer to the buffer to read from or write to, and the number of bytes to read or write.

We'll cover the flags argument in Chapter 14, where we look at the recv, send, recvmsg, and sendmsg functions, since we don't need them in our simple example right now. For now we will always set the flags argument to zero.

The to argument to the sendto function is a socket address structure containing the protocol address (such as IP address and port number) of the destination. The size of this socket address structure is specified by the addrlen argument. The recvform function fills the socket address structure pointed to by the from argument with the protocol address of the datagram's sender. The number of bytes stored in the socket address structure is also returned to the calling process as the integer pointed to by the addrlen argument. Note that the last argument to sendto is an integer value, while the last argument to recvfrom is a pointer to an integer value (a value-result argument).

The last two arguments to the recvfrom function are the same as the last two arguments to the accept function: the contents of the socket address structure upon completion tell us who sent the datagram (in the case of UDP) or who initiated the connection (in the case of TCP). The last two arguments to the sendto function are similar to the last two arguments to the connect function: we fill the socket address structure with the protocol address of the datagram recipient (in the case of UDP) or the address of the host with which the connection will be established (in the case of TCP).

Both functions return as the function value the length of the data that was read or written. In a typical use of the recvfrom function with a datagram protocol, the return value is the amount of user data in the received datagram.

The datagram can have zero length. For UDP, this returns an IP datagram containing an IP header (typically 20 bytes for IPv4 or 40 bytes for IPv6), an 8-byte UDP header, and no data. This also means that a null return from recvfrom is acceptable for the datagram protocol: it does not indicate that the other party has closed the connection, as does a null return from read on a TCP socket. Since the UDP protocol is not connection-oriented, there is no such event as connection closure.

If the from argument to recvfrom is a null pointer, then the corresponding length argument (addrlen) must also be a null pointer, meaning that we are not interested in the address of the sender of the data.

Both the recvfrom and sendto functions can be used with TCP, although they are usually not necessary.

8.3. UDP echo server: main function

We will now rework our simple client-server model from Chapter 5 using UDP. A diagram of function calls in our UDP client and server programs is shown in Fig. 8.1. In Fig. 8.2 shows the functions used. Listing 8.1 shows the main server function.

Rice. 8.2. Simple model client-server using UDP

Listing 8.1. UDP echo server

//udpcliserv/udpserv01.с

1 #include "unp.h"

3 intmain(int argc, char **argv)

6 struct sockaddr_in servaddr, cliaddr;

7 sockfd = Socket(AF_INET, SOCK_DGRAM, 0);

8 bzero(&servaddr, sizeof(servaddr));

9 servaddr.sin_family = AF_INET;

10 servaddr.sin_addr.s_addr = htonl(INADDR_ANY);

12 Bind(sockfd, (SA*)&servaddr, sizeof(servaddr));

13 dg_echo(sodkfd, (SA*)&cliaddr, sizeof(cliaddr));

Create a UDP socket, bind to a known port using the bind function

7-12 We create a UDP socket by specifying SOCK_DGRAM (an IPv4 datagram socket) as the second argument of the socket function. As in the TCP server example, the IPv4 address for the bind function is specified as INADDR_ANY , and the server's known port number is the SERV_PORT constant from the unp.h header.

13 The dg_echo function is then called to process the client request by the server.

8.4. UDP echo server: dg_echo function

Listing 8.2 shows the dg_echo function.

Listing 8.2. dg_echo function: echoing strings on a datagram socket

1 #include "unp.h"

3 dg_echo(int sockfd, SA *pcliaddr, socklen_t clilen)

6 socklen_t len;

7 char mesg;

10 n = Recvfrom(sockfd, mesg, MAXLINE, 0, pcliaddr, &len);

11 Sendto(sockfd, mesg, n, 0, pcliaddr, len);

Reading datagram, reflecting to sender

8-12 This function is a simple loop in which the next datagram arriving at the server port is read by the recvfrom function and sent back using the sendto function.

Despite the simplicity of this function, there are a number of important details to consider. Firstly, this function never completes. Because UDP is a connectionless protocol, there is no equivalent of the end-of-file flag used in TCP.

Secondly, this function allows you to create a serial server, rather than a parallel one, which we received in the case of TCP. Since there is no fork function call, one server process handles all client processing. In general, most TCP servers are parallel, and most UDP servers are serial.

For a socket at the UDP level, datagrams are implicitly buffered in the form of a queue. Indeed, every UDP socket has a receive buffer, and every datagram arriving on that socket is placed in its receive buffer. When a process calls the recvfrom function, the next datagram from the buffer is returned to the process in FIFO (First In, First Out) order. Thus, if many datagrams arrive on a socket before the process can read the data already queued for the socket, then the incoming datagrams are simply added to the socket's receive buffer. But this buffer has a limited size. We discussed this size and how to increase it using the SO_RCVBUF socket option in Section 7.5.

In Fig. Figure 8.3 shows a generalization of our TCP client-server model from Chapter 5, where two clients establish connections to a server.

Rice. 8.3. Generalization of the TCP client-server model with two clients

There are two attached sockets here, and each of the attached sockets on the server node has its own receive buffer. In Fig. Figure 8.4 shows a case where two clients are sending datagrams to a UDP server.

Rice. 8.4. Generalization of the UDP client-server model with two clients

There is only one server process, and it has one socket on which the server receives all incoming datagrams and from which it sends all responses. This socket has a receive buffer into which all incoming datagrams are placed.

The main function in Listing 8.1 is protocol-dependent (it creates an AF_INET family socket and then allocates and initializes an IPv4 socket address structure), but the dg_echo function is protocol-independent. The reason the dg_echo function is protocol independent is that the calling process (in our case the main function) must allocate a socket address structure of the correct size in memory, and a pointer to this structure along with its size is passed as arguments to the dg_echo function . The dg_echo function never digs into this structure: it simply passes a pointer to it to the recvfrom and sendto functions. The recvfrom function fills this structure with the client's IP address and port number, and since the same pointer (pcliaddr) is then passed to the sendto function as the destination address, the datagram is thus reflected back to the client that sent the datagram.

8.5. UDP echo client: main function

The UDP client's main function is shown in Listing 8.3.

Listing 8.3. UDP echo client

//udpcliserv/udpcli01.c

1 #include "unp.h"

3 main(int argc, char **argv)

6 struct sockaddr_in servaddr;

7 if (argc != 2)

8 err_quit("usage: udpcli ");

9 bzero(&servaddr, sizeof(servaddr));

10 servaddr.sin_family = AF_INET;

11 servaddr.sin_port = htons(SERV_PORT);

12 Inet_pton(AF_INET, argv, &servaddr.sin_addr);

13 sockfd = Socket(AF_INET, SOCK_DGRAM, 0);

14 dg_cli(stdin, sockfd, (SA*)&servaddr, sizeof(servaddr));

Filling the socket address structure with the server address

9-12 The IPv4 socket address structure is filled with the IP address and port number of the server. This structure will be passed to the dg_cli function. It determines where to send datagrams.

13-14 A UDP socket is created and the dg_cli function is called.

8.6. UDP echo client: dg_cli function

Listing 8.4 shows the dg_cli function, which does most of the work on the client side.

Listing 8.4. Function dg_cli: client loop

1 #include "unp.h"

7 while (Fgets(sendline, MAXLINE, fp) != NULL) (

8 Sendto(sockfd, sendline, strlen(sendline), 0, pservaddr, servlen);

9 n = Recvfrom(sockfd, recvline, MAXLINE, 0, NULL, NULL);

10 recvline[n] = 0; /* null terminating */

11 Fputs(recvline, stdout);

7-12 There are four steps in the client-side processing loop: reading a string from standard input using fgets, sending the string to the server using sendto, reading the server's reflected response using recvfrom, and putting the reflected string to standard output using fputs functions.

Our client did not ask the kernel to assign a dynamically assigned port to its socket (whereas the TCP client did when calling connect). For a UDP socket, the first time sendto is called, the kernel selects a dynamically assigned port if no local port has already been associated with that socket. As with TCP, a client can call bind explicitly, but this is rarely done.

Note that when calling the recvfrom function, null pointers are specified as the fifth and sixth arguments. This tells the kernel that we are not interested in knowing who sent the response. There is a risk that any process, whether on the same node or on any other, can send a datagram to the client's IP address and port, which will be read by the client, assuming that it is a response from the server. We will consider this situation in section 8.8.

As with the dg_echo server function, the dg_cli client function is protocol independent, but the client main function is protocol dependent. The main function allocates and initializes a socket address structure of a particular protocol type, and then passes the dg_cli function a pointer to the structure along with its size.

8.7. Lost Datagrams

The UDP client and server in our example are unreliable. If the client's datagram is lost (let's say it is ignored by some router between the client and the server), the client will forever be blocked in its call to the recvfrom function inside the dg_cli function, waiting for a response from the server that will never come. Likewise, if a client datagram arrives at the server but the server's response is lost, the client will be permanently blocked in its call to the recvfrom function. The only way to prevent this situation is to put a timeout in the client's recvfrom function call. We'll look at this in section 14.2.

Simply putting a timeout in the recvfrom function call is not a complete solution. For example, if the specified timeout expires and no response is received, we cannot say for sure what is wrong - either our datagram did not reach the server, or the server's response did not come back. If the client's request contained a request like "transfer a certain amount of money from account A to account B" (as opposed to the case with our simple echo server), then there would be a big difference between losing the request and losing the response. We'll talk more about adding reliability to the UDP client-server model in Section 22.5.

8.8. Checking the received response

At the end of section 8.6, we mentioned that any process that knows the client's dynamically assigned port number can send datagrams to our client, and they will be mixed in with the server's normal responses. All we can do is modify the recvfrom function call in Listing 8.4 to return the IP address and port of the sender of the response, and ignore any datagrams that come from a server other than the one to which we are sending the datagram. However, there are several pitfalls here, as we will see.

First, we modify the main client function (see Listing 8.3) to work with the standard echo server (see Table 2.1). We simply replace the assignment

servaddr.sin_port = htons(SERV_PORT);

assignment

servaddr.sin_port = htons(7);

Now we can use any node running a standard echo server with our client.

We then rewrite the dg_cli function to allocate a different socket address structure in memory to hold the structure returned by recvfrom . We show it in Listing 8.5.

Listing 8.5. A version of the dg_cli function that checks the returned socket address

//udpcliserv/dgcliaddr.c

1 #include "unp.h"

3 dg_cli(FILE *fp, int sockfd, const SA *pservaddr, socklen_t servlen)

6 char sendline, recvline;

7 socklen_t len;

8 struct sockaddr *preply_addr;

9 preply_addr = Malloc(servlen);

10 while (Fgets(sendline, MAXLINE, fp) != NULL) (

11 Sendto(sockfd, sendline, strlen(sendline), 0, pservaddr, servlen);

12 len = servlen;

13 n = Recvfrom(sockfd, recvline, MAXLINE, 0, preply_addr, &len);

14 if (len != servlen || memcmp(pservaddr, preply_addr, len) != 0) (

15 printf("reply from %s (ignored)\n",

18 recvline[n] = 0; /* null terminating */

19 Fputs(recvline, stdout);

Placing a different socket address structure in memory

9 We allocate another socket address structure in memory using malloc functions. Note that the dg_cli function is still protocol independent. Since we don't care what type of socket address structure we're dealing with, we only use its size in the malloc function call.

Comparison of returned addresses

12-13 In the recvfrom function call, we tell the kernel to return the source address of the datagram. We first compare the length returned by the recvfrom function as a value-result argument, and then compare the socket address structures themselves using the memcmp function.

The new version of our client works great if the server is on a host with a single IP address. But this program may not work if the server has several network interfaces (multihomed server). We run this program by accessing the freebsd4 node, which has two interfaces and two IP addresses:

macosx% host freebsd4

freebsd4.unpbook.com has address 172.24.37.94

freebsd4.unpbook.com has address 135.197.17.100

macosx% udpcli02 135.197.17.100

reply from 172.24.37.94:7 (ignored)

According to Fig. 1.7 you can see that we set an IP address from a different subnet. This is usually acceptable. Most IP implementations accept an incoming IP datagram destined for any of the host's IP addresses, regardless of the interface on which it arrives. RFC 1122 calls this the weak end system model. If a system were to implement what this document calls a strong end system model, it accepts an incoming datagram only if the datagram arrives on the interface to which it is destined.

The IP address returned by the recvfrom function (the source IP address of the UDP datagram) is not the IP address to which we sent the datagram. When the server sends its response, the recipient's IP address is 172.24.37.94. The kernel routing function on node freebsd4 selects 172.24.37.94 as the outgoing interface. Since the server has not associated an IP address with its socket (the server has associated a generic address with its socket, which we can verify by running the netstat program on the freebsd4 node), the kernel selects the source address of the IP datagram. This address becomes the primary IP address of the outgoing interface. If we send the datagram to something other than the primary IP address of the interface (that is, to an alternate name, an alias), then our test shown in Listing 8.5 will also fail.

One solution would be for the client to check the domain name of the responding host instead of its IP address. To do this, the server name is looked up in DNS (see Chapter 11) based on the IP address returned by the recvfrom function. Another solution is to have the UDP server create one socket for each IP address configured on the host, bind that IP address to the socket, call select on each of all those sockets (waiting for some either of them will become ready for reading) and then responded from the socket ready for reading. Since the socket used for the response is associated with the IP address that was the destination address of the client request (otherwise the datagram would not have been delivered to the socket), we can be sure that the sender of the response and the recipient of the request address are the same. We show these examples in Section 22.6.

NOTE

On a Solaris system with multiple network interfaces, the source IP address of the server response is the recipient IP address of the client request. The scenario described in this section applies to Berkeley-derived implementations that select a source IP address based on the outgoing interface.

8.9. Starting the client without starting the server

The next scenario we'll look at is starting the client without starting the server. If we do this and enter one line on the client side, nothing will happen. The client is forever blocked in its recvfrom function call, waiting for a server response that never comes. But that doesn't matter in this example because we're now looking to gain a deeper understanding of the protocols and what's going on with our network application.

We first run tcpdump on the macosx host, and then run the client on the same host, setting the server host to freebsd4. Then we enter one line, but this line is not reflected by the server.

macosx% udpcli01 172.24.37.94

hello, world we enter this line,

but we get nothing in response

Listing 8.6 shows the output of tcpdump.

Listing 8.6. tcpdump output when the server process is not running on the server node

01 0.0 arp who-has freebsd4 tell macosx

02 0.003576 (0.0036) arp reply freebsd4 is-at 0:40:5:42:d6:de

03 0.003601 (0.0000) macosx.51139 > freebsd4.9877: udp 13

04 0.009781 (0.0062) freebsd4 >

The first thing we notice is that the ARP request and response are received before the client host can send a UDP datagram to the server host. (We left this exchange in the program output to re-emphasize that an ARP request is always sent and a response is received before an IP datagram is sent.)

On line 3 we see that the client datagram is sent, but the server host responds on line 4 with an ICMP port unreachable message. (A length of 13 includes 12 characters plus a newline.) However, this ICMP error is not returned to the client process for reasons we'll briefly list below. Instead, the client is permanently blocked in the recvfrom function call in Listing 8.4. We also note that ICMPv6 has a "Port Unreachable" error similar to ICMPv4 (see Tables A.5 and A.6), so the results presented here are similar to those for IPv6.

This ICMP error is an asynchronous error. The error was caused by the sendto function, but the sendto function completed normally. Recall from Section 2.9 that a normal return from a UDP output operation means only that the datagram has been added to the link-layer output queue. The ICMP error is not returned until a certain amount of time has passed (4 ms for Listing 8.6), which is why it is called asynchronous.

The basic rule is that asynchronous errors are not returned for a UDP socket unless the socket has been attached. We show how to call the connect function on a UDP socket in Section 8.11. Not everyone understands why this decision was made when sockets were first implemented. (Implementation considerations are discussed on pages 748-749.) Consider a UDP client sequentially sending three datagrams to three different servers (that is, three different IP addresses) over a single UDP socket. The client enters a loop that calls the recvfrom function to read the responses. Two datagrams are delivered correctly (that is, the server was running on two of the three nodes), but the third node was not running the server, and the third node responds with an ICMP port unreachable message. This ICMP error message contains the IP header and UDP header of the datagram that caused the error. (ICMPv4 and ICMPv6 error messages always contain an IP header and all or part of a UDP header to allow the message recipient to determine which socket caused the error. This is shown in Figures 28.5 and 28.6.) A client that sent three datagrams should know the recipient of the datagram that caused the error to determine exactly which of the three datagrams caused the error. But how can the kernel communicate this information to the process? The only thing recvfrom can return is the value of the errno variable. But the recvfrom function cannot return the IP address and port number of the recipient of the UDP datagram in error. Consequently, it has been decided that these asynchronous errors are returned to a process only if the process has attached a UDP socket to only one specific peer.

NOTE

Linux returns most ICMP port unreachable errors, even for an unattached socket, unless the SO_DSBCOMPAT socket option is enabled. All recipient unreachable errors shown in Table 1 are returned. A.5, with the exception of errors with codes 0, 1, 4, 5, 11 and 12.

We'll return to the issue of asynchronous errors with UDP sockets in Section 28.7 and show a simple way to get these errors on an unattached socket using our own daemon.

8.10. Final example of a UDP client-server

In Fig. In Figure 8.5, the large black dots show the four values ​​that must be set or selected when a client sends a UDP datagram.

Rice. 8.5. Summarizing the UDP client-server model from the client's point of view

The client must specify the server's IP address and port number to call the sendto function. Typically the client's IP address and port number are automatically chosen by the kernel, although we noted that the client can call the bind function. We also noted that if these two values ​​are chosen for a client by the kernel, then the client's dynamically assigned port is chosen once, the first time sendto is called, and is never changed again. However, the client's IP address may change for each UDP datagram the client sends, assuming the client does not bind a specific IP address to the socket using the bind function. The reason is explained in Fig. 8.5: If the client node has several network interfaces, the client can switch between them (in Fig. 8.5, one address refers to the link layer shown on the left, the other refers to the one shown on the right). In the worst case of this scenario, the client IP address, chosen by the kernel based on the outgoing link layer, would change for each datagram.

What happens if a client binds an IP address to its socket, but the kernel decides that the outgoing datagram should be sent from some other link layer? In this case, the IP datagram will contain a source IP address that is different from the outgoing link layer IP address (see Exercise 8.6).

In Fig. Figure 8.6 shows the same four values, but from the server's point of view.

Rice. 8.6. Summarizing the UDP client-server model from the server's point of view

The server can find out by at least four parameters for each datagram received: source IP address, destination IP address, source port number, and destination port number. The calls that return this information to TCP and UDP servers are shown in Table. 8.1.

Table 8.1. Information, available to the server from an incoming IP datagram

The TCP server always has easy access to all four pieces of information for a connected socket, and these four values ​​remain constant for the lifetime of the connection. However, in the case of a UDP connection, the destination IP address can only be obtained by setting the socket option IP_RECVDSTADDR for IPv4 or IPV6_PKTINFO for IPv6 and then calling the recvmsg function instead of the recvfrom function. Because UDP is connectionless, the destination IP address can change for each datagram sent to the server. A UDP server can also receive datagrams destined for one of the host's broadcast addresses or a multicast address, which we discuss in Chapters 20 and 21. We'll show how to determine the destination address of a UDP datagram in Section 20.2 after we describe the recvmsg function.

8.11. connect function for UDP
NOTE
NOTE
NOTE

Table 8.2

NOTE

Rice. 8.7. UDP attached socket

Rice. 8.8

Calling connect multiple times on a UDP socket

A process with a connected UDP socket can again call the connect function on that socket to:

c – set a new IP address and port;

c – disconnect the socket.

The first case, specifying a new peer for a connected UDP socket, differs from using the connect function with a TCP socket: for a TCP socket, the connect function can only be called once.

To disconnect a UDP socket, we call the connect function, but set the family element of the socket address structure (sin_family for IPv4 or sin6_family for IPv6) to AF_UNSPEC . This may result in an EAFNOSUPPORT error, but this is normal. It is the process of calling the connect function on an already connected UDP socket that allows the socket to be disconnected.

NOTE

The BSD manual for the connect function traditionally stated: "Datagram sockets can break connections by connecting to invalid addresses, such as empty addresses." Unfortunately, neither manual says what constitutes an "empty address", nor does it mention that an error is returned as a result (which is normal). The POSIX standard explicitly states that the address family must be set to AF_UNSPEC, but then states that this call to connect may or may not return an EAFNOSUPPORT error.

Performance

When an application calls the sendto function on an unattached UDP socket, Berkeley-derived kernel implementations temporarily connect to the socket, send a datagram, and then disconnect from the socket. Thus, calling the sendto function to send two datagrams sequentially on an unattached socket involves the following six steps performed by the kernel:

c – attaching a socket;

c – output of the first datagram;

c – disconnecting the socket;

c – attaching a socket;

c – output of the second datagram;

c – disconnecting the socket.

NOTE

Another point to consider is the number of lookups in the routing table. The first temporary connection looks up the destination IP address in the routing table and stores (caches) this information. The second temporary connection notes that the recipient address matches the cached address from the routing table (we assume that both sendto functions are given the same recipient) and does not need to look up the routing table again.

When an application knows that it will send multiple datagrams to the same peer, it is more efficient to explicitly attach the socket. A call to connect followed by two calls to write will now involve the following steps performed by the kernel:

c – attaching a socket;

c – output of the first datagram;

c – output of the second datagram.

In this case, the kernel copies the socket address structure containing the destination IP address and port only once, and when sendingto is called twice, the copying is done twice. B notes that temporarily reattaching a disconnected UDP socket accounts for approximately one-third of the cost of each UDP transfer.

8.12. dg_cli function (continued)

Let's go back to the dg_cli function shown in Listing 8.4 and rewrite it to call the connect function. Listing 8.7 shows new feature.

Listing 8.7. The dg_cli function calling the connect function

//udpcliserv/dgcliconnect.c

1 #include "unp.h"

3 dg_cli(FILE *fp, int sockfd, const SA *pservaddr, socklen_t servlen)

6 char sendline, recvline;

7 Connect(sockfd, (SA*)pservaddr, servlen);

8 while (Fgets(sendline, MAXLINE, fp) != NULL) (

9 Write(sockfd, sendline, strlen(sendline));

10 n = Read(sockfd, recvline, MAXLINE);

11 recvline[n] = 0; /* null terminating */

12 Fputs(recvline, stdout);

Changes compared to previous version- this is adding a call to the connect function and replacing calls to the sendto and recvfrom functions with calls to the write and read functions. The dg_cli function remains protocol independent because it does not delve into the socket address structure passed to the connect function. Our client's main function, shown in Listing 8.3, remains the same.

If we run the program on a macosx host, specifying the IP address of the freebsd4 host (which does not run our server on port 9877), we get the following output:

macosx% udpcli04 172.24.37.94

hello, world

read error: Connection refused

The first thing we notice is that we don't get an error when we start the client process. The error occurs only after we send the first datagram to the server. It is the sending of this datagram that causes an ICMP error from the server host. But when a TCP client calls connect , specifying a server node that is not running a server process, connect returns an error because the call to connect causes the first packet of the three-way to be sent TCP handshakes, and it is this packet that causes the RST segment to be received from the peer (see section 4.3).

Listing 8.8 shows the output of tcpdump.

Listing 8.8. Output from tcpdump when running dg_cli function

macosx% tcpdump

01 0.0 macosx.51139 > freebsd4 9877:udp 13

02 0.006180 (0.0062) freebsd4 > macosx: icmp: freebsd4 udp port 9877 unreachable

In table A.5 we also see that the kernel associates an ICMP error with an ECONNREFUSED error, which corresponds to the output of the Connection refused message string by the err_sys function.

NOTE

Unfortunately, not all kernels return ICMP messages to the attached UDP socket, as we showed in this section. Typically, Berkeley-derived kernels return this error, but System V kernels do not. For example, if we run the same client on a Solaris 2.4 host and use the connect function to connect to a host that is not running our server, then using tcpdump we can verify that the ICMP port unreachable error is returned by the server host, but caused by the client read function never completes. This situation has been corrected in Solaris 2.5. UnixWare does not return an error, while AIX, Digital Unix, HP-UX and Linux do.

8.13. Lack of flow control in UDP

Listing 8.9

//udpcliserv/dgcliloop1.c

1 #include "unp.h"

8char sendline;

Listing 8.10

//udpcliserv/dgecholoop1.c

1 #include "unp.h"

3 static int count;

7 socklen_t len;

8 char mesg;

11 len = clilen;

17 recvfrom_int(int signo)

Listing 8.11. Output on the server node

freebsd % netstat -s -p udp

71208 datagrams received

0 with incomplete header

0 with bad data length field

0 with bad checksum

0 with no checksum

832 dropped due to no socket

0 not for hashed pcb

137685 datagrams output

freebsd % udpserv06 launch our server

client sends datagrams

^C

freebsd % netstat -s -p udp

73208 datagrams received

0 with incomplete header

0 with bad data length field

0 with bad checksum

0 with no checksum

832 dropped due to no socket

16 broadcast/multicast datagrams dropped due to no socket

0 not for hashed pcb

137685 datagrams output

aix % udpserv06

^?

received 2000 datagrams

UDP socket receive buffer

The number of UDP datagrams queued for a given socket is limited by the size of its receive buffer. We can change this using the SO_RCVBUF socket option, as we showed in section 7.5. On FreeBSD, the default UDP socket receive buffer size is 42,080 bytes, which allows only 30 of our 1400-byte datagrams to be stored. If we increase the size of the socket receive buffer, we can expect the server to receive additional datagrams. Listing 8.12 is a modified dg_echo function from Listing 8.10 that increases the size of the socket receive buffer to 240 KB. If we run this server on a Sun system and the client on an RS/6000 system, the received datagram count will be 103. Since this is only slightly better than the previous example with the default buffer size, it is clear that we are still didn't get a solution to the problem.

Listing 8.12. dg_echo function, which increases the size of the socket receive buffer

//udpcliserv/dgecholooor2.c

1 #include "unp.h"

2 static void recvfrom_int(int);

3 static int count;

5 dg_echo(int sockfd, SA *pcliaddr, socklen_t clilen)

8 socklen_t len;

9 char mesg;

10 Signal(SIGINT, recvfrom_int);

11 n = 240 * 1024;

12 Setsockopt(sockfd, SOL_SOCKET, SO_RCVBUF, &n, sizeof(n));

14 len = clilen;

15 Recvfrom(sockfd, mesg, MAXLINE, 0, pcliaddr, &len);

20 recvfrom_int(int signo)

22 printf("\nreceived %d datagrams\n", count);

NOTE

Why do we set the socket receive buffer size to 240G-1024 bytes in Listing 8.12? The default maximum socket receive buffer size in BSD/OS 2.1 is 262,144 bytes (256G-1024), but due to the way the buffer is allocated in memory (described in Chapter 2), it is actually limited to 246,723 bytes. Many earlier 4.3BSD-based systems limited the socket receive buffer size to approximately 52,000 bytes.

8.14. Defining the outgoing interface for UDP

You can also use the attached UDP socket to specify the outgoing interface that will be used to send datagrams to a specific recipient. This is due to a side effect of the connect function applied to a UDP socket: the kernel selects the local IP address (assuming the process has not yet called bind to set it explicitly). The local address is selected by looking up the destination address in the routing table, taking the primary IP address of the interface from which, according to the table, datagrams will be sent.

Listing 8.13 shows simple program UDP, which connects to the given IP address using the connect function and then calls the getsockname function, outputting the local IP address and port.

Listing 8.13. UDP program using connect function to determine outgoing interface

//udpcliserv/udpcli09.c

1 #include "unp.h"

3 main(int argc, char **argv)

6 socklen_t len;

7 struct sockaddr_in cliaddr, servaddr;

8 if (argc != 2)

9 err_quit("usage: udpcli ");

10 sockfd = Socket(AF_INET, SOCK_DGRAM, 0);

11 bzero(&servaddr, sizeof(servaddr));

12 servaddr.sin_family = AF_INET;

13 servaddr.sin_port = htons(SERV_PORT);

14 Inet_pton(AF_INET, argv, &servaddr.sin_addr);

15 Connect(sockfd, (SA*)&servaddr, sizeof(servaddr));

16 len = sizeof(cliaddr);

17 Getsockname(sockfd, (SA*)&cliaddr, &len);

18 printf("local address %s\n", Sock_ntop((SA*)&cliaddr, len));

If we run the program on a freebsd host with multiple network interfaces, we will get the following output:

freebsd % udpcli09 206.168.112.96

local address 12.106.32.254:52329

freebsd % udpcli09 192.168.42.2

local address 192.168.42.1:52330

freebsd % udpcli09 127.0.0.1

local address 127.0.0.1:52331

According to Fig. 1.7 we can see that when we run the program the first two times, the command line argument is the IP address in different networks Ethernet. The kernel assigns a local IP address to the primary interface address in the appropriate Ethernet networks. When calling connect on a UDP socket, nothing is sent to that host - it is a completely local operation that preserves the peer's IP address and port. We also see that calling connect on an unconnected UDP socket also assigns a dynamically assigned port to the socket.

NOTE

Unfortunately, this technology does not work in all implementations, which is especially true for kernels derived from SVR4. For example, this does not work on Solaris 2.5, but works on AIX, Digital Unix, Linux, MacOS X and Solaris 2.6.

8.15. TCP and UDP echo server using select function

We will now combine our parallel TCP echo server from Chapter 5 and our serial UDP echo server from this chapter into one server that uses the select function to multiplex TCP and UDP sockets. Listing 8.14 shows the first part of this server.

Listing 8.14. The first part of the echo server processing TCP and UDP sockets using the select function

//udpcliserv/udpservselect01.c

1 #include "unp.h"

3 main(int argc, char **argv)

5 int listenfd, connfd, udpfd, nready, maxfdp1;

6 char mesg;

7 pid_t childpid;

10 socklen_t len;

11 const int on = 1;

12 struct sockaddr_in cliaddr, servaddr;

13 void sig_chld(int);

14 /* create a TCP listening socket */

15 listenfd = Socket(AF_INET, SOCK_STREAM, 0);

16 bzero(&servaddr, sizeof(servaddr));

17 servaddr.sin_family = AF_INET;

18 servaddr.sin_addr.s_addr = htonl(INADDR_ANY);

19 servaddr.sin_port = htons(SERV_PORT);

20 Setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on));

21 Bind(listenfd, (SA*)&servaddr, sizeof(servaddr));

22 Listen(listenfd, LISTENQ);

23 /* create a UDP socket */

24 udpfd = Socket(AF_INET, SOCK_DGRAM, 0);

25 bzero(&servaddr, sizeof(servaddr));

26 servaddr.sin_family = AF_INET;

27 servaddr.sin_addr.s_addr = htonl(INADDR_ANY);

28 servaddr.sin_port = htons(SERV_PORT);

29 Bind(udpfd, (SA*)&servaddr, sizeof(servaddr));

Creating a TCP Listening Socket

14-22 A TCP listening socket is created and associated with a previously known port on the server. We set the socket option SO_REUSEADDR in case there are connections on this port.

Creating a UDP socket

23-29 A UDP socket is also created and associated with the same port. Even if the same port is used for TCP and UDP sockets, there is no need to set the SO_REUSEADDR socket option before this call to bind because TCP ports do not depend on UDP ports.

Listing 8.15 shows the second part of our server.

Listing 8.15. The second half of the echo server processing TCP and UDP using the select function

udpcliserv/udpservselect01.c

30 Signal(SIGCHLD, sig_chld); /* need to call waitpid() */

31 FD_ZERO(&rset);

32 maxfdp1 = max(listenfd, udpfd) + 1;

34 FD_SET(listenfd, &rset);

35 FD_SET(udpfd, &rset);

36 if ((nready = select(maxfdp1, &rset, NULL, NULL, NULL))

37 if (errno == EINTR)

38 continue; /* back to for() */

40 err_sys("select error");

42 if (FD_ISSET(listenfd, &rset)) (

43 len = sizeof(cliaddr);

44 connfd = Accept(listenfd, (SA*)&cliaddr, &len);

45 if ((childpid = Fork()) == 0) ( /* child process */

46 Close(listenfd); /* closes the listening socket */

47 str_echo(connfd); /* Processing request */

50 Close(connfd); /* parent closes the attached socket */

52 if (FD_ISSET(udpfd, &rset)) (

53 len = sizeof(cliaddr);

54 n = Recvfrom(udpfd, mesg, MAXLINE, 0, (SA*)&cliaddr, &len);

55 Sendto(udpfd, mesg, n, 0, (SA*)&cliaddr, len);

Installing the SIGCHLD signal handler

30 A handler is installed for the SIGCHLD signal because TCP connections will be handled by the child process. We showed this signal handler in Listing 5.8.

Preparing to Call the Select Function

31-32 We initialize a set of descriptors for the select function and calculate the maximum of the two descriptors, which we will wait for when ready.

Calling the select function

34-41 We call the select function, waiting only for the TCP socket or UDP socket we are listening on to be ready to read. Since our sig_chld signal handler may abort the select function call, we handle the EINTR error.

Handling a new client connection

42-51 Using the accept function we accept new things client connection, and when the listening TCP socket is ready for reading, we use the fork function to spawn a child process and call our str_echo function in the child process. This is the same sequence of steps we followed in Chapter 5.

Processing an incoming datagram

52-57 If the UDP socket is ready for reading, the datagram has arrived. We read it using the recvfrom function and send it back to the client using the sendto function.

8.16. Summary

Converting our echo client and echo server to use UDP instead of TCP was easy. But at the same time we lost many opportunities provided TCP protocol: detecting lost packets and retransmitting, checking whether packets come from the correct interlocutor, etc. We'll return to this topic in Section 22.5 and see how we can improve the reliability of a UDP application.

UDP sockets can generate asynchronous errors, which are errors that are reported some time after the packet has been sent. TCP sockets always report them to the application, but with UDP the socket must be connected to receive these errors.

UDP lacks flow control, which is very easy to demonstrate. This is usually not a problem since many UDP applications are built using a request-response model and are not intended to transmit large quantity data.

There are a number of other things to consider when writing UDP applications, but we'll cover them in Chapter 22 after covering the functions of interfaces, broadcast, and multicast.

Exercises

1. Let's say we have two applications, one uses TCP and the other uses UDP. The receive buffer for a TCP socket contains 4096 bytes of data, and the receive buffer for a UDP socket contains two datagrams of 2048 bytes. A TCP application calls the read function with a third argument of 4096, and a UDP application calls the recvfrom function with a third argument of 4096. Is there any difference between these calls?

2. In Listing 8.2, what happens if we replace the last argument of the sendto function (which we labeled len) with the argument clilen?

3. Compile and run the UDP server from Listings 8.1 and 8.4, and then the client from Listings 8.3 and 8.4. Make sure the client and server are working together.

4. Run the ping program in one window, specifying the -i option 60 (send one packet every 60 seconds; some systems use the I switch instead of i), the -v option (print all received ICMP error messages), and set the loopback address to itself ( usually 127.0.0.1). We will use this program to see the ICMP port unreachable error returned by the server host. Then run our client from the previous exercise in another window, specifying the IP address of some node that is not running the server. What's happening?

5. Looking at Fig. 8.3, we said that each connected TCP socket has its own receive buffer. Do you think the listening socket has its own receive buffer?

6. Use the sock program (see Section B.3) and a tool such as tcpdump (see Section B.5) to verify the statement in Section 8.10: if the client uses the bind function to bind an IP address to its socket, but sends a datagram originating on a different interface, then the resulting datagram contains the IP address that was associated with the socket, even if it does not match the originating interface.

7. Compile the programs from Section 8.13 and run the client and server on different nodes. Place a printf on the client every time a datagram is written to the socket. Does this change the percentage of packets received? Why? Call printf from the server every time a datagram is read from the socket. Does this change the percentage of packets received? Why?

8. What is the largest length we can pass to the sendto function for a UDP/IPv4 socket, that is, what is the largest amount of data that can fit in a UDP/IPv4 datagram? What changes in the case of UDP/IPv6?

Modify Listing 8.4 to send a single UDP datagram maximum size, read it back and print the number of bytes returned by the recvfrom function.

9. Modify Listing 8.15 to conform to RFC 1122: IP_RECVDSTADDR should be used for a UDP socket.

At the end of Section 8.9, we mentioned that asynchronous errors are not returned on a UDP socket if the socket has not been attached. We can actually call the connect function on a UDP socket (see section 4.3). But this will not result in anything like a TCP connection: there is no three-way handshake. The kernel simply checks to see if the destination is known to be unreachable, then records the peer's IP address and port number, which are contained in the socket address structure passed to the connect function, and immediately returns control to the calling process.

NOTE

Overloading the connect function with this new opportunity for UDP sockets it can be confusing. If the convention is used that sockname is the local protocol address and peername is the address remote protocol, then it would be better if this function was called setpeername. Likewise, the bind function would be better called setsockname.

With this in mind, it is necessary to understand the difference between the two types of UDP sockets.

c– An unconnected UDP socket is the default UDP socket created.

c– A connected UDP socket is the result of calling the connect function on a UDP socket.

A connected UDP socket has three differences from an unattached socket, which is created by default.

1. We can no longer set the destination IP address and port for the output operation. That is, we use the write or send function instead of the sendto function. Anything written to the connected UDP socket is automatically sent to the address (such as IP address and port) specified by the connect function.

NOTE

Similar to TCP, we can call the sendto function on an attached UDP socket, but we cannot specify the destination address. The fifth argument to the sendto function (a pointer to the socket address structure) must be a null pointer, and the sixth argument (the size of the socket address structure) must be null. The POSIX standard specifies that when the fifth argument is a null pointer, the sixth argument is ignored.

2. Instead of the recvfrom function, we use the read or recv function. The only datagrams returned by the kernel for an input operation on a connected UDP socket are datagrams coming from the address specified in the connect function. Datagrams destined for the local protocol address of a connected UDP socket (such as an IP address and port) but coming from a protocol address other than the one to which the socket was connected using the connect function are not sent to the connected socket. This restricts the attached UDP socket to allow it to exchange datagrams with one and only one peer.

NOTE

More precisely, datagrams are exchanged with only one IP address, and not with one interlocutor, since it can be a multicast IP address, thus representing a group of interlocutors.

3. Asynchronous errors are returned to the process only for operations on the attached UDP socket. As a result, as we already said, an unattached UDP socket does not receive any asynchronous errors.

In table 8.2 brings together the properties listed in the first paragraph as applied to 4.4BSD.

Table 8.2. TCP and UDP sockets: can the destination protocol address be specified

NOTE

POSIX specifies that a pin operation that does not specify a destination address on an unattached UDP socket must return an ENOTCONN error rather than an EDESTADDRREQ error.

Solaris 2.5 allows the sendto function, which specifies the destination address for an attached UDP socket. POSIX specifies that an EISCONN error should be returned in this situation.

In Fig. Section 8.7 summarizes information about the attached UDP socket.

Rice. 8.7. UDP attached socket

The application calls the connect function, specifying the IP address and port number of the interlocutor. It then uses the read and write functions to exchange data with the other party.

Datagrams coming from any other IP address or port (which we denote as "???" in Figure 8.7) are not sent to the attached socket because either the source IP address or UDP port does not match the protocol address. to which the socket is connected using the connect function. These datagrams may be delivered to some other UDP socket on the host. If there is no other matching socket for the incoming datagram, UDP will ignore it and generate an ICMP port unreachable message.

To summarize the above, we can state that a UDP client or server can call the connect function only if that process is using a UDP socket to communicate with only one interlocutor. Typically it is the UDP client that calls the connect function, but there are applications in which the UDP server communicates with a single client for an extended period of time (such as TFTP), in which case both the client and the server call the connect function.

Another example of long-term interaction is DNS (Figure 8.8).

Rice. 8.8. Example of DNS clients and servers and the connect function

A DNS client can be configured to use one or more servers, typically by listing the servers' IP addresses in the /etc/resolv.conf file. If there is only one server listed in this file (that client is the leftmost rectangle in the figure), the client can call the connect function, but if many servers are listed (the second right rectangle in the figure), the client cannot call the connect function. Usually DNS server handles any client requests as well, hence servers cannot call the connect function.

Now we will check how the application is affected by the absence of any flow control in UDP. First we'll change our dg_cli function so that it sends a fixed number of datagrams. It will no longer read from standard input. Listing 8.9 shows a new version functions. This function sends 2000 UDP datagrams of 1400 bytes each to the server.

Listing 8.9. The dg_cli function sends a fixed number of datagrams to the server

//udpcliserv/dgcliloop1.c

1 #include "unp.h"

2 #define NDG 2000 /* number of datagrams to send */

3 #define DGLEN 1400 /* length of each datagram */

5 dg_cli(FILE *fp, int sockfd, const SA *pservaddr, socklen_t servlen)

8char sendline;

10 Sendto(sockfd, sendline, DGLEN, 0, pservaddr, servlen);

We then modify the server to receive datagrams and count the number of datagrams received. The server no longer reflects datagrams back to the client. Listing 8.10 shows the new dg_echo function. When we terminate the server process by pressing the interrupt key on the terminal (which causes a SIGINT signal to be sent to the process), the server prints the number of datagrams received and exits.

Listing 8.10. The dg_echo function, which counts received datagrams

//udpcliserv/dgecholoop1.c

1 #include "unp.h"

2 static void recvfrom_int(int);

3 static int count;

5 dg_echo(int sockfd, SA *pcliaddr, socklen_t clilen)

7 socklen_t len;

8 char mesg;

9 Signal(SIGINT, recvfrom_int);

11 len = clilen;

12 Recvfrom(sockfd, mesg, MAXLINE, 0, pcliaddr, &len);

17 recvfrom_int(int signo)

19 printf("\nreceived %d datagrams\n", count);

Now we run the server on the freebsd node, which is slow computer SPARCStation. We run the client on a much faster RS/6000 system with the aix operating system. They are connected to each other directly Ethernet channel at 100 Mbit/s. In addition, we run netstat -s on the server node both before and after the client and server are started, since the statistics output will show how many datagrams we lost. Listing 8.11 shows the server output.

Listing 8.11. Output on the server node

freebsd % netstat -s -p udp

71208 datagrams received

0 with incomplete header

0 with bad data length field

0 with bad checksum

0 with no checksum

832 dropped due to no socket

16 broadcast/multicast datagrams dropped due to no socket

1971 dropped due to full socket buffers

0 not for hashed pcb

137685 datagrams output

freebsd % udpserv06 launch our server

client sends datagrams

^C To end the client's work, enter our interrupt symbol

freebsd % netstat -s -p udp

73208 datagrams received

0 with incomplete header

0 with bad data length field

0 with bad checksum

0 with no checksum

832 dropped due to no socket

16 broadcast/multicast datagrams dropped due to no socket

3941 dropped due to full socket buffers

0 not for hashed pcb

137685 datagrams output

The client sent 2000 datagrams, but the server application only received 30 of them, meaning a loss rate of 98%. Neither the server nor the client receives a message that these datagrams are lost. As we said, UDP does not have flow control capabilities - it is unreliable. As we have shown, it is easy for a UDP sender to overflow the receiver's buffer.

If we look at the netstat output, we see that the total number of datagrams received by the server node (not the server application) is 2000 (73,208 - 71,208). The dropped due to full socket buffers counter shows how many datagrams were received by UDP and ignored because the receiving socket's receive buffer was full. This value is 1970 (3941 - 1971), which when added to the output of the application's datagrams received count (30), gives a total of 2000 datagrams received by the node. Unfortunately, netstat's count of datagrams discarded due to a full buffer is system wide. There is no way to determine which applications (eg which UDP ports) are affected.

The number of datagrams received by the server in this example is non-deterministic. It depends on many factors, such as network load, client node and server node load.

If we run the same client and the same server, but this time the client is on a slow Sun system and the server is on a fast RS/6000 system, no datagrams are lost.

aix % udpserv06

^? after the client has finished working, enter our interrupt symbol

received 2000 datagrams

At the end of Section 8.9, we mentioned that asynchronous errors are not returned on a UDP socket if the socket has not been attached. We can actually call the connect function on a UDP socket (see section 4.3). But this will not result in anything like a TCP connection: there is no three-way handshake. The kernel simply checks to see if the destination is known to be unreachable, then records the peer's IP address and port number, which are contained in the socket address structure passed to the connect function, and immediately returns control to the calling process.

NOTE

Overloading the connect function with this new feature for UDP sockets can be confusing. If the convention is that sockname is the local protocol address and peername is the remote protocol address, then this function would be better called setpeername. Likewise, the bind function would be better called setsockname.

With this in mind, it is necessary to understand the difference between the two types of UDP sockets.

c– An unconnected UDP socket is the default UDP socket created.

c– A connected UDP socket is the result of calling the connect function on a UDP socket.

A connected UDP socket has three differences from an unattached socket, which is created by default.

1. We can no longer set the destination IP address and port for the output operation. That is, we use the write or send function instead of the sendto function. Anything written to the connected UDP socket is automatically sent to the address (such as IP address and port) specified by the connect function.

NOTE

Similar to TCP, we can call the sendto function on an attached UDP socket, but we cannot specify the destination address. The fifth argument to the sendto function (a pointer to the socket address structure) must be a null pointer, and the sixth argument (the size of the socket address structure) must be null. The POSIX standard specifies that when the fifth argument is a null pointer, the sixth argument is ignored.

2. Instead of the recvfrom function, we use the read or recv function. The only datagrams returned by the kernel for an input operation on a connected UDP socket are datagrams coming from the address specified in the connect function. Datagrams destined for the local protocol address of a connected UDP socket (such as an IP address and port) but coming from a protocol address other than the one to which the socket was connected using the connect function are not sent to the connected socket. This restricts the attached UDP socket to allow it to exchange datagrams with one and only one peer.

NOTE

More precisely, datagrams are exchanged with only one IP address, and not with one interlocutor, since it can be a multicast IP address, thus representing a group of interlocutors.

3. Asynchronous errors are returned to the process only for operations on the attached UDP socket. As a result, as we already said, an unattached UDP socket does not receive any asynchronous errors.

In table 8.2 brings together the properties listed in the first paragraph as applied to 4.4BSD.

Table 8.2. TCP and UDP sockets: can the destination protocol address be specified

NOTE

POSIX specifies that a pin operation that does not specify a destination address on an unattached UDP socket must return an ENOTCONN error rather than an EDESTADDRREQ error.

Solaris 2.5 allows the sendto function, which specifies the destination address for an attached UDP socket. POSIX specifies that an EISCONN error should be returned in this situation.

In Fig. Section 8.7 summarizes information about the attached UDP socket.

Rice. 8.7. UDP attached socket

The application calls the connect function, specifying the IP address and port number of the interlocutor. It then uses the read and write functions to exchange data with the other party.

Datagrams coming from any other IP address or port (which we denote as "???" in Figure 8.7) are not sent to the attached socket because either the source IP address or UDP port does not match the protocol address. to which the socket is connected using the connect function. These datagrams may be delivered to some other UDP socket on the host. If there is no other matching socket for the incoming datagram, UDP will ignore it and generate an ICMP port unreachable message.

To summarize the above, we can state that a UDP client or server can call the connect function only if that process is using a UDP socket to communicate with only one interlocutor. Typically it is the UDP client that calls the connect function, but there are applications in which the UDP server communicates with a single client for an extended period of time (such as TFTP), in which case both the client and the server call the connect function.

Another example of long-term interaction is DNS (Figure 8.8).

Rice. 8.8. Example of DNS clients and servers and the connect function

A DNS client can be configured to use one or more servers, typically by listing the servers' IP addresses in the /etc/resolv.conf file. If there is only one server listed in this file (that client is the leftmost rectangle in the figure), the client can call the connect function, but if many servers are listed (the second right rectangle in the figure), the client cannot call the connect function. Typically, the DNS server also handles any client requests, so servers cannot call the connect function.

Now we will check how the application is affected by the absence of any flow control in UDP. First we'll change our dg_cli function so that it sends a fixed number of datagrams. It will no longer read from standard input. Listing 8.9 shows the new version of the function. This function sends 2000 UDP datagrams of 1400 bytes each to the server.

Listing 8.9. The dg_cli function sends a fixed number of datagrams to the server

//udpcliserv/dgcliloop1.c

1 #include "unp.h"

2 #define NDG 2000 /* number of datagrams to send */

3 #define DGLEN 1400 /* length of each datagram */

5 dg_cli(FILE *fp, int sockfd, const SA *pservaddr, socklen_t servlen)

8char sendline;

10 Sendto(sockfd, sendline, DGLEN, 0, pservaddr, servlen);

We then modify the server to receive datagrams and count the number of datagrams received. The server no longer reflects datagrams back to the client. Listing 8.10 shows the new dg_echo function. When we terminate the server process by pressing the interrupt key on the terminal (which causes a SIGINT signal to be sent to the process), the server prints the number of datagrams received and exits.

Listing 8.10. The dg_echo function, which counts received datagrams

//udpcliserv/dgecholoop1.c

1 #include "unp.h"

2 static void recvfrom_int(int);

3 static int count;

5 dg_echo(int sockfd, SA *pcliaddr, socklen_t clilen)

7 socklen_t len;

8 char mesg;

9 Signal(SIGINT, recvfrom_int);

11 len = clilen;

12 Recvfrom(sockfd, mesg, MAXLINE, 0, pcliaddr, &len);

17 recvfrom_int(int signo)

19 printf("\nreceived %d datagrams\n", count);

We now run the server on node freebsd, which is a slow SPARCStation computer. We run the client on a much faster RS/6000 system with the aix operating system. They are connected directly to each other via a 100 Mbit/s Ethernet link. In addition, we run netstat -s on the server node both before and after the client and server are started, since the statistics output will show how many datagrams we lost. Listing 8.11 shows the server output.

Listing 8.11. Output on the server node

freebsd % netstat -s -p udp

71208 datagrams received

0 with incomplete header

0 with bad data length field

0 with bad checksum

0 with no checksum

832 dropped due to no socket

16 broadcast/multicast datagrams dropped due to no socket

1971 dropped due to full socket buffers

0 not for hashed pcb

137685 datagrams output

freebsd % udpserv06 launch our server

client sends datagrams

^C To end the client's work, enter our interrupt symbol

freebsd % netstat -s -p udp

73208 datagrams received

0 with incomplete header

0 with bad data length field

0 with bad checksum

0 with no checksum

832 dropped due to no socket

16 broadcast/multicast datagrams dropped due to no socket

3941 dropped due to full socket buffers

0 not for hashed pcb

137685 datagrams output

The client sent 2000 datagrams, but the server application only received 30 of them, meaning a loss rate of 98%. Neither the server nor the client receives a message that these datagrams are lost. As we said, UDP does not have flow control capabilities - it is unreliable. As we have shown, it is easy for a UDP sender to overflow the receiver's buffer.

If we look at the netstat output, we see that the total number of datagrams received by the server node (not the server application) is 2000 (73,208 - 71,208). The dropped due to full socket buffers counter shows how many datagrams were received by UDP and ignored because the receiving socket's receive buffer was full. This value is 1970 (3941 - 1971), which when added to the output of the application's datagrams received count (30), gives a total of 2000 datagrams received by the node. Unfortunately, netstat's count of datagrams discarded due to a full buffer is system wide. There is no way to determine which applications (eg which UDP ports) are affected.

The number of datagrams received by the server in this example is non-deterministic. It depends on many factors, such as network load, client node and server node load.

If we run the same client and the same server, but this time the client is on a slow Sun system and the server is on a fast RS/6000 system, no datagrams are lost.

aix % udpserv06

^? after the client has finished working, enter our interrupt symbol

received 2000 datagrams







2024 gtavrl.ru.