TCP triple handshake. Stuck questions - computer networks and telecommunications

Facebook

Acceleration of any processes is impossible without a detailed representation of their internal structure. Speeding up the Internet is impossible without understanding (and appropriate configuration) the fundamental protocols - IP and TCP. Let's understand the features of protocols that affect Internet speed.

IP (Internet Protocol) provides host-to-host routing and addressing. TCP (Transmission Control Protocol) provides an abstraction in which a network operates reliably over an inherently unreliable channel.

The TCP/IP protocols were proposed by Vint Cerf and Bob Kahn in the paper “A Communications Protocol for a Packet-Based Network,” published in 1974. The original proposal, registered as RFC 675, was edited several times and in 1981, version 4 of the TCP/IP specification was published as two different RFCs:

RFC 791 – Internet Protocol
RFC 793 – Transmission Control Protocol

Since then, several improvements have been made to TCP, but its foundation remains the same. TCP quickly replaced other protocols and now powers the core components of how we think of the Internet: websites, email, file transfer, and others.

TCP provides the necessary abstraction of network connections so that applications do not have to deal with various related problems, such as: retransmission of lost data, delivery of data in a certain order, data integrity, and the like. When you work with a TCP stream, you know that the bytes sent will be identical to those received, and that they will arrive in the same order. We can say that TCP is more “tailored” to the correctness of data delivery, and not to speed. This fact creates a number of problems when it comes to optimizing website performance.

The HTTP standard does not require the use of TCP as a transport protocol. If we want, we can transmit HTTP over a datagram socket (UDP - User Datagram Protocol) or through any other. But in practice, all HTTP traffic is transmitted via TCP, due to the convenience of the latter.

Therefore, it is necessary to understand some of the internal mechanisms of TCP in order to optimize sites. You likely won't be working with TCP sockets directly in your application, but some of your application design decisions will dictate the performance of the TCP over which your application will run.

Three-way handshake

All TCP connections begin with a three-way handshake (Figure 1). Before the client and server can exchange any application data, they must "agree" on the seed of the packet sequence, as well as a number of other variables associated with that connection. Sequence numbers are chosen randomly on both sides for security.

SYN

The client selects a random number X and sends a SYN packet, which may also contain additional TCP flags and option values.

SYN ACK

The server selects its own random number Y, adds 1 to the value of X, adds its flags and options, and sends a response.

ASK

The client adds 1 to the values of X and Y and completes the handshake by sending an ACK packet.

Rice. 1. Three-way handshake.

Once the handshake is completed, data exchange can begin. The client can send a data packet immediately after the ACK packet, the server must wait for the ACK packet to start sending data. This process occurs on every TCP connection and poses a serious challenge in terms of website performance. After all, each new connection means some network delay.

For example, if the client is in New York, the server is in London, and we create a new TCP connection, it will take 56 milliseconds. 28 milliseconds for a packet to travel in one direction and the same amount to return to New York. The channel width does not play any role here. Creating TCP connections can be expensive, so reusing connections is an important optimization feature for any TCP-based application.

TCP Fast Open (TFO)

Loading a page can mean downloading hundreds of parts from different hosts. This may require the browser to create dozens of new TCP connections, each of which will suffer latency due to the handshake. Needless to say, this can worsen the loading speed of such a page, especially for mobile users.

TCP Fast Open (TFO) is a mechanism that reduces latency by allowing data to be sent within a SYN packet. However, it also has its limitations: in particular, on the maximum size of data inside a SYN packet. Additionally, only certain types of HTTP requests can use TFO, and it only works for replay connections because it uses a cookie.

Using TFO requires explicit support for this mechanism on the client, server, and application. This works on a server running Linux kernel version 3.7 and above and a compatible client (Linux, iOS9 and above, OSX 10.11 and above), and will also need to enable the appropriate socket flags within the application.

Google experts have determined that TFO can reduce network latency for HTTP requests by 15%, speed up page loading by 10% on average, and in some cases by up to 40%.

Overload control

In early 1984, John Nagle described a network condition he called “congestion collapse,” which can form in any network where the channel widths between nodes are unequal.

When the round-trip delay (round-trip time for packets) exceeds the maximum retransmission interval, hosts begin sending copies of the same datagrams to the network. This will cause buffers to become full and packets to be lost. As a result, hosts will send packets several times, and after several attempts the packets will reach the target. This is called "congestion collapse".

Nagle showed that congestion collapse was not a problem for ARPANETN at the time because the nodes had equal channel widths and the backbone had excess capacity. However, this is no longer the case on the modern Internet. Back in 1986, when the number of nodes on the network exceeded 5000, a series of congestion collapses occurred. In some cases, this caused the network speed to drop by a factor of 1000, meaning it was virtually unusable.

To cope with this problem, TCP has implemented several mechanisms: flow control, congestion control, and congestion avoidance. They determined the speed at which data can be transferred in both directions.

Flow control

Flow control prevents sending too much data to the recipient for it to process. To prevent this from happening, each side of the TCP connection reports the amount of available buffer space for incoming data. This parameter is the “receive window” (rwnd).

When a connection is established, both sides set their rwn values based on their system defaults. Opening a typical web page will mean sending a lot of data from the server to the client, so the client's reception window will be the main limiter. However, if the client is sending a lot of data to the server, for example by uploading a video there, then the server's receive window will be the limiting factor.

If for some reason one side cannot cope with the incoming data stream, it must report a reduced value for its receive window. If the receive window reaches 0, this signals the sender not to send any more data until the receiver's buffer is cleared at the application level. This sequence is repeated continuously on every TCP connection: each ACK packet carries a fresh rwnd value for both parties, allowing them to dynamically adjust the data rate according to the capabilities of the recipient and sender.

Rice. 2. Transmitting the receive window value.

Window scaling (RFC 1323)

The original TCP specification limited the size of the transmitted receive window value to 16 bits. This severely limited it at the top, since the receive window could not be larger than 216 or 65,535 bytes. It turned out that this is often not enough for optimal performance, especially in networks with a large “bandwidth-delay product” (BDP).

To combat this problem, RFC 1323 introduced the TCP window scaling option, which allowed the receive window size to be increased from 65,535 bytes to 1 gigabyte. The window scaling parameter is sent in the three-way handshake and represents the number of bits to left shift the 16-bit receive window size in the following ACK packets.

Today, receive window scaling is enabled by default on all major platforms. However, intermediate hosts, routers, and firewalls can overwrite or even delete this setting. If your connection cannot fully utilize the entire channel, you need to start by checking the values of the receive windows. On Linux platform, the window scaling option can be checked and set like this:

$> sysctl net.ipv4.tcp_window_scaling $> sysctl -w net.ipv4.tcp_window_scaling=1

In the next part, we will figure out what TCP Slow Start is, how to optimize the data transfer rate and increase the initial window, and also put together all the recommendations for optimizing the TCP/IP stack.

Transport layer functions

provides a logical connection between applications;
implements reliable data transmission;
provides control of data transfer speed.

Sockets

Socket(socket) is a data structure that identifies a network connection.

Why are sockets needed? A server (program) can simultaneously support multiple TCP connections with other computers using the same standard port number. How to implement this? You can assign this task to a programmer. Let it select packets from the network layer receive buffer, look at who they were sent from and respond accordingly. But you can make all this more convenient.

Each connection must have its own thread associated with it, into which information can be written and from which it can be read. Each flow has its own IP address of the remote computer and its own port of the remote computer. We will call the data structure corresponding to each such stream a socket (socket). Thus, the server can be compared to a power strip with a bunch of sockets to which clients are connected.

If you do this, then instead of sorting through a bunch of different packets from the network layer receive buffer, the server will read from streams, each of which corresponds to a different client. Data from clients will not be piled up, but will be distributed across socket streams. Responsibility for such distribution falls not on the programmer, but on the operating system transport layer driver.

Sockets were developed at the University of California at Berkeley and became the de facto standard as opposed to the OSI TLI (Transport Layer Interface).

Historical reference. UNIX split

BSD UNIX, created at the University of Berkeley, began its history in 1978. The author of BSD was Bill Joy. In the early 1980s, AT&T, which owned Bell Labs, recognized the value of UNIX and began creating a commercial version of UNIX. An important reason for the UNIX split was the implementation of the TCP/IP protocol stack in 1980. Before this, machine-to-machine communication in UNIX was in its infancy - the most significant method of communication was UUCP (a means of copying files from one UNIX system to another, originally operating over telephone networks using modems).

These two operating systems implemented 2 different network application programming interfaces: Berkley sockets (TCP/IP) and TLI (OSI ISO) Transport Layer Interface. The Berkley sockets interface was developed at the University of Berkeley and used the TCP/IP protocol stack developed there. TLI was created by AT&T according to the transport layer definition of the OSI model. Initially, it did not implement TCP/IP or other network protocols, but similar implementations were provided by third parties. This, as well as other considerations (mostly market ones), caused the final division between the two branches of UNIX - BSD (Berkeley University) and System V (commercial version from AT&T). Subsequently, many companies, having licensed System V from AT&T, developed their own commercial versions of UNIX, such as AIX, HP-UX, IRIX, Solaris.

Socket primitives

SOCKET	create a new (empty) socket
BIND	the server binds its local address (port) to the socket
LISTEN	the server allocates memory for the client connection queue (TCP)
ACCEPT	the server waits for a client to connect or accepts the first connection in the queue (TCP). To block waiting for incoming connections, the server executes the ACCEPT primitive. Upon receiving a connection request, the OS transport module creates a new socket with the same properties as the original socket and returns a file handle for it. The server can then fork a process or thread to handle the connection for the new socket and in parallel wait for the next connection for the original socket
CONNECT	client requests connection (TCP)
SEND / SEND_TO	send data (TCP/UDP)
RECEIVE / RECEIVE_FROM	receive data (TCP/UDP)
DISCONNECT	request disconnect (TCP)

Multiplexing and demultiplexing

Multiplexing- collecting messages from sockets of all applications and adding headers.

Demultiplexing- distribution of incoming data across sockets.

For UDP, the required socket is determined by the recipient port number, for TCP - by the recipient port number, IP address and sender port number.

Transport layer protocols

There are two protocols at the transport layer: TCP (reliable) and UDP (unreliable).

UDP protocol

UDP (User Datagram Protocol) performs a minimum of steps, allowing the application to work almost directly with the network layer. It works much faster than TCP because there is no need to establish a connection and wait for delivery confirmation. Possible loss of segments. Monitors the correctness of transmitted data (checksum).

UDP segment structure

The header is only 8 bytes.

Principles of reliable data transmission

Let's design the myTCP protocol, gradually complicating it.

myTCP 1.0 protocol states. Transmission over a completely reliable channel

myTCP 1.0 protocol states (sender) (transmission over a completely reliable channel).JPG

Sender

myTCP 1.0 protocol states (receiver) (transmission over a completely reliable channel).JPG

Recipient

myTCP 2.0 protocol states. Transmission over a channel that allows bit distortion. No packet loss possible

myTCP 2.0 protocol states (sender) (transmission over a channel that allows bit corruption. No packet loss possible).JPG

Sender

myTCP 2.0 protocol states (receiver) (transmission over a channel that allows bit corruption. No packet loss possible).JPG

Recipient

But receipts can also get lost. If the receipt is corrupted, the sender sends the packet again. The recipient must think about how to process repeated packets (you need to enter a new state - whether the previous packet was transferred to the application or not).

The role of the “repeat” and “new” identifiers in TCP/IP is played by packet numbers (since packets can still be lost).

myTCP 2.1 protocol states. Transmission over a channel that allows bit distortion. No packet loss possible

myTCP 2.1 protocol states (sender) (transmission over a channel that allows bit corruption. No packet loss possible).JPG

Sender

myTCP 2.1 protocol states (receiver) (transmission over a channel that allows bit corruption. No packet loss possible).JPG

Recipient

The main difference between the recipient states is how replay packets are handled. In the “Last packet was sent to the application” state, we throw out repeated packets, and in the “Last packet was not transmitted to the application” state, we accept them and transmit them to the application.

Now it's time to remember that packets can get lost.

You need to be able to determine whether a packet has been lost, for example, to detect the time after the packet is sent.
Packages need to be numbered.
Receipts must indicate the number of the package to which it was sent.

Thus, we come to the need for a timer. If a certain time has passed and confirmation has not arrived, the message is resent. The time interval is small because the probability of loss is assumed to be close to 1 (this is true even for a good WiFi connection).

Disadvantages of protocols that wait for confirmations

Let's look at an example. Let there be a 1GB channel Rostov - Moscow. Let's calculate the time to send 1000 bytes (or 8000 bits):

8000 bits / 1 Gb/s = 8 µs

Signal propagation time:

1000 km / 300,000 km/s = 3333 µs

Total: the next 1000 bytes will be sent in more than 6674 µs

Conclusion: 99.9% of the time the channel is not used.

The solution is to increase the package size. But if at least 1 bit is distorted, then the entire packet will be thrown out. What then?

Sliding window protocols

Solution to the problem: allow the sender to send not one frame, but several before stopping and going into mode waiting for confirmations (receipts). This technique is called conveyor processing.

In the figure, green indicates those receipts that have already been received, yellow indicates those that have been sent but not received, blue ones are prepared for sending, and white ones cannot be sent until we receive receipts for yellow ones. Window: Yellow and blue are packets that can be transmitted without waiting for receipts. The first white package can only be sent after confirmation of the first yellow one has been received. Then the window moves 1 to the right.

The question may arise: why limit the window size, let's transmit all the packets and then wait for confirmations. But you can’t do this: it’s easy to get congestion in the network.

There are two ways to solve the problem of errors when pipelining frames:

GBN (Go Back N - return N packets back);
SR (Selective Repeat - selective repetition).

GBN

The recipient sends only positive receipts and only about the receipt of those packets for which the condition is met: all packets with lower numbers have already been received. So here it is used group handshake: receipt by the sender of a receipt with number i means that all packets up to i inclusive were delivered successfully. If after some time the sender does not receive a receipt, it repeats sending all N packets starting from the one after the last acknowledged one.

The GBN method is ineffective when the window is large and packets take a long time to propagate over a network where losses occur. Example: we sent 1000 packets, the second one did not arrive, we have to repeat sending all of them, starting with the second one. We are clogging up the network with useless traffic.

S.R.

This approach involves sending a receipt for each package. The receiver stores in its buffer all valid frames received after an invalid or lost one. In this case, the incorrect frame is discarded. If the receipt timeout for a frame expires, the sender sends this frame again without repeating the sending of all subsequent ones. If the second attempt is successful, the recipient's accumulated packets can be transmitted to the network layer, after which an acknowledgment of receipt of the frame with the highest number will be sent.

Often the selective method is combined with the recipient sending a “negative acknowledgment” (NAK - Negative Acknowledgement) when an error is detected (for example, if the checksum is incorrect). At the same time, work efficiency increases.

With a large window, the SR approach may require a significant buffer size.

TCP protocol

TCP Segment Format

A TCP segment consists of a data field and several header fields. Data field contains a piece of data passed between processes. The data field size is limited to M.S.S.(maximum segment size). When a protocol transfers a large file, it typically breaks the data into MSS-sized chunks (except for the last chunk, which is usually smaller). In contrast, interactive applications often exchange data that is significantly smaller than the MSS. For example, remote network access applications like Telnet can send 1 byte of data to the transport layer. Since the TCP segment header is typically 20 bytes long (12 bytes larger than UDP), the total segment size in this case is 21 bytes.

As in the UDP protocol, the header includes the source and destination port numbers for data multiplexing and demultiplexing procedures, as well as a checksum field. In addition, the TCP segment includes some other fields.

32-bit sequence number and acknowledgment number fields. Necessary for reliable data transfer.
The 4-bit header length field specifies the length of the TCP header in 32-bit words. The minimum size is 5 words and the maximum is 15, which is 20 and 60 bytes respectively. The TCP header can be of variable length due to the options field described below (typically the options field is empty; this means the header is 20 bytes long).
The flags field consists of 6 bits. The acknowledgment bit (ACK) indicates that the value contained in the receipt is correct. The RST, SYN and FIN bits are used to establish and terminate a connection. When the PSH bit is set, it instructs the receiver to push the data accumulated in the receive buffer to the user application. The URG bit indicates that the segment contains data designated as "urgent" by the top layer. The location of the last byte of the urgent data is indicated in the 16-bit urgent data pointer field. At the receiving end, TCP must notify the upper layer that there is urgent data in the segment and pass it a pointer to the end of that data. In practice, the PSH, URG flags and the urgent data pointer field are not used. We mentioned them only for completeness of description.
A 16-bit receive window is used for data flow control. It contains the number of bytes that the receiving side is capable of receiving.
The importance indicator is a 16-bit positive offset value from the sequence number in a given segment. This field indicates the sequence number of the octet that ends the urgent data. The field is taken into account only for packets with the URG flag set.
An optional parameter field is used in cases where the sending and receiving parties “agree” on the maximum segment size, or to scale the window in high-speed networks. This field also defines the timestamp parameter. More information can be found in RFC 854 and RFC 1323.

Sequence and confirmation numbers

Segment sequence number is the number of the first byte of this segment.

Confirmation number is the sequence number of the next expected byte.

The sequence number and acknowledgment number fields are the most important fields in the TCP segment header because they play a key role in the functioning of the reliable data transfer service. However, before considering the role of these fields in the reliable transmission mechanism, let's look at the values that TCP places in these fields.

The TCP protocol treats data as an unstructured, ordered stream of bytes. This approach is manifested in the fact that TCP assigns sequence numbers not to segments, but to each byte transmitted. Based on this, the sequence number of a segment is defined as the sequence number of the first byte of this segment. Consider the following example. Let host A wish to forward a data stream to host B over a TCP connection. The TCP protocol on the sending side implicitly numbers each byte of the stream. Let the transferred file size be 500,000 bytes, the MSS value be 1000 bytes, and the first byte of the stream have sequence number 0. TCP splits the data stream into 500 segments. The first segment is assigned sequence number 0, the second segment is assigned number 1000, the third segment is assigned number 2000, etc. The sequence numbers are entered in the sequence number fields of each TCP segment.

Now let's look at the confirmation numbers. Recall that the TCP protocol provides duplex data transmission, that is, through a single TCP connection, data between hosts A and B can be transmitted simultaneously in both directions. Each segment originating from Host B contains the sequence number of the data being transmitted from Host B to Host A. The acknowledgment number that Host A places in its segment is the sequence number of the next byte that Host A expects from Host B. Consider the following example. Assume that Host A has received all bytes numbered 0 through 535 sent by Host B and forms a segment to send to Host B. Host A expects the next bytes sent by Host B to be numbered starting with 536 and places the number 536 in the confirmation number field of your segment.

Let's consider another situation. Let host A receive two segments from host B, the first of which contains bytes numbered 0 to 535, and the second contains bytes numbered 900 to 1000. This means that for some reason bytes numbered 536 to 899 were not received by Host A. In this case, Host A waits for the missing bytes and places the number 536 in its segment acknowledgment number field. Because TCP handshake received data up to the first missing byte, it is said to support a general handshake.

This last example demonstrates a very important aspect of how TCP works. The third segment (containing bytes 900-1000) was received by host A earlier than the second (containing bytes 536-899), that is, in violation of the data order. The question arises: how does TCP respond to disruption of order? If the received segment contains a sequence number greater than expected, then the data from the segment is buffered, but the confirmed sequence number is not changed. If a segment corresponding to the expected sequence number is subsequently received, the order of the data will be automatically restored based on the sequence numbers in the segments. Thus, TCP is an SR protocol, but it uses a common handshake like GBN. Although SR is not entirely pure. If the sending party receives several (3) negative receipts for the same segment x, then it guesses that network congestion has occurred and segments x+1, x+2, x+3,... were also not delivered. Then the entire series is sent starting from x - as in GBN protocols.

Problems with maximum segment size

TCP requires an explicit maximum segment size if the virtual connection is over a network segment where the maximum unit size (MTU) is less than the standard Ethernet MTU (1500 bytes). In tunneling protocols such as GRE, IPIP, as well as PPPoE, the tunnel MTU is smaller than the standard one, so the maximum size TCP segment has a packet length greater than the MTU. Since fragmentation is prohibited in the vast majority of cases, such packets are discarded.

The manifestation of this problem looks like “hanging” connections. In this case, “freezing” can occur at arbitrary times, namely when the sender used segments longer than the permissible size. To solve this problem, routers use Firewall rules that add an MSS parameter to all packets that initiate connections so that the sender uses segments of a valid size. MSS can also be controlled by operating system parameters.

Triple handshake

To establish a connection, host 2 passively waits for an incoming connection by executing the ACCEPT primitive.

Host 2 executes the CONNECT primitive, indicating the IP address and port with which it wants to connect, the maximum TCP segment size, etc. The CONNECT primitive sends a TCP "Connection request" segment with the SYN=1 bit set and the bit cleared ACK=0 and waits for a response. So host 1 reports the sequence number x of the bit sequence from host 1 to 2.

Host 2 sends back a “Connection accepted” confirmation (accept function). The sequence of TCP segments sent in the normal case is shown in Fig: SYN=1 ASK=1, host 2 reports the sequence number x of the bit sequence from host 2 to 1 and reports that it expects data to continue from byte no. x+1.

Host 1 (Connect) sends confirmation that it has received consent to establish the connection.

TCP Congestion Control

When any network receives more data than it can handle, the network becomes congested. The Internet in this sense is no exception. Although the network layer also tries to deal with congestion, the main contribution to solving this problem, which is to reduce the data transfer rate, is made by the TCP protocol.

Theoretically, congestion can be dealt with using a principle borrowed from physics - the law of conservation of packets. The idea is to not send new packets onto the network until the old ones have left (that is, been delivered). The TCP protocol attempts to achieve this goal by dynamically controlling the window size.

The first step in dealing with overload is to recognize it. A couple of decades ago, detecting network congestion was difficult. It was difficult to understand why the package was not delivered on time. In addition to the possibility of network congestion, there was also a high probability of packet loss due to high levels of interference on the line.

Nowadays, packet loss during transmission is relatively rare, since most long-distance communication lines are fiber optic (although in wireless networks the percentage of packets lost due to interference is quite high). Accordingly, most lost packets on the Internet are caused by congestion. All Internet TCP algorithms assume that packet losses are caused by network congestion and watch for timeouts as a warning sign of problems.

Before we discuss how TCP responds to congestion, we first describe the protocol's congestion avoidance techniques. When an overload is detected, an appropriate window size must be selected. The recipient can specify the window size based on the amount of free space in the buffer. If the sender is mindful of its window size, a buffer overflow at the receiver may not cause the problem, but it may still occur due to congestion somewhere in the network between the sender and the receiver.

TCP Congestion Control

Let us illustrate this problem using the example of a water supply system. In figure a we see a thick pipe leading to a receiver with a small container. As long as the sender does not send more water than can fit in the bucket, the water will not spill; in Figure b, the limiting factor is not the capacity of the bucket, but the network capacity. If water flows too quickly from the tap into the funnel, the water level in the funnel will begin to rise and, eventually, some of the water may overflow the edge of the funnel.

The Internet solution is to recognize that there are two potential problems: low network throughput and low receiver capacity, and to address both problems separately. To do this, each sender has two windows: the receiver-provided window and the overload window. The size of each of them corresponds to the number of bytes that the sender is allowed to transmit. The sender uses the minimum of these two values. For example, the receiver says, “Send 8 KB,” but the sender knows that if he sends more than 4 KB, there will be a congestion on the network, so he sends 4 KB anyway. If the sender knows that the network is capable of transmitting a larger amount of data, for example 32 KB, he will transmit as much as the recipient requests (that is, 8 KB).

When a connection is established, the sender sets the congestion window size to the size of the maximum segment used on the connection. It then transmits one maximum segment. If an acknowledgment of receipt of this segment arrives before the timeout period expires, the segment size is added to the window size, that is, the congestion window size is doubled, and two segments are sent. In response to confirmation of receipt of each of the segments, the congestion window is expanded by the value of one maximum segment. Let's say the window size is n segments. If acknowledgments for all segments arrive on time, the window is increased by the number of bytes corresponding to n segments. Essentially, acknowledging each sequence of segments results in the congestion window doubling.

This process of exponential growth continues until the receiver's window size is reached or a timeout flag is generated, signaling network congestion. For example, if packets of size 1024, 2048 and 4096 bytes reach the recipient successfully, but in response to the transmission of a packet of size 8192 bytes the acknowledgment does not arrive within the specified time limit, the congestion window is set to 4096 bytes. As long as the congestion window size remains at 4096 bytes, no longer packets are sent, regardless of the window size provided by the recipient. This algorithm is called long start, or slow start. However, it is not that slow (Jacobson, 1988). It's exponential. All implementations of the TCP protocol are required to support it.

Let us now consider the congestion control mechanism used on the Internet. In addition to the receiver windows and congestion, it uses a threshold value as a third parameter, which is initially set to 64 KB. When a timeout condition occurs (the acknowledgment is not returned in time), the new threshold value is set to half the current congestion window size, and the congestion window is reduced to the size of one maximum segment. Then, as in the previous case, a slow start-up algorithm is used to quickly detect the network capacity limit. However, this time the exponential growth of the window size stops when it reaches a threshold, after which the window grows linearly, by one segment for each subsequent transmission. In essence, the assumption is that you can safely halve the size of the congestion window, and then gradually increase it.

Reliable transmission mechanisms. Generalization

Check sum	Detection of bit corruption in a received packet
Timer	Counts down the timeout interval and indicates when it has expired. The latter means that with a high degree of probability the packet or its receipt is lost during transmission. If a packet is delivered with a delay, but is not lost (premature expiration of the timeout interval), or a receipt is lost, retransmission leads to duplication of the packet on the receiving side
Serial numbers	Sequential numbering of packets sent by the transmitting side. “Gaps” in the numbers of received packets allow us to conclude that data has been lost. The same packet sequence numbers mean that the packets are duplicates of each other
“+” and “-” receipts	Generated by the receiving end and indicating to the sending end that the corresponding packet or group of packets was or was not received. Typically the acknowledgment contains the sequence numbers of successfully received packets. Depending on the protocol, individual and group confirmations are distinguished
Window/conveyor	Limit the range of sequence numbers that can be used to transmit packets. Multicast transmission and handshaking can significantly increase protocol throughput compared to waiting for acknowledgments. The window size can be calculated based on the reception and buffering capabilities of the receiving end, as well as the network load level

Programming Features

TCP streaming connection
- situation a) is possible in case of poor communication, if the time interval between arrivals of groups of network layer datagrams is large:
  - computer1 uses the send function once;
  - computer2 does not receive all the information in one recv call (multiple calls are needed).
- situation b) is possible if the time interval between calls to the send function is small and the data size is small:
  - computer1 uses the send function several times;
  - computer2 receives all the information in one recv call.
Via UDP protocol
- situation a) - impossible
  - computer1 uses the send function once; at the network level, the UDP segment is divided into several packets;
  - computer2 always receives the segment with one recv call and only if all IP datagrams have arrived.
- situation b) - impossible
  - different calls to the sendto function on computer1 correspond to different UDP datagrams and different calls to recvfrom on computer2.
If the buffer in the recv and recvfrom functions is smaller than the size of the sent data, then in the case of UDP, part of the data is lost, and in the case of TCP, the rest is saved for the subsequent recv call.
A UDP server has 1 socket, and a TCP server has many different sockets (according to the number of simultaneously connected clients) and each one receives its own information.

Introduction

TCP is a connection-oriented protocol. Before either party can send data to the other, a connection must be established between them. In this chapter, we will look in detail at how a TCP connection is established and how it is terminated.

Because TCP requires a connection to be established between two ends to operate, it differs from connectionless protocols such as UDP. In we saw that when using UDP, each side simply sends datagrams to the other without first establishing a connection.

Establishing and terminating a connection

In order to see what happens when a TCP connection is established and terminated, we executed the following command on the svr4 system:

svr4% telnet bsdi discard
Trying 192.82.148.3 ...
Connected to bsdi.
Escape character is "^]".
^] enter Control, right square bracket,
telnet> quit to have the Telnet client close the connection
Connection closed.

The telnet command establishes a TCP connection to the bsdi host on the port corresponding to the discard service (Chapter 1, section). This is exactly the type of service we need to see what happens when a connection is established and broken, but without exchanging data.

tcpdump output

Figure 18.1 shows tcpdump output for the segments generated by this command.

1 0.0 svr4.1037 > bsdi.discard: S 1415531521:1415531521 (0)
win 4096
2 0.002402 (0.0024) bsdi.discard > svr4.1037: S 1823083521:1823083521 (0)
ack 1415531522 win 4096

3 0.007224 (0.0048) svr4.1037 > bsdi.discard: . ack 1823083522 win 4096
4 4.155441 (4.1482) svr4.1037 > bsdi.discard: F 1415531522:1415531522 (0)
ack 1823083522 win 4096
5 4.156747 (0.0013) bsdi.discard > svr4.1037: . ack 1415531523 win 4096
6 4.158144 (0.0014) bsdi.discard > svr4.1037: F 1823083522:1823083522 (0)
ack 1415531523 win 4096
7 4.180662 (0.0225) svr4.1037 > bsdi.discard: . ack 1823083523 win 4096

Figure 18.1 tcpdump output for establishing and tearing down a TCP connection.

These seven TCP segments contain only TCP headers. There was no data exchange.

For TCP segments, each output line begins with

source > destination: flags

where flags are four of the six flag bits of the TCP header (). Figure 18.2 shows five different symbols that correspond to flags and may appear in the output.

	3 character abbreviation	Description
		sync sequence numbers
		the sender has finished transmitting data
		connection reset
		sending data to the receiving process as quickly as possible
		none of the four flags are set

Figure 18.2 Flag characters output by tcpdump for flag bits in the TCP header.

In this example we see the S, F and dot flags. Two more flags (R and P) will appear later. The other two flag bits in the TCP header - ACK and URG - are printed by the tcpdump command.

More than one of the four flag bits shown in Figure 18.2 may be present in a single segment, however, usually only one flag is set.

RFC 1025 [Postel 1987] calls a segment in which the maximum combination of all available flag bits is set simultaneously (SYN, URG, PSH, FIN and 1 byte of data) a Kamikaze packet (there are several other definitions of a similar packet in English, namely "dirty package", "New Year tree package", etc.).

In line 1, the field 1415531521:1415531521 (0) means that the packet's sequence number is 1415531521 and the number of data bytes in the segment is 0. The tcpdump command prints the starting sequence number, a colon, the estimated ending sequence number, and then the number of data bytes in parentheses. It is possible to view the expected final sequence number when the number of bytes is greater than 0. The field appears (1) if the segment contains one or more bytes of user data, or (2) if the SYN, FIN, or RST flag is set. In lines 1, 2, 4, and 6 in Figure 18.1, this field appears because the flag bits are set—no data was exchanged in this example.

In line 2, the ack 1415531522 field contains the confirmation number. It is printed only if the ACK flag is set. The win 4096 field on each line of output shows the window size that was declared by the sender. In this example, where there was no communication, the window size was left unchanged and the default value of 4096 was used. (We'll cover the TCP window size in a section in Chapter 20.)

And the last field in the output in Figure 18.1, shows the maximum segment size (MSS - maximum segment size), an option that is set by the sender. The sender does not want to receive TCP segments larger than this value. This is usually done to avoid fragmentation (Chapter 11, section). We'll cover the maximum segment size in a section of this chapter, and show the format of the various TCP options in a section of this chapter.

Timing diagrams

Figure 18.3 shows the timing diagram corresponding to this packet exchange. (We described some of the basic characteristics of timing diagrams when we first looked at .) This figure shows which side is sending the packets. Also shown is the output of the tcpdump command (it printed SYN instead of S). This timing diagram has removed the window size value as it is not relevant to our discussion.

Connection Establishment Protocol

Now let's return to the details of the TCP protocol, which are shown in Figure 18.3. To establish a TCP connection, you must:

The requester (usually called the client) sends a SYN segment indicating the port number of the server to which the client wants to connect and the client's original sequence number (in this example, ISN, 1415531521). This is segment number 1.
The server responds with its SYN segment containing the server's original sequence number (segment 2). The server also acknowledges the arrival of the client's SYN using an ACK (client's ISN plus one). A single sequence number is used per SYN.
The client must acknowledge the arrival of the SYN from the server using an ACK (server ISN plus one, segment 3).

These three segments are enough to establish a connection. This is often called a three-way handshake.

Figure 18.3 Timing diagram of connection establishment and termination.

The party that sends the first SYN is considered to have activated the connection (active open). The other side, which receives the first SYN and sends the next SYN, takes a passive part in opening the connection (passive open). (In a section of this chapter, we will detail the procedure for opening a connection, where both parties are considered active when the connection is established.)

When each side has sent its SYN to establish a connection, it selects an Initial Sequence Number (ISN) for that connection. The ISN must change every time, so each connection has its own, different ISN. RFC 793 [Postel 1981c] specifies that the ISN is a 32-bit counter that increments by one every 4 microseconds. Thanks to sequence numbers, packets that linger on the network and are delivered later are not perceived as part of an existing connection.

How is the sequence number selected? In 4.4BSD (and most Berkeley implementations), the initial sequence number is set to 1 when the system is initialized. This practice is discouraged by the Host Requirements RFC. This value then increases by 64000 every half second and returns to 0 every 9.5 hours. (This corresponds to a counter that increases by one every 8 microseconds, rather than every 4 microseconds.) Additionally, each time a connection is established, this variable is incremented by 64000.

The 4.1 second gap between segments 3 and 4 corresponds to the time between establishing a connection and entering the telnet quit command to terminate the connection.

Disconnect Protocol

In order to establish a connection, 3 segments are needed, and in order to break it - 4. This is explained by the fact that the TCP connection can be in a half-closed state. Since a TCP connection is full-duplex (data can travel in each direction independently of the other direction), each direction must be closed independently of the other. The rule is that each party must send a FIN when the data transfer is complete. When TCP receives a FIN, it must notify the application that the remote party is tearing down the connection and stops sending data in that direction. FIN is usually sent as a result of the application being closed.

We can say that the side that closes the connection first (sends the first FIN) performs an active close, and the other side (which received that FIN) performs a passive close. Typically, one side performs an active close and the other does a passive close, however, in a section of this chapter, we will see that both sides can perform an active close.

Segment number 4 in Figure 18.3 closes the connection and is sent when the Telnet client stops working. This happens when we enter quit. In this case, the TCP client is forced to send FIN, closing the data flow from the client to the server.

When the server receives the FIN, it sends back an ACK with the received sequence number plus one (segment 5). FIN uses one sequence number, just like SYN. At this point, the TCP server also delivers an end-of-file sign to the application (to shut down the server). The server then closes its connection, which causes its TCP to send a FIN (segment 6), which the client must acknowledge (ACK) by incrementing the received sequence number (segment 7).

Figure 18.4 shows a typical exchange of segments when closing a connection. Sequence numbers are omitted. In this figure, FINs are sent due to applications closing their connections, while the ACK for these FINs is generated automatically by the TCP software.

Connections are usually established by the client, that is, the first SYN travels from the client to the server. However, either side can actively close the connection (send the first FIN). Often, however, it is the client that determines when the connection should be closed, since the client process is largely controlled by the user, who enters something like "quit" to close the connection. In Figure 18.4, we can reverse the labels at the top of the figure, calling the left side the server and the right side the client. However, even in this case, everything will work exactly as shown in the figure. (The first example in the section in Chapter 14, for example, showed how the time server closes the connection.)

Figure 18.4 Normal segment exchange when closing a connection.

Normal tcpdump output

Because the task of sorting through a huge number of sequence numbers is quite complex, tcpdump's output contains the complete sequence numbers for SYN segments only, and all subsequent sequence numbers are shown as relative offsets from the original sequence numbers. (To get the output shown in Figure 18.1, we had to specify the -S option.) The normal tcpdump output corresponding to Figure 18.1 is shown in Figure 18.5.

1 0.0 svr4.1037 > bsdi.discard: S 1415531521:1415531521(0)
win 4096
2 0.002402 (0.0024) bsdi.discard > svr4.1037: S 1823083521:1823083521(0)
ack 1415531522
win 4096
3 0.007224 (0.0048) svr4.1037 > bsdi.discard: . ack 1 win 4096
4 4.155441 (4.1482) svr4.1037 > bsdi.discard: F 1:1 (0) ack 1 win 4096
5 4.156747 (0.0013) bsdi.discard > svr4.1037: . ack 2 win 4096
6 4.158144 (0.0014) bsdi.discard > svr4.1037: F 1:1 (0) ack 2 win 4096
7 4.180662 (0.0225) svr4.1037 > bsdi.discard: . ack 2 win 4096

Figure 18.5 Typical tcpdump command output corresponding to connection establishment and termination.

Unless we have a need to show full sequence numbers, we will use this form of output in all of the following examples.

Timeout when establishing a connection

There are several reasons why a connection cannot be established. For example, the host (server) is turned off. To simulate a similar situation, we executed the telnet command after disconnecting the Ethernet cable from the server. Figure 18.6 shows the output of the tcpdump command.

1 0.0 bsdi.1024 >
win 4096
2 5.814797 (5.8148) bsdi.1024 > svr4.discard: S 291008001:291008001(0)
win 4096
3 29.815436 (24.0006) bsdi.1024 > svr4.discard: S 291008001:291008001(0)
win 4096

Figure 18.6 Output of the tcpdump command to establish a connection that has timed out.

What you need to pay attention to in this output is how often the TCP client sends SYN in an attempt to establish a connection. The second segment is sent 5.8 seconds after the first, and the third is sent 24 seconds after the second.

It should be noted that this example ran approximately 38 minutes after the client was rebooted. Therefore, the corresponding original sequence number is 291008001 (approximately 38x60x6400x2). At the beginning of the chapter, we said that typical Berkeley systems set the initial sequence number to 1 and then increment it by 64000 every half second.

It should also be noted that this is the first TCP connection since the system was rebooted, since the client port number is 1024.

However, Figure 18.6 does not show how long the TCP client made retransmissions before abandoning its attempt. In order to view these temporary values, we must execute the telnet command as follows:

bsdi % date ; telnet svr4 discard ; date
Thu Sep 24 16:24:11 MST 1992
Trying 192.82.148.2...
telnet: Unable to connect to remote host: Connection timed out
Thu Sep 24 16:25:27 MST 1992

The time is 76 seconds. Most Berkeley systems set a time limit of 75 seconds, during which time a new connection must be established. In the Chapter 21 section, we will see that the third packet sent by the client will time out at approximately 16:25:29, that is, 48 seconds after it was sent, but the client will not stop trying after 75 seconds .

First time out

In Figure 18.6, notice that the first timeout, 5.8 seconds, is close to 6 seconds but not 6 seconds, while the second timeout is almost exactly 24 seconds. Ten more similar tests were performed, and in each of them the value of the first timeout ranged from 5.59 seconds to 5.93 seconds. The second timeout, however, was always 24.00 seconds.

This is because BSD TCP implementations start a timer every 500 milliseconds. This 500 millisecond timer is used for various TCP timeouts, all of which will be described in the following chapters. When we enter the telnet command, the initial 6 second timer is set (12 clock ticks), however it can expire anywhere between 5.5 and 6 seconds. Figure 18.7 shows how this happens.

Figure 18.7 500-ms TCP timer.

Since the timer is set to 12 ticks, the first timer decrement can occur between 0 and 500 milliseconds after it is set. From this point on, the timer decreases approximately every 500 milliseconds, but the first period of time may vary. (We use the word "approximately" because the time TCP receives control every 500 milliseconds is approximate, since another interrupt may pass and be handled by the kernel.)

When this 6-second timer expires at the tick labeled 0 in Figure 18.7, the timer is reset to 24 seconds (48 ticks). This next timer will be 24 seconds because it was set at the point in time when the 500 millisecond TCP timer was called by the kernel and not by the user.

Service type field

In Figure 18.6 we see the expression. This is the type-of-service field (TOS - type-of-service) in an IP datagram (). The Telnet client in BSD/386 sets this field to achieve minimum latency.

Maximum segment size

The maximum segment size (MSS) is the largest piece of data that TCP will send to the remote end. When a connection is established, each side can advertise its MSS. The values we saw were 1024. The resulting IP datagram is typically 40 bytes larger: 20 bytes allocated for the TCP header and 20 bytes for the IP header.

Some publications say that this option is installed "by agreement". In reality, the agreement is not used in this case. When a connection is established, each side advertises the MSS it intends to receive. (The MSS option can only be used in the SYN segment.) If one side does not accept the MSS option from the other side, the default size of 536 bytes is used. (In this case, with a 20-byte IP header and a 20-byte TCP header, the IP datagram size will be 576 bytes.)

In general, the more MSS the better, as long as fragmentation does not occur. (This is not always true. Refer to and to verify this.) Larger segment sizes allow more data to be sent in each segment, which reduces the relative cost of IP and TCP headers. When TCP sends a SYN segment, either when a local application wants to establish a connection, or when a connection request is received from a remote host, the MSS value can be set to the MTU of the outgoing interface minus the size of the fixed TCP and IP headers. For Ethernet, MSS can be up to 1460 bytes. When using IEEE 802.3 encapsulation (Chapter 2, Section), the MSS can be up to 1452 bytes.

The value of 1024 that we see in this chapter corresponds to connections involving BSD/386 and SVR4, because most BSD implementations require the MSS to be a multiple of 512. Other systems such as SunOS 4.1.3, Solaris 2.2, and AIX 3.2 .2, declare an MSS of 1460 when both sides are on the same Ethernet. Calculations given in [Mogul 1993] show that an MSS of 1460 provides better Ethernet performance than an MSS of 1024.

If the destination IP address is "non-local", the MSS is usually set to the default - 536. Whether the final destination is local or non-local can be determined as follows. A destination whose IP address has the same network ID and the same subnet mask as the sender is local; a destination whose IP address is completely different from the network ID is non-local; a destination with the same network ID, but with a different subnet mask, can be either local or non-local. Most implementations provide a configuration option ( and ) that allows the system administrator to specify which subnets are local and which are non-local. Setting this option determines the maximum advertised MSS (which can be as large as the MTU of the outgoing interface), otherwise the default value of 536 is used.

MSS allows the host to set the size of datagrams that will be sent by the remote party. If you take into account the fact that the host also limits the size of the datagrams it sends, this avoids fragmentation when the host is connected to a network with a smaller MTU.

Imagine our host slip, which has a SLIP channel with an MTU of 296, connected to a bsdi router. Figure 18.8 shows these systems and the host sun.

Figure 18.8 TCP connection from sun to slip and MSS values.

We established a TCP connection from sun to slip and scanned the segments using tcpdump. Figure 18.9 shows only the connection establishment (window size declarations removed).

1 0.0 sun.1093 > slip.discard: S 517312000:517312000(0)

2 0.10 (0.00) slip.discard > sun.1093: S 509556225:509556225(0)
ack 517312001
3 0.10 (0.00) sun.1093 > slip.discard: . ack 1

Figure 18.9 Tcpdump output for establishing a connection from sun to slip.

The important thing to note here is that sun cannot send a segment with a data chunk larger than 256 bytes, since it received an MSS of 256 (line 2). Moreover, since slip knows that the outgoing interface's MTU is 296, even if sun advertises an MSS of 1460, it can never send more than 256 bytes of data to avoid fragmentation. However, the system may send less data than the MSS advertised by the remote party.

Fragmentation can only be avoided in this way if the host is directly connected to a network with an MTU of less than 576. If both hosts are connected to Ethernet and both advertise an MSS of 536, but the intermediate network has an MTU of 296, fragmentation will occur. The only way to avoid this is to use the transport MTU discovery mechanism (Chapter 24, section).

Half-closed TCP

TCP provides the ability for one party to a connection to stop transmitting data but still receive data from the other party. This is called half-closed TCP. As we mentioned earlier, not many apps can take advantage of this feature.

To use this programming interface feature, you need to allow the application to say, "I'm done transferring data, so I'm sending an end-of-file indicator (FIN) to the remote end, but I still want to receive data from the remote end before that." until it sends me an end-of-file sign (FIN)."

The sockets API supports semi-closed mode if the application calls shutdown with a second argument of 1 instead of calling close. Most applications, however, close connections in both directions by calling close.

Figure 18.10 shows a typical scenario for semi-closed TCP. We have shown the client on the left side, he initiates the semi-closed mode, however, either side can do this. The first two segments are the same: FIN from the initiator, followed by ACK and FIN from the receiver. However, the scenario will be different from the one in Figure 18.4 because the party that received the half-close order may still be sending data. We have shown only one data segment followed by an ACK, but in this case any number of data segments can be sent. (We'll cover the exchange of data segments and acknowledgments in more detail in .) When the end that received the half-close order has made a data transfer, it closes its part of the connection, causing a FIN to be sent, with an end-of-file flag delivered to the application that initiated "semi-closed" mode. When the second FIN is confirmed, the connection is considered completely closed.

Figure 18.10 TCP in half-closed mode.

What can semi-closed mode be used for? One example would be the Unix command rsh(1), which executes a command on another system. Team

sun % rsh bsdi sort< datafile

will run the sort command on the bsdi host, with the rsh command's standard input read from a file named datafile. The rsh command creates TCP connections between itself and the program that will be executed on the remote host. rsh then functions quite simply: the command copies standard input (datafile) into the connection and copies from the connection to standard output (our terminal). Figure 18.11 shows how this happens. (We remember that the TCP connection is full duplex.)

Figure 18.11 Command: rsh bsdi sort< datafile.

On the remote host bsdi, the rshd server runs the sort program such that its standard input and standard output are directed to the TCP connection. Chapter 14 gives a detailed description of the structure of the Unix process involved here, but we are interested in how the TCP connection and TCP semi-closed mode are used.

The sort program cannot begin generating output until all of its input has been read. All raw data coming over the connection from the rsh client to the sort server is sent to the file that is to be sorted. When the end-of-file mark in the input (datafile) is reached, the rsh client performs a half-close of the TCP connection. The sort server then takes the end-of-file mark from its standard input (TCP connection), sorts the file, and writes the result to its standard output (TCP connection). The rsh client continues to read the TCP connection at its end, copying the sorted file to its standard output.

Without using half-closed mode, some additional technique is required that allows the client to tell the server that it has finished sending data, but the client is still allowed to receive data from the server. Alternatively, two connections must be used, but semi-closed mode is preferred.

TCP Transmission State Diagram

We have described several rules for establishing and breaking a TCP connection. These rules are compiled into a transmission state diagram, which is shown in Figure 18.12.

It should be noted that this diagram is a standard state diagram. We have marked the normal client transfer with solid bold arrows and the normal server transfer with dotted bold arrows.

The two transfers leading to the ESTABLISHED state correspond to the opening of the connection, and the two transfers leading from the ESTABLISHED state correspond to the termination of the connection. The ESTABLISHED state occurs at the moment when it becomes possible to transfer data between two parties in both directions. The following chapters will describe what happens in this state.

We've combined the four boxes at the bottom left of the diagram inside a dotted box and labeled them "active close". The other two squares (CLOSE_WAIT and LAST_ACK) are joined by a dotted frame and labeled "passive close".

The names of the 11 states (CLOSED, LISTEN, SYN_SENT, and so on) in this figure are chosen to correspond to the states output by the netstat command. The names of netstat, in turn, are almost identical to the names described in RFC 793. The CLOSED state is not really a state, but it is the starting and ending point for the diagram.

Changing state from LISTEN to SYN_SENT is theoretically possible, but is not supported in Berkeley implementations.

And a state change from SYN_RCVD back to LISTEN is only possible if the SYN_RCVD state was entered from the LISTEN state (this is the usual scenario), and not from the SYN_SENT state (simultaneous opening). This means that if we did a passive open (entered the LISTEN state), received a SYN, sent a SYN with an ACK (entered the RECEIVED_SYN - SYN_RCVD state) and then received a reset instead of an ACK, the endpoint returns to the LISTEN state and waits for another connection request to arrive.

Figure 18.12 TCP state change diagram.

Figure 18.13 shows a typical TCP connection establishment and termination. The different states that the client and server go through are also detailed.

Figure 18.13 TCP states corresponding to normal connection opening and closing.

In Figure 18.13, we assumed that the client on the left side is performing an active open, and the server on the right is performing a passive open. We also showed that the client is performing an active close (as we mentioned earlier, either party can perform an active close).

You should trace the state changes in Figure 18.13 using the state change datagram shown in Figure 18.12 to understand why a particular state change occurs.

Standby state 2MSL

The TIME_WAIT state is also sometimes called the 2MSL wait state. Each implementation chooses a value for the maximum segment lifetime (MSL - maximum segment lifetime). This is the maximum time a segment can exist on the network before it is discarded. We know that this time is limited because TCP segments are transmitted via IP datagrams, and each IP datagram has a TTL field that limits its lifetime.

RFC 793 [Postel 1981c] specifies that the MSL should be 2 minutes. In different implementations, this value has a value of 30 seconds, 1 minute or 2 minutes.

It was said that the lifetime of an IP datagram is limited by the number of transfers, not by a timer.

When using MSL, the following rules apply: when TCP performs an active close and sends the last segment containing an acknowledgment (ACK), the connection must remain in the TIME_WAIT state for a period equal to two MSLs. This allows TCP to resend the last ACK in the event that the first ACK is lost (in which case the remote side will time out and retransmit its final FIN).

Another purpose of 2MSL wait is that while a TCP connection is waiting for 2MSL, the socket pair allocated for that connection (client IP address, client port number, server IP address, and server port number) cannot be reused. This connection can only be reused when the 2MSL timeout expires.

Unfortunately, most implementations (Berkeley is one of them) are subject to more stringent requirements. By default, a local port number cannot be reused as long as that port number is the local port number of a socket pair that is in the 2MSL wait state. Below we will look at examples of general requirements.

Some implementations and APIs provide tools that allow you to work around these limitations. Using the socket API, the SO_REUSEADDR socket option can be specified. It allows the caller to assign itself a local port number that is in the 2MSL state, however we will see that TCP rules do not allow this port number to be used on a connection that is in the 2MSL wait state.

Each delayed segment arriving on a connection that is in the 2MSL wait state is discarded. Since the connection is defined by a pair of sockets in the 2MSL state, this connection cannot be reused until we can establish a new connection. This is done to ensure that late packets are not accepted as part of a new connection. (A connection is defined by a pair of sockets. A new connection is called a restoration or revival of that connection.)

As we already showed in Figure 18.13, the client typically performs an active close and enters TIME_WAIT mode. The server usually performs a passive shutdown and does not go through TIME_WAIT mode. We can conclude that if we shut down the client and immediately restart it, this new client will not be able to use the same local port number. This is not a problem since clients typically use dynamically assigned ports and do not care which dynamically assigned port is currently in use.

However, from the server's point of view, everything is different, since servers use pre-known ports. If we shut down a server that has an established connection and try to restart it immediately, the server cannot use its previously known port number as the endpoint of the connection, since that port number is part of the connection that is in the 2MSL wait state. Therefore, it may take 1 to 4 minutes before the server is restarted.

You can observe a similar scenario using the sock program. We started the server, connected the client to it, and then turned off the server:

sun % sock -v -s 6666
(start the client on bsdi, which will connect to this port)
connection on 140.252.13.33.6666 from140.252.13.35.1081
^? enter the interrupt character to shut down the server
sun % sock -s 6666 and try to immediately restart the server on the same port
can"t bind local address: Address already in use
sun % netstat let's try to check the connection status
Active Internet connections
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp 0 0 sun.6666 bsdi.1081 TIME_WAIT
many lines deleted

When we try to restart the server, the program throws an error message indicating that it cannot capture its pre-known port number because it is already in use (in the 2MSL wait state).

We then immediately run netstat to look at the state of the connection and verify that it is indeed in the TIME_WAIT state.

If we keep trying to restart the server and see the time when it succeeds, we can calculate the value of 2MSL. For SunOS 4.1.3, SVR4, BSD/386 and AIX 3.2.2, restarting the server will take 1 minute, which means the MSL is 30 seconds. In Solaris 2.2, this server restart takes 4 minutes, which means the MSL is 2 minutes.

We can see the same error generated by the client if the client tries to seize a port that is part of a connection that is in 2MSL idle mode (usually the client does not do this):

sun % sock -v bsdi echo we start the client that connects to the echo server
connected on 140.252.13.33.1162 to 140.252.13.35.7
hello there print this line
hello there it echoes from the server
^D enter the end-of-file character to turn off the client
sun % sock -b1162 bsdi echo
can"t bind local address: Address already in use

When the client was first launched, the -v option was specified, which allows you to see what local port number is being used (1162). The second time the client was run, the -b option was specified, which tells the client to assign itself a local port number of 1162. As we expected, the client cannot do this, since this port number is part of a connection that is in the 2MSL state.

There is one feature of the 2MSL wait state that needs to be mentioned here, which we will return to when we talk about the File Transfer Protocol (FTP). As mentioned earlier, a pair of sockets (consisting of a local IP address, a local port, a remote IP address, and a remote port) remains in the 2MSL wait state. However, while many implementations allow a process to reuse a port number that is part of a connection that is in 2MSL mode (usually using the SO_REUSEADDR option), TCP may not allow a new connection to be created to the same socket pair. This can be proven using the following experiment:

sun % sock -v -s 6666 start of the server listening on port 6666
(we launch the client on bsdi, which connects to this port)
connection on 140.252.13.33.6666 from 140.252.13.35.1098
^? enter the interrupt character to shut down the server
sun % sock-b6666 bsdi 1098 we start the client with local port 6666
can"t bind local address: Address already in use
sun % sock -A -b6666 bsdi 1098 try again, this time with the -A option
active open error: Address already in use

The first time we ran our sock program as a server on port 6666 and connected a client to it from the bsdi host. The client's dynamically assigned port number is 1098. We shut down the server, so it did an active shutdown. In this case, 4 parameters - 140.252.13.33 (local IP address), 6666 (local port number), 140.252.13.35 (remote IP address) and 1098 (remote port number) on the server fall into the 2MSL state.

The second time we ran this program as a client, specifying local port number 6666, it attempted to connect to the bsdi host on port 1098. When we tried to reuse local port 6666, an error was generated because this port is in the 2MSL state.

To avoid this error, we ran the program again with the -A option, which enables the SO_REUSEADDR option. This allowed the program to assign itself a port number of 6666, but it received an error when the program attempted to do an active open. Even if the program can assign itself port number 6666, it will not be able to create connections to port 1098 on the bsdi host because the socket pair defining this connection is in the 2MSL wait state.

What if we try to establish a connection from another host? First, we must restart the server on sun with the -A flag, since the port it needs (6666) is part of a connection that is in the 2MSL wait state:

sun % sock -A -s 6666 start the server listening on port 6666

Then, before the 2MSL wait state ends on sun, we start the client on bsdi:

bsdi % sock -b1098 sun 6666
connected on 140.252.13.35.1098 to 140.252.13.33.6666

Unfortunately, it works! This is a shortcoming of the TCP specification, but is supported by most Berkeley implementations. These implementations accept the arrival of a new connection request for a connection that is in the TIME_WAIT state if the new sequence number is greater than the last sequence number used in the previous connection. In this case, the ISN for the new connection is set to the last sequence number for the previous connection plus 128000. The appendix to RFC 1185 shows the possible disadvantages of this technology.

This implementation feature allows the client and server to reuse the same port numbers to successfully re-establish the same connection, provided, however, that the server has not actively closed it. We'll see another example of the 2MSL wait state on , when we discuss FTP. Also refer to this chapter.

Quiet time concept

The 2MSL wait state provides protection against late packets belonging to earlier connections, so that they will not be interpreted as part of a new connection that uses the same local and remote IP addresses and port numbers. However, this only works if the host with the connection in 2MSL state has not failed.

What if a host with ports in the 2MSL state crashed, rebooted during MSL, and immediately established new connections using the same local and remote IP addresses and port numbers corresponding to the local ports that were in the 2MSL state before the failure? In this case, late segments from a connection that existed before the failure may be misinterpreted as belonging to a new connection created after the reboot. This may occur regardless of what initial sequence number is selected after a reboot.

To protect against such unwanted scenarios, RFC 793 specifies that TCP should not create new connections until the MSL expires after the boot time. This is called quiet time.

In some implementations, hosts wait even longer than the MSL time after a reboot.

State WAIT_AND_CONFIRMATION_FIN (FIN_WAIT_2)

In FIN_WAIT_2 state we send our FIN and the remote party acknowledges it. If we are not in a half-closed connection state, we expect the application at the remote end to recognize that the end-of-file has been received and close its side of the connection, and send us a FIN. Only when the process on the remote end performs this closure will our end switch from FIN_WAIT_2 mode to TIME_WAIT mode.

This means that our side of the connection may remain in this mode forever. The remote side is still in the CLOSE_WAIT state and can remain in that state forever until the application decides to close.

Most Berkeley implementations prevent this kind of waiting forever in the FIN_WAIT_2 state as follows. If the application that performed the active close performed a full close rather than a half-close, indicating that it is waiting to receive data, then a timer is set. If a connection is idle for 10 minutes plus 75 seconds, TCP places the connection in CLOSED mode. The comments state that such a characteristic is inconsistent with the protocol specification.

Reset segments

We mentioned that there is a bit in the TCP header called RST, which stands for reset. In general, a reset signal is sent by TCP if arriving segments do not belong to the specified connection. (We use the term "referenced connection", which means a connection identified by a destination IP address and destination port number, and a source IP address and source port number. In RFC 793, this is called a "socket".)

Connection request on a non-existent port

The most common case in which a reset is generated is when a connection request arrives and there is no process listening on the destination port. In the case of UDP, as we saw in the section of Chapter 6, if a datagram arrives on an unused destination port, an ICMP port unreachable error is generated. TCP uses reset instead.

We'll give a simple example using a Telnet client, specifying a port number that is not used at the destination:

bsdi % telnet svr4 20000 port 20000 is not used
Trying 140.252.13.34...
telnet: Unable to connect to remote host: Connection refused

An error message is reported to the Telnet client immediately. Figure 18.14 shows the packet exchange corresponding to this command.

1 0.0 bsdi.1087 > svr4.20000: S 297416193:297416193(0)
win 4096
2 0.003771 (0.0038) svr4.20000 > bsdi.1087: R 0:0 (0) ack 297416194 win 0

Figure 18.14 Generating a reset when trying to open a connection to a non-existent port.

The values we need to look at in more detail in this figure are the sequence number field and the acknowledgment number field in the reset. Since the acknowledgment (ACK) bit was not set on the arriving segment, the reset sequence number is set to 0 and the acknowledgment number is set to the incoming initial sequence number (ISN) plus the number of data bytes in the segment. Even though there is no real data present in the arriving segment, the SYN bit logically occupies 1 byte in sequence number space; Thus, in this example, the acknowledgment number in reset is set to ISN plus the data length (0) plus one SYN bit.

Disconnection

In a section of this chapter, we saw that the usual method used to terminate a connection is for one of the parties to send a FIN. This is sometimes called an orderly release because the FIN is sent after all previously queued data has been sent, and usually no data loss occurs. However, it is possible to terminate the connection by sending a reset instead of a FIN. This is sometimes called an abortive release.

This kind of connection termination gives the application two options: (1) any data in the queue is lost and a reset is sent immediately, and (2) the party receiving the RST can say that the remote party has closed the connection, instead of closing it normally . The application programming interface (API) used by the application must provide a way to generate such a reset instead of a normal shutdown.

We can see what happens when a break like this occurs using our sock program. The sockets API provides this capability using the linger on close socket option (SO_LINGER). We specified the -L option with a delay time of 0. This means that instead of the normal FIN, a reset will be sent to close the connection. We will connect to the server version of the sock program on svr4:

bsdi % sock -L0 svr4 8888 this is the client; the server is shown next
hello, world enter one line that will be sent to the remote end
^D enter the end-of-file character to turn off the client

Figure 18.15 shows the output of the tcpdump command for this example. (We have removed all window declarations in this figure since they do not affect our reasoning.)

1 0.0 bsdi.1099 > svr4.8888: S 671112193:671112193(0)

2 0.004975 (0.0050) svr4.8888 > bsdi.1099: S 3224959489:3224959489(0)
ack 671112194
3 0.006656 (0.0017) bsdi.1099 > svr4.8888: . ack 1
4 4.833073 (4.8264) bsdi.1099 > svr4.8888: P 1:14 (13) ack 1
5 5.026224 (0.1932) svr4.8888 > bsdi.1099: . ack 14
6 9.527634 (4.5014) bsdi.1099 > svr4.8888: R 14:14 (0) ack 1

Figure 18.15 Terminating a connection using reset (RST) instead of FIN.

Lines 1-3 show normal connection setup. Line 4 sends the string of data we printed (12 characters plus the Unix newline), and line 5 receives an acknowledgment that the data was received.

Line 6 corresponds to the end-of-file (Control-D) input that we used to shut down the client. Since we specified a break instead of a normal close (command line option -L0), TCP on bsdi will send a RST instead of the normal FIN. The RST segment contains the sequence number and confirmation number. Also note that the RST segment does not expect a response from the remote end - it contains no acknowledgment at all. The reset recipient terminates the connection and informs the application that the connection has been terminated.

We will receive the following error from the server with such an exchange:

svr4% sock -s 8888 run as a server, listen to port 8888
hello, world this is what the client sent
read error: Connection reset by peer

This server reads from the network and copies everything it receives to standard output. Normally it exits after receiving an end-of-file sign from its TCP, however here we see that it received an error when the RST arrived. The error is exactly what we expected: the connection was closed by one of the connection participants.

Defining a Half-Open Connection

A TCP connection is considered half-open if one side has closed or terminated the connection without notifying the other side. This can happen at any time if one of the two hosts fails. Since there will be no attempts to transmit data over a half-open connection for some time, one of the parties will work until it determines that the remote party has failed.

Another reason why a half-open connection may occur is that the client host was powered down instead of shutting down the client application and then shutting down the computer. This happens when, for example, a Telnet client is launched on a PC, and users turn off the computer at the end of the working day. If no data was being transferred when the PC was turned off, the server will never know that the client has disappeared. When the user comes in the next morning, turns on his PC and starts a new Telnet client, a new server starts on the server host. Because of this, a lot of open TCP connections may appear on the server host. (In we will see a way in which one end of a TCP connection can detect that the other has disappeared. This is done using the TCP "keepalive" option.)

We can easily create a half-open connection. We launch the Telnet client on bsdi and connect to the discard server on svr4. We enter one line and use tcpdump to see how it goes, and then disconnect the Ethernet cable from the server host and restart it. By doing this, we simulated the failure of the server host. (We disconnected the Ethernet cable before rebooting the server to prevent it from sending FIN on an open connection, which some TCP modules do when shutting down.) After the server rebooted, we reconnected the cable and tried sending another line from the client to the server. Because the server has been rebooted and has lost all connection data that existed before the reboot, it knows nothing about the connections and is unaware of which connection the arriving segments belong to. In this case, the receiving TCP side responds with a reset.

bsdi % telnet svr4 discard client launch
Trying 140.252.13.34...
Connected to svr4.
Escape character is "^]".
hi there this line is sent normally
at this point we rebooted the server host
another line a reset was performed at this location
Connection closed by foreign host.

Figure 18.16 shows the tcpdump output for this example. (We have removed window declarations, service type information, and MSS declarations from the output since they do not affect our reasoning.)

1 0.0 bsdi.1102 > svr4.discard: S 1591752193:1591752193(0)
2 0.004811 (0.0048) svr4.discard > bsdi.1102: S 26368001:26368001(0)
ack 1591752194
3 0.006516 (0.0017) bsdi.1102 > svr4.discard: . ack 1

4 5.167679 (5.1612) bsdi.1102 > svr4.discard: P 1:11 (10) ack 1
5 5.201662 (0.0340) svr4.discard > bsdi.1102: . ack 11

6 194.909929 (189.7083) bsdi.1102 > svr4.discard: P 11:25 (14) ack 1
7 194.914957 (0.0050) arp who-has bsdi tell svr4
8 194.915678 (0.0007) arp reply bsdi is-at 0:0:c0:6f:2d:40
9 194.918225 (0.0025) svr4.discard > bsdi.1102: R 26368002:26368002(0)

Figure 18.16 Reset in response to the arrival of a data segment with a half-open connection.

Lines 1-3 perform normal connection establishment. In line 4, the string “hi there” is sent (this can be roughly translated as “hey you, there”) to the discard server, in line 5 a confirmation is received.

At this point we disconnected the Ethernet cable from the svr4, rebooted it and reconnected the cable. The entire procedure took approximately 190 seconds. We then printed the next line of input on the client ("another line"), and when we pressed Return, the line was sent to the server (line 6 in Figure 18.16). At the same time, a response was received from the server, however, since the server was rebooted, its ARP cache is empty, so in lines 7 and 8 we see the ARP request and response. Then a reset was sent on line 9. The client received a reset and reported that the connection was terminated by the remote host. (The last output message from the Telnet client is not as informative as it could be.)

Simultaneous opening

It is possible for two applications to actively open at the same time. A SYN must be sent from each side, and these SYNs must travel through the network towards each other. It also requires each side to have a port number that is known to the other side. This is called simultaneous open.

For example, an application on host A with local port 7777 is actively opening to port 8888 of host B. An application on host B with local port 8888 is actively opening to port 7777 of host A.

This is not the same as connecting a Telnet client from Host A to a Telnet server on Host B, while a Telnet client from Host B connects to a Telnet server on Host A. In this scenario, both Telnet servers perform a passive open, rather than an active one, whereas Telnet clients assign themselves dynamically assigned port numbers, rather than ports that are known in advance to remote Telnet servers.

TCP is specifically designed to handle simultaneous openings, resulting in one connection rather than two. (In other protocol families, such as the OSI transport layer, this creates two connections rather than one.)

When simultaneous discovery occurs, the protocol state changes are different from those shown in Figure 18.13. Both ends send SYN at the same time, entering the SYN_SENT state. When each end receives the SYN, the state changes to SYN_RECEIVED ( SYN_RCVD) (see Figure 18.12), and each end resends the SYN with an acknowledgment that the SYN was received. When each end receives a SYN plus an ACK, the state changes to ESTABLISHED. State changes are shown in Figure 18.17.

Figure 18.17 Exchange of segments during simultaneous opening.

Simultaneous opening requires the exchange of four segments, one more than a "three handshake". Also note that we do not call one end the client and the other the server, because in this case both act as both the client and the server.

It is possible to carry out simultaneous opening, but it is quite difficult. Both sides should start at approximately the same time, so that the SYNs overlap each other. In this case, a long return time between the two participants in the connection can help, allowing the SYN to overlap. To achieve this, we use the host bsdi as one party to the connection and the host vangogh.cs.berkeley.edu as the other. Since there is a dial-up SLIP channel between them, the return time must be quite large (several hundred milliseconds) to allow the SYN to overlap.

One end (bsdi) assigns itself local port 8888 (command line option -b) and actively opens to port 7777 of the other host:

bsdi % sock -v -b8888 vangogh.cs.berkeley.edu 7777
connected on 140.252.13.35.8888 to 128.32.130.2.7777
TCP_MAXSEG = 512
hello, world enter this line
and hi there this line was printed at the other end
connection closed by peer this is the output when FIN was received

The other end was started at about the same time, it assigned itself a local port number of 7777 and carried out an active open to port 8888:

vangogh % sock -v -b7777 bsdi.tuc.noao.edu 8888
connected on 128.32.130.2.7777 to 140.252.13.35.8888
TCP_MAXSEG = 512
hello, world this is entered at the other end
and hi there we printed this line
^D and then entered the end-of-file character EOF

We specified the -v flag on the sock command line to check the IP addresses and port numbers for each end of the connections. This flag also prints the MSS used at each end of the connection. We also printed one line as input on each end, which was sent to the remote end and printed there to make sure both hosts "see" each other.

Figure 18.18 shows the segment exchange for this connection. (We have removed some new TCP options that appeared in the original SYNs that came from vangogh, which is running 4.4BSD. We will describe these new options in a section of this chapter.) Note that the two SYNs (lines 1 and 2) are followed by two SYN with ACK (lines 3 and 4). In this case, a simultaneous opening occurs.

Line 5 shows the entered string "hello, world" which goes from bsdi to vangogh with confirmation on line 6. Lines 7 and 8 correspond to the string "and hi there" which goes in the other direction. Lines 9-12 show a normal connection closure.

Most Berkeley implementations do not correctly support concurrent opening. In these systems, if you can get the SYNs to overlap, you end up exchanging segments, each with a SYN and an ACK, in both directions. Most implementations do not always make the transition from the SYN_SENT state to the SYN_RCVD state shown in Figure 18.12.

1 0.0 bsdi.8888 > vangogh.7777: S 91904001:91904001(0)
win 4096
2 0.213782 (0.2138) vangogh.7777 > bsdi.8888: S 1058199041:1058199041(0)
win 8192
3 0.215399 (0.0016) bsdi.8888 > vangogh.7777: S 91904001:91904001(0)
ack 1058199042 win 4096

4 0.340405 (0.1250) vangogh.7777 > bsdi.8888: S 1058199041:1058199041(0)
ack 91904002 win 8192

5 5.633142 (5.2927) bsdi.8888 > vangogh.7777: P 1:14 (13) ack 1 win 4096
6 6.100366 (0.4672) vangogh.7777 > bsdi.8888: . ack 14 win 8192

7 9.640214 (3.5398) vangogh.7777 > bsdi.8888: P 1:14 (13) ack 14 win 8192
8 9.796417 (0.1562) bsdi.8888 > vangogh.7777: . ack 14 win 4096

9 13.060395 (3.2640) vangogh.7777 > bsdi.8888: F 14:14 (0) ack 14 win 8192
10 13.061828 (0.0014) bsdi.8888 > vangogh.7777: . ack 15 win 4096
11 13.079769 (0.0179) bsdi.8888 > vangogh.7777: F 14:14 (0) ack 15 win 4096
12 13.299940 (0.2202) vangogh.7777 > bsdi.8888: . ack 15 win 8192

Figure 18.18Segment exchange during simultaneous opening.

Simultaneous closure

As we said earlier, on one side (often, but not always, the client side) an active close is performed and the first FIN is sent. It is also possible for both parties to perform an active close, since TCP allows simultaneous closes.

In terms of Figure 18.12, both ends transition from the ESTABLISHED state to the FIN_WAIT_1 state when the application signals to close. At the same time, both send FIN, which may be found somewhere on the network. When FIN is received, each end transitions from the FIN_WAIT_1 state to the CLOSING state and sends a final ACK from each end. When each end receives the final ACK, the state changes to TIME_WAIT. Figure 18.19 shows the state changes.

Figure 18.19 Exchange of segments during simultaneous closure.

With simultaneous closure, the same number of packets are exchanged as with normal closure.

The TCP header may contain options (). The only options that are defined in the original TCP specification are: end of option list, no operation, and maximum segment size. We saw in our examples the MSS option in almost every SYN segment.

Newer RFCs, such as RFC 1323, define additional TCP options, most of which can only be found in later implementations. (We'll describe the new options in .) Figure 18.20 shows the format of the current TCP options - those described in RFC 793 and RFC 1323.

Figure 18.20 TCP options.

Each option begins with a 1-byte type (kind), which indicates the type of the option. Options whose type is 0 and 1 occupy 1 byte. Other options have a length of (len) bytes that follows the type byte. Length is the total length, including the type and length bytes.

A No Operation (NOP) option has been added to allow the sender to fill in fields that must be a multiple of 4 bytes. If we establish a TCP connection from a 4.4BSD system, the following options can be seen in the initial SYN segment using tcpdump:

The MSS option is set to 512, followed by NOP, followed by the window size option. The first NOP option is used to pad the 3-byte window size option to 4 bytes. Similarly, a 10-byte timestamp option is preceded by two NOPs to occupy 12 bytes.

The other four options, which have a type of 4, 5, 6, and 7, are called selective ACK options and echo options. We have not shown them in Figure 18.20 because the echo option has been replaced by a timestamp option, and selective ACKs, as currently defined, are still under discussion and were not included in RFC 1323. It should be noted that the T/TCP proposal for TCP transactions (section of Chapter 24) specifies three more options with types equal to 11, 12 and 13.

Implementation of a TCP server

In the Chapter 1 section, we said that most TCP servers are concurrent. When a request to establish a new connection arrives at the server, it accepts the connection and starts a new process that will serve the new client. Depending on the operating system, different methods are used to create a new server. On Unix systems, a new process is created using the fork function.

We need to discuss how TCP interacts with concurrent servers. I would like to answer the following question: how are port numbers processed when the server receives a request for a new connection from a client, and what happens if multiple connection requests arrive at the same time?

TCP Server Port Numbers

We can tell how TCP handles port numbers by looking at any TCP server. Let's look at a Telnet server using the netstat command. The following output is for systems that have no active Telnet connections. (We've removed all lines except the one that shows the Telnet server.)

sun % netstat -a -n -f inet
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp 0 0 *.23 *.* LISTEN

The -a flag reports on all network endpoints, not just those in the ESTABLISHED state. The -n flag prints IP addresses in numeric decimal notation instead of using DNS to convert addresses into names, and prints numeric port numbers (such as 23) instead of printing service names (in this case Telnet). The -f inet option reports only TCP and UDP endpoints.

The local address is output as *.23, where the asterisk is usually called a wildcard or metacharacter. This means that an incoming connection request (SYN) will be accepted from any local interface. If a host has multiple interfaces, we could specify one specific IP address as the local IP address (one of the host's IP addresses), and only connection requests accepted from that interface will be serviced. (We'll see how this is done later in this section.) The local port is 23, which is a known port for Telnet.

The remote address is shown as *.*, which means that the remote IP address and remote port number are not yet known because the endpoint is in the LISTEN state, waiting for the connection request to arrive.

Now we are starting a Telnet client on the slip host (140.252.13.65), which will connect to this server. Here are the corresponding lines of output from the netstat command:

tcp 0 0 140.252.13.33.23 140.252.13.65.1029 ESTABLISHED
tcp 0 0 *.23 *.* LISTEN

The first line for port 23 is the established connection (ESTABLISHED). For this connection, all four elements of the local and remote addresses are filled in: the local IP address and port number, and the remote IP address and port number. The local IP address corresponds to the interface on which the connection request arrived (Ethernet interface, 140.252.13.33).

The endpoint remained in the LISTEN state. This is the endpoint that the concurrent server uses to accept connection requests that will come in the future. In this case, the TCP module residing in the kernel created a new endpoint in the ESTABLISHED state when the incoming connection request arrived and was accepted. Also note that the port number for a connection that is in the ESTABLISHED state has not changed: it is 23, the same as for an endpoint that is in the LISTEN state.

Now we are starting another Telnet client from the same client (slip) to this server. The corresponding output from the netstat command would look like this:

Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp 0 0 140.252.13.33.23 140.252.13.65.1030 ESTABLISHED
tcp 0 0 140.252.13.33.23 140.252.13.65.1029 ESTABLISHED
tcp 0 0 *.23 *.* LISTEN

Now we see two established (ESTABLISHED) connections from the same host to the same server. Both have a local port number of 23. This is not a problem for TCP, since the remote port numbers are different. They must be different because every Telnet client uses a dynamically assigned port, and from the definition of a dynamically assigned port we know that only a port that is not currently in use on the host (slip) can be dynamically assigned.

This example shows that TCP demultiplexes incoming segments using all four values that are compared to the local and remote addresses: destination IP address, destination port number, source IP address, and source port number. TCP cannot determine which process received the incoming segment by looking only at the destination port number. Also, only one of the three endpoints on port 23, which is in the LISTEN state, accepts incoming connection requests. Endpoints in the ESTABLISHED state cannot receive SYN segments, and an endpoint in the LISTEN state cannot receive data segments.

Now we are starting another Telnet client from the solaris host, which will go through the SLIP channel from sun, rather than through Ethernet.

Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp 0 0 140.252.1.29.23 140.252.1.32.34603 ESTABLISHED
tcp 0 0 140.252.13.33.23 140.252.13.65.1030 ESTABLISHED
tcp 0 0 140.252.13.33.23 140.252.13.65.1029 ESTABLISHED
tcp 0 0 *.23 *.* LISTEN

The local IP address for the first established (ESTABLISHED) connection now corresponds to the address of the SLIP channel interface on the multi-interface host sun (140.252.1.29).

Limiting local IP addresses

We can look at what happens when the server does not use wildcards as its local IP addresses, setting instead to one specific local interface address. If we provide an IP address (or hostname) to our sock program when we use it as a server, that IP address becomes the local IP address of the listening endpoint. For example

sun % sock -s 140.252.1.29 8888

restricts this server to only connections coming from the SLIP interface (140.252.1.29). The netstat command output will show the following:

Proto Recv-Q Send-Q Local Address Foreign Address (state)

If we connect to this server via a SLIP channel from the solaris host, it will work.

Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp 0 0 140.252.1.29.8888 140.252.1.32.34614 ESTABLISHED
tcp 0 0 140.252.1.29.8888 *.* LISTEN

However, if we try to connect to this server from the host via Ethernet (140.252.13), the connection request will not be accepted by the TCP module. If we look using tcpdump, we will see that a RST response is received on SYN, as shown in Figure 18.21.

1 0.0 bsdi.1026 > sun.8888: S 3657920001:3657920001(0)
win 4096
2 0.000859 (0.0009) sun.8888 > bsdi.1026: R 0:0 (0) ack 3657920002 win 0

Figure 18.21 Limiting connection requests based on the server's local IP address.

An application running on the server will never see a connection request - the restriction is carried out by the TCP module in the kernel based on the local IP address specified by the application.

Remote IP Address Restriction

In the Chapter 11 section, we saw that a UDP server can determine a remote IP address and port number, in addition to the specified local IP address and port number. The interface functions specified in RFC 793 allow the server to perform a passive open based on a fully specified remote socket (in which case an active open request from a specific client is expected) or an unspecified remote socket (in which case a connection request from any client is expected).

Unfortunately, most APIs do not provide such capabilities. The server must leave the remote socket uninstantiated, waiting for a connection to arrive, and then checking the client's IP address and port number.

Figure 18.22 shows the three types of addresses and the address-to-port relationships that a TCP server can set for itself. In all cases, lport is the server's known port, and localIP must be the IP address of the local interface. The order in which the three rows appear in the table corresponds to the order in which the TCP module attempts to determine which local endpoint will accept an incoming connection request. The first row of the table (if supported) is tried first, and then the remaining specifications (the last row with IP addresses specified as wildcards) are tried last.

Figure 18.22 Specifying local and remote IP addresses and port numbers for the TCP server.

Incoming connection request queue

A concurrent server starts a new process that serves each client, so the listening server must always be ready to handle the next incoming connection request. This is the main reason why competitive servers are used. However, it is possible that multiple connection requests will arrive just as the listening server is creating a new process, or while the operating system is busy processing another higher priority process. How does TCP handle these incoming connection requests while the listening application is busy?

Berkeley implementations use the following rules.

Each listening endpoint has a fixed-length queue of connections that can be accepted by TCP (the "three-time handshake" is complete) but have not yet been accepted by the application. Be careful to distinguish between accepting a TCP connection and placing it in a queue, and an application accepting connections from that queue.
The application specifies a limit or limit for this queue, which is usually called a backlog. This limit should be in the range from 0 to 5. (Most applications specify a maximum value of 5.)
When a connection request (SYN segment) arrives, TCP looks at the current number of connections currently queued for that listening endpoint, and it determines whether the connection can be accepted. We expect that the backlog value specified by the application will be maximum, that is, the maximum number of connections allowed to be queued for that point, although this is not very simple. Figure 18.23 shows the relationship between the backlog value and the actual maximum number of connections that can be queued on traditional Berkeley systems and Solaris 2.2.

backlog value

Maximum number of connections queued

Traditional BSD

Figure 18.23 Maximum number of accepted connections for a listening endpoint.

Remember that this backlog value only indicates the maximum number of connections queued for a single listening endpoint, all of which have already been accepted by TCP and are waiting to be accepted by the application. The backlog value does not have any effect on the maximum number of connections that can be established by the system or the number of clients that a concurrent server can serve.

The values for Solaris in this figure are exactly what we would expect. Traditional values for BSD (for some unknown reason) are equal to the backlog value times 3 divided by 2 plus 1.
If there is room in the queue for a given listening endpoint for a new connection (see Figure 18.23), the TCP module acknowledges (ACK) the incoming SYN and establishes the connection. The server application with the listening endpoint will not see this new connection until the third segment of the "three-time handshake" is received. Alternatively, the client may consider the server to be ready to accept data when the client's active open has completed successfully, before the server application is notified of the new connection. (If this happens, the TCP server will simply queue the incoming data.)
If there is not enough space to queue a new connection, TCP simply ignores the received SYN. Nothing is sent in response (even the RST segment is not sent). If the listening server cannot refuse to accept some already accepted connections that have filled the queue to capacity, the client's active open will time out.

We can view this script using the sock program. Let's run it with a new option ( -O) that tells us to pause after creating a listening endpoint, before accepting any connection request. If we then start multiple clients during this pause, the server will be forced to queue accepted connections, and we will see what happens using the tcpdump command.

bsdi % sock -s -v -q1 -O30 5555

The -q1 option sets the backlog of the listening endpoint to 1, which on a traditional BSD system would correspond to two connection requests (Figure 18.23). The -O30 option causes the program to "sleep" for 30 seconds before accepting any connection from the client. This gives us 30 seconds to start a few clients who will fill the queue. We are starting four clients on the host sun.

Figure 18.24 shows the output of the tcpdump program, this output starts with the first SYN from the first client. (We have removed the window size declarations and MSS declarations. We have also highlighted the client port numbers in bold when a TCP connection is established - a "three-time handshake".)

The first connection request from the client, coming from port 1090, is received by the TCP module (segments 1-3). The second connection request from the client on port 1091 is also accepted by the TCP module (segments 4-6). The server application is still sleeping and has not accepted any connections. Everything done was done by the TCP module in the kernel. It should also be noted that two clients successfully performed an active opening, that is, the “three-time handshake” was successfully completed.

1 0.0 sun. 1090 > bsdi.7777: S 1617152000:1617152000(0)
2 0.002310 (0.0023) bsdi.7777 > sun. 1090 : S 4164096001:4164096001(0)
3 0.003098 (0.0008) sun. 1090 > bsdi.7777: . ack 1617152001
ack 1
4 4.291007 (4.2879) sun. 1091 > bsdi.7777: S 1617792000:1617792000(0)
5 4.293349 (0.0023) bsdi.7777 > sun. 1091 : S 4164672001:4164672001(0)
ack 1617792001
6 4.294167 (0.0008) sun. 1091 > bsdi.7777: . ack 1
7 7.131981 (2.8378) sun.1092 >
8 10.556787 (3.4248) sun.1093 > bsdi.7777: S 1618688000:1618688000(0)
9 12.695916 (2.1391) sun.1092 > bsdi.7777: S 1618176000:1618176000(0)
10 16.195772 (3.4999) sun.1093 >
11 24.695571 (8.4998) sun.1092 > bsdi.7777: S 1618176000:1618176000(0)
12 28.195454 (3.4999) sun. 1093 > bsdi.7777: S 1618688000:1618688000(0)
13 28.197810 (0.0024) bsdi.7777 > sun. 1093 : S 4167808001:4167808001(0)
14 28.198639 (0.0008) sun. 1093 > bsdi.7777: . ack 1618688001
ack 1
15 48.694931 (20.4963) sun. 1092 > bsdi.7777: S 1618176000:1618176000(0)
16 48.697292 (0.0024) bsdi.7777 > sun. 1092 : S 4170496001:4170496001(0)
ack 1618176001
17 48.698145 (0.0009) sun. 1092 > bsdi.7777: . ack 1

Figure 18.24 Output from tcpdump for an example of using backlog.

We tried to start a third client on segment 7 (port 1092) and a fourth on segment 8 (port 1093). TCP ignored both SYNs because the queue for that listening endpoint was full. Both clients retransmitted their SYNs in segments 9, 10, 11, 12, and 15. The fourth client's third retransmission is accepted (segments 12-14) because the server's 30-second pause has ended and the server has dropped two connections that were accepted. clearing the queue. (The reason this happened is that this connection was accepted by the server at time 28.19, and not at time which is greater than 30; this happened because it took a few seconds to start the first client [segment 1 , start time in output] after server start.) The fourth retransmission of the third client is also received (segments 15-17). The fourth client's connection (port 1093) is accepted by the server before the third client's connection (port 1092) due to the timing between the end of the 30-second timeout and the client's retransmission.

We could expect the queue of accepted connections to be processed by the application according to the FIFO (first in, first out) principle. Thus, after TCP accepted an application on ports 1090 and 1091, we would expect the application to receive a connection first on port 1090 and then a connection on port 1091. However, there is a bug in most Berkeley implementations that results in using LIFO order (last in, first out). Manufacturers have tried many times to fix this bug, but it still exists in systems such as SunOS 4.1.3.

TCP ignores incoming SYN when the queue is full and does not respond using RST due to an error. Typically the queue is full because the application or operating system is busy, so the application cannot process incoming connections. This condition can change in a short period of time. However, if the TCP server responded with a reset, the active opening of the client will be interrupted (this is exactly what will happen if the server has not been started). Since the SYN is ignored, the TCP client will be forced to retransmit the SYN later, in the hope that there will be room in the queue for a new connection.

At this point it is necessary to discuss another very important detail, which is present in almost all TCP/IP implementations. It is that TCP accepts an incoming connection request (SYN) if there is space in the queue. In this case, the application cannot see who the request came from (source IP address and source port number). This is not required by TCP, it is just a general technique used in implementations. If an API, such as TLI (section of Chapter 1), notifies an application that a connection request has arrived and allows the application to choose whether or not to accept the connection, then when using TCP it is so that when the application is told that a connection has just arrived, in reality TCP has already completed the "three times handshake"! In other transport layers, it is possible to distinguish between arriving and received connections (OSI transport layer), but TCP does not provide this feature.

Solaris 2.2 provides an option that prevents TCP from accepting an incoming connection request until the application allows it to do so (tcp_eager_listeners in application section E).

This behavior also means that the TCP server cannot cause an active client open to be interrupted. When a connection from a new client reaches the server application, the TCP "three-way handshake" has already been completed and the client's active discovery has completed successfully. If the server then looks at the client's IP address and port number and decides that it does not want to serve that client, all servers can simply close the connection (which will send a FIN) or reset the connection (which will send a RST). In any case, the client will assume that everything is fine with the server, since the active opening has completed, and, quite possibly, has already sent some kind of request to the server.

Brief conclusions

Before two processes can communicate using TCP, they must establish a connection with each other. When the work between them is completed, the connection should be broken. This chapter details how a connection is established using the "three-hop handshake" and how it is broken using the four-hop handshake.

We used tcpdump to show all the fields in the TCP header. We also looked at how an established connection can be timed out, how a connection is reset, what happens to a half-open connection, and how TCP provides a half-closed mode, simultaneous opening and simultaneous closing.

To understand the functioning of TCP, it is necessary to consider the fundamental TCP state diagram. We examined point by point how a connection is established and terminated, and what changes in state occur during this process. We also looked at how TCP servers establish TCP connections.

TCP connections are uniquely identified by 4 parameters: local IP address, local port number, remote IP address and remote port number. If the connection is broken, one side must still remember the connection, in which case we say that TIME_WAIT mode is in effect. The rule states that this party can perform an active open by entering this mode after twice the MSL time accepted for this implementation has expired.

Exercises

In the section we said that the initial sequence number (ISN) is usually set to 1 and incremented by 64000 every half second and every time there is an active open. This means that the least significant three digits in the ISN will always be 001. However, in Figure 18.3, these lowest three digits for each direction are 521. How did this happen?
In Figure 18.15, we printed 12 characters, but saw that TCP sent 13 bytes. In Figure 18.16, we printed 8 characters, but TCP sent 10 bytes. Why was 1 byte added in the first case, and 2 bytes in the second?
What is the difference between a half-open connection and a half-closed connection?
If we start the sock program as a server, and then interrupt its operation (with no clients connected to it), we can immediately restart the server. This means that it will not be in the 2MSL wait state. Explain this in terms of a state transition diagram.
In the section we showed that a client cannot reuse the same local port number while the port is part of a connection in the 2MSL wait state. However, if we run the sock program twice in a row as a client connecting to the time server, we can use the same local port number. In addition, we can create a new connection that will be in the 2MSL wait state. How does this happen?
sun % sock -v bsdi daytime

Wed Jul 7 07:54:51 1993
connection closed by peer
sun % sock -v -b1163 bsdi daytime reusing the same local port number
connected on 140.252.13.33.1163 to 140.252.13.35.13
Wed Jul 7 07:55:01 1993
connection closed by peer
At the end of the section, when we described the FIN_WAIT_2 state, we stated that most implementations will move the connection from this state to the CLOSED state if the application has completed a full shutdown (not half-closed) after approximately 11 minutes. If the other side (in the CLOSE_WAIT state) waits 12 minutes before performing the close (sending its FIN), what will its TCP receive in response to the FIN?
Which party in a telephone conversation does the active opening and which does the passive opening? Is simultaneous opening possible? Is it possible to close at the same time?
In Figure 18.6 we did not see the ARP request or ARP response. However, the hardware address of the svr4 host must be in the bsdi ARP cache. What will change in this picture if this item is not in the ARP cache?
Explain the following output from the tcpdump command. Compare it with Figure 18.13.
1 0.0 solaris.32990 > bsdi.discard: S 40140288:40140288 (0)
win 8760
2 0.003295 (0.0033) bsdi.discard > solaris.32990: S 4208081409:4208081409 (0)
ack 40140289 win 4096

3 0.419991 (0.4167) solaris.32990 > bsdi.discard: P 1:257 (256) ack 1 win 9216
4 0.449852 (0.0299) solaris.32990 > bsdi.discard: F 257:257 (0) ack 1 win 9216
5 0.451965 (0.0021) bsdi.discard > solaris.32990: . ack 258 win 3840
6 0.464569 (0.0126) bsdi.discard > solaris.32990: F 1:1 (0) ack 258 win 4096
7 0.720031 (0.2555) solaris.32990 > bsdi.discard: . ack 2 win 9216
Why wouldn't the server in Figure 18.4 combine the ACK to the client's FIN with its own FIN, thereby reducing the number of segments to three?
In Figure 18.16, why is the RST sequence number 26368002?
Tell me, is TCP's request to the link layer for its MTU based on the principle of layering?

Establishing a TCP connection

In the TCP protocol, connections are established using the "triple handshake" described in the Connection Establishment section. To establish a connection, one side (such as the server) passively waits for an incoming connection by executing the LISTEN and ACCEPT primitives, either specifying a specific source or not specifying one.

The other side (such as the client) issues a CONNECT primitive, specifying the IP address and port it wants to connect to, the maximum TCP segment size, and optionally some user data (such as a password). The CONNECT primitive sends a TCP segment with the SYN bit set and the ACK bit cleared and waits for a response.

When this segment arrives at its destination, the TCP entity checks to see if any process has executed the LISTEN primitive, specifying as a parameter the same port contained in the Destination Port field. If there is no such process, it responds by sending a segment with the RST bit set to refuse the connection.

If any process is listening on any port, then the incoming TCP segment is passed to that process. The latter can accept the connection or refuse it. If the process accepts the connection, it responds with an acknowledgment. The sequence of TCP segments sent in the normal case (Fig. a) Note that a segment with the SYN bit set occupies 1 byte of sequence number space, which avoids ambiguity in their acknowledgments.

If two hosts simultaneously try to establish a connection with each other, then the sequence of events that occurs will correspond to Fig. b. As a result, only one connection will be established, not two, since the pair of endpoints uniquely identifies the connection. That is, if both connections try to identify themselves using the pair (x, y), only one table entry is made for (x, y).

The initial value of the connection sequence number is not zero for the reasons discussed above. A circuit based on a timer is used, changing its state every 4 μs. For greater reliability, the host after a failure is prohibited from rebooting earlier than after the maximum packet lifetime has passed. This ensures that no packets from previous connections are floating around on the Internet.

TCP Connection Drop

Although TCP connections are full-duplex, to understand how they are released, it is better to think of them as pairs of simplex connections. Each simplex connection breaks independently of its partner. To close the connection, either side can send a TCP segment with the FIN bit set to one, indicating that it has no more data to send. When this TCP segment receives an acknowledgment, this transmission direction is closed. However, data may continue to flow indefinitely in the opposite direction. The connection is broken when both directions are closed. Typically, four TCP segments are required to close a connection: one with the FIN bit and one with the ACK bit in each direction. The first ACK bit and the second FIN bit can also be contained in one TCP segment, which will reduce the number of segments to three.

Just like a telephone conversation where both parties can say goodbye and hang up at the same time, both ends of a TCP connection can send FIN messages at the same time. They both receive the usual acknowledgments and the connection is closed. Essentially, there is no difference between simultaneous and sequential disconnects.

To avoid the problem of two armies, timers are used. If a response to a sent FIN segment does not arrive within two maximum packet lifetime intervals, the sender closes the connection. The other side will eventually notice that no one is answering and will also disconnect. Although this solution is not ideal, given the unattainability of the ideal, we have to use what we have. In practice, problems arise quite rarely.

TCP Transmission Control

As mentioned earlier, TCP window management is not directly tied to acknowledgments, as is done in most data transfer protocols. For example, suppose the recipient has a 4096-byte buffer. If the sender transmits a 2048-byte segment that is successfully received by the recipient, then the recipient acknowledges its receipt. However, this leaves the receiver with only 2048 bytes of free buffer space (until the application takes some data from the buffer), which it reports to the sender, indicating the appropriate window size (2048) and the number of the next expected byte.

The sender then sends another 2048 bytes, which are acknowledged, but the window size is declared to be 0. The sender must stop transmitting until the receiving host frees up buffer space and increases the window size.

With a window size of zero, the sender cannot send segments, except in two cases. First, it is allowed to send urgent data, for example so that a user can kill a process running on a remote machine. Second, the sender can send a 1-byte segment asking the receiver to repeat information about the window size and the expected next byte. The TCP standard explicitly provides this feature to prevent deadlocks if the window size advertisement is lost.

Senders are not required to transmit data immediately as it comes from the application. Also, no one requires recipients to send confirmations as soon as possible. For example, a TCP entity, having received the first 2 KB of data from an application and knowing that the available window size is 4 KB, would be perfectly correct to simply store the received data in a buffer until another 2 KB of data arrives to send immediately a segment with 4 KB of payload. This discretion can be used to improve performance.

Consider a TELNET connection with an interactive editor that responds to every keystroke. In the worst case, when the character arrives at the sending TCP entity, it creates a 21-byte TCP segment and passes it to the IP layer, which in turn sends a 41-byte IP datagram.

At the receiving end, the TCP entity immediately responds with a 40-byte acknowledgment (20 bytes TCP header and 20 bytes IP header). Then, when the editor reads that byte from the buffer, the TCP entity will send an update on the buffer size, moving the window 1 byte to the right. The size of this packet is also 40 bytes. Finally, when the editor has processed this character, it sends back an echo, transmitted in a 41-byte packet. In total, for each character entered from the keyboard, four packets with a total size of 162 bytes are sent. In conditions of shortage of line capacity, this method of operation is undesirable.

To improve the situation, many TCP implementations delay acknowledgments and window size updates by 500 ms in hopes of obtaining additional data with which to send the acknowledgment in one packet. If the editor manages to echo within 500 ms, the remote user will only need to send one 41-byte packet, thus cutting the network load in half.

Although this delay method reduces the network load, the sender's network efficiency remains low since each byte is sent in a separate 41-byte packet. A method that improves efficiency is known as Nagle's algorithm (Nagle, 1984). Nagl's proposal sounds quite simple: if data arrives at the sender one byte at a time, the sender simply transmits the first byte and buffers the rest until an acknowledgment of the first byte is received. After this, you can send all the characters accumulated in the buffer as one TCP segment and start buffering again until you receive confirmation of the sent characters. If the user enters characters quickly and the network is slow, then a significant number of characters will be transmitted in each segment, thus reducing the load on the network significantly. In addition, this algorithm allows a new packet to be sent even if the number of characters in the buffer exceeds half the window size or the maximum segment size.

Nagl's algorithm is widely used by various implementations of the TCP protocol, but there are sometimes situations in which it is better to disable it. In particular, when an X-Windows application is running on the Internet, information about mouse movements is sent to the remote computer. (X-Window is the window management system for most UNIX-type operating systems). If you buffer this data for batch transfer, the cursor will move jerkily with long pauses, as a result of which it will be very difficult, almost impossible, to use the program.

Another problem that can significantly degrade TCP performance is known as stupid window syndrome (Clark, 1982). The essence of the problem is that the data is sent by the TCP entity in large blocks, but the receiving side of the interactive application reads it character by character.

Let's look at an example - the initial state is as follows: the TCP buffer of the receiving side is full, and the sender knows this (that is, its window size is 0). The interactive application then reads one character from the TCP stream. The receiving TCP entity happily informs the sender that the window size has increased and that it can now send 1 byte. The sender obeys and sends 1 byte. The buffer is full again, which the receiver notifies by sending an acknowledgment for a 1-byte segment with a window size of zero. And this can go on forever.

David Clark proposed preventing the receiving end from sending one-byte window size information. Instead, the recipient must wait until there is a significant amount of free space in the buffer. In particular, the receiver MUST not send information about the new window size until it can accept the maximum size segment it advertised when the connection was established, or its buffer is at least half free.

In addition, the sender himself can help improve sending efficiency by not sending segments that are too small. Instead, it must wait until the window size is large enough to allow it to send a full segment or at least equal to half the size of the receiver's buffer. (The sender can estimate this size from a sequence of window size messages it has previously received.)

In the problem of getting rid of the stupid window syndrome, Nagl's algorithm and Clark's solution complement each other. Nagl was trying to solve the problem of an application providing data to a TCP entity character by character. Clark tried to solve the problem of an application receiving data character by character from TCP. Both solutions are good and can work simultaneously. Their essence is not to send or ask to transfer data in too small portions.

The TCP receiving entity can go even further in improving performance by simply updating the window size information in large chunks. Like the sending TCP entity, it can also buffer data and block a READ request from an application until it has a large amount of data. This reduces the number of calls to the TCP entity and hence reduces the overhead. Of course, this approach increases the response time, but for non-interactive applications, such as file transfer, reducing the time spent on the entire operation is much more important than increasing the response time for individual requests.

Another recipient problem is segments received in the wrong order. They may be retained or discarded at the discretion of the recipient. Of course, an acknowledgment can only be sent if all data up to the byte being acknowledged has been received. If the receiver receives segments O, 1, 2, 4, 5, 6, and 7, it can acknowledge receipt of data up to the last byte of segment 2. When the sender times out, it will transmit segment 3 again. If the receiver has segments 4 through 7 buffered by the time segment 3 arrives, it can acknowledge receipt of all bytes up to the last byte of segment 7.

- 1

TCP triple handshake. Stuck questions - computer networks and telecommunications

Three-way handshake

SYN

SYN ACK

ASK

TCP Fast Open (TFO)

Overload control

Flow control

Window scaling (RFC 1323)

Transport layer functions

Sockets

Socket primitives

Multiplexing and demultiplexing

Transport layer protocols

UDP protocol

UDP segment structure

Principles of reliable data transmission

Disadvantages of protocols that wait for confirmations

Sliding window protocols

GBN

S.R.

TCP protocol

TCP Segment Format

Sequence and confirmation numbers

Problems with maximum segment size

Triple handshake

TCP Congestion Control

Reliable transmission mechanisms. Generalization

Programming Features

Popular articles

Latest articles

Sections

Pages

Special projects

Contacts

backlog value	Maximum number of connections queued
backlog value	Traditional BSD