Java. HTTP protocol and working with WEB

Useful tips

Socket vs Socket part 2, or say “no” to the TCP protocol - Archive WASM.RU

In the first part, devoted to the basics of using MSWindows sockets in assembly programs, we talked about what sockets are, how they are created, and what parameters are specified. At the same time, it was mentioned in passing about non-connection-oriented UDP protocol, which does not guarantee the delivery of packages, as well as the order in which they arrive at their destination. The training example then used our favorite TCP protocol. And everything was fine with us, but in the end there were a number of unresolved questions, in particular, how to organize mutual exchange between several computers on the network, how to transfer something to many computers at once, etc.

Generally speaking, reading the first part is not at all necessary to understand the current one, although I will constantly refer to it along the way. So it goes. Haha...

So, we pose the problem: we have a local network of, say, a dozen computers, we need to organize the exchange of messages between any two of them, and (optional) between one and all others.

I hear, I hear a chorus of hints that say, use the built-in Windows features, type:

net send 192.168.0.4 Zhenya sends greetings to you!

net send Node4 Waiting for your answer!

There are only two objections to this. First, you never know what our operating system or other ready-made programs can do, we want to learn how to write our own programs, don’t we? And secondly, it is not a fact that the message goes from person to person. In the general case, the operator may not know anything... Or even should not know anything...

For me, the most important thing in setting this task was to ensure the ability to transfer something to all computers on the network at once. Imagine that we wrote a certain program... Who said - a Trojan? No, no and NO! No Trojans. Just a small (very) accounting program, for example. Which after some time was able to settle on many computers of our local network. And now the appointed time comes, it’s time to balance the balance, to summarize, so to speak, the results for the quarter... Everything must be done quickly and preferably at the same time. How to do this within the framework of the material that we studied in the first part remained unclear.

The answer, as always, comes from WindowsAPI. We search and find. Function sendto() – sends data via specified address. What then is its difference from the function already studied in the first part? send() ? It turns out that sendto() can broadcast to a special IP address. But, please note, this only works for sockets of the SOCK_DGRAM type! And sockets that were opened using the SOCK_DGRAM value as a socket type parameter operate via the UDP protocol, not TCP! This makes clear the meaning of the subtitle of this article... Of course, this is just a literary device, no one protocol is better or worse than another, they are just... different, that's all. Although both are transport layer protocols that “...provide data transfer between application processes.” Both use a network layer protocol such as IP to transmit (receive) data. Through which they (the data) then get to physical layer, i.e. on the transmission Wednesday... And what kind of Wednesday it is, who knows. Maybe it’s a copper cable, or maybe it’s not Wednesday at all, but Thursday, and not a copper cable, but broadcast...

Scheme of interaction of network protocols.

UDP– U ser D atagram P rotocol

TCP-T release C control P rotocol

ICMP-I internet C control M essay P rotocol (control message exchange protocol)

ARPA ddress R esolution P rotocol (address discovery protocol)

In general, if the drawing didn’t help you in any way, it doesn’t matter. It is important to understand one thing that TCP is a transport layer protocol that provides reliable transporting data between application processes by setting up a logical connection (emphasis mine). But UDP is not. And further. Somewhere there, on application level, our application will be located in one of the empty rectangles.

Let's finish the introductory part here and move on to looking at how to use it from the very beginning.

To demonstrate all the material, as usual, a training example is used, which can be downloaded< >. We skip the common one for everyone Windows applications part and describe only what concerns the operation of sockets. First you need to initialize the Windows Sockets DLL using the function WSAStartup() , which will return zero if successful implementation, or, otherwise, one of the error codes. Then, when initializing the main application window, open a socket to receive messages:

invoke socket, AF_INET, \

SOCK_DGRAM, \ ; specifies the socket type - UDP protocol!

0 ; protocol type

If eax != INVALID_SOCKET ; if there is no error

mov hSocket, eax ; remember handle

After this, as usual, we need to tell Windows to send messages to the specified window from the socket we opened:

invoke WSAAsyncSelect, hSocket, hWnd, WM_SOCKET, FD_READ

Where hSocket- socket descriptor
hWnd- handle to the window to whose procedure messages will be sent
WM_SOCKET- message, defined by us in section.const
FD_READ– a mask that specifies the events of interest to us, in in this case this is the readiness of data from the socket for reading.

I hear, I hear a surprised chorus with despair in their voice: they promised a hidden application, but here is the main window and all that... The fact is that you can’t do without it, because... The operating system sends all messages to our application through its window procedure. The solution is simple. If necessary, hide this most important application window. How? For example, comment out the line:

invoke ShowWindow, hwnd, SW_SHOWNORMAL

or, more correctly, use:

invoke ShowWindow, hwnd, SW_HIDE

After this, our application will also start, the main window will be created, a WM_CREATE message will be sent to it from Windows with all the consequences... Only its window will not be visible either on the desktop or on the taskbar. If this is what you wanted, I'm glad. Anyway, let's continue...

To do this, we convert the port number to network byte order using a special API function:

invoke htons, Port

mov sin.sin_port, ax

mov sin.sin_family, AF_INET

mov sin.sin_addr, INADDR_ANY

A small lyrical digression, not necessary to understand the meaning of this article .

The port numbers for our sockets were discussed at the end of part one. It is difficult to give recommendations as to what they should be. The only thing that can be said is that they cannot be. It is unwise to try to use port numbers defined for widely used services such as:

via protocol TCP: 20, 21 – ftp; 23 – telnet; 25 – smtp; 80 – http; 139 - NetBIOS session service;

via protocol UDP: 53 – DNS; 137, 138 – NetBIOS; 161 – SNMP;

Of course, the API has a special function getservbyport() , which, given a port number, returns the name of the corresponding service. More precisely, the function itself returns a pointer to a structure, inside of which there is a pointer to this name...

You can call it like this:

invoke htons, Port; convert the port number to network byte order

invoke getservbyport, ax, 0;

Note what Win32 Programmer's Reference says about getservbyport:

“...returns a pointer to a structure that is distributed by Windows Sockets. An application should never attempt to modify this structure or any of its components. Additionally, only one copy of this structure is allocated toflow, so the application must copy any information it needs before any other call Windows features Sockets".

And here is the structure itself:

s_name DWORD ?; pointer to a string with the service name

s_aliases DWORD ?;

s_port WORD ?; port number

s_proto DWORD ?;

The API also has a “paired” function, so to speak: getservbyname(), which, based on the service name, returns information about the port number used.

Unfortunately, we will not be able to derive practical benefit from these functions. So, know that they exist and forget about them...

invoke bind, hSocket, addr sin, sizeof sin

If eax == SOCKET_ERROR; if there is an error

invoke MessageBox, NULL, addr ...

At this point, the preparatory work on creating and configuring a receiving socket using datagrams can be considered complete. There is no need to set the socket to listen on the port using the invoke function listen, as we did for a socket of type SOCK_STREAM in the first part. Now in our application's main window procedure we can add code that will be executed when a WM_SOCKET message arrives from the socket:

; if a message is received from a socket (hSocket)

Elseif uMsg == WM_SOCKET

If ax == FD_READ;

If ax == NULL ; no error

; receive data (64 bytes) from the socket into the BytRecu buffer

invoke recv, hSocket, addr BytRecu, 64, 0;

Now let's talk about how to open a socket for sending messages. That's all necessary actions programs:

invoke socket, AF_INET, SOCK_DGRAM, 0

invoke htons, Port

mov sin_to.sin_port, ax

mov sin_to.sin_family, AF_INET

invoke inet_addr, addr AddressIP

mov sin_to.sin_addr, eax

When it comes to transferring data, all you need to do is:

invoke sendto, hSocket1, addr BytSend1, 64, 0, \

addr sin_to, sizeof sin_to

The parameter values when calling this API function are as follows:

hSocket1- handle to a previously opened socket
addrBytSend1- address of the buffer containing data for transmission
64 - size of data in the buffer, in bytes
0 - indicator..., in the MSDN example it’s just 0
addrsin_to- pointer to a structure that contains the destination address
sizeofsin_to– the size of this structure in bytes.

If, when executing a function sendto() no errors occurred, then it returns the number of bytes transferred, otherwise the output is SOCKET_ERROR in eax.

Now is the time to talk about that same broadcast address that was mentioned at the beginning. In structure we pre-filled the field with the destination IP address, indicating where, in fact, to send the data. If this address is 127.0.0.1 - naturally, nowhere further own computer our data will not go away. The literature clearly states that a packet sent to a network with the address 127.x.x.x will not be transmitted on any network. Moreover, a router or gateway should never propagate routing information for network number 127 - this address is not a network address. To send a “transmission” to all computers on the local network at once, you need to use an address formed from our own IP address, but with all the ones in the low octet, something like 192.168.0.255.

That's all, actually. When the program closes, you need to close the sockets and release the Sockets DLL resources; this is done simply:

invoke closesocket, hSocket

invoke closesocket, hSocket1

invoke WSACleanup

For multi-threaded applications after WSACleanup socket operations are completed for all threads.

The hardest part for me in this article was deciding how best to illustrate using Windows Sockets API. You have probably already seen one approach, when both a socket for receiving and a socket for sending messages were used simultaneously in a single application. Another method seems no less attractive, when the code for one and the other is clearly separated, even what exists in different applications. In the end, I also implemented this method, which may be a little easier for beginners to understand. In the second<архиве

Without this function send() will produce SOCKET_ERROR!

Finally, we can note some common problems that arise when working with sockets. To handle the window message indicating that the state of the socket had changed, we used direct messages from Windows to the main application window as usual. There is another approach when creating separate windows for each socket.

Generally speaking, centralized message processing by the main window seems like an easier-to-understand method, but can still be a hassle in practice. If a program is using more than one socket at the same time, it needs to store a list of socket descriptors. When a message from the sockets appears, the main window procedure in the list looks for information associated with that socket descriptor and sends a state change message further to the procedure intended for this. Which already reacts in one way or another, does something there... This approach forces the processing of network tasks to be integrated into the program core, which makes it difficult to create libraries of network functions. Each time these networking functions are used, additional code must be added to the application's main window handler.

In the second method of processing messages, the application creates a hidden window to receive them. It serves to separate the application's main window procedure from processing network messages. This approach can simplify the main application and make it easier to use existing networking code in other programs. The negative side of this approach is the excessive use of Windows - user memory, because For each created window, a fairly large volume is reserved.

Which method to choose is up to you. One more thing. While experimenting, you may need to disable your personal firewall. For example, Outpost Pro 2.1.275 in learning mode responded to an attempt to transfer to the socket, but when the transfer was manually allowed, the data still did not arrive. So much for UDP. Although this may not be the case. There were no problems with my ZoneAlarmPro 5.0.590 in the same situation.

P.S. While finishing the second part of the article, I accidentally came across the source code of the Trojan on the Internet in our favorite MASM language. Everything compiles and runs, one thing is that the client does not want to connect to the server, and even under Windows 2000 sp4 it sometimes crashes with an error, saying that the application will be closed and all that... Personally, what I like about this Trojan is that the program does not just keep a log of clicks , or “rips out” a file with passwords and sends it by email, and has a wide range of remotely controlled functions, implemented in a very original way. If we manage to bring this whole business to life, then perhaps a third part will soon appear, devoted to a description of a specific implementation... For those who have carefully read both articles and understood the operation of the socket API functions, there is nothing complicated there. It seems... By the way, the author himself writes in the readme that he wrote it (Trojan) for educational purposes. Oh well. We will use this.
DirectOr

Sockets

Socket is one end of a two-way communication channel between two programs running on the network. By connecting two sockets together, you can transfer data between different processes (local or remote). The socket implementation provides encapsulation of network and transport layer protocols.

Sockets were originally developed for UNIX at the University of California, Berkeley. In UNIX, the communication I/O method follows the open/read/write/close algorithm. Before a resource can be used, it must be opened with appropriate permissions and other settings. Once a resource is open, data can be read from or written to. After using the resource, the user must call the Close() method to signal operating system about the completion of his work with this resource.

When were features added to the UNIX operating system? Inter-Process Communication (IPC) and network exchange, the familiar input-output pattern was borrowed. All resources exposed for communication in UNIX and Windows are identified by handles. These descriptors, or handles, can point to a file, memory, or some other communication channel, but actually point to an internal data structure used by the operating system. The socket, being the same resource, is also represented by a descriptor. Therefore, for sockets, the life of a handle can be divided into three phases: open (create) the socket, receive from or send to the socket, and finally close the socket.

The IPC interface for communication between different processes is built on top of I/O methods. They make it easier for sockets to send and receive data. Each target is specified by a socket address, so this address can be specified in the client to establish a connection to the target.

Socket types

There are two main types of sockets - stream sockets and datagram sockets.

Stream sockets

A stream socket is a connection-based socket consisting of a stream of bytes that can be bidirectional, meaning that an application can both send and receive data through this endpoint.

A stream socket ensures error correction, handles delivery, and maintains data consistency. It can be relied upon to deliver orderly, duplicated data. A stream socket is also suitable for transferring large amounts of data, since the overhead of establishing a separate connection for each message sent may be prohibitive for small amounts of data. Stream sockets achieve this level of quality by using the protocol Transmission Control Protocol (TCP). TCP ensures that data reaches the other side in the correct sequence and without errors.

For this type of socket, the path is formed before messages are sent. This ensures that both parties involved in the interaction accept and respond. If an application sends two messages to a recipient, it is guaranteed that the messages will be received in the same sequence.

However, individual messages may be split into packets, and there is no way to determine the boundaries of records. When using TCP, this protocol takes care of breaking the transmitted data into packets of the appropriate size, sending them to the network and reassembling them on the other side. The application only knows that it sends a certain number of bytes to the TCP layer and the other side receives those bytes. In turn, TCP effectively breaks this data into appropriately sized packets, receives these packets on the other side, extracts the data from them, and combines them together.

Streams are based on explicit connections: socket A requests a connection to socket B, and socket B either accepts or rejects the connection request.

If the data must be guaranteed to be delivered to the other side or the size of the data is large, stream sockets are preferable to datagram sockets. Therefore, if reliable communication between two applications is of utmost importance, choose stream sockets.

An email server is an example of an application that must deliver content in the correct order, without duplication or omissions. The stream socket relies on TCP to ensure messages are delivered to their destinations.

Datagram sockets

Datagram sockets are sometimes called connectionless sockets, i.e., no explicit connection is established between them - the message is sent to the specified socket and, accordingly, can be received from the specified socket.

Stream sockets do provide a more reliable method than datagram sockets, but for some applications the overhead associated with establishing an explicit connection is unacceptable (for example, a time of day server providing time synchronization to its clients). After all, establishing a reliable connection to the server takes time, which simply introduces service delays and the server application's task fails. To reduce overhead, you should use datagram sockets.

The use of datagram sockets requires that the transfer of data from the client to the server be handled by User Datagram Protocol (UDP). In this protocol, some restrictions are imposed on the size of messages, and unlike stream sockets, which can reliably send messages to the destination server, datagram sockets do not provide reliability. If the data is lost somewhere on the network, the server will not report errors.

In addition to the two types discussed, there is also a generalized form of sockets, which is called unprocessed or raw.

Raw sockets

The main purpose of using raw sockets is to bypass the mechanism by which the computer handles TCP/IP. This is achieved by providing a special implementation of the TCP/IP stack that overrides the mechanism provided by the TCP/IP stack in the kernel - the packet is passed directly to the application and is therefore processed much more efficiently than when passing through the client's main protocol stack.

By definition, a raw socket is a socket that accepts packets, bypasses the TCP and UDP layers in the TCP/IP stack, and sends them directly to the application.

When using such sockets, the packet does not pass through the TCP/IP filter, i.e. is not processed in any way, and appears in its raw form. In this case, it is the responsibility of the receiving application to properly process all the data and perform actions such as stripping headers and parsing fields - like including a small TCP/IP stack in the application.

However, it is not often that you may need a program that deals with raw sockets. Unless you're writing system software or a packet sniffer-like program, you won't need to go into such detail. Raw sockets are primarily used in the development of specialized low-level protocol applications. For example, various TCP/IP utilities such as trace route, ping, or arp use raw sockets.

Working with raw sockets requires a solid knowledge of the basic TCP/UDP/IP protocols.

Ports

The port is defined to allow the problem of simultaneous interaction with multiple applications. Essentially, it expands the concept of an IP address. A computer running multiple applications at the same time receiving a packet from the network can identify the target process using the unique port number specified when the connection was established.

The socket consists of the machine's IP address and the port number used by the TCP application. Because an IP address is unique on the Internet and port numbers are unique on an individual machine, socket numbers are also unique on the entire Internet. This characteristic allows a process to communicate over the network with another process based solely on the socket number.

Port numbers are reserved for certain services - these are well-known port numbers, such as port 21, used in FTP. Your application can use any port number that has not been reserved and is not yet in use. Agency Internet Assigned Numbers Authority (IANA) maintains a list of commonly known port numbers.

Typically a client-server application using sockets consists of two different applications - a client initiating a connection to a target (server) and a server waiting for a connection from the client.

For example, on the client side, the application must know the target address and port number. By sending a connection request, the client tries to establish a connection with the server:

If events develop successfully, provided that the server is started before the client attempts to connect to it, the server agrees to the connection. Having given consent, the server application creates a new socket to interact specifically with the client that established the connection:

Now the client and server can interact with each other, reading messages each from their own socket and, accordingly, writing messages.

Working with sockets in .NET

Socket support in .NET is provided by classes in the namespace System.Net.Sockets- let's start with their brief description.

Classes for working with sockets

Class	Description
MulticastOption	The MulticastOption class sets the IP address value for joining or leaving an IP group.
NetworkStream	The NetworkStream class implements the base stream class from which data is sent and received. This is a high-level abstraction that represents a connection to a TCP/IP communication channel.
TcpClient	The TcpClient class builds on the Socket class to provide higher-level TCP services. TcpClient provides several methods for sending and receiving data over the network.
TcpListener	This class also builds on the low-level Socket class. Its main purpose is server applications. It listens for incoming connection requests from clients and notifies the application of any connections.
UdpClient	UDP is a connectionless protocol, hence different functionality is required to implement UDP service in .NET.
SocketException	This exception is thrown when an error occurs on the socket.
Socket	The last class in the System.Net.Sockets namespace is the Socket class itself. It provides the basic functionality of a socket application.

Socket class

The Socket class plays an important role in network programming, providing both client and server functionality. Primarily, calls to methods in this class perform necessary security-related checks, including checking security permissions, after which they are forwarded to the methods' counterparts in the Windows Sockets API.

Before turning to an example of using the Socket class, let's look at some important properties and methods of this class:

Properties and methods of the Socket class

Property or method	Description
AddressFamily	Gives the socket address family - a value from the Socket.AddressFamily enumeration.
Available	Returns the amount of data available for reading.
Blocking	Gets or sets a value indicating whether the socket is in blocking mode.
Connected	Returns a value indicating whether the socket is connected to the remote host.
LocalEndPoint	Gives the local endpoint.
ProtocolType	Gives the protocol type of the socket.
RemoteEndPoint	Gives the remote socket endpoint.
SocketType	Gives the socket type.
Accept()	Creates a new socket to handle an incoming connection request.
Bind()	Binds a socket to a local endpoint to listen for incoming connection requests.
Close()	Forces the socket to close.
Connect()	Establishes a connection with a remote host.
GetSocketOption()	Returns the SocketOption value.
IOControl()	Sets low-level operating modes for the socket. This method provides low-level access to the underlying Socket class.
Listen()	Places the socket in listening (waiting) mode. This method is for server applications only.
Receive()	Receives data from a connected socket.
Poll()	Determines the status of the socket.
Select()	Checks the status of one or more sockets.
Send()	Sends data to the connected socket.
SetSocketOption()	Sets the socket option.
Shutdown()	Disables sending and receiving operations on the socket.

Hence the “sharpening” of this protocol for working with individual documents, mainly text ones. HTTP uses the capabilities of TCP/IP in its work, so let's look at the capabilities provided by java for working with the latter.

In Java, there is a special package “java.net” for this, containing the java.net.Socket class. Socket in translation means “socket”; this name was given by analogy with the sockets on equipment, the very ones where plugs are connected. According to this analogy, you can connect two “sockets” and transfer data between them. Each nest belongs to a specific host (Host - owner, holder). Each host has a unique IP (Internet Packet) address. At the moment, the Internet operates using the IPv4 protocol, where the IP address is written in 4 numbers from 0 to 255 - for example, 127.0.0.1 (read more about the distribution of IP addresses here - RFC 790, RFC 1918, RFC 2365, read about the IPv6 version here - RFC 2373 )

The sockets are mounted on the host port (port). A port is designated by a number from 0 to 65535 and logically indicates a place where a socket can be bound. If a port on this host is already occupied by some socket, then it will no longer be possible to dock another socket there. Thus, after the socket is installed, it has a very specific address, symbolically written like this: for example - 127.0.0.1:8888 (means that the socket occupies port 8888 on host 127.0.0.1)

In order to make life easier, so as not to use an inconvenient IP address, the DNS system (DNS - Domain Name Service) was invented. The purpose of this system is to map symbolic names to IP addresses. For example, the address "127.0.0.1" on most computers is associated with the name "localhost" (in common parlance - "localhost").

Localhost, in fact, means the computer itself on which the program is running, it is also the local computer. All work with localhost does not require access to the network and communication with any other hosts.

Client socket

So, let's return to the java.net.Socket class. It is most convenient to initialize it as follows:

Public Socket(String host, int port) throws UnknownHostException, IOException In the host string constant, you can specify both the server's IP address and its DNS name. In this case, the program will automatically select a free port on the local computer and “screw” your socket there, after which an attempt will be made to contact another socket, the address of which is specified in the initialization parameters. In this case, two types of exceptions may occur: unknown host address - when there is no computer with the same name on the network, or an error that there is no connection with this socket.

It is also useful to know the function

Public void setSoTimeout(int timeout) throws SocketException This function sets the timeout for working with a socket. If during this time no actions are taken with the socket (meaning receiving and sending data), then it self-destructs. The time is set in seconds; when timeout is set to 0, the socket becomes “eternal”.

For some networks, changing the timeout is not possible or is set at certain intervals (for example, from 20 to 100 seconds). If you try to set an invalid timeout, an appropriate exception will be thrown.

The program that opens this type of socket will be considered the client, and the program that owns the socket you are trying to connect to will be called the server. In fact, by analogy with a socket-plug, the server program will be the socket, and the client is precisely the plug.

Server socket

I have just described how to establish a connection from a client to a server, now how to make a socket that will serve the server. For this purpose, there is the following class in Java: java.net.ServerSocket The most convenient initializer for it is the following:

Public ServerSocket(int port, int backlog, InetAddress bindAddr) throws IOException As you can see, an object of another class is used as the third parameter - java.net.InetAddress This class provides work with DNS and IP names, so the above initializer can be used in programs like this: ServerSocket(port, 0, InetAddress.getByName(host)) throws IOException For this type of socket, the installation port is specified directly, therefore, during initialization, an exception may occur indicating that this port is already in use or is prohibited for use by the security policy computer.

After installing the socket, the function is called

Public Socket accept() throws IOException This function causes the program to wait for the client to connect to the server socket. Once the connection is established, the function will return a Socket class object for communicating with the client.

Client-server via sockets. Example

As an example, here is a simple program that implements working with sockets.

On the client side, the program works as follows: the client connects to the server, sends data, then receives data from the server and outputs it.

From the server side it looks like this: the server sets the server socket to port 3128, and then waits for incoming connections. Having accepted a new connection, the server transfers it to a separate computing thread. In a new stream, the server receives data from the client, assigns a connection sequence number to it, and sends the data back to the client.

Logical structure of the example programs

Simple TCP/IP client program

(SampleClient.java) import java. io.* ; import java. net.* ; class SampleClient extends Thread ( public static void main(String args) ( try ( // open the socket and connect to localhost:3128 // get the server socket Socket s = new Socket("localhost" , 3128 ); // take the output stream and output the first argument there // specified during the call, the address of the open socket and its port args[ 0 ] = args[ 0 ] + "\n" + s. getInetAddress() . getHostAddress() + ":" + s. getLocalPort(); s. getOutputStream() . write(args[ 0 ] . getBytes()); // read the answer byte buf = new byte [ 64 * 1024 ]; int r = s. getInputStream() . read(buf); String data = new String(buf, 0 , r); // output the response to the console System. out. println(data); ) catch (Exception e) ( System. out. println("init error: " + e);) // output exceptions } }

Simple TCP/IP server program

(SampleServer.java) import java. io.* ; import java. net.* ; class SampleServer extends Thread ( Socket s; int num; public static void main(String args) ( try ( int i = 0 ; // connection counter // screw the socket to localhost, port 3128 ServerSocket server = new ServerSocket(3128, 0, InetAddress. getByName("localhost" )); System. out. println("server is started" ); // listen to the port while (true) ( // wait for a new connection, after which we start processing the client // into a new computational thread and increase the counter by one new SampleServer(i, server. accept()); i++ ; ) ) catch (Exception e) ( System. out. println("init error: " + e);) // output exceptions) public SampleServer(int num, Socket s) ( // copy the data this. num = num; this. s = s; // and launch a new computational thread (see function run()) setDaemon(true); setPriority(NORM_PRIORITY); start(); ) public void run() ( try ( // take a stream of incoming data from the client socket InputStream is = s. getInputStream(); // and from there - the data flow from the server to the client OutputStream os = s. getOutputStream(); // data buffer of 64 kilobytes byte buf = new byte [ 64 * 1024 ]; // read 64kb from the client, the result is the number of actually received data int r = is. read(buf); // create a string containing the information received from the client String data = new String(buf, 0 , r); // add data about the socket address: data = "" + num+ ": " + "\n" + data; // output data: os. write(data. getBytes()); // end the connection s. close(); ) catch (Exception e) ( System. out. println("init error: " + e);) // output exceptions } }

After compilation, we get the files SampleServer.class and SampleClient.class (all programs here and below are compiled using JDK v1.4) and first start the server:

Java SampleServer and then, after waiting for the message "server is started", and any number of clients: java SampleClient test1 java SampleClient test2 ... java SampleClient testN

If, during the startup of the server program, instead of the line "server is started" it produced a line like

Init error: java.net.BindException: Address already in use: JVM_Bind then this will mean that port 3128 on your computer is already occupied by some program or is prohibited for use by security policy.

Notes

Let us note an important feature of the server socket: it can accept connections from several clients at once. Theoretically, the number of simultaneous connections is unlimited, but almost everything depends on the power of the computers. By the way, this problem of the finite power of computers is used in DOS attacks on servers: they are simply bombarded with so many connections that the computers cannot cope with the load and “crash”.

In this case, using the example of SimpleServer, I show how to process several simultaneous connections at once: the socket of each new connection is sent to a separate computing thread for processing.

It is worth mentioning that the Socket - ServerSocket abstraction and work with data streams are used by C/C++, Perl, Python, and many other programming languages and operating system APIs, so much of what has been said is applicable not only to the Java platform.

It's time to use Erlang for its intended purpose - to implement a network service. Most often, such services are made on the basis of a web server, on top of the HTTP protocol. But we will take the level below - TCP and UDP sockets.

I assume you already know how the network works, what Internet Protocol, User Datagram Protocol and Transmission Control Protocol are. This topic is familiar to most programmers. But if for some reason you missed it, you will have to first catch up and then return to this lesson.

UDP socket

Let's remember in general outline what is UDP:

short message transfer protocol (Datagram);
Fast shipping;
no persistent connection between client and server, stateless;
message delivery and delivery order are not guaranteed.

To work with UDP, the gen_udp module is used.

Let's launch two nodes and establish communication between them.

On the 1st node, open UDP on port 2000:

1> (ok, Socket) = gen_udp:open(2000, ). (ok,#Port<0.587>}

Calling gen_udp:open/2, we pass the port number and a list of options. The list of all possible options is quite large, but we are interested in two of them:

binary-- the socket is opened in binary mode. Alternatively, the socket can be opened in text mode by specifying the option list. The difference is how we interpret the data received from the socket - as a byte stream, or as text.

(active, true)-- socket open in active mode, which means that data arriving on the socket will be sent as messages to the mailbox of the thread that owns the socket. More on this below.

On the 2nd node, open UDP on port 2001:

1> (ok, Socket) = gen_udp:open(2001, ). (ok,#Port<0.587>}

And we will send a message from the 1st node to the 2nd:

2> gen_udp:send(Socket, (127,0,0,1), 2001,<<"Hello from 2000">>). ok

Calling gen_udp:send/4, we transmit the socket, the address and port of the recipient, and the message itself.

The address may be domain name as a string or an atom, or an IPv4 address as a tuple of 4 numbers, or an IPv6 address as a tuple of 8 numbers.

On the 2nd node we will make sure that the message has arrived:

2> <0.587>,{127,0,0,1},2000,<<"Hello from 2000">>) ok

The message arrives as a tuple (udp, Socket, SenderAddress, SenderPort, Packet).

Let's send a message from the 2nd node to the 1st:

3> gen_udp:send(Socket, (127,0,0,1), 2000,<<"Hello from 2001">>). ok

On the 1st node, we will make sure that the message has arrived:

3> flush(). Shell got (udp,#Port<0.587>,{127,0,0,1},2001,<<"Hello from 2001">>) ok

As you can see, everything is simple here.

Active and passive socket mode

AND gen_udp, And gen_tcp, both have one important setting: mode of working with incoming data. This can be either active mode (active, true), or passive mode (active, false).

In active mode, a thread receives incoming packets as messages in its mailbox. And they can be received and processed by calling receive, like any other messages.

For a udp socket these are messages like:

(udp, Socket, SenderAddress, SenderPort, Packet)

we've already seen them:

(udp,#Port<0.587>,{127,0,0,1},2001,<<"Hello from 2001">>}

For a tcp socket similar messages:

(tcp, Socket, Packet)

Active mode is easy to use, but dangerous because the client can overflow the thread's message queue, run out of memory, and crash the node. Therefore, passive mode is recommended.

In passive mode, the data must be retrieved by calls gen_udp:recv/3 And gen_tcp:recv/3:

Gen_udp:recv(Socket, Length, Timeout) -> (ok, (Address, Port, Packet)) | (error, Reason) gen_tcp:recv(Socket, Length, Timeout) -> (ok, Packet) | (error, reason)

Here we indicate how many bytes of data we want to read from the socket. If this data is there, then we receive it immediately. If not, the call is blocked until enough data arrives. You can specify Timeout to avoid blocking the thread for a long time.

However, gen_udp:recv ignores the Length argument and returns whatever data is on the socket. Or it blocks and waits for some data if there is nothing on the socket. It is not clear why the Length argument is present in the API at all.

For gen_tcp:recv the Length argument works as expected. Unless the option is specified (packet, size), which will be discussed below.

There is still an option (active, once). In this case, the socket starts in active mode, receives the first data packet as a message, and immediately switches to passive mode.

TCP socket

Let's remember in general terms what TCP is:

reliable data transfer protocol guarantees message delivery and delivery order;
permanent connection between client and server, has a state;
additional overhead for establishing and closing connections and transferring data.

It should be noted that maintaining constant connections with many thousands of clients for a long time is expensive. All connections must work independently of each other, which means in different threads. For many programming languages (but not Erlang) this is a serious problem.

This is why the HTTP protocol is so popular, which, although it works on top of a TCP socket, implies a short interaction time. This allows a relatively small number of threads (tens or hundreds) to serve a significantly larger number of clients (thousands, tens of thousands).

In some cases, there remains a need to have long-lived persistent connections between the client and server. For example, for chats or for multiplayer games. And here Erlang has few competitors.

To work with TCP, the gen_tcp module is used.

Working with a TCP socket is more difficult than working with a UDP socket. We now have client and server roles that require different implementations. Let's consider a server implementation option.

Module(server). -export(). start() -> start(1234). start(Port) -> spawn(?MODULE, server, ), ok. server(Port) -> io:format("start server at port ~p~n", ), (ok, ListenSocket) = gen_tcp:listen(Port, ), ) || ID<- lists:seq(1, 5)], timer:sleep(infinity), ok. accept(Id, ListenSocket) ->io:format("Socket #~p wait for client~n", ), (ok, _Socket) = gen_tcp:accept(ListenSocket), io:format("Socket #~p, session started~n", ), handle_connection (Id, ListenSocket). handle_connection(Id, ListenSocket) -> receive (tcp, Socket, Msg) -> io:format("Socket #~p got message: ~p~n", ), gen_tcp:send(Socket, Msg), handle_connection(Id , ListenSocket); (tcp_closed, _Socket) ->

There are two types of socket: Listen Socket And Accept Socket. There is only one Listen Socket, it accepts all connection requests. You need many Accept Sockets, one for each connection. The thread that creates the socket becomes the owner of the socket. If the owner thread exits, the socket is automatically closed. Therefore, we create a separate thread for each socket.

The Listen Socket must always be running, and to do this, its owner thread must not terminate. Therefore in server/1 we added a challenge timer:sleep(infinity). This will block the thread and prevent it from finishing. This implementation is, of course, educational. It would be good to provide the ability to correctly stop the server, but this is not possible here.

The Accept Socket and the thread for it could be created dynamically as clients appear. First, you can create one such thread and call gen_tcp:accept/1 and wait for the client. This call is blocking. It ends when the client appears. Then you can serve the current client in this thread, and create a new thread waiting for a new client.

But here we have a different implementation. We create a pool of several threads in advance, and they all wait for clients. After finishing work with one client, the socket is not closed, but waits for a new one. So, instead of constantly opening new sockets and closing old ones, we use a pool of long-lived sockets.

It is more effective when large quantities clients. Firstly, because we accept connections faster. Secondly, due to the fact that we manage sockets more carefully as a system resource.

Threads belong to an Erlang node, and we can create as many of them as we like. But sockets belong to the operating system. Their number is limited, although quite large. ( It's about about the limit on the number of file descriptors that the operating system allows a user process to open, usually 2 10 - 2 16).

Our pool size is toy-sized - 5 stream-socket pairs. In reality, we need a pool of several hundred such pairs. It would also be nice to be able to increase and decrease this pool at runtime in order to adapt to the current load.

The current session with the client is processed in the function handle_connection/2. It can be seen that the socket is in active mode, and the thread receives messages like (tcp, Socket, Msg), Where Msg-- this is binary data coming from the client. We send this data back to the client, that is, we implement a banal echo service :)

When the client closes the connection, the thread receives a message (tcp_closed, _Socket), returns back to accept/2 and is waiting for the next client.

This is what the operation of such a server with two telnet clients looks like:

$ telnet localhost 1234 Trying 127.0.0.1... Connected to localhost. Escape character is "^]". hello from client 1 hello from client 1 some message from client 1 some message from client 1 new message from client 1 new message from client 1 client 1 is going to close connection client 1 is going to close connection ^] telnet> quit Connection closed.

$ telnet localhost 1234 Trying 127.0.0.1... Connected to localhost. Escape character is "^]". hello from client 2 hello from client 2 message from client 2 message from client 2 client 2 is still active client 2 is still active but client 2 is still active but client 2 is still active and now client 2 is going to close connection and now client 2 is going to close connection ^] telnet> quit Connection closed.

2> server:start(). start server at port 1234 ok Socket #1 wait for client Socket #2 wait for client Socket #3 wait for client Socket #4 wait for client Socket #5 wait for client Socket #1, session started Socket #1 got message:<<"hello from client 1\r\n">> Socket #1 got message:<<"some message from client 1\r\n">> Socket #2, session started Socket #2 got message:<<"hello from client 2\r\n">> Socket #2 got message:<<"message from client 2\r\n">> Socket #1 got message:<<"new message from client 1\r\n">> Socket #2 got message:<<"client 2 is still active\r\n">> Socket #1 got message:<<"client 1 is going to close connection\r\n">> Socket #1, session closed Socket #1 wait for client Socket #2 got message:<<"but client 2 is still active\r\n">> Socket #2 got message:<<"and now client 2 is going to close connection\r\n">> Socket #2, session closed Socket #2 wait for client

Server in passive mode

This is all good, but good server must operate in passive mode. That is, it should receive data from the client not in the form of messages in the mailbox, but by calling gen_tcp:recv/2,3.

The nuance is that here we need to indicate how much data we want to read. How can the server know how much data the client sent it? Well, apparently, the client himself must say how much data he is going to send. To do this, the client first sends a small service packet, in which it indicates the size of its data, and then sends the data itself.

Now we need to decide how many bytes this service packet should occupy. If it is 1 byte, then you cannot pack a number larger than 255 into it. You can pack the number 65535 into 2 bytes, and 4294967295 into 4 bytes. 1 byte is obviously not enough. It is likely that the client will need to send more than 255 bytes of data. A 2 byte header is fine. A 4-byte header is sometimes needed.

So, the client sends a 2-byte service packet indicating how much data will follow it, and then the data itself:

Msg =<<"Hello">>, Size = byte_size(Msg), Header =<>, gen_tcp:send(Socket,<

>),

Full client code:

Module(client2). -export(). start() -> start("localhost", 1234). start(Host, Port) -> spawn(?MODULE, client, ). send(Pid, Msg) -> Pid ! (send, Msg), ok. stop(Pid) -> Pid ! stop, ok. client(Host, Port) -> io:format("Client ~p connects to ~p:~p~n", ), (ok, Socket) = gen_tcp:connect(Host, Port, ), loop(Socket). loop(Socket) -> receive (send, Msg) -> io:format("Client ~p send ~p~n", ), Size = byte_size(Msg), Header =<>, gen_tcp:send(Socket,<

>), loop(Socket); (tcp, Socket, Msg) -> io:format("Client ~p got message: ~p~n", ), loop(Socket); stop -> io:format("Client ~p closes connection and stops~n", ), gen_tcp:close(Socket) after 200 -> loop(Socket) end.

The server first reads 2 bytes, determines the size of the data, and then reads all the data:

(ok, Header) = gen_tcp:recv(Socket, 2),<> = Header, (ok, Msg) = gen_tcp:recv(Socket, Size),

In the server code the functions start/0 And start/1 have not changed, the rest has changed a little:

Server(Port) -> io:format("start server at port ~p~n", ), (ok, ListenSocket) = gen_tcp:listen(Port, ), ) || ID<- lists:seq(1, 5)], timer:sleep(infinity), ok. accept(Id, ListenSocket) ->io:format("Socket #~p wait for client~n", ), (ok, Socket) = gen_tcp:accept(ListenSocket), io:format("Socket #~p, session started~n", ), handle_connection (Id, ListenSocket, Socket). handle_connection(Id, ListenSocket, Socket) -> case gen_tcp:recv(Socket, 2) of (ok, Header) -><> = Header, (ok, Msg) = gen_tcp:recv(Socket, Size), io:format("Socket #~p got message: ~p~n", ), gen_tcp:send(Socket, Msg), handle_connection( Id, ListenSocket, Socket); (error, closed) -> io:format("Socket #~p, session closed ~n", ), accept(Id, ListenSocket) end.

An example of a session from the client side:

2> Pid = client2:start(). Client<0.40.0>connects to "localhost":1234<0.40.0>3> client2:send(Pid,<<"Hello">>). Client<0.40.0>send<<"Hello">> ok Client<0.40.0>got message:<<"Hello">> 4> client2:send(Pid,<<"Hello again">>). Client<0.40.0>send<<"Hello again">> ok Client<0.40.0>got message:<<"Hello again">> 5> client2:stop(Pid). Client<0.40.0>closes connection and stops ok

And from the server side:

2> server2:start(). start server at port 1234 ok Socket #1 wait for client Socket #2 wait for client Socket #3 wait for client Socket #4 wait for client Socket #5 wait for client Socket #1, session started Socket #1 got message:<<"Hello">> Socket #1 got message:<<"Hello again">> Socket #1, session closed Socket #1 wait for client

This is all well and good, but there's really no need to manually deal with the header package. This has already been implemented in gen_tcp. You need to specify the size of the service packet in the settings when opening a socket on the client side:

(ok, Socket) = gen_tcp:connect(Host, Port, ),

and on the server side:

(ok, ListenSocket) = gen_tcp:listen(Port, ),

and the need to form and parse these headers yourself disappears.

On the client side, sending is simplified:

Gen_tcp:send(Socket, Msg),

and on the server side it makes it easier to get:

Handle_connection(Id, ListenSocket, Socket) -> case gen_tcp:recv(Socket, 0) of (ok, Msg) -> io:format("Socket #~p got message: ~p~n", ), gen_tcp:send (Socket, Msg), handle_connection(Id, ListenSocket, Socket); (error, closed) -> io:format("Socket #~p, session closed ~n", ), accept(Id, ListenSocket) end.

Now when calling gen_tcp:recv/2 we specify Length = 0. gen_tcp it knows how many bytes need to be read from the socket.

Working with text protocols

In addition to the service header option, there is another approach. You can read from the socket one byte at a time until a special byte is encountered, symbolizing the end of the packet. This can be a null byte, or a newline character.

This option is typical for text protocols (SMTP, POP3, FTP).

There is no need to write your own implementation of reading from a socket, everything is already implemented in gen_tcp. You just need to specify in the socket settings instead (packet, 2) option (packet, line).

(ok, ListenSocket) = gen_tcp:listen(Port, ),

Otherwise, the server code remains unchanged. But now we can return to the telnet client again.

$ telnet localhost 1234 Trying 127.0.0.1... Connected to localhost. Escape character is "^]". hello hello hello again hello again ^] telnet> quit Connection closed.

We will need a TCP server, a text protocol and a telnet client in our course work.

35 replies

Summary

The TCP socket is an instance of an endpoint, defined by an IP address and port in the context of a specific TCP connection or listening state.

Port is the virtualization identifier, which specifies the endpoint of the service (as opposed to the endpoint of the service instance or its session identifier).

TCP socket is not a connection, this is the endpoint of a specific connection.

There may be concurrent connections to the service endpoint, since the connection is identified by both local and remote endpoints, allowing traffic to be routed to a specific service instance.

There can only be one listener socket for a given address and port combination.

Exposition

It was interest Ask, which made me reconsider some things I thought I knew inside out. You'd think a name like "socket" would be a given: apparently chosen to evoke images of an endpoint into which you plug a network cable, there are strong functional parallels there. However, in the language of networking, the word "socket" carries so much baggage that a careful re-examination is necessary.

In its broadest sense, a port is a point of entry or exit. The French word porte, although not used in a networking context, literally means door or gateway, further emphasizing the fact that ports are transport endpoints, whether you're sending data or large steel containers.

For the purposes of this discussion, I will limit my consideration to the context of TCP-IP networks. The OSI model is very good, but has never been fully implemented, much less widely deployed in high-voltage, high-traffic environments.

The combination of an IP address and port is strictly known as an endpoint and is sometimes called a socket. This usage is related to RFC793, the original TCP specification.

A TCP connection is defined by two endpoints aka sockets.

An endpoint (socket) is determined by a combination of a network address and a port ID. Note that the address/port does not fully identify the socket (more on this later).

The purpose of ports is to distinguish between multiple endpoints on a given network address. We can say that the port is a virtualized endpoint. This virtualization makes several parallel connections on one network interface.

This is a pair of sockets (a 4-tuple consisting of the client IP address, the client port number, the server IP address, and the server port number) that specifies two endpoints that uniquely identify each TCP connection to the Internet. (TCP-IP Illustrated Volume 1, W. Richard Stevens)

In most C-based languages, TCP connections are established and handled using methods on an instance of the Socket class. Although it is common to work at a higher level of abstraction, usually an instance of the NetworkStream class, it typically provides a reference to a socket object. To the encoder, this socket object appears to represent a connection because the connection is created and managed using the socket object's methods.

In C#, to establish a TCP connection (to an existing listener), you first create a TcpClient. If you don't specify an endpoint for the TcpClient constructor, it uses the default values - whichever way the local endpoint is determined. Then you call the Connect method on the created instance. This method requires a parameter that describes the other endpoint.

This is all a bit confusing and leads you to believe that a socket is a connection which is a lock. I was working under this misunderstanding until Richard Dorman asked the question.

Having done a lot of reading and thinking, I'm now convinced that it would make much more sense to have a TcpConnection class with a constructor that takes two arguments: LocalEndpoint and RemoteEndpoint. You could probably support a single RemoteEndpoint argument when default values for the local endpoint are acceptable. This is ambiguous on multi-core computers, but the ambiguity can be resolved using the routing table by selecting the interface with the shortest route to the remote endpoint.

Clarity will be enhanced in other ways as well. The socket is not identified by the combination of IP address and port:

[...] TCP demultiplexes incoming segments using all four values that contain local and foreign addresses: destination IP address, destination port number, source IP address, and source port number. TCP cannot determine which process is receiving an incoming segment just by looking at the destination port. Additionally, the only one of the [various] endpoints in [ this number port], which will accept incoming connection requests, is the one in the listening state. (p255, TCP-IP Illustrated Volume 1, W. Richard Stevens)

As you can see, it's not just possible, but likely, that a network service has many sockets with the same address/port, but only one listener socket at a particular address/port combination. Typical implementations of the library are a socket class, an instance of which is used to create and manage a connection. This is extremely unfortunate as it causes confusion and has led to a wide conflation of the two concepts.

Khagrawal doesn't believe me (see comments), so here's a real sample. I connected a web browser to http://dilbert.com and then ran netstat -an -p tcp . The last six lines of output contain two examples of how the address and port are not sufficient to uniquely identify a socket. There are two different connections between 192.168.1.3 (mine work station) and 54.252.92.236:80

TCP 192.168.1.3:63240 54.252.94.236:80 SYN_SENT TCP 192.168.1.3:63241 54.252.94.236:80 SYN_SENT TCP 192.168.1.3:63242 207.38.110.62:80 SYN_SENT TCP 192.168.1.3:63243 207.38.110.62:80 SYN_SENT TCP 192.168 .1.3:64161 65.54.225.168:443 ESTABLISHED

Since the socket is the endpoint of the connection, there are two sockets with the address/port combination 207.38.110.62:80 and two more with the address/port combination 54.252.94.236:80 .

I think the misunderstanding of Khagrawal arises from my very careful use of the word "identifies". I mean "completely, uniquely and uniquely identify". In the example above, there are two endpoints with the address/port combination 54.252.94.236:80. If you have the address and port, you don't have enough information to separate those connectors. There is not enough information to identify the socket.

Addition

Paragraph two of section 2.7 of RFC793 states:

The connection is completely defined by a pair of sockets at the ends. The local socket can participate in many connections with various foreign sockets.

This socket definition is not useful from a programming perspective because it is not the same as the socket object, which is the endpoint for a specific connection. For a programmer, and most of this audience is a programmer, this is a vital functional difference.

Links

TCP-IP Illustrated Volume 1 Protocols, W. Richard Stevens, 1994 Addison Wesley

Socket represents a single connection between two network applications. The two applications nominally run on different computers, but sockets can also be used for interprocess communication on the same computer. Applications can create multiple sockets to communicate with each other. Sockets are bidirectional, meaning that both sides of the connection are capable of sending and receiving data. Therefore, a socket can be created theoretically at any level of the OSI model from level 2 up. Programmers often use sockets in network programming, albeit indirectly. Programming libraries such as Winsock hide many of the low-level details of socket programming. Sockets have been widely used since the early 1980s.

Port represents the end point or "channel" for network communication. Port numbers allow various applications use on one computer network resources without interfering with each other. Port numbers are most common in network programming, especially in socket programming. Sometimes, however, port numbers become visible to the casual user. For example, some websites that a person visits on the Internet use the following URL:

With some analogy

Although for sockets a lot of technical stuff has already been given details...I'd like to add my answer, just in case, if anyone still can't feel the difference between ip, port and sockets

Consider server S,

and let's say person X, Y, Z need a service (let's say a chat service) from this server S

IP address says → Who? that chat server "S" that X, Y, Z wants to contact

ok, you got "who is the server"

but suppose server "S" also provides some other services to other people, say "S" provides storage services to persons A, B, C

port says ---> which? service that you (X, Y, Z) need, i.e. a chat service, not a storage service

ok.. you make the server know that you need a chat service and not storage

you are three years old and the server may want to identify all three differently

comes socket

Now socket says → Which? specific connection

that is, let's say

socket 1 per person X

socket 2 for person Y

and 3 socket for person Z

I hope this helps someone who was still confused :)

First, I think we should start with a little understanding of what it takes to get an A to B package.

A common definition for a network is to use the OSI model, which divides the network into multiple layers according to its purpose. There are a few important ones that we will cover here:

Data link layer. This layer is responsible for receiving data packets from one network device to another and is located just above the level that actually transmits. It talks about MAC addresses and knows how to find hosts based on their MAC (hardware) address, but nothing more.
The network layer is the layer that allows data to be transported across machines and across physical boundaries such as physical devices. The network layer must essentially support additional mechanism, based on an address that is somehow related to a physical address; enter the IP address (IPv4). The IP address can get your package from A to B over the Internet, but knows nothing about how to handle individual flights. This is processed by the layer above according to the routing information.
Transport layer. This layer is responsible for defining how information is obtained from A to B and any limitations, checks, or errors on that behavior. For example, TCP adds Additional information into the packet so that it can be output if packets were lost.

TCP contains, among other things, the concept of ports. These are actually different data endpoints on the same IP address that the Internet Socket (AF_INET) can bind to.

Short short answer.

A port can be described as internal address inside a host that identifies a program or process.

A socket can be described as software interface , allowing a program to communicate with other programs or processes, online or locally.

Typically you will get a lot of theoretical stuff, but one of the most simple ways to distinguish between these two concepts is as follows:

To receive the service you need service number. This service number is called a port. Just like that one.

For example, HTTP as a service runs on port 80.

Now many people can request the service and the client-server connection is established. There will be many connections. Each connection represents a client. To support each connection, the server creates a socket for each connection to support its client.

There seem to be a lot of answers comparing a socket to a connection between two PCs. I think this is absolutely false. The socket has always been an endpoint on 1 PC, which may or may not be connected - of course we've all used receiver or UDP sockets* at some point. An important part is that it is targeted and active. Sending a message to file 1.1.1.1:1234 is unlikely to work since there is no socket for that endpoint.

Sockets are protocol specific - so the implementation of uniqueness is that TCP / and UDP / uses * (ipaddress: port), different from, for example, IPX (Network, Node and ... game, socket - but a different socket than under generic term "socket". IPX socket numbers are equivalent to IP ports). But they all offer a unique addressable endpoint.

As IP has become the dominant protocol, a port (in networking terms) has become singular with the UDP or TCP port number being part of the socket address.

UDP is connection-neutral - this means that a virtual circuit is never created between the two endpoints. However, as an endpoint we still refer to UDP sockets. The API functions make it clear that both are simply different types of sockets. SOCK_DGRAM is UDP (just sending a message) and SOCK_STREAM is TCP (creating a virtual circuit).

Technically, the IP header contains the IP address, and the protocol over IP (UDP or TCP) contains the port number. This allows the use of other protocols (such as ICMP, which do not have port numbers, but do have IP address information).
These are terms from two different domains: "port" is a concept from TCP/IP networks, "socket" is an API (programming). A "socket" is created (in code) by taking a port, hostname, or network adapter and combining them into a data structure that you can use to send or receive data.
TCP-IP connections are bidirectional paths connecting one address: port combination to another address: port combination. So whenever you open a connection with local machine to port on remote server(eg www.google.com:80), you also link new number port on your computer with a connection so the server can send things back to you (eg 127.0.0.1:65234). It's useful to use netstat to view your connections to your computer:
> netstat -nWp tcp (on OS X) Active Internet connections Proto Recv-Q Send-Q Local Address Foreign Address (state) tcp4 0 0 192.168.0.6.49871 17.172.232.57.5223 ESTABLISHED ...
A socket is a special type of file descriptor that is used by a process to request network services from the operating system. The socket address is a triple: (protocol, local-address, local-process), where the local process is identified by the port number.

In the TCP/IP set, for example:

(tcp, 193.44.234.3, 12345)

A conversation is a line of communication between two processes, thus depicting the connection between the two. An association is a 5-tuple that completely defines the two processes that contain the connection: (protocol, local-address, local-process, foreign-address, foreign-process)

In the TCP/IP set, for example:

(tcp, 193.44.234.3, 1500, 193.44.234.5, 21)

may be a valid association.

Semi-association: (protocol, local address, local process)

(protocol, foreign-address, foreign-process)

which define each half of the connection.

A half-link is also called a socket or transport address. That is, a socket is an endpoint for communication that can be named and addressed on a network. The socket interface is one of several application programming interfaces (APIs) for communication protocols. Designed as a universal communication programming interface, it was first introduced UNIX system 4.2BSD. Although it was not standardized, it became the de facto industry standard.
The port was the easy part, it's simply a unique identifier for the socket. A socket is something that processes can use to establish connections and communicate with each other. Tall Jeff had a great phone analogy that wasn't perfect, so I decided to fix it:
An application consists of a pair of processes that communicate over a network (client-server pair). These processes send and receive messages to and from the network through a programming interface socket. Considering the analogy presented in the book " Computer network: top-down approach." There is a house that wants to communicate with another house. Here the house is similar to the process and the door to the socket. The sending process assumes that on the other side of the door there is an infrastructure that will transmit data to the destination. Once the message arrives from on the other hand, it passes through the receiver door (socket) into the house (process).This illustration from the same book may help you:
Sockets are part of the transport layer, which provides logical communication with applications. This means that from an application perspective, both nodes are directly connected to each other, although there are many routers and/or switches between them. So the socket is not the connection itself, it is the end point of the connection. Transport layer protocols are implemented only on hosts, not on intermediate routers.
Ports provide a means of internal addressing for the machine. The main goal is to allow multiple processes to send and receive data over the network without interfering with other processes (their data). All sockets are provided with a port number. When a segment arrives at the host, the transport layer examines the segment's destination port number. It then transfers the segment to the appropriate socket. This task of delivering data on the transport layer segment to the correct socket is called dampening. The segment data is then passed to the process attached to the socket.
A socket is the structure of your software. It's more or less a file; it has operations such as read and write. It is not a physical thing; it's a way for your software to reference physical things.

A port is a device-like thing. Each host has one or more networks (physically); the host has an address on each network. Each address can have thousands of ports.

Only one socket can use a port at an address. A socket allocates a port in much the same way as allocating a device for I/O file system. Once a port is allocated, no other socket can connect to that port. The port will be released when the socket is closed.
A socket is one endpoint of a two-way communication line between two programs running on a network. The socket is bound to a port number so that the TCP layer can identify the application to which the data is intended to be sent.
The relative TCP/IP terminology I assume implies this question. In layman's terms:

PORT is the telephone number for a specific home in a specific zip code. A city's postal code can be thought of as the IP address of a city and all the houses in that city.

SOCKET, on the other hand, is more like a set phone call between the phones of a pair of houses talking to each other. These calls can be established between houses in the same city or two houses in different cities. It is that temporary established path between two phones talking to each other that is a SOCKET.
The port and socket can be compared to a bank branch.

The Bank building number is similar to the IP address. The bank has various sections such as:
1. Savings account department
2. Personal Loan Department
3. Mortgage lending department
4. Complaints department
Thus, 1 (Savings Account Department), 2 (Personal Loan Department), 3 (Home Loan Department) and 4 (Grievance Redressal Department) are ports.

Now let us tell you that you go to open a savings account, you go to the bank (IP address), then you go to the "savings account department" (port #1), then you meet one of the employees working in the "department savings account" ". Let's call it SAVINGACCOUNT_EMPLOYEE1 to open the account.

SAVINGACCOUNT_EMPLOYEE1 is your socket handle, so can be from SAVINGACCOUNT_EMPLOYEE1 to SAVINGACCOUNT_EMPLOYEEN. These are all socket descriptors.

Likewise, other departments will have work under them and they are similar to the socket.
The socket is the end point of communication. The socket is not directly related to the TCP/IP protocol family; it can be used with any protocol supported by your system. The Socket C API expects you to first receive an empty socket object from the system, which can then be bound to a local socket address (to directly receive incoming traffic for connectionless protocols or accept incoming connection requests for connection-oriented protocols), or that you can connect to the remote socket address (for any protocol type). You can even do both if you want to control both: the local socket address to which the socket is bound, and the remote socket address to which the socket is bound. For connectionless protocols, connecting the socket isn't even necessary, but if you don't, you'll also have to pass the destination address with every packet you want to send over the socket, how else would the socket know where to send that data to? The advantage is that you can use one socket to send packets to different socket addresses. Once you've configured the socket and maybe even connected it, consider it a bidirectional communication channel. You can use it to transmit data to some destination, and another destination can use it to transmit data to you. What you write to the socket is sent, and what was received is readable.

On the other hand, ports are something that only certain protocols in the TCP/IP protocol stack have. TCP and UDP packets have ports. Port is just a number. The combination of source port and destination port defines the communication channel between two hosts. For example, you may have a server that needs to be a simple HTTP server, and a simple FTP server. If a packet now arrives for this server's address, how does it know whether it is a packet for an HTTP or an FTP server? Well, it will know since the HTTP server will be running on port 80 and the FTP server will be on port 21, so if a packet arrives with destination port 80, it is destined for the HTTP server and not the FTP server . Also the packet has a source port because without such a source port the server can only have one connection to one IP address at a time. The source port allows the server to differentiate between identical connections: they all have the same destination port, for example port 80, the same destination IP address, always the same server address and the same source IP address because they all come from the same client, but since they have different source ports, the server can tell them apart. And when the server sends back responses, it does so on the port the request came from, this way the client can also differentiate between the different responses it receives.
One port can have one or more jacks connected to another external IP, such as multiple sockets.
TCP 192.168.100.2:9001 155.94.246.179:39255 ESTABLISHED 1312 TCP 192.168.100.2:9001 171.25.193.9:61832 ESTABLISHED 1312 TCP 192.168.100.2:900 1 178.62.199.226:37912 ESTABLISHED 1312 TCP 192.168.100.2:9001 188.193.64.150:40900 ESTABLISHED 1312 TCP 192.168.100.2:9001 198.23.194.149:43970 ESTABLISHED 1312 TCP 192.168.100.2:9001 198.49.73.11:38842 ESTABLISHED 1312
Socket is an abstraction provided by the kernel user applications for data input/output. The type of socket is determined by its processing protocol, IPC communication, etc. So if someone creates TCP socket, it can do manipulations such as reading data into a socket and writing data to it using simple methods and lower layer protocol processing such as TCP translations and packet forwarding to lower layers network protocols performed by a specific socket implementation in the kernel. The advantage is that the user doesn't have to worry about how to deal with specific specific protocols, and simply read and write data to the socket like a regular buffer. The same is true in case of IPC, the user simply reads and writes data to the socket and the kernel handles all the lower level details depending on the type of socket created.