file and network sockets.

Lecture



In the competition for the best computer idea of ​​all times and peoples, sockets, no doubt, could count on a prize. Like other interprocess communication tools discussed in this series of articles, the sockets were first implemented on the Unix platform (4.2BSD), however, the concept of sockets, as a universal means of exchanging data between processes, was so successful that all modern systems support at least some subset of sockets. The reasons for socket success are their simplicity and versatility. Programs that exchange data using sockets can work in the same system and in different systems, using both special system objects and a network stack to exchange data. Like pipes, sockets use a simple interface based on the read (2) and write (2) “file” functions (opening a socket, the Unix program gets a file descriptor that allows you to work with sockets using file functions), but, unlike from channels, sockets allow you to transfer data in both directions, both in synchronous and in asynchronous mode.

Most programmers use various high-level libraries to work with sockets, however, high-level libraries, as a rule, do not allow using all the power and all the variety of sockets. A good example of diversity is file sockets. Windows programmers should be familiar with network sockets, which typically organize data exchange using the TCP / IP family of protocols, but there are other types of sockets for Unix that are specifically designed for exchanging data between local processes.

Sockets in the file namespace

Sockets in the file namespace ( file namespace , also called “Unix sockets”) use special file names as addresses. An important feature of these sockets is that connecting them to local and remote applications is impossible, even if the file system in which the socket was created is available to the remote operating system. In the following code snippet, we create a socket and link it to the socket.soc file (this is a fragment of the server program fsserver.c, which you will find here ):
  sock = socket (AF_UNIX, SOCK_DGRAM, 0);
 if (sock <0) {
 perror ("socket failed");
 return EXIT_FAILURE;
 }
 srvr_name.sa_family = AF_UNIX;
 strcpy (srvr_name.sa_data, "socket.soc");
 if (bind (sock, & srvr_name, strlen (srvr_name.sa_data) +
 sizeof (srvr_name.sa_family)) <0) {
 perror ("bind failed");
 return EXIT_FAILURE;
 }

The constants and functions necessary for working with sockets in the file namespace are declared in the <sys / types.h> and <sys / socket.h> files. Like files, sockets in programs are represented by descriptors. A socket descriptor can be obtained using the socket (2) function. The first parameter of this function is the domain to which the socket belongs. The socket domain refers to the type of connection (and not the Internet domain name, as you might think). The domain denoted by the AF_UNIX constant corresponds to the sockets in the file namespace. The second parameter socket () defines the type of socket. the value of SOCK_DGRAM indicates a datagram socket (I prefer this spelling variant as used in [1] “datagram”). Datagram sockets make unreliable connections when transferring data over a network and allow broadcasting of data. Another commonly used type of socket is SOCK_STREAM, which corresponds to stream sockets that implement point-to-point connections with reliable data transfer. However, in the file name space, datagram sockets are as reliable as stream sockets. The third parameter of the socket () function allows you to specify the protocol used for data transfer. We leave the value of this parameter equal to zero. In the event of an error, the socket () function returns -1.

After receiving the socket descriptor, we call the bind (2) function, which associates the socket with the specified address (the socket must be associated with the address in the server program, but not in the client). The first parameter of the function is a descriptor, and the second is a pointer to the sockaddr structure (srvr_name variable) containing the address where the server is registered (the third parameter of the function is the length of the structure containing the address). Instead of the general sockaddr structure for Unix sockets (sockets in the file namespace), you can use the specialized sockaddr_un structure. The sockaddr.sa_family field allows you to specify a family of addresses that we will use. In our case, this is the Unix AF_UNIX file socket address family. The address of the AF_UNIX family (sa_data field) is the usual name of the socket file. After calling bind (), our server program becomes available to connect to the specified address (file name).

When exchanging data with datagram sockets, we use not the write () and read () functions, but the special functions recvfrom (2) and sendto (2). The same functions can also be used when working with stream sockets, but in the corresponding example we will use the "sweet couple" read () / write (). To read data from a datagram socket, we use the recvfrom (2) function, which by default blocks the program until new data appears at the input.

  bytes = recvfrom (sock, buf, sizeof (buf), 0, & rcvr_name, & namelen); 

Calling the recvfrom () function, we pass it a pointer to another sockaddr structure, in which the function returns the address of the client requesting the connection (in the case of file sockets, this parameter does not carry any useful information). The last parameter of the recvfrom () function is a pointer to a variable in which the length of the structure with the address will be returned. If the client's address information does not interest us, we can pass NULL values ​​in the penultimate and last parameters. Upon completion of working with a socket, we close it using the “file” function close (). Before exiting the server program, delete the socket file created as a result of the socket () call, which we do with the unlink () function.

If the server program seemed simple to you, then the client program (fsclient.c) will be even easier. We open a socket using the socket () function and pass the data (test string) to the server using the “partner” recvfrom (), the sendto (2) function:

  srvr_name.sa_family = AF_UNIX;
 strcpy (srvr_name.sa_data, SOCK_NAME);
 strcpy (buf, "Hello, Unix sockets!");
 sendto (sock, buf, strlen (buf), 0, & srvr_name,
 strlen (srvr_name.sa_data) + sizeof (srvr_name.sa_family));

The first parameter of the sendto () function is a socket descriptor, the second and third parameters allow you to specify the address of the buffer for data transfer and its length. The fourth parameter is used to pass additional flags. The penultimate and last parameters carry information about the server address and its length, respectively. If you call the connect (2) function when working with datagram sockets (see below), you can omit the destination address each time (you only need to specify it once, as a parameter to the connect () function). Before calling the sendto () function, we need to fill the sockaddr structure (srvr_name variable) with the server address information. After the end of the data transfer, we close the socket using close (). If you run the server program and then the client program, the server will print a test string sent by the client.

Paired sockets

Sockets in the file namespace are similar to named pipes in that a special type of file is used to identify sockets. In the world of sockets there is also an analogue of unnamed channels - paired sockets ( socket pairs ). Like unnamed pipes, paired sockets are created in pairs and do not have names. Naturally, the scope of paired sockets is the same as that of unnamed channels, the interaction between the parent and child processes. Just as in the case of an unnamed channel, one of the descriptors is used by one process, the other by another. As an example of using paired sockets, we consider the program sockpair.c, which creates two processes using fork (). Child processes sockpair.c use paired sockets to exchange a polite English greeting.
  #include <sys / types.h>
 #include <sys / socket.h>
 #include <stdlib.h>
 #include <stdio.h>
 #include <errno.h>
 #define STR1 "How are you?"
 #define STR2 "I'm ok, thank you."
 #define BUF_SIZE 1024
 int main (int argc, char ** argv)
 {int sockets [2];
 char buf [BUF_SIZE];
 int pid;
 if (socketpair (AF_UNIX, SOCK_STREAM, 0, sockets) <0) {
 perror ("socketpair () failed");
 return EXIT_FAILURE;
 }
 pid = fork ();
 if (pid! = 0) {
 close (sockets [1]);
 write (sockets [0], STR1, sizeof (STR1));
 read (sockets [0], buf, sizeof (buf));
 printf ("% s \ n", buf);
 close (sockets [0]);
 } else {
 close (sockets [0]);
 read (sockets [1], buf, sizeof (buf));
 printf ("% s \ n", buf);
 write (sockets [1], STR2, sizeof (STR2));
 close (sockets [1]);
 }
 }

Paired sockets are created by the socketpair function (2). The socketpair () function has four parameters. The first three parameters of the function are the same as those of socket (), and the fourth parameter is an array of two variables in which the descriptors are returned. The socket descriptors returned by socketpair () are ready for data transfer, so we can immediately apply read () / write () functions to them. After calling fork (), each process receives both descriptors, one of which it must close. To close a socket, we use the close () function.

When looking at the programming interface of paired sockets, the question may arise, why do these functions relate to sockets? After all, when working with them, we do not use either the address or the client-server model. This is true, but note that the socketpair () functions transfer domain and socket type values, so formally and in terms of implementation in the system, we use real sockets. It should be noted that specifying the domain in the socketpair () function is obviously unnecessary, since the system only supports sockets in the AF_UNIX domain for this function (a logical limitation, considering that the paired sockets have no names and are intended for data exchange between related processes).

Network sockets

We turn to consideration of the most important and universal type of sockets - network sockets. I think that there is no need to extend the meaning that network sockets have in Unix systems. Even if you are writing a system of applications designed to work on one computer, consider using network sockets to exchange data between these applications. Perhaps in the future, your software package will gain power and there will be a need to distribute its components on several machines.

Using network sockets will make the project scaling process painless. However, network sockets have drawbacks. Even if sockets are used to exchange data on the same machine, the transferred data must go through all levels of the network stack, which adversely affects the speed and load on the system.

As an example, we consider a complex of two applications, a client and a server, using network sockets for data exchange. You will find the text of the server program in the file netserver.c, below we present some fragments. First of all, we need to get the socket descriptor:

  sock = socket (AF_INET, SOCK_STREAM, 0);
 if (socket <0) {
 printf ("socket () failed:% d \ n", errno);
 return EXIT_FAILURE;
 }

In the first parameter of the socket () function, we pass the AF_INET constant, indicating that the socket to be opened must be network. The value of the second parameter requires that the socket be streaming. Next, we, as in the case of a socket in the file namespace, call the bind () function:

  serv_addr.sin_family = AF_INET;
 serv_addr.sin_addr.s_addr = INADDR_ANY;
 serv_addr.sin_port = htons (port);
 if (bind (sock, (struct sockaddr *) & serv_addr, sizeof (serv_addr)) <0) {
 printf ("bind () failed:% d \ n", errno);
 return EXIT_FAILURE;
 }

The variable serv_addr, is a structure of type sockaddr_in. The sockaddr_in type is specifically designed to store addresses in Internet format. The most important difference between sockaddr_in and sockaddr_un is the presence of the sin_port parameter for storing the port value. The htons () function rewrites the two-byte value of the port so that the byte order matches the one adopted on the Internet (see the insert). We specify AF_INET (Internet address family) as the address family, and the special constant INADDR_ANY as the address itself. Thanks to this constant, our server program will register at all addresses of the machine on which it runs.

Tip and blunt points

This is how the terms little-endian and big-endian are translated into Russian. In computer literature, these terms refer to the byte order used by the processor to represent simple multibyte types (for example, a 32-bit integer). In the original (that is, in the fabulous novel by J. Swift, “Gulliver in the Land of the Lilliputians”), so-called hostile social movements, adherents of which held opposite views on the order of cleaning eggs from the shell. The disagreements between the little-noctures and the blunt-ends were even the cause of the war between Liliput and Blefusku, a hostile state. However, in the computer world, problems of the order of bytes can reach quite non-Liliput sizes. On Intel processors, the byte order is spiky, and, for example, on MacOS X — Power PC Sun SPARC — stupid (if you consider that Apple abandoned PowerPC, and Sun replaces RISC architectures with opterons, it turns out that spikes are winning). However, Internet protocols use stupid byte ordering. In order to avoid confusion, it is recommended to use the htons () function in all systems, including blunt-tips. This function "knows" the byte order in the system and, if necessary, brings it into compliance with the TCP / IP protocols. At one time, an article in the Russian-language Internet stalked, stating (however, jokingly) that the byte order alien to Intel and the resulting need for an additional rearrangement operation were the result of a conspiracy from giant software companies. The book [1] uses the terms "direct byte order" for little-endian and "reverse byte order" for big-endian.

To understand what we have to do next, let's remember how the Unix network subsystem works and, in this case, any other OS. A network server should be able to perform requests from multiple clients at the same time (our server, netserver.c, can actually process a request from only one client, but this is a general case now). At the same time, in point-to-point connections, for example, when using stream sockets, a separate socket must be opened for each client at the server. From this it follows that we should not establish a connection with the client through the sock socket itself, intended to listen for incoming requests (usually, using network sockets, we cannot do this), otherwise all other attempts to connect to the server at the specified address and port will be locked. Instead, we call the listen (2) function, which puts the server in standby for a connection request:

  listen (sock, 1); 

The second parameter listen () is the maximum number of connections that the server can handle at the same time. Next, we call the accept (2) function, which establishes a connection in response to a client request:

  newsock = accept (sock, (struct sockaddr *) & cli_addr, & clen);
 if (newsock <0) {
 printf ("accept () failed:% d \ n", errno);
 return EXIT_FAILURE;
 }

Upon receiving a connection request, accept () returns a new socket open for communication with the client that requested the connection. The server as if redirects the requested connection to another socket, leaving the socket sock free to listen for connection setup requests. The second parameter of the accept () function contains information about the address of the client that requested the connection, and the third parameter indicates the size of the second. Just like when calling the function recvfom (), we can pass NULL in the last and last but one parameters. To read and write data, the server uses the read () and write () functions, and to close the sockets, of course, close (). In the client program (netclient.c) we, first of all, need to solve a problem that we did not encounter when writing a server, namely, to convert the domain name of the server to its network address. Domain name resolution is performed by the function gethostbyname ():

  server = gethostbyname (argv [1]);
 if (server == NULL) {
 printf ("Host not found \ n");
 return EXIT_FAILURE;
 }

The function receives a pointer to a string with an Internet server name (for example, www.unix.com or 192.168.1.16) and returns a pointer to a hostent structure (server variable), which contains the server name in a form acceptable for later use. In this case, if necessary, resolves the domain name to the network address. Next, we populate the fields in the variable serv_addr (sockaddr_in structures) with the values ​​of the address and port:

  serv_addr.sin_family = AF_INET;
 strncpy ((char *) & serv_addr.sin_addr.s_addr,
 (char *) server-> h_addr, server-> h_length);
 serv_addr.sin_port = htons (port);

The client program opens a new socket by calling the socket () function in the same way as the server does (the socket descriptor that returns socket () is stored in the sock variable) and calls the connect (2) function to establish the connection:

  if (connect (sock, & serv_addr, sizeof (serv_addr)) <0) {
 printf ("connect () failed:% d", errno);
 return EXIT_FAILURE;
 }

The socket is now ready to send and receive data. The client program reads characters entered by the user in a terminal window. When the user presses <Enter>, the program sends the data to the server, waits for a response message from the server and prints it out.

Throughout this article, we have mentioned non-blocking sockets several times. Let us dwell on them a little more. On non-blocking sockets you need to know, first of all, that they can not use. Thanks to multi-threaded (multi-program) programming, we can use blocking sockets in all situations (both when we need to process several sockets at the same time and when we need the ability to interrupt the operation performed on the socket). Consider, however, the two functions necessary for working with non-blocking sockets. By default, the socket () function creates a blocking socket. Чтобы сделать его не- блокирующим, мы используем функцию fcntl(2):

 sock = socket(PF_INET, SOCK_STREAM, 0);
fcntl(sock, F_SETFL, O_NONBLOCK);

Теперь любой вызов функции read() для сокета sock будет возвращать управление сразу же. Если на входе сокета нет данных для чтения, функция read() вернет значение EAGAIN. Для поверки состояния не-блокирующих сокетов можно воспользоваться функцией select(2). Функция select() способна проверять состояние нескольких дескрипторов сокетов (или файлов) сразу. Первый параметр функции – количество проверяемых дескрипторов. Второй, третий и четвертый параметры функции представляют собой наборы дескрипторов, которые следует проверять, соответственно, на готовность к чтению, записи и на наличие исключительных ситуаций. Сама функция select() – блокирующая, она возвращает управление, если хотя бы один из проверяемых сокетов готов к выполнению соответствующей операции. В качестве последнего параметра функции select() можно указать интервал времени, по прошествии которого она вернет управление в любом случае. Вызов select() для проверки наличия входящих данных на сокете sock может выглядеть так:

 fd_set set;
struct timeval interval;
FD_SET(sock, &set);
tv.tv_sec = 1;
tv.tv_usec = 500000;
 ...
select(1, &set, NULL, NULL, &tv);
if (FD_ISSET(sock, &set) {
// Есть данные для чтения
 }

Все, что касается функции select() теперь объявляется в заголовочном файле <sys/select.h> (раньше объявления элементов функции select() были разбросаны по файлам <sys/types.h>, <sys/time.h> и <stdlib.h>). В приведенном фрагменте кода FD_SET и FD_ISSET – макросы, предназначенные для работы с набором дескрипторов fd_set.

На этом мы закончим знакомство с увлекательным миром межпроцессного взаимодействия Linux. Следующая статья будет посвящена управлению процессами, сигналам и потокам.

Literature:

1.обратно Стивенс У., UNIX: Разработка сетевых приложений. - СПб.: Питер, 2004
2.обратно WR Stevens, SA Rago, Advanced Programming in the UNIX® Environment: Second Edition, Addison Wesley Professional, 2005

#inc lude <std lib.h>
#inc lude <std io.h>
#incl ude <str ing.h>
#incl ude <err no.h>
#inclu de <sys/ty pes.h>
#inc lu de <sys/so cket.h>

#de fine SOCK_NAME "so cket.soc"
#de fine BUF_SIZE 256

int m ain(int argc, char ** argv)
{
int sock;
so ck = soc ket(AF_UNIX, SOCK_DGRAM, 0);
char buf[BUF_SIZE];
stru ct soc kaddr srvr_name;

if (sock < 0)
{
per ror("socket failed");
ret urn EXIT_FAILURE;
}
srvr_na me.sa_family = AF_UNIX;
str cpy(srvr_name.sa_data, SOCK_NAME);
str cpy(buf, "Hel lo, Unix sockets!");
sen dto(sock, buf, strlen(buf), 0, &srvr_name,
strl en(srvr_name.sa_data) + sizeof(srvr_name.sa_family));
}

#in clude <st dlib.h>
#in clude <st dio.h>
#inc lude <stri ng.h>
#inc lude <errn oh>
#inc lude <sys/ty pes.h>
#incl ude <sys/sock et.h>

#defi ne SOCK_N AME "socket.soc"
#defi ne BUF_SIZE 256

int m ain(int argc, char ** argv)
{
struct sock addr srvr_name, rcvr_name;
char buf[BUF_SIZE];
int s ock;
int na melen, bytes;

sock = socket(AF_U NIX, S OCK_DGRAM, 0);
if (sock < 0)
{
per ror("socket failed");
ret urn EXI T_FAILURE;
}
srvr_name.sa_family = AF_UNIX;
strc py(srvr_n ame.sa_data, SOCK_NAME);
if (bind(sock, &srvr_nam e, str len(srvr_name.sa_data) +
size of(srvr_name.sa_family)) < 0)
{
per ror("bind failed");
ret urn EXI T_FAILURE;
}
bytes = recvfrom(sock, buf, sizeof(buf), 0, &rcvr_name, &namelen);
if (bytes < 0)
{
perror("recvfrom failed");
return EXIT_FAILURE;
}
b uf[b ytes] = 0;
rcv r_na me.sa_data[na melen] = 0;
pr intf("Client sent: %s\n", buf);
cl ose(sock);
un link(SOCK_NAME);
}

#incl ude <std io.h>
#incl ude <st dlib.h>
#incl ude <er rno.h>
#inclu de <strin gs.h>
#incl ude <sys/ty pes.h>
#inc lude <sys/so cket.h>
#incl ude <net inet/in.h>
#inc lude <net db.h>

#defi ne BU F_SIZE 256

int m ain(int argc, char ** argv)
{
int so ck, port;
str uct soc k addr_in serv_addr;
str uct hostent *server;
char buf[BUF_SIZE];

if (argc < 3)
{
fpr intf(stderr,"usage: %s <hostna me> <port_num ber>\n", argv[0]);
ret urn EXIT_FAILURE;
}
po rt = ato i(argv[2]);
so ck = soc ket(AF_INET, SO CK_STREAM, 0);
if (so ck < 0)
{
pri ntf("socket() failed: %d", errno);
ret urn EXIT_FAILURE;
}
ser ver = gethostbyname(argv[1]);
if (se rver == NULL)
{
prin tf("Host not found\n");
retu rn EXIT_FAILURE;
}
me mset((char *) &serv_addr, 0, sizeof(serv_addr));
ser v_addr.sin_family = AF_INET;
strn cpy((char *)&serv_addr.sin_addr.s_addr, (char *)server->h_addr, server->h_length);
serv_ad dr.sin_port = htons(port);
if (con nect(sock, &serv_addr, sizeof(serv_addr)) < 0)
{
pri ntf("connect() failed: %d", errno);
retur n EXIT_FAILURE;
}
pri ntf(">");
me mset(buf, 0, BUF_SIZE);
fge ts(buf, BUF_SIZE-1, stdin);
wri te(sock, buf, strlen(buf));
m emset(buf, 0, BUF_SIZE);
re ad(sock, buf, BUF_SIZE-1);
pr intf("%s\n",buf);
cl ose(sock);
retu rn 0;
}

#incl ude <stdio.h>
#incl ude <stdlib.h>
#incl ude <errno.h>
#incl ude <strings.h>
#inc lude <sys/types.h>
#inc lude <sys/socket.h>
#inc lude <netinet/in.h>

#defi ne BUF_SIZE 256

int m ain(int argc, char ** argv)
{
int sock, newsock, port, clen;
ch ar b uf[BUF_SIZE];
stru ct sock addr_in serv_addr, cli_addr;
if (argc < 2)
{
fpri ntf(stderr,"usage: %s <port_number>\n", argv[0]);
ret urn EXIT_FAILURE;
}
so ck = so cket(AF_INET, SOCK_STREAM, 0);
if (socket < 0)
{
pri ntf("socket() failed: %d\n", errno);
ret urn EXIT_FAILURE;
}
mem set((char *) &serv_addr, 0, sizeof(serv_addr));
port = atoi(argv[1]);
ser v_addr.sin_family = AF_INET;
se rv_addr.sin_addr.s_addr = INADDR_ANY;
se rv_addr.sin_port = htons(port);
if (bi nd(sock, (struct sockaddr *) &se rv_addr, sizeof(serv_addr)) < 0)
{
pri ntf("bind() failed: %d\n", errno);
return EXIT_FAILURE;
}
list en(sock, 1);
clen = sizeof(cli_addr);
new sock = acce pt(sock, (struct sockaddr *) &cli_addr, &clen);
if (newsock < 0)
{
pri ntf("accept() failed: %d\n", errno);
ret urn EXIT_FAILURE;
}
m emset(buf, 0, BUF_SIZE);
re ad(newsock, buf, BUF_SIZE-1);
buf[BUF_SIZE] = 0;
pr intf("MSG: %s\n", buf);
wr ite(newsock, "OK", 2);
clo se(newsock);
clo se(sock);
}

#inc lude <sys/types.h>
#inc lude <sys/socket.h>
#inclu de <stdlib.h>
#inclu de <stdio.h>
#incl ude <errno.h>

#def ine STR1 "How are you?"
#def ine STR2 "I'm ok, thank you."
#def ine BUF_SIZE 1024

int mai n(int argc, char ** argv)
{
int sockets[2];
char buf[BUF_SIZE];
int pid;

if (socketpair(AF_UNIX, SOCK_STREAM, 0, sockets) < 0)
{
per ror("socketpair() failed");
retu rn EXIT_FAILURE;
}
pid = fork();
if (pid != 0)
{
clo se(sockets[1]);
wr ite(sockets[0], STR1, sizeof(STR1));
re ad(sockets[0], buf, sizeof(buf));
pri ntf("%s\n", buf);
clo se(sockets[0]);
}
else
{
cl ose (sockets [0]);
rea d (sockets [1], buf, sizeof (buf));
pri ntf ("% s \ n", buf);
wri te (sockets [1], STR2, sizeof (STR2));
close (sockets [1]);
}
}


Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Running server side scripts using PHP as an example (LAMP)

Terms: Running server side scripts using PHP as an example (LAMP)