select / poll / epoll: the practical difference

When designing high-performance network applications with non-blocking sockets, it is important to decide which method of monitoring network events we will use. There are several of them and each is good and bad in his own way. Choosing the right method can be critical to your application architecture.

In this article we will look at:

select ()
poll ()
epoll ()
libevent

Using select ()

The old, proven over the years hard worker select () was created back in the days when "sockets" were called " Berkeley sockets ". This method was not included in the very first specification of the Berkeley sockets themselves, because at that time there was no concept of non-blocking I / O. But somewhere in the 80s it appeared, and along with it select (). Since then, nothing has changed in its interface.

To use select (), the developer needs to initialize and populate several fd_set structures with descriptors and events that need to be monitored, and then call select (). A typical code looks like this:

fd_set fd_in, fd_out; struct timeval tv; //   FD_ZERO( &fd_in ); FD_ZERO( &fd_out ); //        sock1 FD_SET( sock1, &fd_in ); //        sock2 FD_SET( sock2, &fd_out ); //       (select   ) int largest_sock = sock1 > sock2 ? sock1 : sock2; //    10  tv.tv_sec = 10; tv.tv_usec = 0; //  select int ret = select( largest_sock + 1, &fd_in, &fd_out, NULL, &tv ); //    if ( ret == -1 ) //  else if ( ret == 0 ) // ,    else { if ( FD_ISSET( sock1, &fd_in ) ) //    sock1 if ( FD_ISSET( sock2, &fd_out ) ) //    sock2 }

When select () was designed, no one probably expected that in the future we would need to write multi-threaded applications serving thousands of connections. In select () there are several significant flaws that make it poorly suited to work in such systems. The main ones are as follows:

select modifies the fd_sets structures passed to it, so that none of them can be reused. Even if you do not need to change anything (for example, having received a portion of data, you want to get more), you will have to reinitialize the fd_sets structures. Well, or copy from a previously saved backup using FD_COPY. And this will have to be done again and again, before each select call.
To find out which particular descriptor generated the event, you will have to manually poll them all with FD_ISSET. When you monitor 2000 descriptors, and an event only happens for one of them (which, according to the law of meanness, will be the last in the list), you will waste a lot of processor resources.
Did I just mention 2000 handles? This I got excited. select does not support so much. Well, at least on ordinary Linux, with a normal kernel. The maximum number of simultaneously observed descriptors is limited by the FD_SETSIZE constant, which is strictly equal to 1024 in Linux. Some operating systems allow you to implement a hack with overriding the FD_SETSIZE value before including the sys / select.h header file, but this hack is not part of any common standard. The same Linux will ignore it.
You cannot work with descriptors from the observed set from another thread. Imagine a thread executing the above code. Here it starts and waits for events in its select (). Now imagine that you have another stream that monitors the overall load on the system, and now he decided that the data from the sock1 socket had not come for too long and it would be time to break the connection. Since this socket can be reused to serve new customers, it would be good to close it correctly. But the first thread is watching including this descriptor right now. What will happen if we close it all the same? Oh, the documentation has the answer to this question and you don’t like it: “If the descriptor observed with select () is closed by another thread, you will get undefined behavior”.
The same problem appears when trying to send some data through sock1. We will not send anything until select completes its work.
The choice of events that we can monitor is rather limited. For example, to determine that a remote socket was closed, you should, first, monitor data arrival events on it, and second, make an attempt to read this data (read returns 0 for a closed socket). This can still be called acceptable when reading data from a socket (read 0 - the socket is closed), but what if our current task at the moment is to send data to this socket and no longer need to read data from it?
select imposes on you the excessive burden of calculating the “greatest descriptor” and passing it as a separate parameter.

Of course, all of the above is not news. The developers of operating systems have long realized these problems and many of them were taken into account when designing the poll method. In this place you may ask, why do we even study ancient history now and are there any reasons today to use ancient select? Yes, there are two such reasons. Not the fact that they will be useful to you once, but why not learn about them.

The first reason is portability. select () has been with us for a million years. In whatever jungle of software and hardware platforms you are not brought, if there is a network there, there will be a select. There may not be any other methods, but select will be almost guaranteed. And do not think that I am now falling into senility and remembering something like punched cards and ENIAC, no. There is no more modern poll method , for example, in Windows XP . But select is.

The second reason is more exotic and is related to the fact that select can (theoretically) work with timeouts on the order of one nanosecond (if the hardware allows), while both poll and epoll support only millisecond accuracy. This should not play a special role on ordinary desktops (or even servers), where you still do not have a nanosecond precision hardware timer. But still there are real-time systems in the world that have such timers. So I beg you, when you write the firmware of a nuclear reactor or rocket - do not be lazy to measure the time to nanoseconds. You know, I want to live.

The case described above is probably the only one in which you really have no choice what to use (only selects are appropriate). However, if you write a normal application to work on ordinary hardware, and you will operate with an adequate number of sockets (tens, hundreds - and no more), the difference in the performance of poll and select will not be noticeable, so the choice will be based on other factors.

Poll with poll ()

poll is a new method for polling sockets, created after people started trying to write large and high-loaded network services. It is designed much better and does not suffer from most of the shortcomings of the select method. In most cases, when writing modern applications, you will choose between using poll and epoll / libevent.

To use poll, the developer needs to initialize the members of the pollfd structure with observable handles and events, and then call poll ().
A typical code looks like this:

 //   struct pollfd fds[2]; //  sock1      fds[0].fd = sock1; fds[0].events = POLLIN; //   sock2 -  fds[1].fd = sock2; fds[1].events = POLLOUT; //   10  int ret = poll( &fds, 2, 10000 ); //    if ( ret == -1 ) //  else if ( ret == 0 ) // ,    else { //  ,  revents      if ( pfd[0].revents & POLLIN ) pfd[0].revents = 0; //     sock1 if ( pfd[1].revents & POLLOUT ) pfd[1].revents = 0; //     sock2 }

Poll was created to solve the problems of the select method, let's see how it did it:

There is no limit on the number of observed descriptors; more than 1024 items can be monitored.
The pollfd structure is not modified, which makes it possible to reuse it between poll () calls — all you need to do is reset the revents field.
Observed events are better structured. For example, you can define to disable a remote client without having to read data from a socket.

We already talked about the shortcomings of the poll method above: it is not on some platforms, like Windows XP. Starting with Vista, it exists, but is called WSAPoll. The prototype is the same, so for platform-independent code you can write an override, like:

 #if defined (WIN32) static inline int poll( struct pollfd *pfd, int nfds, int timeout) { return WSAPoll ( pfd, nfds, timeout ); } #endif

Well, the accuracy of timeouts in 1 ms, which is very rarely enough. However, poll has other disadvantages:

As with the use of select, it is impossible to determine which particular descriptors generated events without a full pass through all the observed structures and checking the field of revents in them. Worse, it is also implemented in the OS kernel.
As with select, there is no way to dynamically change the observed set of events.

However, all of the above can be considered relatively irrelevant for most client applications. The only exceptions are probably only the p2p protocols, where each client can be connected with thousands of others. These problems can be ignored even by most server applications. Thus, poll should be your default preference to select, unless one of the above two causes limits you.

Looking ahead, I would say that poll is preferable even compared to the more modern epoll (discussed below) in the following cases:

You want to write cross-platform code (epoll is only in Linux)
You do not need to monitor more than 1000 sockets (epoll will not give you anything significant in this case)
You need to monitor more than 1000 sockets, but the connection time with each of them is very small (in these cases, the performance of poll and epoll will be very close - the gain from waiting for a smaller number of events in the epoll will cross out the overhead of adding / deleting them)
Your application is not designed to change events from one thread while another is waiting for them (or you do not need it)

Polling with epoll ()

epoll is the latest and best way to wait for events on Linux (and only on Linux). Well, it's not that straightforward "newest" - it has been in the core since 2002. It differs from poll and select in that it provides an API for adding / deleting / modifying a list of observed descriptors and events.

Using epoll requires a bit more thorough preparation. The developer should:

Create an epoll descriptor by calling epoll_create
Initialize the epoll_event structure with required events and pointers to connection contexts. The "context" here can be anything, epoll simply passes this value in the returned events.
Call epoll_ctl (... EPOLL_CTL_ADD) to add a descriptor to the list of observed
Call epoll_wait () to wait for events (we indicate how many events we want to receive at one time, for example, 20). In contrast to the previous methods, we will receive these events separately, and not in the properties of the input structures. If we observe 200 descriptors and 5 of them received new data - epoll_wait will return only 5 events. If 50 events occur - the first 20 will be returned to us, and the remaining 30 will wait for the next call, they will not be lost.
Handle received events. This will be a relatively fast processing, because we are not looking at those descriptors where nothing happened.

A typical code looks like this:

 //   epoll.       ,      //    (    ,   ),        int pollingfd = epoll_create( 0xCAFE ); if ( pollingfd < 0 ) //  //   epoll_event struct epoll_event ev = { 0 }; //     .    ,   // epoll     . , ,       ev.data.ptr = pConnection1; //    ,     ev.events = EPOLLIN | EPOLLONESHOT; //     .        //      epoll_wait -    if ( epoll_ctl( epollfd, EPOLL_CTL_ADD, pConnection1->getSocket(), &ev ) != 0 ) // report error //       20    struct epoll_event pevents[ 20 ]; //  10  int ready = epoll_wait( pollingfd, pevents, 20, 10000 ); //    if ( ret == -1 ) //  else if ( ret == 0 ) // ,    else { //     for ( int i = 0; i < ret; i++ ) { if ( pevents[i].events & EPOLLIN ) { //        ,   Connection * c = (Connection*) pevents[i].data.ptr; c->handleReadEvent(); } } }

Let's start with the shortcomings of epoll - they are obvious from the code. This method is more difficult to use, you need to write more code, it makes more system calls.

The advantages are also evident:

epoll returns a list of only those descriptors for which the observed events actually occurred. There is no need to look at thousands of structures in search of the one, possibly the one, where the expected event worked.
You can associate some meaningful context with each observed event. In the example above, we used a pointer to the object of the connection class for this — it saved us another potential search in the connection array.
You can add or remove sockets from the list at any time. You can even modify the observed events. Everything will work correctly, it is officially supported and documented.
You can start multiple threads waiting for events from the same queue using epoll_wait. Something that in no way can be done with select / poll.

But it must also be remembered that epoll is not “improved poll in everything”. It also has disadvantages compared to poll:

Changing event flags (for example, switching from READ to WRITE) requires an extra system call epoll_ctl, while for poll you simply change the bit mask (completely in user mode). Switching 5000 sockets from read to write will require 5,000 system calls and context switches for epoll, while for poll it will be a trivial bit operation in a loop.
For each new connection you have to call accept () and epoll_ctl () - these are two system calls. In the case of using poll, there will be only one call. With a very short connection life, this can make a difference.
epoll is only in Linux. In other operating systems there are similar mechanisms, but still not completely identical. You will not be able to write code with epoll so that it gets together and runs, for example, on FreeBSD.
Writing high-loaded parallel code is hard. Many applications do not need such a fundamental approach, since their load level is easily handled by simpler methods.

Thus, epoll should be used only when all of the following is fulfilled:

Your application uses a thread pool to handle network connections. The gain from epoll in a single-threaded application will be negligible, and you should not bother with the implementation.
You expect a relatively large number of connections (from 1000 and above). On a small number of observed sockets epoll will not give a performance boost, and if there are literally several sockets, it may even slow down.
Your connections live relatively long. In a situation where a new connection transmits literally several bytes of data and then closes immediately - poll will work faster, because it will need to make fewer system calls for processing.
You intend to run your code on Linux and only on Linux.

If one or more items are not met, consider using poll or libevent.

libevent

libevent is a library that wraps the polling methods listed in this article (as well as some others) into a unified API. The advantage here is that once you have written the code, you can build and run it on different operating systems. Nevertheless, it is important to understand that libevent is just a wrapper, within which all the same methods listed above work, with all their advantages and disadvantages. libevent does not force select to listen to more than 1024 sockets, and epoll to modify the list of events without an additional system call. So knowing the underlying technology is still important.

The need to support different polling methods makes the libevent library API more complex. Still, its use is simpler than manually writing two different event selection engines for, for example, Linux and FreeBSD (using epoll and kqueue).

Consider using libevent when combining two events:

You looked at the select and poll methods and they definitely didn’t fit you
You need to support multiple OS.

Source: https://habr.com/ru/post/415259/

All Articles

select / poll / epoll: the practical difference

Using select ()

Poll with poll ()

Polling with epoll ()

libevent

More articles: