Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do TCP connection pooling in C/C++

I'm designing a distributed server/client system with C++, in which many clients send request to many servers through TCP and server throw a thread to handle the request and send back it's response. In my use case only limited number of clients will access the server and I need very high performance.The data sent from client and server are all small, but are very frequent. So creating a connection and tearing it down it after use is expensive. So I want to use connection caching to solve this problem: once connection created, it will be stored in a cache for future use.(Assume that the number of clients will not beyond the size of cache).

My question is:

  1. I saw someone said that connection pooling is a client side technique. If this connection pooling is only used in client side, then first time it make connection to a server, and send data. This action of making connection triggers the accept() function in server side which return a socket for receiving from client. So when client wants to use a existing connection(in cache), it doesn't make new connection, but just send data. The problem is, if no making connection, who would trigger accept() in server side and to throw a thread?
  2. If connection pooling also need to be implemented in server side, how can I know where a request come from? Since only from accept() I can get the client address, but meanwhile accept() already make a new socket for that request, so no point to use a cached connection.

Any answer and suggestion will be appreciated. Or any one can give me an example of connection pool or connection caching?

like image 934
Tony Avatar asked Mar 07 '12 10:03

Tony


1 Answers

I saw someone said that connection pooling is a client side technique. ... if no making connection, who would trigger accept() in server side and to throw a thread?

Firstly, connection pooling is not just a client-side technique; it's a connection-mode technique. It applies to both types of peer (the "server" and the "client").

Secondly, accept doesn't need to be called to start a thread. Programs can start threads for any reason they like... They could start threads just to start more threads, in a massively parallelised loop of thread creation. (edit: we call this a "fork bomb")

Finally, an efficient thread-pooling implementation won't start a thread for each client. Each thread typically occupies between 512KB-4MB (counting stack space and other context information), so if you have 10000 clients each occupying that much, that's a lot of wasted memory.

I want to do so, but just don't know how to do it in multithreading case.

You shouldn't use multithreading here... At least, not until you have a solution that uses a single thread, and you decide that it's not fast enough. At the moment you don't have that information; you're just guessing, and guessing doesn't guarantee optimisation.

At the turn of the century there were FTP servers that solved the C10K problem; they were able to handle 10000 clients at any given time, browsing, downloading or idling as users tend to do on FTP servers. They solved that problem not by using threads, but by using non-blocking and/or asynchronous sockets and/or calls.

To clarify, those web servers handled thousands of connections on a single thread! One typical way is to use select, but I'm not particularly fond of that method because it requires a rather ugly series of loops. I prefer to use ioctlsocket for Windows and fcntl for other POSIX OSes to set the file descriptor into non-blocking mode, e.g.:

#ifdef WIN32
ioctlsocket(fd, FIONBIO, (u_long[]){1});
#else
fcntl(fd, F_SETFL, fcntl(fd, F_GETFL, 0) | O_NONBLOCK);
#endif

At this point, recv and read won't block when operating on fd; if there's no data available, they'll return an error value immediately rather than waiting for data to arrive. That means you can loop on multiple sockets.

If connection pooling also need to be implemented in server side, how can I know where a request come from?

Store the client fd along-side its struct sockaddr_storage and any other stateful information you need to store about clients, in a struct that you declare however you feel. If this ends up being 4KB (which is a fairly large struct, usually about as large as they need to get) then 10000 of these will only occupy about 40000KB (~40MB). Even the mobile phones of today should have no problems handling that. Consider completing the following code for your needs:

struct client {
    struct sockaddr_storage addr;
    socklen_t addr_len;
    int fd;
    /* Other stateful information */
};

#define BUFFER_SIZE 4096
#define CLIENT_COUNT 10000

int main(void) {
    int server;
    struct client client[CLIENT_COUNT] = { 0 };
    size_t client_count = 0;
    /* XXX: Perform usual bind/listen */
    #ifdef WIN32
    ioctlsocket(server, FIONBIO, (u_long[]){1});
    #else
    fcntl(server, F_SETFL, fcntl(server, F_GETFL, 0) | O_NONBLOCK);
    #endif

    for (;;) {
        /* Accept connection if possible */
        if (client_count < sizeof client / sizeof *client) {
            struct sockaddr_storage addr = { 0 };
            socklen_t addr_len = sizeof addr;
            int fd = accept(server, &addr, &addr_len);
            if (fd != -1) {
#               ifdef WIN32
                ioctlsocket(fd, FIONBIO, (u_long[]){1});
#               else
                fcntl(fd, F_SETFL, fcntl(fd, F_GETFL, 0) | O_NONBLOCK);
#               endif
                client[client_count++] = (struct client) { .addr = addr
                                                         , .addr_len = addr_len
                                                         , .fd = fd };
            }
        }
        /* Loop through clients */
        char buffer[BUFFER_SIZE];
        for (size_t index = 0; index < client_count; index++) {
            ssize_t bytes_recvd = recv(client[index].fd, buffer, sizeof buffer, 0);
#           ifdef WIN32
            int closed = bytes_recvd == 0
                      || (bytes_recvd < 0 && WSAGetLastError() == WSAEWOULDBLOCK);
#           else
            int closed = bytes_recvd == 0
                      || (bytes_recvd < 0 && errno == EAGAIN) || errno == EWOULDBLOCK;
#           endif
            if (closed) {
                close(client[index].fd);
                client_count--;
                memmove(client + index, client + index + 1, (client_count - index) * sizeof client);
                continue;
            }
            /* XXX: Process buffer[0..bytes_recvd-1] */
        }

        sleep(0); /* This is necessary to pass control back to the kernel,
                   * so it can queue more data for us to process
                   */
    }
}

Supposing you want to pool connections on the client-side, the code would look very similar, except obviously there would be no need for the accept-related code. Supposing you have an array of clients that you want to connect, you could use non-blocking connect calls to perform all of the connections at once like this:

size_t index = 0, in_progress = 0;
for (;;) {
    if (client[index].fd == 0) {
        client[index].fd = socket(/* TODO */);
#       ifdef WIN32
        ioctlsocket(client[index].fd, FIONBIO, (u_long[]){1});
#       else
        fcntl(client[index].fd, F_SETFL, fcntl(client[index].fd, F_GETFL, 0) | O_NONBLOCK);
#       endif
    }
#   ifdef WIN32
    in_progress += connect(client[index].fd, (struct sockaddr *) &client[index].addr, client[index].addr_len) < 0
                && (WSAGetLastError() == WSAEALREADY
                ||  WSAGetLastError() == WSAEWOULDBLOCK
                ||  WSAGetLastError() == WSAEINVAL);
#   else
    in_progress += connect(client[index].fd, (struct sockaddr *) &client[index].addr, client[index].addr_len) < 0
                && (errno == EALREADY
                ||  errno == EINPROGRESS);
#   endif
    if (++index < sizeof client / sizeof *client) {
        continue;
    }
    index = 0;
    if (in_progress == 0) {
        break;
    }
    in_progress = 0;
}

As for optimisation, given that this should be able to handle 10000 clients with perhaps a few minor tweaks, you shouldn't need multiple threads.

Nonetheless, by associating items from a mutex collection with clients and preceding the non-blocking socket operation with a non-blocking pthread_mutex_trylock, the above loops could be adapted to run simultaneously in multiple threads whilst processing the same group of sockets. This provides a working model for all POSIX-compliant platforms, be it Windows, BSD or Linux, but it's not a perfectly optimal one. To achieve optimality, we must step into the asynchronous world, which varies from system to system:

  • Windows uses WSA* functions with call-backs.
  • BSD and Linux use the somewhat similar kqueue and epoll, respectively.

It may pay to codify that "non-blocking socket operation" abstraction mentioned earlier, as the two asynchronous mechanisms vary significantly in respect to their interface. Like everything else, unfortunately we must write abstractions so that our Windows-relevant code remains legible on POSIX-compliant systems. As a bonus, this'll allow us to mingle server-processing (i.e. accept and anything that follows) with client-processing (i.e. connect and anything that follows), so our server loop can become a client loop (or vice-versa).

like image 191
autistic Avatar answered Oct 06 '22 22:10

autistic