Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does TCP socket slow down if done in multiple system calls?

Why is the following code slow? And by slow I mean 100x-1000x slow. It just repeatedly performs read/write directly on a TCP socket. The curious part is that it remains slow only if I use two function calls for both read AND write as shown below. If I change either the server or the client code to use a single function call (as in the comments), it becomes super fast.

Code snippet:

int main(...) {
  int sock = ...; // open TCP socket
  int i;
  char buf[100000];
  for(i=0;i<2000;++i)
  { if(amServer)
    { write(sock,buf,10);
      // read(sock,buf,20);
      read(sock,buf,10);
      read(sock,buf,10);
    }else
    { read(sock,buf,10);
      // write(sock,buf,20);
      write(sock,buf,10);
      write(sock,buf,10);
    }
  }
  close(sock);
}

We stumbled on this in a larger program, that was actually using stdio buffering. It mysteriously became sluggish the moment payload size exceeded the buffer size by a small margin. Then I did some digging around with strace, and finally boiled the problem down to this. I can solve this by fooling around with buffering strategy, but I'd really like to know what on earth is going on here. On my machine, it goes from 0.030 s to over a minute on my machine (tested both locally and over remote machines) when I change the two read calls to a single call.

These tests were done on various Linux distros, and various kernel versions. Same result.

Fully runnable code with networking boilerplate:

#include <netdb.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <netinet/ip.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/tcp.h>

static int getsockaddr(const char* name,const char* port, struct sockaddr* res)
{
    struct addrinfo* list;
    if(getaddrinfo(name,port,NULL,&list) < 0) return -1;
    for(;list!=NULL && list->ai_family!=AF_INET;list=list->ai_next);
    if(!list) return -1;
    memcpy(res,list->ai_addr,list->ai_addrlen);
    freeaddrinfo(list);
    return 0;
}
// used as sock=tcpConnect(...); ...; close(sock);
static int tcpConnect(struct sockaddr_in* sa)
{
    int outsock;
    if((outsock=socket(AF_INET,SOCK_STREAM,0))<0) return -1;
    if(connect(outsock,(struct sockaddr*)sa,sizeof(*sa))<0) return -1;
    return outsock;
}
int tcpConnectTo(const char* server, const char* port)
{
    struct sockaddr_in sa;
    if(getsockaddr(server,port,(struct sockaddr*)&sa)<0) return -1;
    int sock=tcpConnect(&sa); if(sock<0) return -1;
    return sock;
}

int tcpListenAny(const char* portn)
{
    in_port_t port;
    int outsock;
    if(sscanf(portn,"%hu",&port)<1) return -1;
    if((outsock=socket(AF_INET,SOCK_STREAM,0))<0) return -1;
    int reuse = 1;
    if(setsockopt(outsock,SOL_SOCKET,SO_REUSEADDR,
              (const char*)&reuse,sizeof(reuse))<0) return fprintf(stderr,"setsockopt() failed\n"),-1;
    struct sockaddr_in sa = { .sin_family=AF_INET, .sin_port=htons(port)
                  , .sin_addr={INADDR_ANY} };
    if(bind(outsock,(struct sockaddr*)&sa,sizeof(sa))<0) return fprintf(stderr,"Bind failed\n"),-1;
    if(listen(outsock,SOMAXCONN)<0) return fprintf(stderr,"Listen failed\n"),-1;
    return outsock;
}

int tcpAccept(const char* port)
{
    int listenSock, sock;
    listenSock = tcpListenAny(port);
    if((sock=accept(listenSock,0,0))<0) return fprintf(stderr,"Accept failed\n"),-1;
    close(listenSock);
    return sock;
}

void writeLoop(int fd,const char* buf,size_t n)
{
    // Don't even bother incrementing buffer pointer
    while(n) n-=write(fd,buf,n);
}
void readLoop(int fd,char* buf,size_t n)
{
    while(n) n-=read(fd,buf,n);
}
int main(int argc,char* argv[])
{
    if(argc<3)
    { fprintf(stderr,"Usage: round {server_addr|--} port\n");
        return -1;
    }
    bool amServer = (strcmp("--",argv[1])==0);
    int sock;
    if(amServer) sock=tcpAccept(argv[2]);
    else sock=tcpConnectTo(argv[1],argv[2]);
    if(sock<0) { fprintf(stderr,"Connection failed\n"); return -1; }

    int i;
    char buf[100000] = { 0 };
    for(i=0;i<4000;++i)
    {
        if(amServer)
        { writeLoop(sock,buf,10);
            readLoop(sock,buf,20);
            //readLoop(sock,buf,10);
            //readLoop(sock,buf,10);
        }else
        { readLoop(sock,buf,10);
            writeLoop(sock,buf,20);
            //writeLoop(sock,buf,10);
            //writeLoop(sock,buf,10);
        }
    }

    close(sock);
    return 0;
}

EDIT: This version is slightly different from the other snippet in that it reads/writes in a loop. So in this version, two separate writes automatically causes two separate read() calls, even if readLoop is called only once. But otherwise the problem still remains.

like image 370
Samee Avatar asked Aug 28 '15 15:08

Samee


People also ask

Why TCP connection is slow?

The magnitude of these TCP network delays depends on hardware speed, the load of the network and server, the size of the request and response messages, and the distance between client and server. The delays also are significantly affected by technical intricacies of the TCP protocol.

Can a TCP socket have multiple connections?

Thanks :D. @premktiw: Yes, multiple client sockets can be bound to the same local IP/port pair at the same time, if they are connected to different server IP/Port pairs so the tuples of local+remote pairs are unique. And yes, it is possible for a client to have more than 64K concurrent connections total.

What happens when the TCP receiver buffer gets full?

As the receive buffer becomes full, new data cannot be accepted from the network for this socket and must be dropped, which indicates a congestion event to the transmitting node.

Why does TCP use two sockets?

The reason is that TCP has two different kinds of state that you want to control, whereas UDP has only one.


1 Answers

Interesting. You are being a victim of the Nagle's algorithm together with TCP delayed acknowledgements.

The Nagle's algorithm is a mechanism used in TCP to defer transmission of small segments until enough data has been accumulated that makes it worth building and sending a segment over the network. From the wikipedia article:

Nagle's algorithm works by combining a number of small outgoing messages, and sending them all at once. Specifically, as long as there is a sent packet for which the sender has received no acknowledgment, the sender should keep buffering its output until it has a full packet's worth of output, so that output can be sent all at once.

However, TCP typically employs something known as TCP delayed acknowledgements, which is a technique that consists of accumulating together a batch of ACK replies (because TCP uses cumulative ACKS), to reduce network traffic.

That wikipedia article further mentions this:

With both algorithms enabled, applications that do two successive writes to a TCP connection, followed by a read that will not be fulfilled until after the data from the second write has reached the destination, experience a constant delay of up to 500 milliseconds, the "ACK delay".

(Emphasis mine)

In your specific case, since the server doesn't send more data before reading the reply, the client is causing the delay: if the client writes twice, the second write will be delayed.

If Nagle's algorithm is being used by the sending party, data will be queued by the sender until an ACK is received. If the sender does not send enough data to fill the maximum segment size (for example, if it performs two small writes followed by a blocking read) then the transfer will pause up to the ACK delay timeout.

So, when the client makes 2 write calls, this is what happens:

  1. Client issues the first write.
  2. The server receives some data. It doesn't acknowledge it in the hope that more data will arrive (so it can batch up a bunch of ACKs in one single ACK).
  3. Client issues the second write. The previous write has not been acknowledged, so Nagle's algorithm defers transmission until more data arrives (until enough data has been collected to make a segment) or the previous write is ACKed.
  4. Server is tired of waiting and after 500 ms acknowledges the segment.
  5. Client finally completes the 2nd write.

With 1 write, this is what happens:

  1. Client issues the first write.
  2. The server receives some data. It doesn't acknowledge it in the hope that more data will arrive (so it can batch up a bunch of ACKs in one single ACK).
  3. The server writes to the socket. An ACK is part of the TCP header, so if you're writing, you might as well acknowledge the previous segment at no extra cost. Do it.
  4. Meanwhile, the client wrote once, so it was already waiting on the next read - there was no 2nd write waiting for the server's ACK.

If you want to keep writing twice on the client side, you need to disable the Nagle's algorithm. This is the solution proposed by the algorithm author himself:

The user-level solution is to avoid write-write-read sequences on sockets. write-read-write-read is fine. write-write-write is fine. But write-write-read is a killer. So, if you can, buffer up your little writes to TCP and send them all at once. Using the standard UNIX I/O package and flushing write before each read usually works.

(See the citation on Wikipedia)

As mentioned by David Schwartz in the comments, this may not be the greatest idea for various reasons, but it illustrates the point and shows that this is indeed causing the delay.

To disable it, you need to set the TCP_NODELAY option on the sockets with setsockopt(2).

This can be done in tcpConnectTo() for the client:

int tcpConnectTo(const char* server, const char* port)
{
    struct sockaddr_in sa;
    if(getsockaddr(server,port,(struct sockaddr*)&sa)<0) return -1;
    int sock=tcpConnect(&sa); if(sock<0) return -1;

    int val = 1;
    if (setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &val, sizeof(val)) < 0)
        perror("setsockopt(2) error");

    return sock;
}

And in tcpAccept() for the server:

int tcpAccept(const char* port)
{
    int listenSock, sock;
    listenSock = tcpListenAny(port);
    if((sock=accept(listenSock,0,0))<0) return fprintf(stderr,"Accept failed\n"),-1;
    close(listenSock);

    int val = 1;
    if (setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &val, sizeof(val)) < 0)
        perror("setsockopt(2) error");

    return sock;
}

It's interesting to see the huge difference this makes.

If you'd rather not mess with the socket options, it's enough to ensure that the client writes once - and only once - before the next read. You can still have the server read twice:

for(i=0;i<4000;++i)
{
    if(amServer)
    { writeLoop(sock,buf,10);
        //readLoop(sock,buf,20);
        readLoop(sock,buf,10);
        readLoop(sock,buf,10);
    }else
    { readLoop(sock,buf,10);
        writeLoop(sock,buf,20);
        //writeLoop(sock,buf,10);
        //writeLoop(sock,buf,10);
    }
}
like image 91
Filipe Gonçalves Avatar answered Oct 04 '22 22:10

Filipe Gonçalves