Why is the following code slow? And by slow I mean 100x-1000x slow. It just repeatedly performs read/write directly on a TCP socket. The curious part is that it remains slow only if I use two function calls for both read AND write as shown below. If I change either the server or the client code to use a single function call (as in the comments), it becomes super fast. Code snippet: <pre class="prettyprint"><code>int main(...) { int sock = ...; // open TCP socket int i; char buf[100000]; for(i=0;i<2000;++i) { if(amServer) { write(sock,buf,10); // read(sock,buf,20); read(sock,buf,10); read(sock,buf,10); }else { read(sock,buf,10); // write(sock,buf,20); write(sock,buf,10); write(sock,buf,10); } } close(sock); } </code></pre> We stumbled on this in a larger program, that was actually using stdio buffering. It mysteriously became sluggish the moment payload size exceeded the buffer size by a small margin. Then I did some digging around with <code>strace</code>, and finally boiled the problem down to this. I can solve this by fooling around with buffering strategy, but I'd really like to know what on earth is going on here. On my machine, it goes from 0.030 s to over a minute on my machine (tested both locally and over remote machines) when I change the two read calls to a single call. These tests were done on various Linux distros, and various kernel versions. Same result. Fully runnable code with networking boilerplate: <pre class="prettyprint"><code>#include <netdb.h> #include <stdbool.h> #include <stdio.h> #include <string.h> #include <unistd.h> #include <netinet/ip.h> #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #include <netinet/tcp.h> static int getsockaddr(const char* name,const char* port, struct sockaddr* res) { struct addrinfo* list; if(getaddrinfo(name,port,NULL,&list) < 0) return -1; for(;list!=NULL && list->ai_family!=AF_INET;list=list->ai_next); if(!list) return -1; memcpy(res,list->ai_addr,list->ai_addrlen); freeaddrinfo(list); return 0; } // used as sock=tcpConnect(...); ...; close(sock); static int tcpConnect(struct sockaddr_in* sa) { int outsock; if((outsock=socket(AF_INET,SOCK_STREAM,0))<0) return -1; if(connect(outsock,(struct sockaddr*)sa,sizeof(*sa))<0) return -1; return outsock; } int tcpConnectTo(const char* server, const char* port) { struct sockaddr_in sa; if(getsockaddr(server,port,(struct sockaddr*)&sa)<0) return -1; int sock=tcpConnect(&sa); if(sock<0) return -1; return sock; } int tcpListenAny(const char* portn) { in_port_t port; int outsock; if(sscanf(portn,"%hu",&port)<1) return -1; if((outsock=socket(AF_INET,SOCK_STREAM,0))<0) return -1; int reuse = 1; if(setsockopt(outsock,SOL_SOCKET,SO_REUSEADDR, (const char*)&reuse,sizeof(reuse))<0) return fprintf(stderr,"setsockopt() failed\n"),-1; struct sockaddr_in sa = { .sin_family=AF_INET, .sin_port=htons(port) , .sin_addr={INADDR_ANY} }; if(bind(outsock,(struct sockaddr*)&sa,sizeof(sa))<0) return fprintf(stderr,"Bind failed\n"),-1; if(listen(outsock,SOMAXCONN)<0) return fprintf(stderr,"Listen failed\n"),-1; return outsock; } int tcpAccept(const char* port) { int listenSock, sock; listenSock = tcpListenAny(port); if((sock=accept(listenSock,0,0))<0) return fprintf(stderr,"Accept failed\n"),-1; close(listenSock); return sock; } void writeLoop(int fd,const char* buf,size_t n) { // Don't even bother incrementing buffer pointer while(n) n-=write(fd,buf,n); } void readLoop(int fd,char* buf,size_t n) { while(n) n-=read(fd,buf,n); } int main(int argc,char* argv[]) { if(argc<3) { fprintf(stderr,"Usage: round {server_addr|--} port\n"); return -1; } bool amServer = (strcmp("--",argv[1])==0); int sock; if(amServer) sock=tcpAccept(argv[2]); else sock=tcpConnectTo(argv[1],argv[2]); if(sock<0) { fprintf(stderr,"Connection failed\n"); return -1; } int i; char buf[100000] = { 0 }; for(i=0;i<4000;++i) { if(amServer) { writeLoop(sock,buf,10); readLoop(sock,buf,20); //readLoop(sock,buf,10); //readLoop(sock,buf,10); }else { readLoop(sock,buf,10); writeLoop(sock,buf,20); //writeLoop(sock,buf,10); //writeLoop(sock,buf,10); } } close(sock); return 0; } </code></pre> EDIT: This version is slightly different from the other snippet in that it reads/writes in a loop. So in this version, two separate writes automatically causes two separate <code>read()</code> calls, even if <code>readLoop</code> is called only once. But otherwise the problem still remains.

Interesting. You are being a victim of the Nagle's algorithm together with TCP delayed acknowledgements. The Nagle's algorithm is a mechanism used in TCP to defer transmission of small segments until enough data has been accumulated that makes it worth building and sending a segment over the network. From the wikipedia article: <blockquote> Nagle's algorithm works by combining a number of small outgoing messages, and sending them all at once. Specifically, as long as there is a sent packet for which the sender has received no acknowledgment, the sender should keep buffering its output until it has a full packet's worth of output, so that output can be sent all at once. </blockquote> However, TCP typically employs something known as TCP delayed acknowledgements, which is a technique that consists of accumulating together a batch of ACK replies (because TCP uses cumulative ACKS), to reduce network traffic. That wikipedia article further mentions this: <blockquote> With both algorithms enabled, applications that do two successive writes to a TCP connection, followed by a read that will not be fulfilled until after the data from the second write has reached the destination, experience a constant delay of up to 500 milliseconds, the "ACK delay". </blockquote> (Emphasis mine) In your specific case, since the server doesn't send more data before reading the reply, the client is causing the delay: if the client writes twice, the second write will be delayed. <blockquote> If Nagle's algorithm is being used by the sending party, data will be queued by the sender until an ACK is received. If the sender does not send enough data to fill the maximum segment size (for example, if it performs two small writes followed by a blocking read) then the transfer will pause up to the ACK delay timeout. </blockquote> So, when the client makes 2 write calls, this is what happens: <ol> <li>Client issues the first write.</li> <li>The server receives some data. It doesn't acknowledge it in the hope that more data will arrive (so it can batch up a bunch of ACKs in one single ACK).</li> <li>Client issues the second write. The previous write has not been acknowledged, so Nagle's algorithm defers transmission until more data arrives (until enough data has been collected to make a segment) or the previous write is ACKed.</li> <li>Server is tired of waiting and after 500 ms acknowledges the segment.</li> <li>Client finally completes the 2nd write.</li> </ol> With 1 write, this is what happens: <ol> <li>Client issues the first write.</li> <li>The server receives some data. It doesn't acknowledge it in the hope that more data will arrive (so it can batch up a bunch of ACKs in one single ACK).</li> <li>The server writes to the socket. An <code>ACK</code> is part of the TCP header, so if you're writing, you might as well acknowledge the previous segment at no extra cost. Do it.</li> <li>Meanwhile, the client wrote once, so it was already waiting on the next read - there was no 2nd write waiting for the server's ACK.</li> </ol> If you want to keep writing twice on the client side, you need to disable the Nagle's algorithm. This is the solution proposed by the algorithm author himself: <blockquote> The user-level solution is to avoid write-write-read sequences on sockets. write-read-write-read is fine. write-write-write is fine. But write-write-read is a killer. So, if you can, buffer up your little writes to TCP and send them all at once. Using the standard UNIX I/O package and flushing write before each read usually works. </blockquote> (See the citation on Wikipedia) As mentioned by David Schwartz in the comments, this may not be the greatest idea for various reasons, but it illustrates the point and shows that this is indeed causing the delay. To disable it, you need to set the <code>TCP_NODELAY</code> option on the sockets with <code>setsockopt(2)</code>. This can be done in <code>tcpConnectTo()</code> for the client: <pre class="prettyprint"><code>int tcpConnectTo(const char* server, const char* port) { struct sockaddr_in sa; if(getsockaddr(server,port,(struct sockaddr*)&sa)<0) return -1; int sock=tcpConnect(&sa); if(sock<0) return -1; int val = 1; if (setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &val, sizeof(val)) < 0) perror("setsockopt(2) error"); return sock; } </code></pre> And in <code>tcpAccept()</code> for the server: <pre class="prettyprint"><code>int tcpAccept(const char* port) { int listenSock, sock; listenSock = tcpListenAny(port); if((sock=accept(listenSock,0,0))<0) return fprintf(stderr,"Accept failed\n"),-1; close(listenSock); int val = 1; if (setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &val, sizeof(val)) < 0) perror("setsockopt(2) error"); return sock; } </code></pre> It's interesting to see the huge difference this makes. If you'd rather not mess with the socket options, it's enough to ensure that the client writes once - and only once - before the next read. You can still have the server read twice: <pre class="prettyprint"><code>for(i=0;i<4000;++i) { if(amServer) { writeLoop(sock,buf,10); //readLoop(sock,buf,20); readLoop(sock,buf,10); readLoop(sock,buf,10); }else { readLoop(sock,buf,10); writeLoop(sock,buf,20); //writeLoop(sock,buf,10); //writeLoop(sock,buf,10); } } </code></pre>

Why does TCP socket slow down if done in multiple system calls?

Tags:

performance

c

linux

tcp

sockets

Why is the following code slow? And by slow I mean 100x-1000x slow. It just repeatedly performs read/write directly on a TCP socket. The curious part is that it remains slow only if I use two function calls for both read AND write as shown below. If I change either the server or the client code to use a single function call (as in the comments), it becomes super fast.

Code snippet:

int main(...) {
  int sock = ...; // open TCP socket
  int i;
  char buf[100000];
  for(i=0;i<2000;++i)
  { if(amServer)
    { write(sock,buf,10);
      // read(sock,buf,20);
      read(sock,buf,10);
      read(sock,buf,10);
    }else
    { read(sock,buf,10);
      // write(sock,buf,20);
      write(sock,buf,10);
      write(sock,buf,10);
    }
  }
  close(sock);
}

We stumbled on this in a larger program, that was actually using stdio buffering. It mysteriously became sluggish the moment payload size exceeded the buffer size by a small margin. Then I did some digging around with strace, and finally boiled the problem down to this. I can solve this by fooling around with buffering strategy, but I'd really like to know what on earth is going on here. On my machine, it goes from 0.030 s to over a minute on my machine (tested both locally and over remote machines) when I change the two read calls to a single call.

These tests were done on various Linux distros, and various kernel versions. Same result.

Fully runnable code with networking boilerplate:

#include <netdb.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <netinet/ip.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/tcp.h>

static int getsockaddr(const char* name,const char* port, struct sockaddr* res)
{
    struct addrinfo* list;
    if(getaddrinfo(name,port,NULL,&list) < 0) return -1;
    for(;list!=NULL && list->ai_family!=AF_INET;list=list->ai_next);
    if(!list) return -1;
    memcpy(res,list->ai_addr,list->ai_addrlen);
    freeaddrinfo(list);
    return 0;
}
// used as sock=tcpConnect(...); ...; close(sock);
static int tcpConnect(struct sockaddr_in* sa)
{
    int outsock;
    if((outsock=socket(AF_INET,SOCK_STREAM,0))<0) return -1;
    if(connect(outsock,(struct sockaddr*)sa,sizeof(*sa))<0) return -1;
    return outsock;
}
int tcpConnectTo(const char* server, const char* port)
{
    struct sockaddr_in sa;
    if(getsockaddr(server,port,(struct sockaddr*)&sa)<0) return -1;
    int sock=tcpConnect(&sa); if(sock<0) return -1;
    return sock;
}

int tcpListenAny(const char* portn)
{
    in_port_t port;
    int outsock;
    if(sscanf(portn,"%hu",&port)<1) return -1;
    if((outsock=socket(AF_INET,SOCK_STREAM,0))<0) return -1;
    int reuse = 1;
    if(setsockopt(outsock,SOL_SOCKET,SO_REUSEADDR,
              (const char*)&reuse,sizeof(reuse))<0) return fprintf(stderr,"setsockopt() failed\n"),-1;
    struct sockaddr_in sa = { .sin_family=AF_INET, .sin_port=htons(port)
                  , .sin_addr={INADDR_ANY} };
    if(bind(outsock,(struct sockaddr*)&sa,sizeof(sa))<0) return fprintf(stderr,"Bind failed\n"),-1;
    if(listen(outsock,SOMAXCONN)<0) return fprintf(stderr,"Listen failed\n"),-1;
    return outsock;
}

int tcpAccept(const char* port)
{
    int listenSock, sock;
    listenSock = tcpListenAny(port);
    if((sock=accept(listenSock,0,0))<0) return fprintf(stderr,"Accept failed\n"),-1;
    close(listenSock);
    return sock;
}

void writeLoop(int fd,const char* buf,size_t n)
{
    // Don't even bother incrementing buffer pointer
    while(n) n-=write(fd,buf,n);
}
void readLoop(int fd,char* buf,size_t n)
{
    while(n) n-=read(fd,buf,n);
}
int main(int argc,char* argv[])
{
    if(argc<3)
    { fprintf(stderr,"Usage: round {server_addr|--} port\n");
        return -1;
    }
    bool amServer = (strcmp("--",argv[1])==0);
    int sock;
    if(amServer) sock=tcpAccept(argv[2]);
    else sock=tcpConnectTo(argv[1],argv[2]);
    if(sock<0) { fprintf(stderr,"Connection failed\n"); return -1; }

    int i;
    char buf[100000] = { 0 };
    for(i=0;i<4000;++i)
    {
        if(amServer)
        { writeLoop(sock,buf,10);
            readLoop(sock,buf,20);
            //readLoop(sock,buf,10);
            //readLoop(sock,buf,10);
        }else
        { readLoop(sock,buf,10);
            writeLoop(sock,buf,20);
            //writeLoop(sock,buf,10);
            //writeLoop(sock,buf,10);
        }
    }

    close(sock);
    return 0;
}

EDIT: This version is slightly different from the other snippet in that it reads/writes in a loop. So in this version, two separate writes automatically causes two separate read() calls, even if readLoop is called only once. But otherwise the problem still remains.

370

asked Aug 28 '15 15:08

Samee

1 Answers

Interesting. You are being a victim of the Nagle's algorithm together with TCP delayed acknowledgements.

The Nagle's algorithm is a mechanism used in TCP to defer transmission of small segments until enough data has been accumulated that makes it worth building and sending a segment over the network. From the wikipedia article:

Nagle's algorithm works by combining a number of small outgoing messages, and sending them all at once. Specifically, as long as there is a sent packet for which the sender has received no acknowledgment, the sender should keep buffering its output until it has a full packet's worth of output, so that output can be sent all at once.

However, TCP typically employs something known as TCP delayed acknowledgements, which is a technique that consists of accumulating together a batch of ACK replies (because TCP uses cumulative ACKS), to reduce network traffic.

That wikipedia article further mentions this:

With both algorithms enabled, applications that do two successive writes to a TCP connection, followed by a read that will not be fulfilled until after the data from the second write has reached the destination, experience a constant delay of up to 500 milliseconds, the "ACK delay".

(Emphasis mine)

In your specific case, since the server doesn't send more data before reading the reply, the client is causing the delay: if the client writes twice, the second write will be delayed.

If Nagle's algorithm is being used by the sending party, data will be queued by the sender until an ACK is received. If the sender does not send enough data to fill the maximum segment size (for example, if it performs two small writes followed by a blocking read) then the transfer will pause up to the ACK delay timeout.

So, when the client makes 2 write calls, this is what happens:

Client issues the first write.
The server receives some data. It doesn't acknowledge it in the hope that more data will arrive (so it can batch up a bunch of ACKs in one single ACK).
Client issues the second write. The previous write has not been acknowledged, so Nagle's algorithm defers transmission until more data arrives (until enough data has been collected to make a segment) or the previous write is ACKed.
Server is tired of waiting and after 500 ms acknowledges the segment.
Client finally completes the 2nd write.

With 1 write, this is what happens:

Client issues the first write.
The server receives some data. It doesn't acknowledge it in the hope that more data will arrive (so it can batch up a bunch of ACKs in one single ACK).
The server writes to the socket. An ACK is part of the TCP header, so if you're writing, you might as well acknowledge the previous segment at no extra cost. Do it.
Meanwhile, the client wrote once, so it was already waiting on the next read - there was no 2nd write waiting for the server's ACK.

If you want to keep writing twice on the client side, you need to disable the Nagle's algorithm. This is the solution proposed by the algorithm author himself:

The user-level solution is to avoid write-write-read sequences on sockets. write-read-write-read is fine. write-write-write is fine. But write-write-read is a killer. So, if you can, buffer up your little writes to TCP and send them all at once. Using the standard UNIX I/O package and flushing write before each read usually works.

(See the citation on Wikipedia)

As mentioned by David Schwartz in the comments, this may not be the greatest idea for various reasons, but it illustrates the point and shows that this is indeed causing the delay.

To disable it, you need to set the TCP_NODELAY option on the sockets with setsockopt(2).

This can be done in tcpConnectTo() for the client:

int tcpConnectTo(const char* server, const char* port)
{
    struct sockaddr_in sa;
    if(getsockaddr(server,port,(struct sockaddr*)&sa)<0) return -1;
    int sock=tcpConnect(&sa); if(sock<0) return -1;

    int val = 1;
    if (setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &val, sizeof(val)) < 0)
        perror("setsockopt(2) error");

    return sock;
}

And in tcpAccept() for the server:

int tcpAccept(const char* port)
{
    int listenSock, sock;
    listenSock = tcpListenAny(port);
    if((sock=accept(listenSock,0,0))<0) return fprintf(stderr,"Accept failed\n"),-1;
    close(listenSock);

    int val = 1;
    if (setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &val, sizeof(val)) < 0)
        perror("setsockopt(2) error");

    return sock;
}

It's interesting to see the huge difference this makes.

If you'd rather not mess with the socket options, it's enough to ensure that the client writes once - and only once - before the next read. You can still have the server read twice:

for(i=0;i<4000;++i)
{
    if(amServer)
    { writeLoop(sock,buf,10);
        //readLoop(sock,buf,20);
        readLoop(sock,buf,10);
        readLoop(sock,buf,10);
    }else
    { readLoop(sock,buf,10);
        writeLoop(sock,buf,20);
        //writeLoop(sock,buf,10);
        //writeLoop(sock,buf,10);
    }
}

answered Oct 04 '22 22:10

Filipe Gonçalves

Related questions
                            
                                How do import libraries work and why doesn't MinGW need them?
                            
                                Is there any performance difference in using int versus int8_t
                            
                                GDB and trouble with core dumps
                            
                                What is the point of VkApplicationInfo?
                            
                                Why does gcc compile f(1199) and f(1200) differently?
                            
                                Which variable types/sizes are atomic on STM32 microcontrollers?
                            
                                Possible to build a shared library with static link used library?
                            
                                C 64-bit Pointer Alignment
                            
                                What's the best way to do a lookup table in C?
                            
                                open and fopen function [duplicate]
                            
                                How to use strtok in C properly so there is no memory leak?
                            
                                How to cross-compile for MIPS?
                            
                                Are datagrams always received completely?
                            
                                C - How to suppress a sub function's output?
                            
                                Linux Kernel - why a function's address in System.map is one byte preceding its address as seen in real time?
                            
                                Why getppid() from the child return 1
                            
                                why regexec() in posix c always return the first match,how can it return all match positions only run once?
                            
                                What's the usage of Mcrt1.o and Scrt1.o?
                            
                                Linux C programming execute as user
                            
                                How does this code work to count number of 1-bits?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With