I remade this post because my title choice was horrible, sorry about that. My new post can be found here: After sending a lot, my send() call causes my program to stall completely. How is this possible?
Thank you very much everyone. The problem was that the clients are actually bots and they never read from the connections. (Feels foolish)
TCP_NODELAY
might help latency of small packets from sender to receiver, but the description you gave points into different direction. I can imagine the following:
SO_SNDBUF
) and causes the server process to appear "stuck" in the send(2)
system call. At this point the kernel waits for the other end to acknowledge some of the outstanding data, but the receiver does not expect it, so it does not recv(2)
.There are probably other explanations, but it's hard to tell without seeing the code.
If send()
is blocking on a TCP socket, it indicates that the send buffer is full, which in turn indicates that the peer on the other end of the connection isn't reading data fast enough. Maybe that client is completely stuck and not calling recv()
often enough.
Nagle's wouldn't cause "disappearing into the kernel", which is why disabling it doesn't help you. Nagle's will just buffer data for a little while, but will eventually send it without any prompting from the user.
There is some other culprit.
Edit for the updated question.
You must make sure that the client is receiving all of the sent data, and that it is receiving it quickly. Have each client write to a log or something to verify.
For example, if a client is waiting for the server to accept its 23-byte update, then it might not be receiving the data. That can cause the server's send buffer to fill-up, which would cause degradation and eventual deadlock.
If this is indeed the culprit, the solution would be some asynchronous communication, like Boost's Asio library.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With