Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are repeated recv() calls expensive?

Tags:

c

tcp

sockets

I have a question about a situation that I face quite often. From time to time I have to implement various TCP-based protocols. Most of them define variable-length data packets that begin with a common header ([packet ID, length, payload] or something really similar). Obviously, there can be two approaches to reading these packets:

  1. Read header (since header length is usually fixed), extract the payload length, read the payload
  2. Read all available data and store it in a buffer; parse the buffer afterwards

Obviously, the first approach is simple, but requires two calls to read() (or probably more). The second one is slightly more complicated, but requires less calls.

The question is: does the first approach affect the performance badly enough to worry about it?

like image 742
Roman Dmitrienko Avatar asked Feb 24 '11 10:02

Roman Dmitrienko


People also ask

Is recv () a blocking call?

recv(IPC, Buffer, int n) is a blocking call, that is, if data is available it writes it to the buffer and immediately returns true, and if no data is available it waits for at least n seconds to receive any data.

What does recv () return?

Returned value. If successful, recv() returns the length of the message or datagram in bytes. The value 0 indicates the connection is closed.

What is returned by recv () from the server after it is done sending the HTTP request?

RETURN VALUE Upon successful completion, recv() shall return the length of the message in bytes. If no messages are available to be received and the peer has performed an orderly shutdown, recv() shall return 0. Otherwise, -1 shall be returned and errno set to indicate the error.

Under which of the following circumstances recv () returns a zero value?

A returned value of zero indicates one of the following: The partner program has sent a NULL message (a datagram with no user data), A shutdown() to disable reading was previously done on the socket. The buffer length specified was zero.


2 Answers

yes, system calls are generally expensive, compared to memory copies. IMHO it is particularly true on x86 architecture, and arguable on RISC machine (arm, mips, ...).

To be honest, unless you must handle hundreds or thousands of request per second, you will hardly notice the difference.

Depending on what is exactly the protocol, an hybrid approach could be the best. When the protocol uses a lot of small packets and less big ones, you can read the header and a partial amount of data. When it is a small packet, you win by avoiding a large memcpy, when the packet is big, you win by issuing a second syscall only for that case.

like image 186
Laurent G Avatar answered Oct 19 '22 06:10

Laurent G


If your application is a server capable of handling multiple clients simultaneously and non-blocking sockets are used to handle multiple clients in one thread, you have little choice but to only ever issue one recv() syscall when a socket becomes ready for read.

The reason for that is if you keep calling recv() in a loop and the client sends a large volume of data, what can happen is that your recv() loop may block the thread for long time from doing anything else. E.g., recv() reads some amount of data from the socket, determines that there is now a complete message in the buffer and forwards that message to the callback. The callback processes the message somehow and returns. If you call recv() once more there can be more messages that have arrived while the callback was processing the previous message. This leads to a busy recv() loop on one socket preventing the thread from processing any other pending events.

This issue is exacerbated if the socket read buffer in your application is smaller than the kernel socket receive buffer. In other words, the whole contents of the kernel receive buffer can not be read in one recv() call. Anecdotal evidence is that I hit this issue on a busy production system when there was a 16Kb user-space buffer for a 2Mb kernel socket receive buffer. A client sending many messages in succession would block the thread in that recv() loop for minutes because more messages would arrive when the just read messages were being processed, leading to disruption of the service.

In such event-driven architectures it is best to have the user-space read buffer equal to the size of the kernel socket receive buffer (or the maximum message size, whichever is bigger), so that all the data available in the kernel buffer can be read in one recv() call. This works by doing one recv() call, processing all complete messages in the user-space read buffer and then returning control to the event loop. This way a connections with a lot of incoming data is not blocking the thread from processing other events and connections, rather it round-robin's processing of all connections with incoming data available.

like image 26
Maxim Egorushkin Avatar answered Oct 19 '22 08:10

Maxim Egorushkin