I have a c++ program running under Linux Debian 9. I'm doing a simple read() from a file descriptor:
int bytes_read = read(fd, buffer, buffer_size);
Imagine that I want to read some more data from the socket, but I want to skip a known number of bytes before getting to some content I'm interested in:
int unwanted_bytes_read = read(fd, unwanted_buffer, bytes_to_skip);
int useful_bytes = read(fd, buffer, buffer_size);
In Linux, is there a system-wide 'built-in' location that I can dump the unwanted bytes into, rather than having to maintain a buffer for unwanted data (like unwanted_buffer
in the above example)?
I suppose what I'm looking for would be (sort of) the opposite of MSG_PEEK
in the socket world, i.e. the kernel would purge bytes_to_skip
from its receive buffer before the next useful call to recv.
If I were reading from a file then lseek
would be enough. But this is not possible if you are reading from a socket and are using scatter/gather I/O, and you want to drop one of the fields.
I'm thinking about something like this:
// send side
int a = 1;
int b = 2;
int c = 3;
struct iovec iov[3];
ssize_t nwritten;
iov[0].iov_base = &a;
iov[0].iov_len = sizeof(int);
iov[1].iov_base = &b;
iov[1].iov_len = sizeof(int);
iov[2].iov_base = &c;
iov[2].iov_len = sizeof(int);
nwritten = writev(fd, iov, 3);
// receive side
int a = -1;
int c = -1;
struct iovec iov[3]; // you know that you'll be receiving three fields and what their sizes are, but you don't care about the second.
ssize_t nread;
iov[0].iov_base = &a;
iov[0].iov_len = sizeof(int);
iov[1].iov_base = ??? <---- what to put here?
iov[1].iov_len = sizeof(int);
iov[2].iov_base = &c;
iov[2].iov_len = sizeof(int);
nread = readv(fd, iov, 3);
I know that I could just create another b
variable on the receive side, but if I don't want to, how can I read the sizeof(int)
bytes that it occupies in the file but just dump the data and proceed to c
? I could just create a generic buffer to dump b
into, all I was asking is if there is such a location by default.
[EDIT]
Following a suggestion from @inetknght, I tried memory mapping /dev/null and doing my gather into the mapped address:
int nullfd = open("/dev/null", O_WRONLY);
void* blackhole = mmap(NULL, iov[1].iov_len, PROT_WRITE, MAP_SHARED, nullfd, 0);
iov[1].iov_base = blackhole;
nread = readv(fd, iov, 3);
However, blackhole
comes out as 0xffff
and I get an errno 13 'Permission Denied'. I tried running my code as su and this doesn't work either. Perhaps I'm setting up my mmap
incorrectly?
default The default size of the send buffer for a TCP socket. This value overwrites the initial default buffer size from the generic global /proc/sys/net/core/wmem_default defined for all protocols. The default value is 16 kB.
The read() function reads data previously written to a file. If any portion of a regular file prior to the end-of-file has not been written, read() shall return bytes with value 0.
TCP receive buffer becomes full: Commonly caused by the receiving application not being able to extract data from the socket receive buffer quickly enough. For instance, an overloaded server, i.e. one that is receiving data at a rate greater than the rate at which it can process data, would exhibit this characteristic.
The read() function then returns the number of bytes read, and places the zero-byte message back on the STREAM to be retrieved by the next read(), readv() or getmsg(). In message-nondiscard mode or message-discard mode, a zero-byte message returns 0 and the message is removed from the STREAM.
A socket is an endpoint of a software network connection, abstracted so that it can be treated as a file handle. That means it fits with the general Unix and Linux design principle of “ everything is a file .” We don’t mean the physical socket on the wall that you plug your network cable into.
If software listens for incoming socket connections, it’s acting as a server. Any data that comes over the socket connection is said to be received by the server. We can replicate this behavior very easily using nc. Any received data is displayed in the terminal window.
We’re going to call them clients and servers. Sockets are implemented as an application programming interface (API), allowing software developers to call on the socket functionality from within their code. That’s fine if you’re a programmer, but what if you’re not?
There's a tl;dr at the end.
In my comment, I suggested you mmap()
the /dev/null
device. However it seems that device is not mappable on my machine (err 19
: No such device
). It looks like /dev/zero
is mappable though. Another question/answer suggests that is equivalent to MAP_ANONYMOUS
which makes the fd
argument and its associated open()
unnecessary in the first place. Check out an example:
#include <iostream>
#include <cstring>
#include <cerrno>
#include <cstdlib>
extern "C" {
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <fcntl.h>
}
template <class Type>
struct iovec ignored(void *p)
{
struct iovec iov_ = {};
iov_.iov_base = p;
iov_.iov_len = sizeof(Type);
return iov_;
}
int main()
{
auto * p = mmap(nullptr, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if ( MAP_FAILED == p ) {
auto err = errno;
std::cerr << "mmap(MAP_PRIVATE | MAP_ANONYMOUS): " << err << ": " << strerror(err) << std::endl;
return EXIT_FAILURE;
}
int s_[2] = {-1, -1};
int result = socketpair(AF_UNIX, SOCK_STREAM, 0, s_);
if ( result < 0 ) {
auto err = errno;
std::cerr << "socketpair(): " << err << ": " << strerror(err) << std::endl;
return EXIT_FAILURE;
}
int w_[3] = {1,2,3};
ssize_t nwritten = 0;
auto makeiov = [](int & v){
struct iovec iov_ = {};
iov_.iov_base = &v;
iov_.iov_len = sizeof(v);
return iov_;
};
struct iovec wv[3] = {
makeiov(w_[0]),
makeiov(w_[1]),
makeiov(w_[2])
};
nwritten = writev(s_[0], wv, 3);
if ( nwritten < 0 ) {
auto err = errno;
std::cerr << "writev(): " << err << ": " << strerror(err) << std::endl;
return EXIT_FAILURE;
}
int r_ = {0};
ssize_t nread = 0;
struct iovec rv[3] = {
ignored<int>(p),
makeiov(r_),
ignored<int>(p),
};
nread = readv(s_[1], rv, 3);
if ( nread < 0 ) {
auto err = errno;
std::cerr << "readv(): " << err << ": " << strerror(err) << std::endl;
return EXIT_FAILURE;
}
std::cout <<
w_[0] << '\t' <<
w_[1] << '\t' <<
w_[2] << '\n' <<
r_ << '\t' <<
*(int*)p << std::endl;
return EXIT_SUCCESS;
}
In the above example you can see that I create a private (writes won't be visible by children after fork()
) anonymous (not backed by a file) memory mapping of 4KiB (one single page size on most systems). It's then used twice to provide a write destination for two ints -- the later int overwriting the earlier one.
That doesn't exactly solve your question: how to ignore the bytes. Since you're using readv()
, I looked into its sister function, preadv()
which on first glance appears to do what you want it to do: skip bytes. However, it seems that's not supported on socket file descriptors. The following code gives preadv(): 29: Illegal seek
.
rv = makeiov(r_[1]);
nread = preadv(s_[1], &rv, 1, sizeof(int));
if ( nread < 0 ) {
auto err = errno;
std::cerr << "preadv(): " << err << ": " << strerror(err) << std::endl;
return EXIT_FAILURE;
}
So it looks like even preadv()
uses seek()
under the hood which is, of course, not permitted on a socket. I'm not sure if there is (yet?) a way to tell the OS to ignore/drop bytes received in an established stream. I suspect that's because @geza is correct: the cost to write to the final (ignored) destination is extremely trivial for most situations I've encountered. And, in the situations where the cost of the ignored bytes is not trivial, you should seriously consider using better options, implementations, or protocols.
tl;dr:
Creating a 4KiB anonymous private memory mapping is effectively indistinguishable from contiguous-allocation containers (there are subtle differences that aren't likely to be important for any workload outside of very high end performance). Using a standard container is also a lot less prone to allocation bugs: memory leaks, wild pointers, et al. So I'd say KISS and just do that instead of endorsing any of the code I wrote above. For example: std::array<char, 4096> ignored;
or std::vector<char> ignored{4096};
and just set iovec.iov_base = ignored.data();
and set the .iov_len
to whatever size you need to ignore (within the length of the container).
The efficient reading of data from a socket is when:
SO_RCVBUF_size + maximum_message_size - 1
) than that of the kernel socket receive buffer. You can even map buffer memory pages twice contiguously to make it a ring-buffer to avoid memmove
ing incomplete messages to the beginning of the buffer.recv
. This minimizes the number of syscalls (which are more expensive these days due to mitigations for Spectre, Meltdown, etc..). And also prevents starvation of other sockets in the same event loop, which can happen if the code repeatedly calls recv
on the same socket with small buffer size until it fails with EAGAIN
. As well as guarantees that you drain the entire kernel receive buffer in one recv
syscall.If you do the above, you should then interpret/decode the message from the user-space buffer ignoring whatever is necessary.
Using multiple recv
or recvmsg
calls with small buffer sizes is sub-optimal with regards to latency and throughput.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With