I have a c++ program running under Linux Debian 9. I'm doing a simple read() from a file descriptor: <pre class="prettyprint"><code>int bytes_read = read(fd, buffer, buffer_size); </code></pre> Imagine that I want to read some more data from the socket, but I want to skip a known number of bytes before getting to some content I'm interested in: <pre class="prettyprint"><code>int unwanted_bytes_read = read(fd, unwanted_buffer, bytes_to_skip); int useful_bytes = read(fd, buffer, buffer_size); </code></pre> In Linux, is there a system-wide 'built-in' location that I can dump the unwanted bytes into, rather than having to maintain a buffer for unwanted data (like <code>unwanted_buffer</code> in the above example)? I suppose what I'm looking for would be (sort of) the opposite of <code>MSG_PEEK</code> in the socket world, i.e. the kernel would purge <code>bytes_to_skip</code> from its receive buffer before the next useful call to recv. If I were reading from a file then <code>lseek</code> would be enough. But this is not possible if you are reading from a socket and are using scatter/gather I/O, and you want to drop one of the fields. I'm thinking about something like this: <pre class="prettyprint"><code>// send side int a = 1; int b = 2; int c = 3; struct iovec iov[3]; ssize_t nwritten; iov[0].iov_base = &a; iov[0].iov_len = sizeof(int); iov[1].iov_base = &b; iov[1].iov_len = sizeof(int); iov[2].iov_base = &c; iov[2].iov_len = sizeof(int); nwritten = writev(fd, iov, 3); // receive side int a = -1; int c = -1; struct iovec iov[3]; // you know that you'll be receiving three fields and what their sizes are, but you don't care about the second. ssize_t nread; iov[0].iov_base = &a; iov[0].iov_len = sizeof(int); iov[1].iov_base = ??? <---- what to put here? iov[1].iov_len = sizeof(int); iov[2].iov_base = &c; iov[2].iov_len = sizeof(int); nread = readv(fd, iov, 3); </code></pre> I know that I could just create another <code>b</code> variable on the receive side, but if I don't want to, how can I read the <code>sizeof(int)</code> bytes that it occupies in the file but just dump the data and proceed to <code>c</code>? I could just create a generic buffer to dump <code>b</code> into, all I was asking is if there is such a location by default. [EDIT] Following a suggestion from @inetknght, I tried memory mapping /dev/null and doing my gather into the mapped address: <pre class="prettyprint"><code>int nullfd = open("/dev/null", O_WRONLY); void* blackhole = mmap(NULL, iov[1].iov_len, PROT_WRITE, MAP_SHARED, nullfd, 0); iov[1].iov_base = blackhole; nread = readv(fd, iov, 3); </code></pre> However, <code>blackhole</code> comes out as <code>0xffff</code> and I get an errno 13 'Permission Denied'. I tried running my code as su and this doesn't work either. Perhaps I'm setting up my <code>mmap</code> incorrectly?

There's a tl;dr at the end. In my comment, I suggested you <code>mmap()</code> the <code>/dev/null</code> device. However it seems that device is not mappable on my machine (err <code>19</code>: <code>No such device</code>). It looks like <code>/dev/zero</code> is mappable though. Another question/answer suggests that is equivalent to <code>MAP_ANONYMOUS</code> which makes the <code>fd</code> argument and its associated <code>open()</code> unnecessary in the first place. Check out an example: <pre class="prettyprint"><code>#include <iostream> #include <cstring> #include <cerrno> #include <cstdlib> extern "C" { #include <sys/mman.h> #include <sys/types.h> #include <sys/socket.h> #include <sys/stat.h> #include <fcntl.h> } template <class Type> struct iovec ignored(void *p) { struct iovec iov_ = {}; iov_.iov_base = p; iov_.iov_len = sizeof(Type); return iov_; } int main() { auto * p = mmap(nullptr, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if ( MAP_FAILED == p ) { auto err = errno; std::cerr << "mmap(MAP_PRIVATE | MAP_ANONYMOUS): " << err << ": " << strerror(err) << std::endl; return EXIT_FAILURE; } int s_[2] = {-1, -1}; int result = socketpair(AF_UNIX, SOCK_STREAM, 0, s_); if ( result < 0 ) { auto err = errno; std::cerr << "socketpair(): " << err << ": " << strerror(err) << std::endl; return EXIT_FAILURE; } int w_[3] = {1,2,3}; ssize_t nwritten = 0; auto makeiov = [](int & v){ struct iovec iov_ = {}; iov_.iov_base = &v; iov_.iov_len = sizeof(v); return iov_; }; struct iovec wv[3] = { makeiov(w_[0]), makeiov(w_[1]), makeiov(w_[2]) }; nwritten = writev(s_[0], wv, 3); if ( nwritten < 0 ) { auto err = errno; std::cerr << "writev(): " << err << ": " << strerror(err) << std::endl; return EXIT_FAILURE; } int r_ = {0}; ssize_t nread = 0; struct iovec rv[3] = { ignored<int>(p), makeiov(r_), ignored<int>(p), }; nread = readv(s_[1], rv, 3); if ( nread < 0 ) { auto err = errno; std::cerr << "readv(): " << err << ": " << strerror(err) << std::endl; return EXIT_FAILURE; } std::cout << w_[0] << '\t' << w_[1] << '\t' << w_[2] << '\n' << r_ << '\t' << *(int*)p << std::endl; return EXIT_SUCCESS; } </code></pre> In the above example you can see that I create a private (writes won't be visible by children after <code>fork()</code>) anonymous (not backed by a file) memory mapping of 4KiB (one single page size on most systems). It's then used twice to provide a write destination for two ints -- the later int overwriting the earlier one. That doesn't exactly solve your question: how to ignore the bytes. Since you're using <code>readv()</code>, I looked into its sister function, <code>preadv()</code> which on first glance appears to do what you want it to do: skip bytes. However, it seems that's not supported on socket file descriptors. The following code gives <code>preadv(): 29: Illegal seek</code>. <pre class="prettyprint"><code>rv = makeiov(r_[1]); nread = preadv(s_[1], &rv, 1, sizeof(int)); if ( nread < 0 ) { auto err = errno; std::cerr << "preadv(): " << err << ": " << strerror(err) << std::endl; return EXIT_FAILURE; } </code></pre> So it looks like even <code>preadv()</code> uses <code>seek()</code> under the hood which is, of course, not permitted on a socket. I'm not sure if there is (yet?) a way to tell the OS to ignore/drop bytes received in an established stream. I suspect that's because @geza is correct: the cost to write to the final (ignored) destination is extremely trivial for most situations I've encountered. And, in the situations where the cost of the ignored bytes is not trivial, you should seriously consider using better options, implementations, or protocols. tl;dr: Creating a 4KiB anonymous private memory mapping is effectively indistinguishable from contiguous-allocation containers (there are subtle differences that aren't likely to be important for any workload outside of very high end performance). Using a standard container is also a lot less prone to allocation bugs: memory leaks, wild pointers, et al. So I'd say KISS and just do that instead of endorsing any of the code I wrote above. For example: <code>std::array<char, 4096> ignored;</code> or <code>std::vector<char> ignored{4096};</code> and just set <code>iovec.iov_base = ignored.data();</code> and set the <code>.iov_len</code> to whatever size you need to ignore (within the length of the container).

The efficient reading of data from a socket is when: <ol> <li>The user-space buffer size is the same or larger (<code>SO_RCVBUF_size + maximum_message_size - 1</code>) than that of the kernel socket receive buffer. You can even map buffer memory pages twice contiguously to make it a ring-buffer to avoid <code>memmove</code>ing incomplete messages to the beginning of the buffer.</li> <li>The reading is done in one call of <code>recv</code>. This minimizes the number of syscalls (which are more expensive these days due to mitigations for Spectre, Meltdown, etc..). And also prevents starvation of other sockets in the same event loop, which can happen if the code repeatedly calls <code>recv</code> on the same socket with small buffer size until it fails with <code>EAGAIN</code>. As well as guarantees that you drain the entire kernel receive buffer in one <code>recv</code> syscall.</li> </ol> If you do the above, you should then interpret/decode the message from the user-space buffer ignoring whatever is necessary. Using multiple <code>recv</code> or <code>recvmsg</code> calls with small buffer sizes is sub-optimal with regards to latency and throughput.

Is it possible to ask Linux to blackhole bytes during a socket read?

Tags:

c++

linux

sockets

I have a c++ program running under Linux Debian 9. I'm doing a simple read() from a file descriptor:

int bytes_read = read(fd, buffer, buffer_size);

Imagine that I want to read some more data from the socket, but I want to skip a known number of bytes before getting to some content I'm interested in:

int unwanted_bytes_read = read(fd, unwanted_buffer, bytes_to_skip);

int useful_bytes = read(fd, buffer, buffer_size);

In Linux, is there a system-wide 'built-in' location that I can dump the unwanted bytes into, rather than having to maintain a buffer for unwanted data (like unwanted_buffer in the above example)?

I suppose what I'm looking for would be (sort of) the opposite of MSG_PEEK in the socket world, i.e. the kernel would purge bytes_to_skip from its receive buffer before the next useful call to recv.

If I were reading from a file then lseek would be enough. But this is not possible if you are reading from a socket and are using scatter/gather I/O, and you want to drop one of the fields.

I'm thinking about something like this:

// send side
int a = 1;
int b = 2;
int c = 3;
struct iovec iov[3];
ssize_t nwritten;

iov[0].iov_base = &a;
iov[0].iov_len  = sizeof(int);
iov[1].iov_base = &b;
iov[1].iov_len  = sizeof(int);
iov[2].iov_base = &c;
iov[2].iov_len  = sizeof(int);

nwritten = writev(fd, iov, 3);

// receive side
int a = -1;
int c = -1;
struct iovec iov[3]; // you know that you'll be receiving three fields and what their sizes are, but you don't care about the second.
ssize_t nread;

iov[0].iov_base = &a;
iov[0].iov_len  = sizeof(int);
iov[1].iov_base = ??? <---- what to put here?
iov[1].iov_len  = sizeof(int);
iov[2].iov_base = &c;
iov[2].iov_len  = sizeof(int);

nread = readv(fd, iov, 3);

I know that I could just create another b variable on the receive side, but if I don't want to, how can I read the sizeof(int) bytes that it occupies in the file but just dump the data and proceed to c? I could just create a generic buffer to dump b into, all I was asking is if there is such a location by default.

[EDIT]

Following a suggestion from @inetknght, I tried memory mapping /dev/null and doing my gather into the mapped address:

int nullfd = open("/dev/null", O_WRONLY);
void* blackhole = mmap(NULL, iov[1].iov_len, PROT_WRITE, MAP_SHARED, nullfd, 0);

iov[1].iov_base = blackhole;    

nread = readv(fd, iov, 3);

However, blackhole comes out as 0xffff and I get an errno 13 'Permission Denied'. I tried running my code as su and this doesn't work either. Perhaps I'm setting up my mmap incorrectly?

983

asked May 14 '19 10:05

user12066

2 Answers

There's a tl;dr at the end.

In my comment, I suggested you mmap() the /dev/null device. However it seems that device is not mappable on my machine (err 19: No such device). It looks like /dev/zero is mappable though. Another question/answer suggests that is equivalent to MAP_ANONYMOUS which makes the fd argument and its associated open() unnecessary in the first place. Check out an example:

#include <iostream>
#include <cstring>
#include <cerrno>
#include <cstdlib>

extern "C" {
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <fcntl.h>
}

template <class Type>
struct iovec ignored(void *p)
{
    struct iovec iov_ = {};
    iov_.iov_base = p;
    iov_.iov_len = sizeof(Type);
    return iov_;
}

int main()
{
    auto * p = mmap(nullptr, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    if ( MAP_FAILED == p ) {
        auto err = errno;
        std::cerr << "mmap(MAP_PRIVATE | MAP_ANONYMOUS): " << err << ": " << strerror(err) << std::endl;
        return EXIT_FAILURE;
    }

    int s_[2] = {-1, -1};
    int result = socketpair(AF_UNIX, SOCK_STREAM, 0, s_);
    if ( result < 0 ) {
        auto err = errno;
        std::cerr << "socketpair(): " << err << ": " << strerror(err) << std::endl;
        return EXIT_FAILURE;
    }

    int w_[3] = {1,2,3};
    ssize_t nwritten = 0;
    auto makeiov = [](int & v){
        struct iovec iov_ = {};
        iov_.iov_base = &v;
        iov_.iov_len = sizeof(v);
        return iov_;
    };
    struct iovec wv[3] = {
        makeiov(w_[0]),
        makeiov(w_[1]),
        makeiov(w_[2])
    };

    nwritten = writev(s_[0], wv, 3);
    if ( nwritten < 0 ) {
        auto err = errno;
        std::cerr << "writev(): " << err << ": " << strerror(err) << std::endl;
        return EXIT_FAILURE;
    }

    int r_ = {0};
    ssize_t nread = 0;
    struct iovec rv[3] = {
        ignored<int>(p),
        makeiov(r_),
        ignored<int>(p),
    };

    nread = readv(s_[1], rv, 3);
    if ( nread < 0 ) {
        auto err = errno;
        std::cerr << "readv(): " << err << ": " << strerror(err) << std::endl;
        return EXIT_FAILURE;
    }

    std::cout <<
        w_[0] << '\t' <<
        w_[1] << '\t' <<
        w_[2] << '\n' <<
        r_ << '\t' <<
        *(int*)p << std::endl;

    return EXIT_SUCCESS;
}

In the above example you can see that I create a private (writes won't be visible by children after fork()) anonymous (not backed by a file) memory mapping of 4KiB (one single page size on most systems). It's then used twice to provide a write destination for two ints -- the later int overwriting the earlier one.

That doesn't exactly solve your question: how to ignore the bytes. Since you're using readv(), I looked into its sister function, preadv() which on first glance appears to do what you want it to do: skip bytes. However, it seems that's not supported on socket file descriptors. The following code gives preadv(): 29: Illegal seek.

rv = makeiov(r_[1]);
nread = preadv(s_[1], &rv, 1, sizeof(int));
if ( nread < 0 ) {
    auto err = errno;
    std::cerr << "preadv(): " << err << ": " << strerror(err) << std::endl;
    return EXIT_FAILURE;
}

So it looks like even preadv() uses seek() under the hood which is, of course, not permitted on a socket. I'm not sure if there is (yet?) a way to tell the OS to ignore/drop bytes received in an established stream. I suspect that's because @geza is correct: the cost to write to the final (ignored) destination is extremely trivial for most situations I've encountered. And, in the situations where the cost of the ignored bytes is not trivial, you should seriously consider using better options, implementations, or protocols.

tl;dr:

Creating a 4KiB anonymous private memory mapping is effectively indistinguishable from contiguous-allocation containers (there are subtle differences that aren't likely to be important for any workload outside of very high end performance). Using a standard container is also a lot less prone to allocation bugs: memory leaks, wild pointers, et al. So I'd say KISS and just do that instead of endorsing any of the code I wrote above. For example: std::array<char, 4096> ignored; or std::vector<char> ignored{4096}; and just set iovec.iov_base = ignored.data(); and set the .iov_len to whatever size you need to ignore (within the length of the container).

165

answered Nov 07 '22 01:11

inetknght

The efficient reading of data from a socket is when:

The user-space buffer size is the same or larger (SO_RCVBUF_size + maximum_message_size - 1) than that of the kernel socket receive buffer. You can even map buffer memory pages twice contiguously to make it a ring-buffer to avoid memmoveing incomplete messages to the beginning of the buffer.
The reading is done in one call of recv. This minimizes the number of syscalls (which are more expensive these days due to mitigations for Spectre, Meltdown, etc..). And also prevents starvation of other sockets in the same event loop, which can happen if the code repeatedly calls recv on the same socket with small buffer size until it fails with EAGAIN. As well as guarantees that you drain the entire kernel receive buffer in one recv syscall.

If you do the above, you should then interpret/decode the message from the user-space buffer ignoring whatever is necessary.

Using multiple recv or recvmsg calls with small buffer sizes is sub-optimal with regards to latency and throughput.

answered Nov 07 '22 01:11

Maxim Egorushkin

Related questions
                            
                                What to do with these old-style casts?
                            
                                Superpowered: can't get TimeStretching to work correctly, the output sound is distorted
                            
                                BoostPython and CMake
                            
                                C++ type trait to see if `static_cast<uint32_t>(k)` can be called on any variable of type `K`
                            
                                How do I know if object passed as r-value will get moved?
                            
                                Template deduction complaints ambiguous candidates
                            
                                How to access to subsequence of a valarray considering it as a 2D matrix in C++
                            
                                Barriers and synchronization points with non-atomic variables - data race?
                            
                                Why doesn't this work? (brace-initialization of references)
                            
                                why _Printf_format_string_ macro doesn't produce any warnings?
                            
                                How to know which functions of a library get called by a program
                            
                                Scope resolution in templated inheritance (possibly what is called mixin)
                            
                                Why does the MSVC compiler give access to this private function without a warning or error?
                            
                                C++ inline definition of friend function
                            
                                Does std::integral_constant<T, v>::value always have a definition?
                            
                                Execute and finish of methods
                            
                                OpenCL usable when compiling host application with Address Sanitizer
                            
                                How to ensure moving without impeding RVO?
                            
                                C++17: how to control number of threads in execution policy?
                            
                                problems using threads in C++ on windows 10 (using g++ as compiler)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With