Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to ask Linux to blackhole bytes during a socket read?

Tags:

c++

linux

sockets

I have a c++ program running under Linux Debian 9. I'm doing a simple read() from a file descriptor:

int bytes_read = read(fd, buffer, buffer_size);

Imagine that I want to read some more data from the socket, but I want to skip a known number of bytes before getting to some content I'm interested in:

int unwanted_bytes_read = read(fd, unwanted_buffer, bytes_to_skip);

int useful_bytes = read(fd, buffer, buffer_size);

In Linux, is there a system-wide 'built-in' location that I can dump the unwanted bytes into, rather than having to maintain a buffer for unwanted data (like unwanted_buffer in the above example)?

I suppose what I'm looking for would be (sort of) the opposite of MSG_PEEK in the socket world, i.e. the kernel would purge bytes_to_skip from its receive buffer before the next useful call to recv.

If I were reading from a file then lseek would be enough. But this is not possible if you are reading from a socket and are using scatter/gather I/O, and you want to drop one of the fields.

I'm thinking about something like this:

// send side
int a = 1;
int b = 2;
int c = 3;
struct iovec iov[3];
ssize_t nwritten;

iov[0].iov_base = &a;
iov[0].iov_len  = sizeof(int);
iov[1].iov_base = &b;
iov[1].iov_len  = sizeof(int);
iov[2].iov_base = &c;
iov[2].iov_len  = sizeof(int);

nwritten = writev(fd, iov, 3);

// receive side
int a = -1;
int c = -1;
struct iovec iov[3]; // you know that you'll be receiving three fields and what their sizes are, but you don't care about the second.
ssize_t nread;

iov[0].iov_base = &a;
iov[0].iov_len  = sizeof(int);
iov[1].iov_base = ??? <---- what to put here?
iov[1].iov_len  = sizeof(int);
iov[2].iov_base = &c;
iov[2].iov_len  = sizeof(int);

nread = readv(fd, iov, 3);

I know that I could just create another b variable on the receive side, but if I don't want to, how can I read the sizeof(int) bytes that it occupies in the file but just dump the data and proceed to c? I could just create a generic buffer to dump b into, all I was asking is if there is such a location by default.

[EDIT]

Following a suggestion from @inetknght, I tried memory mapping /dev/null and doing my gather into the mapped address:

int nullfd = open("/dev/null", O_WRONLY);
void* blackhole = mmap(NULL, iov[1].iov_len, PROT_WRITE, MAP_SHARED, nullfd, 0);

iov[1].iov_base = blackhole;    

nread = readv(fd, iov, 3);

However, blackhole comes out as 0xffff and I get an errno 13 'Permission Denied'. I tried running my code as su and this doesn't work either. Perhaps I'm setting up my mmap incorrectly?

like image 983
user12066 Avatar asked May 14 '19 10:05

user12066


People also ask

What is socket buffer size in Linux?

default The default size of the send buffer for a TCP socket. This value overwrites the initial default buffer size from the generic global /proc/sys/net/core/wmem_default defined for all protocols. The default value is 16 kB.

How does read () work in C?

The read() function reads data previously written to a file. If any portion of a regular file prior to the end-of-file has not been written, read() shall return bytes with value 0.

What is TCP socket buffer?

TCP receive buffer becomes full: Commonly caused by the receiving application not being able to extract data from the socket receive buffer quickly enough. For instance, an overloaded server, i.e. one that is receiving data at a rate greater than the rate at which it can process data, would exhibit this characteristic.

What does the read function return?

The read() function then returns the number of bytes read, and places the zero-byte message back on the STREAM to be retrieved by the next read(), readv() or getmsg(). In message-nondiscard mode or message-discard mode, a zero-byte message returns 0 and the message is removed from the STREAM.

What is a socket in Linux?

A socket is an endpoint of a software network connection, abstracted so that it can be treated as a file handle. That means it fits with the general Unix and Linux design principle of “ everything is a file .” We don’t mean the physical socket on the wall that you plug your network cable into.

What happens when a socket connection is received by a server?

If software listens for incoming socket connections, it’s acting as a server. Any data that comes over the socket connection is said to be received by the server. We can replicate this behavior very easily using nc. Any received data is displayed in the terminal window.

What are sockets and how do they work?

We’re going to call them clients and servers. Sockets are implemented as an application programming interface (API), allowing software developers to call on the socket functionality from within their code. That’s fine if you’re a programmer, but what if you’re not?


2 Answers

There's a tl;dr at the end.

In my comment, I suggested you mmap() the /dev/null device. However it seems that device is not mappable on my machine (err 19: No such device). It looks like /dev/zero is mappable though. Another question/answer suggests that is equivalent to MAP_ANONYMOUS which makes the fd argument and its associated open() unnecessary in the first place. Check out an example:

#include <iostream>
#include <cstring>
#include <cerrno>
#include <cstdlib>

extern "C" {
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <fcntl.h>
}

template <class Type>
struct iovec ignored(void *p)
{
    struct iovec iov_ = {};
    iov_.iov_base = p;
    iov_.iov_len = sizeof(Type);
    return iov_;
}

int main()
{
    auto * p = mmap(nullptr, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    if ( MAP_FAILED == p ) {
        auto err = errno;
        std::cerr << "mmap(MAP_PRIVATE | MAP_ANONYMOUS): " << err << ": " << strerror(err) << std::endl;
        return EXIT_FAILURE;
    }

    int s_[2] = {-1, -1};
    int result = socketpair(AF_UNIX, SOCK_STREAM, 0, s_);
    if ( result < 0 ) {
        auto err = errno;
        std::cerr << "socketpair(): " << err << ": " << strerror(err) << std::endl;
        return EXIT_FAILURE;
    }

    int w_[3] = {1,2,3};
    ssize_t nwritten = 0;
    auto makeiov = [](int & v){
        struct iovec iov_ = {};
        iov_.iov_base = &v;
        iov_.iov_len = sizeof(v);
        return iov_;
    };
    struct iovec wv[3] = {
        makeiov(w_[0]),
        makeiov(w_[1]),
        makeiov(w_[2])
    };

    nwritten = writev(s_[0], wv, 3);
    if ( nwritten < 0 ) {
        auto err = errno;
        std::cerr << "writev(): " << err << ": " << strerror(err) << std::endl;
        return EXIT_FAILURE;
    }

    int r_ = {0};
    ssize_t nread = 0;
    struct iovec rv[3] = {
        ignored<int>(p),
        makeiov(r_),
        ignored<int>(p),
    };

    nread = readv(s_[1], rv, 3);
    if ( nread < 0 ) {
        auto err = errno;
        std::cerr << "readv(): " << err << ": " << strerror(err) << std::endl;
        return EXIT_FAILURE;
    }

    std::cout <<
        w_[0] << '\t' <<
        w_[1] << '\t' <<
        w_[2] << '\n' <<
        r_ << '\t' <<
        *(int*)p << std::endl;

    return EXIT_SUCCESS;
}

In the above example you can see that I create a private (writes won't be visible by children after fork()) anonymous (not backed by a file) memory mapping of 4KiB (one single page size on most systems). It's then used twice to provide a write destination for two ints -- the later int overwriting the earlier one.

That doesn't exactly solve your question: how to ignore the bytes. Since you're using readv(), I looked into its sister function, preadv() which on first glance appears to do what you want it to do: skip bytes. However, it seems that's not supported on socket file descriptors. The following code gives preadv(): 29: Illegal seek.

rv = makeiov(r_[1]);
nread = preadv(s_[1], &rv, 1, sizeof(int));
if ( nread < 0 ) {
    auto err = errno;
    std::cerr << "preadv(): " << err << ": " << strerror(err) << std::endl;
    return EXIT_FAILURE;
}

So it looks like even preadv() uses seek() under the hood which is, of course, not permitted on a socket. I'm not sure if there is (yet?) a way to tell the OS to ignore/drop bytes received in an established stream. I suspect that's because @geza is correct: the cost to write to the final (ignored) destination is extremely trivial for most situations I've encountered. And, in the situations where the cost of the ignored bytes is not trivial, you should seriously consider using better options, implementations, or protocols.

tl;dr:

Creating a 4KiB anonymous private memory mapping is effectively indistinguishable from contiguous-allocation containers (there are subtle differences that aren't likely to be important for any workload outside of very high end performance). Using a standard container is also a lot less prone to allocation bugs: memory leaks, wild pointers, et al. So I'd say KISS and just do that instead of endorsing any of the code I wrote above. For example: std::array<char, 4096> ignored; or std::vector<char> ignored{4096}; and just set iovec.iov_base = ignored.data(); and set the .iov_len to whatever size you need to ignore (within the length of the container).

like image 165
inetknght Avatar answered Nov 07 '22 01:11

inetknght


The efficient reading of data from a socket is when:

  1. The user-space buffer size is the same or larger (SO_RCVBUF_size + maximum_message_size - 1) than that of the kernel socket receive buffer. You can even map buffer memory pages twice contiguously to make it a ring-buffer to avoid memmoveing incomplete messages to the beginning of the buffer.
  2. The reading is done in one call of recv. This minimizes the number of syscalls (which are more expensive these days due to mitigations for Spectre, Meltdown, etc..). And also prevents starvation of other sockets in the same event loop, which can happen if the code repeatedly calls recv on the same socket with small buffer size until it fails with EAGAIN. As well as guarantees that you drain the entire kernel receive buffer in one recv syscall.

If you do the above, you should then interpret/decode the message from the user-space buffer ignoring whatever is necessary.

Using multiple recv or recvmsg calls with small buffer sizes is sub-optimal with regards to latency and throughput.

like image 30
Maxim Egorushkin Avatar answered Nov 07 '22 01:11

Maxim Egorushkin