Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Zero copy in using vmsplice/splice in Linux

I am trying to get zero copy semantics working in linux using vmsplice()/splice() but I don't see any performance improvement. This is on linux 3.10, tried on 3.0.0 and 2.6.32. The following code tries to do file writes, I have tried network socket writes() also, couldn't see any improvement.

Can somebody tell what am I doing wrong ?

Has anyone gotten improvement using vmsplice()/splice() in production ?

#include <assert.h>
#include <fcntl.h>
#include <iostream>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#include <unistd.h>
#include <vector>

const char *filename = "Test-File";
const int block_size = 4 * 1024;
const int file_size = 4 * 1024 * 1024;

using namespace std;

int pipes[2];
vector<char *> file_data;

static int NowUsecs() {
  struct timeval tv;
  const int err = gettimeofday(&tv, NULL);
  assert(err >= 0);
  return tv.tv_sec * 1000000LL + tv.tv_usec;
}

void CreateData() {
  for (int xx = 0; xx < file_size / block_size; ++xx) {
    // The data buffer to fill.
    char *data = NULL;
    assert(posix_memalign(reinterpret_cast<void **>(&data), 4096, block_size) == 0);
    file_data.emplace_back(data);
  }
}

int SpliceWrite(int fd, char *buf, int buf_len) {
  int len = buf_len;
  struct iovec iov;
  iov.iov_base = buf;
  iov.iov_len = len;

  while (len) {
    int ret = vmsplice(pipes[1], &iov, 1, SPLICE_F_GIFT);
    assert(ret >= 0);
    if (!ret)
      break;
    len -= ret;
    if (len) {
      auto ptr = static_cast<char *>(iov.iov_base);
      ptr += ret;
      iov.iov_base = ptr;
      iov.iov_len -= ret;
    }
  }

  len = buf_len;
  while (len) {
    int ret = splice(pipes[0], NULL, fd, NULL, len, SPLICE_F_MOVE);
    assert(ret >= 0);
    if (!ret)
      break;

    len -= ret;
  }

  return 1;
}

int WriteToFile(const char *filename, bool use_splice) {
  // Open and write to the file.
   mode_t mode = S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH;
  int fd = open(filename, O_CREAT | O_RDWR, mode);
  assert(fd >= 0);

  const int start = NowUsecs();
  for (int xx = 0; xx < file_size / block_size; ++xx) {
    if (use_splice) {
      SpliceWrite(fd, file_data[xx], block_size);
    } else {
      assert(write(fd, file_data[xx], block_size) == block_size);
    }
  }
  const int time = NowUsecs() - start;

  // Close file.
  assert(close(fd) == 0);

  return time;
}

void ValidateData() {
  // Open and read from file.
  const int fd = open(filename, O_RDWR);
  assert(fd >= 0);

  char *read_buf = (char *)malloc(block_size);
  for (int xx = 0; xx < file_size / block_size; ++xx) {
    assert(read(fd, read_buf, block_size) == block_size);
    assert(memcmp(read_buf, file_data[xx], block_size) == 0);
  }

  // Close file.
  assert(close(fd) == 0);
  assert(unlink(filename) == 0);
}

int main(int argc, char **argv) {
  auto res = pipe(pipes);
  assert(res == 0);

  CreateData();
  const int without_splice = WriteToFile(filename, false /* use splice */);
  ValidateData();
  const int with_splice = WriteToFile(filename, true /* use splice */);
  ValidateData();

  cout << "TIME WITH SPLICE: " << with_splice << endl;
  cout << "TIME WITHOUT SPLICE: " << without_splice << endl;

  return 0;
}
like image 205
WhiteZ Avatar asked Sep 15 '25 12:09

WhiteZ


1 Answers

I did a proof-of-concept some years ago where I got as 4x speedup using an optimized, specially tailored, vmsplice() code. This was measured against a generic socket/write() based solution. This blog post from natsys-lab echoes my findings. But I believe you need to have the exact right use case to get near this number.

So what are you doing wrong? Primarily I think you are measuring the wrong thing. When writing directly to a file you have 1 system call, which is write(). And you are not actually copying data (except to the kernel). When you have a buffer with data that you want to write to disk, it's not gonna get faster than that.

In you vmsplice/splice setup you are still copying you data into the kernel, but you have a total of 2 system calls vmsplice()+splice() to get it to disk. The speed being identical to write() is probably just a testament to Linux system call speed :-)

A more "fair" setup would be to write one program that read() from stdin and write() the same data to stdout. Write an identical program that simply splice() stdin into a file (or point stdout to a file when you run it). Although this setup might be too simple to really show anything.

Aside: an (undocumented?) feature of vmsplice() is that you can also use to to read data from a pipe. I used this in my old POC. It was basically just an IPC layer based on the idea of passing memory pages around using vmsplice().

Note: NowUsecs() probably overflows the int

like image 103
kamstrup Avatar answered Sep 18 '25 06:09

kamstrup