Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

efficiency of fwrite for massive numbers of small writes

I have a program that saves many large files >1GB using fwrite It works fine, but unfortunately due to the nature of the data each call to fwrite only writes 1-4bytes. with the result that the write can take over an hour, with most of this time seemingly due to the syscall overhead (or at least in the library function of fwrite). I have a similar problem with fread.

Does anyone know of any existing / library functions that will buffer these writes and reads with an inline function, or is this another roll your own?

like image 970
camelccc Avatar asked Nov 27 '12 16:11

camelccc


People also ask

What is the difference between fwrite and write?

fwrite writes to a FILE* , i.e. a (potentially) buffered stdio stream. It's specified by the ISO C standard. Additionally, on POSIX systems, fwrite is thread-safe to a certain degree. write is a lower-level API based on file descriptors, described in the POSIX standard.

How long does fwrite take?

with the result that the write can take over an hour, with most of this time seemingly due to the syscall overhead (or at least in the library function of fwrite).

Is fwrite buffered?

Yes, it is buffered. The size of the buffer is defined by BUFSIZ .

How does fwrite work?

The fwrite function writes up to count items, of size length each, from buffer to the output stream . The file pointer associated with stream (if there is one) is incremented by the number of bytes actually written. If stream is opened in text mode, each line feed is replaced with a carriage return-line feed pair.


2 Answers

First of all, fwrite() is a library and not a system call. Secondly, it already buffers the data.

You might want to experiment with increasing the size of the buffer. This is done by using setvbuf(). On my system this only helps a tiny bit, but YMMV.

If setvbuf() does not help, you could do your own buffering and only call fwrite() once you've accumulated enough data. This involves more work, but will almost certainly speed up the writing as your own buffering can be made much more lightweight that fwrite()'s.

edit: If anyone tells you that it's the sheer number of fwrite() calls that is the problem, demand to see evidence. Better still, do your own performance tests. On my computer, 500,000,000 two-byte writes using fwrite() take 11 seconds. This equates to throughput of about 90MB/s.

Last but not least, the huge discrepancy between 11 seconds in my test and one hour mentioned in your question hints at the possibility that there's something else going on in your code that's causing the very poor performance.

like image 172
NPE Avatar answered Oct 25 '22 08:10

NPE


First and foremost: small fwrites() are slower, because each fwrite has to test the validity of its parameters, do the equivalent of flockfile(), possibly fflush(), append the data, return success: this overhead adds up -- not so much as tiny calls to write(2), but it's still noticeable.

Proof:

#include <stdio.h>
#include <stdlib.h>

static void w(const void *buf, size_t nbytes)
{
    size_t n;
    if(!nbytes)
        return;
    n = fwrite(buf, 1, nbytes, stdout);
    if(n >= nbytes)
        return;
    if(!n) {
        perror("stdout");
        exit(111);
    }
    w(buf+n, nbytes-n);
}

/* Usage: time $0 <$bigfile >/dev/null */
int main(int argc, char *argv[])
{
    char buf[32*1024];
    size_t sz;

    sz = atoi(argv[1]);
    if(sz > sizeof(buf))
        return 111;
    if(sz == 0)
        sz = sizeof(buf);
    for(;;) {
        size_t r = fread(buf, 1, sz, stdin);
        if(r < 1)
            break;
        w(buf, r);
    }
    return 0;
}

That being said, you could do what many commenters suggested, ie add your own buffering before fwrite: it's very trivial code, but you should test if it really gives you any benefit.

If you don't want to roll your own, you can use eg the buffer interface in skalibs, but you'll probably take longer to read the docs than to write it yourself (imho).

like image 39
loreb Avatar answered Oct 25 '22 08:10

loreb