Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ ifstream::read slow due to memcpy

Recently I decided to optimize some file reading I was doing, because as everyone says, reading a large chunk of data to a buffer and then working with it is faster than using lots of small reads. And my code certainly is much faster now, but after doing some profiling it appears memcpy is taking up a lot of time.

The gist of my code is...

ifstream file("some huge file");
char buffer[0x1000000];
for (yada yada) {
    int size = some arbitrary size usually around a megabyte;
    file.read(buffer, size);
    //Do stuff with buffer
}

I'm using Visual Studio 11 and after profiling my code it says ifstream::read() eventually calls xsgetn() which copies from the internal buffer to my buffer. This operation takes up over 80% of the time! In second place comes uflow() which takes up 10% of the time.

Is there any way I can get around this copying? Can I somehow tell the ifstream to buffer the size I need directly into my buffer? Does the C-style FILE* also use such an internal buffer?

UPDATE: Due to people telling me to use cstdio... I have done a benchmark.

EDIT: Unfortunately the old code was full of fail (it wasn't even reading the entire file!). You can see it here: http://pastebin.com/4dGEQ6S7

Here's my new benchmark:

const int MAX = 0x10000;
char buf[MAX];
string fpath = "largefile";
int main() {
    {
        clock_t start = clock();
        ifstream file(fpath, ios::binary);
        while (!file.eof()) {
            file.read(buf, MAX);
        }
        clock_t end = clock();
        cout << end-start << endl;
    }
    {
        clock_t start = clock();
        FILE* file = fopen(fpath.c_str(), "rb");
        setvbuf(file, NULL, _IOFBF, 1024);
        while (!feof(file)) {
            fread(buf, 0x1, MAX, file);
        }
        fclose(file);
        clock_t end = clock();
        cout << end-start << endl;
    }
    {
        clock_t start = clock();
        HANDLE file = CreateFile(fpath.c_str(), GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_ALWAYS, NULL, NULL);
        while (true) {
            DWORD used;
            ReadFile(file, buf, MAX, &used, NULL);
            if (used < MAX) break;
        }
        CloseHandle(file);
        clock_t end = clock();
        cout << end-start << endl;
    }
    system("PAUSE");
}

Times are:
185
80
78

Well... looks like using the C-style fread is faster than ifstream::read. As well, using the windows ReadFile gives only a slight advantage which is negligible (I looked at the code and fread basically is a wrapper around ReadFile). Looks like I'll be switching to fread after all.

Man it is confusing to write a benchmark which actually tests this stuff correctly.

CONCLUSION: Using <cstdio> is faster than <fstream>. The reason fstream is slower is because c++ streams have their own internal buffer. This results in extra copying whenever you read/write and this copying accounts for the entire extra time taken by fstream. Even more shocking is that the extra time taken is longer than the time taken to actually read the file.

like image 794
retep998 Avatar asked Apr 25 '12 22:04

retep998


1 Answers

Can I somehow tell the ifstream to buffer the size I need directly into my buffer?

Yes, this is what pubsetbuf() is for.

But if you're that concerned with copying whlie reading a file, consider memory mapping as well, boost has a portable implementation.

like image 146
Cubbi Avatar answered Oct 05 '22 08:10

Cubbi