I have to read a large text file (> 10 GB) in C++. This is a csv file with variable length lines. when I try to read line by line using ifstream it works but takes long time, i guess this is becuase each time I read a line it goes to disk and reads, which makes it very slow.
Is there a way to read in bufferes, for example read 250 MB at one shot (using read method of ifstream) and then get lines from this buffer, i see lot of issues with solution like buffer can have incomplete lines etc..
Is there a solution for this in c++ which handles all these cases etc. Are there any open source libraries that can do this for example boost etc ?
Note: I would want to avoid c stye FILE* pointers etc.
Try using the Windows memory mapped file function. The calls are buffered and you get to treat a file as if its just memory. memory mapped files
IOstreams already use buffers much as you describe (though usually only a few kilobytes, not hundreds of megabytes). You can use pubsetbuf
to get it to use a larger buffer, but I wouldn't expect any huge gains. Most of the overhead in IOstreams stems from other areas (like using virtual functions), not from lack of buffering.
If you're running this on Windows, you might be able to gain a little by writing your own stream buffer, and having it call CreateFile
directly, passing (for example) FILE_FLAG_SEQUENTIAL_SCAN
or FILE_FLAG_NO_BUFFERING
. Under the circumstances, either of these may help your performance substantially.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With