Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading large txt efficiently in c++

I have to read a large text file (> 10 GB) in C++. This is a csv file with variable length lines. when I try to read line by line using ifstream it works but takes long time, i guess this is becuase each time I read a line it goes to disk and reads, which makes it very slow.

Is there a way to read in bufferes, for example read 250 MB at one shot (using read method of ifstream) and then get lines from this buffer, i see lot of issues with solution like buffer can have incomplete lines etc..

Is there a solution for this in c++ which handles all these cases etc. Are there any open source libraries that can do this for example boost etc ?

Note: I would want to avoid c stye FILE* pointers etc.

like image 914
user424060 Avatar asked Feb 01 '11 06:02

user424060


2 Answers

Try using the Windows memory mapped file function. The calls are buffered and you get to treat a file as if its just memory. memory mapped files

like image 149
Gregor Brandt Avatar answered Oct 13 '22 20:10

Gregor Brandt


IOstreams already use buffers much as you describe (though usually only a few kilobytes, not hundreds of megabytes). You can use pubsetbuf to get it to use a larger buffer, but I wouldn't expect any huge gains. Most of the overhead in IOstreams stems from other areas (like using virtual functions), not from lack of buffering.

If you're running this on Windows, you might be able to gain a little by writing your own stream buffer, and having it call CreateFile directly, passing (for example) FILE_FLAG_SEQUENTIAL_SCAN or FILE_FLAG_NO_BUFFERING. Under the circumstances, either of these may help your performance substantially.

like image 45
Jerry Coffin Avatar answered Oct 13 '22 21:10

Jerry Coffin