Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read huge file in c++

Tags:

c++

file

ifstream

If I have a huge file (eg. 1TB, or any size that does not fit into RAM. The file is stored on the disk). It is delimited by space. And my RAM is only 8GB. Can I read that file in ifstream? If not, how to read a block of file (eg. 4GB)?

like image 542
ZigZagZebra Avatar asked Jan 12 '16 19:01

ZigZagZebra


People also ask

How can I read a big file?

To be able to open such large CSV files, you need to download and use a third-party application. If all you want is to view such files, then Large Text File Viewer is the best choice for you. For actually editing them, you can try a feature-rich text editor like Emacs, or go for a premium tool like CSV Explorer.

What is the fastest way to read a file in C?

read() posted the fastest time overall, fread() posted really consistent times. This may have been to some small hiccup during the testing, however. All told, the three methods were just about equal. (Especially fread and read ...)


2 Answers

There are a couple of things that you can do.

First, there's no problem opening a file that is larger than the amount of RAM that you have. What you won't be able to do is copy the whole file live into your memory. The best thing would be for you to find a way to read just a few chunks at a time and process them. You can use ifstream for that purpose (with ifstream.read, for instance). Allocate, say, one megabyte of memory, read the first megabyte of that file into it, rinse and repeat:

ifstream bigFile("mybigfile.dat");
constexpr size_t bufferSize = 1024 * 1024;
unique_ptr<char[]> buffer(new char[bufferSize]);
while (bigFile)
{
    bigFile.read(buffer.get(), bufferSize);
    // process data in buffer
}

Another solution is to map the file to memory. Most operating systems will allow you to map a file to memory even if it is larger than the physical amount of memory that you have. This works because the operating system knows that each memory page associated with the file can be mapped and unmapped on-demand: when your program needs a specific page, the OS will read it from the file into your process's memory and swap out a page that hasn't been used in a while.

However, this can only work if the file is smaller than the maximum amount of memory that your process can theoretically use. This isn't an issue with a 1TB file in a 64-bit process, but it wouldn't work in a 32-bit process.

Also be aware of the spirits that you're summoning. Memory-mapping a file is not the same thing as reading from it. If the file is suddenly truncated from another program, your program is likely to crash. If you modify the data, it's possible that you will run out of memory if you can't save back to the disk. Also, your operating system's algorithm for paging in and out memory may not behave in a way that advantages you significantly. Because of these uncertainties, I would consider mapping the file only if reading it in chunks using the first solution cannot work.

On Linux/OS X, you would use mmap for it. On Windows, you would open a file and then use CreateFileMapping then MapViewOfFile.

like image 144
zneak Avatar answered Sep 28 '22 03:09

zneak


I am sure you don't have to keep all the file in memory. Typically one wants to read and process file by chunks. If you want to use ifstream, you can do something like that:

ifstream is("/path/to/file");
char buf[4096];
do {
    is.read(buf, sizeof(buf));
    process_chunk(buf, is.gcount());
} while(is);
like image 41
Oleg Andriyanov Avatar answered Sep 28 '22 02:09

Oleg Andriyanov