Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to parse a large floating point file stored in ASCII?

What is the best way to parse a large floating point file stored in ASCII?

What would be the fastest way to do it? I remember someone telling me using ifstream was bad, because it worked on a small number of bytes, and it would be better to just read the file into memory first. Is that true?

Edit: I am running on Windows, and the file format is for a point cloud that is stored in rows like x y z r g b. I am attempting to read them into arrays. Also, the files are around 20 MB each, but I have around 10 GB worth of them.

Second edit: I am going to have to load the files to display every time I want to do a visualization, so it would be nice to have it as fast as possible, but honestly, if ifstream preforms reasonably, I wouldn't mind sticking with readable code. It's running quite slow right now, but that might be more of a hardware I/O limitation than anything I can do in software, I just wanted to confirm.

like image 740
Xzhsh Avatar asked Aug 03 '10 21:08

Xzhsh


2 Answers

I think your first concern should be how large the floating point numbers are. Are they float or can there be double data too? The traditional (C) way would be to use fscanf with the format specifier for a float and afaik it is rather fast. The iostreams do add a small overhead in terms of parsing the data, but that is rather negligible. For the sake of brevity I would suggest you use iostreams (not to mention the usual stream features that you'd get with it).

Also, I think it will really help the community if you could add the relevant numbers along with your question, like for e.g., how large a file are you trying to parse ? Is this a small memory footprint environment (like an embedded system).

like image 101
Gangadhar Avatar answered Sep 21 '22 13:09

Gangadhar


It's all based on the operating system, and the choice of C and C++ standard libraries.

The days of slow ifstream are pretty much over, however, there is likely some overhead in handling C++ generic interfaces.

atof/strtod might be the fastest way to deal with it if the string is already in the memory.

Finally, any attempt you'd do at getting the file read into memory will likely be futile. Modern operating systems usually get in the way (especially if the file is larger than RAM you will end up swapping code since the system will treat your (already stored on disk) data as swappable).

If you really need to be ridiculously fast (The only places I can think it will be useful are HPC and Map/Reduce based approaches) - try mmap (Linux/Unix) or MapViewOfFile to get the file prefetched into virtual memory in the most sensible approach, and then atof + custom string handling.

If the file is really well organized for this kind of game, you can even be quirky with mmaps and pointers and have the conversion multithreaded. Sounds like a fun excercise if you have over 10GB of floats to convert on a regular basis.

like image 36
qdot Avatar answered Sep 25 '22 13:09

qdot