Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ / Fast random access skipping in a large file

I have large files, containing a small number of large datasets. Each dataset contains a name and the dataset size in bytes, allowing to skip it and go to the next dataset.

I want to build an index of dataset names very quickly. An example of file is about 21MB large, and contains 88 datasets. Reading the 88 names quickly by using a std::ifstream and seekg() to skip between datasets takes about 1300ms, which I would like to reduce.

So in fact, I'm reading 88 chunks of about 30 bytes, at given positions in a 21MB file, and it takes 1300ms.

Is there a way to improve this, or is it an OS and filesystem limitation? I'm running the test under Windows 7 64bit.

I know that having a complete index at the beginning of the file would be better, but the file format does not have this, and we can't change it.

like image 717
galinette Avatar asked Dec 21 '16 22:12

galinette


2 Answers

You could use a memory mapped file interface (I recommend boost's implementation.)

This will open the file into the virtual page for your application for quicker lookup time, without going back to the disk.

like image 70
mascoj Avatar answered Nov 18 '22 17:11

mascoj


You could scan the file and make your own header with the key and the index in a seperate file. Depending on your use case you can do it once at program start and everytime the file changes. Before accessing the big data, a lookup in the smaller file gives you the needed index.

like image 2
norca Avatar answered Nov 18 '22 18:11

norca