C++ / Fast random access skipping in a large file

Question

I have large files, containing a small number of large datasets. Each dataset contains a name and the dataset size in bytes, allowing to skip it and go to the next dataset.

I want to build an index of dataset names very quickly. An example of file is about 21MB large, and contains 88 datasets. Reading the 88 names quickly by using a std::ifstream and seekg() to skip between datasets takes about 1300ms, which I would like to reduce.

So in fact, I'm reading 88 chunks of about 30 bytes, at given positions in a 21MB file, and it takes 1300ms.

Is there a way to improve this, or is it an OS and filesystem limitation? I'm running the test under Windows 7 64bit.

I know that having a complete index at the beginning of the file would be better, but the file format does not have this, and we can't change it.

mascoj · Accepted Answer

You could use a memory mapped file interface (I recommend boost's implementation.)

This will open the file into the virtual page for your application for quicker lookup time, without going back to the disk.

norca · Answer

You could scan the file and make your own header with the key and the index in a seperate file. Depending on your use case you can do it once at program start and everytime the file changes. Before accessing the big data, a lookup in the smaller file gives you the needed index.

C++ / Fast random access skipping in a large file

Tags:

c++

file

random-access

galinette

2 Answers

mascoj

norca

Recent Activity

Donate For Us

C++ / Fast random access skipping in a large file

Tags:

c++

file

random-access

galinette

2 Answers

mascoj

norca

Related questions

Recent Activity

Donate For Us