Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast read of certain bytes of multiple files in C/C++

I've been searching in the web about this question and although there are many similar questions about read/write in C/C++, I haven't found about this specific task.

I want to be able to read from multiple files (256x256 files) only sizeof(double) bytes located in a certain position of each file. Right now my solution is, for each file:

  1. Open the file (read, binary mode):

    fstream fTest("current_file", ios_base::out | ios_base::binary);

  2. Seek the position I want to read:

    fTest.seekg(position*sizeof(test_value), ios_base::beg);

  3. Read the bytes:

    fTest.read((char *) &(output[i][j]), sizeof(test_value));

  4. And close the file:

    fTest.close();

This takes about 350 ms to run inside a for{ for {} } structure with 256x256 iterations (one for each file).


Q: Do you think there is a better way to implement this operation? How would you do it?

like image 419
Alejandro Cámara Avatar asked May 20 '10 14:05

Alejandro Cámara


1 Answers

If possible, I suggest reorganizing the data. For example, put all those doubles into one file instead of spreading them across multiple files.

If you need to run the program multiple times and the data doesn't change, you may want to create a tool that will optimize the data first.

The performance issue with the files is the overhead of:

  1. {overhead}Ramping up the hard drive.
  2. {overhead}Locating the file.
  3. Positioning within the file.
  4. Reading data.
  5. {Closing a file adds very little to the performance.}

In most file based systems, that use a lot of data, reading data is optimized to have a longer duration than any overhead. The requests would be cached and sorted for optimal disk access. Unfortunately, in your case, you are not reading enough data so that the overhead is now longer duration than the reading.

I suggest trying to queue the reading operation of the data. Take 4 threads, each opens a file and reads the doubles, then places them into a buffer. The idea here is stagger the operations.

  • Thread 1 opens a file.
  • Thread 2 opens a file while thread 1 is positioning.
  • Thread 3 opens a file while thread 2 is positioning and thread 1 is reading the data.
  • Thread 4 opens a file, thread 3 positions, thread 2 reads, thread 1 closes.

Hopefully, these threads can keep the hard drive busy enough to not slow down; continuous activity. You may be able to try this in a single thread first. If you need better performance, you may want to consider sending commands directly to disk drive (order them first).

like image 78
Thomas Matthews Avatar answered Sep 28 '22 19:09

Thomas Matthews