Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Decreasing performance writing large binary file

Tags:

c++

In one of our softwares we are creating records and storing them in a binary file. Once the writing operation is completed we read back this binary file. The issue is if this binary file is less than 100 MB then its performance is good enough, but once this file grows larger its performance is hit.

So, I thought of splitting this large binary file ( > 100 MB) into smaller ones ( < 100 MB). But it seems this solution is not gaining the performance. So, I was just thinking what can be the better approach to handle this scenario?

It will be really great help from you guys to comment on this.

Thanks

like image 570
Manish Avatar asked Jan 19 '10 06:01

Manish


2 Answers

Maybe you could try using an Sqlite database instead.

like image 102
JRL Avatar answered Nov 16 '22 06:11

JRL


It is always quite the difficult to provide accurate answers with only a glimpse of the system, but have you actually tried to check the actual throughput ?

As a first solution, I would simply recommend using a dedicated disk (so there are no concurrent read/write actions from other processes), and a fast one at that. This way it would be just some cost of hardware upgrade, and we all know hardware is usually cheaper that software ;) You may even go to a RAID controller for maximizing throughput.

If you are still limited by the disk throughput, there are new technologies out there using the Flash technology: USB keys (though it may not seem very professional) or the "new" Solid State Drives may provide more throughput than a mechanical disk.

Now, if the disks approach are not fast enough or you can't get your hands on good SSDs, you have other solutions, but they involve software changes, and I propose them off the top of my hat.

  • A socket approach: the second utility is listening on a port and you send it the data there. On a local machine it's relatively fast, and you parallelize the work too, so even if the size of the data grows, you will still begin treating fairly quickly.
  • A memory mapping approach: write to a dedicated area in live memory and have the utility read from that area (Boost.Interprocess may help, there are other solutions).

Note that if the read is sequential, I find it more "natural" to try a 'pipe' approach (ala Unix) so that the two processes execute concurrently. In a traditional pipe, the data may not hit the disk after all.

A shame, isn't it, that in this age of overwhelming processing power, we are still struggling with our disk IO ?

like image 45
Matthieu M. Avatar answered Nov 16 '22 07:11

Matthieu M.