Decreasing performance writing large binary file

Question

In one of our softwares we are creating records and storing them in a binary file. Once the writing operation is completed we read back this binary file. The issue is if this binary file is less than 100 MB then its performance is good enough, but once this file grows larger its performance is hit.

So, I thought of splitting this large binary file ( > 100 MB) into smaller ones ( < 100 MB). But it seems this solution is not gaining the performance. So, I was just thinking what can be the better approach to handle this scenario?

It will be really great help from you guys to comment on this.

Thanks

JRL · Accepted Answer

Maybe you could try using an Sqlite database instead.

Matthieu M. · Answer

It is always quite the difficult to provide accurate answers with only a glimpse of the system, but have you actually tried to check the actual throughput ?

As a first solution, I would simply recommend using a dedicated disk (so there are no concurrent read/write actions from other processes), and a fast one at that. This way it would be just some cost of hardware upgrade, and we all know hardware is usually cheaper that software ;) You may even go to a RAID controller for maximizing throughput.

If you are still limited by the disk throughput, there are new technologies out there using the Flash technology: USB keys (though it may not seem very professional) or the "new" Solid State Drives may provide more throughput than a mechanical disk.

Now, if the disks approach are not fast enough or you can't get your hands on good SSDs, you have other solutions, but they involve software changes, and I propose them off the top of my hat.

A socket approach: the second utility is listening on a port and you send it the data there. On a local machine it's relatively fast, and you parallelize the work too, so even if the size of the data grows, you will still begin treating fairly quickly.
A memory mapping approach: write to a dedicated area in live memory and have the utility read from that area (Boost.Interprocess may help, there are other solutions).

Note that if the read is sequential, I find it more "natural" to try a 'pipe' approach (ala Unix) so that the two processes execute concurrently. In a traditional pipe, the data may not hit the disk after all.

A shame, isn't it, that in this age of overwhelming processing power, we are still struggling with our disk IO ?

Decreasing performance writing large binary file

Tags:

c++

Manish

2 Answers

JRL

Matthieu M.

Recent Activity

Donate For Us

Decreasing performance writing large binary file

Tags:

c++

Manish

2 Answers

JRL

Matthieu M.

Related questions

Recent Activity

Donate For Us