Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance of copying a file with fread/fwrite to USB

Tags:

c++

c

I'm in front of a piece of code, which copies a file to a usb-device. Following part is the important one:

while((bytesRead = fread(buf, 1, 16*1024, m_hSource)) && !bAbort) {
    // write to target
    long bytesWritten = fwrite(buf, 1, bytesRead, m_hTarget);

    m_lBytesCopied += bytesWritten;

The thing, the customer said, it's pretty slow in comparison to normal pc<->usb speed. I didn't code this, so it's my job, to optimize.

So I was wondering, if it's a better approach to first read the complete file and then write the file in one step. But I don't know how error-prone this would be. The code also check after each copystep if all bytes where written correctly, so that might also slow down the process.

I'm not that c++ & hardware guru, so I'm asking you guys, how I could speed things up and keep the copying successful.

like image 547
Johannes Klauß Avatar asked Jan 16 '12 11:01

Johannes Klauß


2 Answers

  1. Try to read/write in big chunk. 16M, 32M are not bad for copying file.
  2. If you just want to copy the file you can always invoke system() It'll be faster.
  3. The code also check after each copystep if all bytes where written correctly, so that might also slow down the process.

    You can check it by creating hash of bigger chunk. Like splitting the file into 64M chunks. Then match hashes of those chunks. Bittorrent protocol has this feature.

  4. If you have mmap or MapViewOfFile available, map the file first. Then write it to usb. This way read operation will be handled by kernel.

  5. Kerrek just commented about using memcpy on mmap. memcpy with 2 mmaped file seems great.

Also note that, Most recent operating systems writes to USB stick when they are being removed. Before removal it just writes the data in a cache. So copy from OS may appear faster.

like image 198
Shiplu Mokaddim Avatar answered Oct 25 '22 14:10

Shiplu Mokaddim


What about overlapping reads and writes?

In the current code, the total time is time(read original) + time(write copy), if you read the first block, then while writing it start reading the second block, etc. your total time would be max(time(read original), time(write copy)) (plus the time reading/writing the first and last blocks that won't be pipelined).

It could be almost half the time if reading and writing takes more or less the same time.

You can do it with two threads or with asynchronous IO. Unfortunately, threads and async IO are platform dependent, so you'll have to check your system manual or choose appropriate portable libraries.

like image 1
fortran Avatar answered Oct 25 '22 12:10

fortran