Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most efficient way to copy a file with GCD?

With Grand Central Dispatch, you can schedule reads and write without needing to worry much about when/how this happens. Compared with my previous NSStream based approach, this requires fewer management on the outside. However, my naive implementation is slower than my NSStream based approach.

For NSStream, I queried the preferred IO size of both the source and destination (NSURLPreferredIOBlockSizeKey). Then I read in whole "preferred input sized chunks" into a buffer, and as soon as I had at least "preferred output size" bytes in the buffer I wrote whole chunks to the destination (except for the last chunk, of course). This should be pretty close to the optimum with respect to read and write performance.

However, with GCD, I don't have much influence on this. Imagine that the source has a preferred IO size of 100kB and that the target has a preferred IO size for the target is 1MB: my naive implementation would now write 10 times as often as it would in my NSStream based solution.

So, what's the most efficient way to solve this with GCD? Simply write to a buffer in the reader block, and as soon as enough data has been gathered, schedule a write block of "preferred output size"? I imagine GCD might offer me a solution here that I'm not aware of yet.

Here's the most important part of my current GCD solution:

// input_ and output_ are of type dispatch_io_t

dispatch_io_read(
    input_,
    0,
    SIZE_MAX,
    dispatch_get_main_queue(),
    ^(bool done, dispatch_data_t data, int error) {
        size_t data_size;

        if (error) {
            NSLog(@"Input: error %d", error);
            [self cancel];
            return;
        }
        if (data) {
            data_size = dispatch_data_get_size(data);
            if (data_size > 0) {
                dispatch_io_write(
                    output_,
                    0,
                    data,
                    dispatch_get_main_queue(), ^(bool done, dispatch_data_t data, int error) {
                        // TODO: I don't know how to get the offset (for progress). So I need to
                        // pass it from the calling block.
                        if (error) {
                            NSLog(@"Output: error %d", error);
                            return;
                        }
                        if (done) {
                            bytesWritten_ += data_size;
                            // Update progress report here.
                        }
                    }
                );
            }
        }
    }
);
like image 979
DarkDust Avatar asked Mar 09 '12 11:03

DarkDust


1 Answers

Although it should not be necessary in most situations, you can influence the IO size used by GCD with the dispatch_io_set_high_water(3) and dispatch_io_set_low_water(3) APIs.

GCD will not read or write chunks larger than a channel's high water mark. Read/write handlers will also never be called with a data object that is smaller than the low water mark.

E.g. by setting the low water mark of input_ in your example to 1MB you can ensure that your current read callback does not pass data objects smaller than 1MB to dispatch_io_write(3).

If this control doesn't suffice in your situation, you could also combine multiple data objects received from successive invocations of your read handler via dispatch_data_create_concat(3) until they reach a size big enough to pass to dispatch_io_write(3).

Hopefully this should not be necessary however, setting the source side's low water mark to a multiple of the preferred source chunk size big enough to reach the preferred destination chunk size and setting the destination channel's high water mark to the preferred destination chunk size (or a multiple thereof) should give you the same performance as your current NSStream-based solution.

You can check out the specifics of the GCD IO buffer policy in the implementation.

In any case, please make sure to file a bug with the specifics of any case where you see performance issues with the default GCD IO buffering.

like image 91
das Avatar answered Nov 06 '22 10:11

das