Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a MD5 library that doesn't require the whole input at the same time?

Tags:

c

objective-c

md5

I'm working on Objective C Cocoa application. I tested CC_MD5 in CommonCrypto, and it worked just fine; however, when I gave 5 gygabyte file to it, my whole computer froze and crashed. MD5 algorithm processes input as 512-byte chunks and doesn't really require all the input at once. Is there an library in Objective C or C that asks for next 512-byte chunk instead of taking all input at once?

like image 253
Max Yankov Avatar asked Jun 11 '12 22:06

Max Yankov


3 Answers

There is a great thread on calculating MD5 of large files in obj-C here: http://www.iphonedevsdk.com/forum/iphone-sdk-development/17659-calculating-md5-hash-large-file.html

Here is the solution someone came up with there:

+(NSString*)fileMD5:(NSString*)path
{
    NSFileHandle *handle = [NSFileHandle fileHandleForReadingAtPath:path];
    if( handle== nil ) return @"ERROR GETTING FILE MD5"; // file didnt exist

    CC_MD5_CTX md5;

    CC_MD5_Init(&md5);

    BOOL done = NO;
    while(!done)
    {
        NSAutoreleasePool * pool = [NSAutoreleasePool new];
        NSData* fileData = [handle readDataOfLength: CHUNK_SIZE ];
        CC_MD5_Update(&md5, [fileData bytes], [fileData length]);
        if( [fileData length] == 0 ) done = YES;
                [pool drain];
    }
    unsigned char digest[CC_MD5_DIGEST_LENGTH];
    CC_MD5_Final(digest, &md5);
    NSString* s = [NSString stringWithFormat: @"%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x",
                   digest[0], digest[1], 
                   digest[2], digest[3],
                   digest[4], digest[5],
                   digest[6], digest[7],
                   digest[8], digest[9],
                   digest[10], digest[11],
                   digest[12], digest[13],
                   digest[14], digest[15]];
    return s;
}
like image 74
dodexahedron Avatar answered Sep 30 '22 19:09

dodexahedron


CC_MD5() is designed to process all its input at once. 5GB is likely more than it can actually store anywhere. For larger data, CommonCrypto can operate on chunks of it at a time, if you use CC_MD5_CTX, CC_MD5_Init(), CC_MD5_Update(), and CC_MD5_Final(). Check the CommonCrypto documentation or Google for more info and example code.

like image 33
Jonathan Grynspan Avatar answered Sep 30 '22 19:09

Jonathan Grynspan


Here is a better way to do it using dispatch apis, for more efficiency. I am using it in production and it's working fine!

    #import "CalculateMD5.h"

// Cryptography
#include <CommonCrypto/CommonDigest.h>

@implementation CalculateMD5

- (id)init
{
    self = [super init];
    if (self)
    {
        MD5ChecksumOperationQueue = dispatch_queue_create("com.test.calculateMD5Checksum", DISPATCH_QUEUE_SERIAL);
    }
    return self;
}

- (void)closeReadChannel
{
    dispatch_async(MD5ChecksumOperationQueue, ^{
        dispatch_io_close(readChannel, DISPATCH_IO_STOP);
    });
}

- (void)MD5Checksum:(NSString *)pathToFile TCB:(void(^)(NSString *md5, NSError *error))tcb
{
    // Initialize the hash object
    __block CC_MD5_CTX hashObject;
    CC_MD5_Init(&hashObject);

    readChannel = dispatch_io_create_with_path(DISPATCH_IO_STREAM,
                                               pathToFile.UTF8String,
                                               O_RDONLY, 0,
                                               MD5ChecksumOperationQueue,
                                               ^(int error) {
                                                   [self closeReadChannel];
                                               });

    if (readChannel == nil)
    {
        NSError* e = [NSError errorWithDomain:@"MD5Error"
                                         code:-999 userInfo:@{
                   NSLocalizedDescriptionKey : @"failed to open file for calculating MD5."
                      }];
        tcb(nil, e);
        return;
    }

    dispatch_io_set_high_water(readChannel, 512*1024);

    dispatch_io_read(readChannel, 0, SIZE_MAX, MD5ChecksumOperationQueue, ^(bool done, dispatch_data_t data, int error) {
        if (error != 0)
        {
            NSError* e = [NSError errorWithDomain:@"ExamSoftMD5"
                                             code:error userInfo:@{
                       NSLocalizedDescriptionKey : @"failed to read from file for calculating MD5."
                          }];
            tcb(nil, e);
            [self closeReadChannel];
            return;
        }

        if (dispatch_data_get_size(data) > 0)
        {
            const void *buffer = NULL;
            size_t size = 0;
            data = dispatch_data_create_map(data, &buffer, &size);

            CC_MD5_Update(&hashObject, (const void *)buffer, (CC_LONG)size);
        }

        if (done == YES)
        {
            // Compute the hash digest
            unsigned char digest[CC_MD5_DIGEST_LENGTH];
            CC_MD5_Final(digest, &hashObject);

            // Compute the string result
            char *hash = calloc((2 * sizeof(digest) + 1), sizeof(char));
            for (size_t i = 0; i < sizeof(digest); ++i)
            {
                snprintf(hash + (2 * i), 3, "%02x", (int)(digest[i]));
            }

            tcb(@(hash), nil);

            [self closeReadChannel];
        }
    });
}


@end
like image 44
Prashant Rane Avatar answered Sep 30 '22 18:09

Prashant Rane