Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Very slow to generate MD5 for large file using Java

I am using Java to generate the MD5 hash for some files. I need to generate one MD5 for several files with a total size of about 1 gigabyte. Here's my code:

private String generateMD5(SequenceInputStream inputStream){
    if(inputStream==null){
        return null;
    }
    MessageDigest md;
    try {
        int read =0;
        byte[] buf = new byte[2048];
        md = MessageDigest.getInstance("MD5");
        while((read = inputStream.read(buf))>0){
            md.update(buf,0,read);
        }
        byte[] hashValue = md.digest();
        return new String(hashValue);
    } catch (NoSuchAlgorithmException e) {
        return null;
    } catch (IOException e) {
        return null;
    }finally{
        try {
            if(inputStream!=null)inputStream.close();
        } catch (IOException e) {
            // ...
        }
    } 

}

This seems to run forever. How can I make it more efficient?

like image 589
lucky_start_izumi Avatar asked Feb 17 '12 01:02

lucky_start_izumi


People also ask

How long does it take to run md5sum?

It generally takes 3-4 hours to transfer via NC and then 40 minutes to get the md5sum. The security of the hash is not an issue in this case.

Is MD5 checksum reliable?

Although originally designed as a cryptographic message authentication code algorithm for use on the internet, MD5 hashing is no longer considered reliable for use as a cryptographic checksum because security experts have demonstrated techniques capable of easily producing MD5 collisions on commercial off-the-shelf ...


2 Answers

You may want to use the Fast MD5 library. It's much faster than Java's built-in MD5 provider and getting a hash is as simple as:

String hash = MD5.asHex(MD5.getHash(new File(filename)));

Be aware that the slow speed may also be due to slow File I/O.

like image 164
Sticky Avatar answered Oct 05 '22 22:10

Sticky


I rewrite your code with nio, the code is somewhat like below:

private static String generateMD5(FileInputStream inputStream){
    if(inputStream==null){

        return null;
    }
    MessageDigest md;
    try {
        md = MessageDigest.getInstance("MD5");
        FileChannel channel = inputStream.getChannel();
        ByteBuffer buff = ByteBuffer.allocate(2048);
        while(channel.read(buff) != -1)
        {
            buff.flip();
            md.update(buff);
            buff.clear();
        }
        byte[] hashValue = md.digest();
        return new String(hashValue);
    }
    catch (NoSuchAlgorithmException e)
    {
        return null;
    } 
    catch (IOException e) 
    {
        return null;
    }
    finally
    {
        try {
            if(inputStream!=null)inputStream.close();
        } catch (IOException e) {

        }
    } 
}

On my machine, it takes about 30s to generate md5 code for a large file, and of course i test your code as well, the result indicates that nio doesn't improve the performance of the program.

Then, i try to get the time for io and md5 respectively, the statistics indicates that the slow file io is the bottleneck because about 5/6 of time is taken for io.

By using the Fast MD5 library mentioned by @Sticky, it takes only 15s to generate md5 code, the improvement is remarkable.

like image 36
Alanmars Avatar answered Oct 05 '22 23:10

Alanmars