Logo Questions Linux Laravel Mysql Ubuntu Git Menu

java: need to increase performance of checksum calculation

I'm using the following function to calculate checksums on files:

public static void generateChecksums(String strInputFile, String strCSVFile) {
    ArrayList<String[]> outputList = new ArrayList<String[]>();
    try {
        MessageDigest m = MessageDigest.getInstance("MD5");
        File aFile = new File(strInputFile);
        InputStream is = new FileInputStream(aFile);

        System.out.println(Calendar.getInstance().getTime().toString() + 
                    " Processing Checksum: " + strInputFile);

        double dLength = aFile.length();
        try {
            is = new DigestInputStream(is, m);
            // read stream to EOF as normal...
            int nTmp;
            double dCount = 0;
            String returned_content="";
            while ((nTmp = is.read()) != -1) {
                if (dCount % 600000000 == 0) {
                    System.out.println(". ");
                } else if (dCount % 20000000 == 0) {
                    System.out.print(". ");
        } finally {
        byte[] digest = m.digest();
        BigInteger bigInt = new BigInteger(1,digest);
        String hashtext = bigInt.toString(16);
        // Now we need to zero pad it if you actually / want the full 32 chars.
        while(hashtext.length() < 32 ){
            hashtext = "0" + hashtext;
        String[] arrayTmp = new String[2];
        arrayTmp[0] = aFile.getName();
        arrayTmp[1] = hashtext;
        System.out.println("Hash Code: " + hashtext);
        UtilityFunctions.createCSV(outputList, strCSVFile, true);
    } catch (NoSuchAlgorithmException nsae) {
    } catch (FileNotFoundException fnfe) {
    } catch (IOException ioe) {

The problem is that the loop to read in the file is really slow:

while ((nTmp = is.read()) != -1) {
    if (dCount % 600000000 == 0) {
        System.out.println(". ");
    } else if (dCount % 20000000 == 0) {
        System.out.print(". ");

A 3 GB file that takes less than a minute to copy from one location to another, takes over an hour to calculate. Is there something I can do to speed this up or should I try to go in a different direction like using a shell command?

Update: Thanks to ratchet freak's suggestion I changed the code to this which is ridiculously faster (I would guess 2048X faster...):

byte[] buff = new byte[2048];
while ((nTmp = is.read(buff)) != -1) {
    dCount += 2048;
    if (dCount % 614400000 == 0) {
        System.out.println(". ");
    } else if (dCount % 20480000 == 0) {
        System.out.print(". ");
like image 594
opike Avatar asked May 22 '11 23:05


1 Answers

use a buffer

byte[] buff = new byte[2048];
while ((nTmp = is.read(buff)) != -1)
     //this logic won't work anymore though
     if (dCount % 600000000 == 0)
         System.out.println(". ");
     else if (dCount % 20000000 == 0)
         System.out.print(". ");

edit: or if you don't need the values do


nvm apparently the implementers of DigestInputStream were stupid and didn't test everything properly before release

like image 103
ratchet freak Avatar answered Oct 04 '22 14:10

ratchet freak