I'm trying to keep track of a set of files, which may have the same name and metadata. I'd like to use a hash to differentiate and use it as a unique ID, but I'm not sure which one to use? The files are relatively small (in the 100 kb range) and I'd like to be able to hash that in less than 10 seconds. Which hash (that comes built in in Java 1.5) would best suite my needs?
Note that a hash of this sort will never be unique though, with the use off an effective one you stand a very good chance of never having a collision.
If you are not concerned with security (i.e. someone deliberately trying to break your hashing) then simply using the MD5 hash will give you an excellent hash with minimal effort.
It is likely that you could do an SHA hash of 100Kb in well less than 10 second though and, though SHA-1 is still theoretically flawed it is of higher strength than MD5.
MessageDigest will get you an implementation of either.
Here are some examples of using it with streams.
Also I should note that this excellent answer from jarnbjo would indicate that even the supplied SHA hashing in Java are well capable of exceeding 20MB/s even on relatively modest x86 hardware. This would imply 5-10 millisecond level performance on 100KB of (in memory) input data so your target of under 10seconds is a massive overestimate of the effort involved. It is likely you will be entirely limited by the rate you can read the files from disk rather than any hashing algorithm you use.
If you have a need for strong crypto hashing you should indicate this in the question. Even then SHA of some flavour above 1 is still likely to be your best bet unless you wish to use an external library like Bouncy Castle since you should never try to roll your own crypto if a well established implementation exists.
For some reasonably efficient sample code I suggest this how to The salient points of which can be distilled into the following (tune the buffer size as you see fit):
import java.io.*;
import java.security.MessageDigest;
public class Checksum
{
const string Algorithm = "SHA-1"; // or MD5 etc.
public static byte[] createChecksum(String filename) throws
Exception
{
InputStream fis = new FileInputStream(filename);
try
{
byte[] buffer = new byte[1024];
MessageDigest complete = MessageDigest.getInstance("MD5");
int numRead;
do
{
numRead = fis.read(buffer);
if (numRead > 0)
{
complete.update(buffer, 0, numRead);
}
} while (numRead != -1);
return complete.digest();
}
finally
{
fis.close();
}
}
}
you could use MessageDigest with SHA1:
MessageDigest messageDigest = MessageDigest.getInstance("SHA1");
InputStream is = new FileInputStream(aFile);
int res;
while ((res = inputStream.read()) != -1) {
digester.update((byte) res);
}
byte[] digest = messageDigest.digest();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With