I have a few (38000) picture/video files in a folder. Approximately 40% of these are duplicates which I'm trying to get rid of. My question is, how can I tell if 2 files are identical? So far I tried to use a SHA1 of the files but it turns out that many duplicates files had different hashes. This is the code I was using:
public static String getHash(File doc) {
MessageDigest md = null;
try {
md = MessageDigest.getInstance("SHA1");
FileInputStream inStream = new FileInputStream(doc);
DigestInputStream dis = new DigestInputStream(inStream, md);
BufferedInputStream bis = new BufferedInputStream(dis);
while (true) {
int b = bis.read();
if (b == -1)
break;
}
inStream.close();
dis.close();
bis.close();
} catch (NoSuchAlgorithmException | IOException e) {
e.printStackTrace();
}
BigInteger bi = new BigInteger(md.digest());
return bi.toString(16);
}
Can I modify this in any way? Or will I have to use a different method?
Perceptual Hash It's fast to compute and lookup is as fast as with a file hash. The possibility to calculate a distance between two perceptual hashes allows to detect not only identical images, but also close matches with tiny changes.
Does Windows 10 have a built-in duplicate file finder app? No, Windows 10 doesn't have a built-in file finder. But, you can do this manually through the Windows photos app. You can also download duplicate file remover and run it.
AllDup for Windows The “Comparison Method” section also lets you decide whether AllDup will look for duplicates within the same folder (for when you've taken a lot similar photos at once) or only between different folders (for photos you might've stashed in multiple places).
As outlined above duplicate detection can be based on a hash. However, if you want to have near duplicate detection, which means that you are searching for images that basically show the same things, but have been scaled, rotated, etc. you might need a content based image retrieval approach. There's LIRE (https://code.google.com/p/lire/), a Java library for that, and you'll find the "SimpleApplication" in the Download section. What you then can do is to
Students of mine did it, it worked well, but I don't have the source code at hand. But rest assured, it's just a few lines and the simple application will get you started.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With