Is there a reliable way to determine whether or not two files are the same? For example, two files with the same size and type may or may not be the same binarilly (yeah, I know it's not really a word). I assume that comparing one or two checksums of the files will help, but I wonder:
Any ideas, suggestions or thoughts are appreciated!
P.S. The code for this is being written in Java running on a nix system, but generic or platform agnostic input is most helpful.
Answer: No, Windows 10 does not have a duplicate finder in it yet.
No, Windows does not have a built-in duplicate file finder tool. Therefore, getting a dedicated tool to find and remove duplicate files from your computer is important. Get Duplicate Files Fixer to run a scan on internal and external hard drives, mobile devices, and cloud storage to delete duplicate files.
It's impossible to know with certainty whether or not two files are the same unless you compare them byte for byte. It's similar to how you can't guarantee that a collection does or doesn't contain a given object unless you check every item in the collection.
Checksums are basically a hash. Whether they're good enough for your purposes depends on how mission-critical your app is. It's certainly possible to create a hash function with low risk of collision; after all, passwords are hashed, even in situations where they protect sensitive data and you wouldn't want to have a second valid password on your account. Unless you're writing code for, say, a bank, a strong checksum algorithm should provide a very good approximation.
Using multiple checksums will increase reliability if and only if the different checksum algorithms use dissimilar hash functions.
Your third question has already been taken care of by leonbloy's answer; MD5 and SHA-1 are common.
1) Very reliable
2) Not theoretically
3) SHA-1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With