I was working with quality yesterday doing some formal testing. In their procedure they were verifying all files on the test machine were pulled from the release. The way they were verifying these files were the same was by checking the size and the date/time stamp windows put on them in Windows Explorer. These happened to be off for another reason which I was able to find out why.
Is this a valid way to verify a file is the same? I didn't think so and started to argue, but I am younger here so thought I shouldn't push it too far. I wanted to argue they should do a binary compare on the file to verify its contents are exact. In my experience time/date stamps and size attributes don't always act as expected. Any thoughts???
The only 100% way to figure out if two files are equal is to do a binary comparison of the two.
If you can live with the risk of false positives (ie. two files which aren't 100% identical but your code says they are), then the digest and checksum algorithms can be used to lessen the work, particularly if the files lives on two different machines with less than optimal bandwidth so that a binary comparison is infeasible.
The digest and checksum algorithms all have chances of false positives, but the exact chance varies with the algorithm. General rule is that the more crypto-made it is, and the more bits it outputs, the less chance of a false positive.
Even the CRC-32 algorithm is fairly good to use and it should be easy to find code examples on the internet that implements it.
If you only do a size/timestamp comparison then I'm sorry to say that this is easy to circumvent and won't actually give you much of a certainty that the files are the same or different.
It depends though, if you know that in your world, timestamps are kept, and only changed when the file is modified, then you can use it, otherwise it holds no guarantee.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With