My question is concerning an idea I had, where I could check if an image has already been uploaded by comparing their base64-encoded strings...
Example use-case would be to find duplicates in your database...
The operation would be pretty big i guess - first converting the image to base64and then using something like "strcmp()" to compare..
Not sure if this would make a lot of sense but what do you think of the idea?
Would it be too big of an operation? How accurate would it be? Does the idea make any sense?
Here's a function that can help you compare files faster.
Aside from checking an obvious thing like file size, you can play more with comparing binary chunks.
For example, check the last n bytes as well as a chunk of a random offset.
I used checksum comparison as a last resort.
When optimizing check order, you can also take into account if you are generally expecting files to be different or not.
function areEqual($firstPath, $secondPath, $chunkSize = 500){
// First check if file are not the same size as the fastest method
if(filesize($firstPath) !== filesize($secondPath)){
return false;
}
// Compare the first ${chunkSize} bytes
// This is fast and binary files will most likely be different
$fp1 = fopen($firstPath, 'r');
$fp2 = fopen($secondPath, 'r');
$chunksAreEqual = fread($fp1, $chunkSize) == fread($fp2, $chunkSize);
fclose($fp1);
fclose($fp2);
if(!$chunksAreEqual){
return false;
}
// Compare hashes
// SHA1 calculates a bit faster than MD5
$firstChecksum = sha1_file($firstPath);
$secondChecksum = sha1_file($secondPath);
if($firstChecksum != $secondChecksum){
return false;
}
return true;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With