Is there a way to recursively find duplicate files in an Amazon S3 bucket? In a normal file system, I would simply use:
fdupes -r /my/directory
There is no "find duplicates" command in Amazon S3.
However, you do do the following:
ETag
(checksum) and Size
They would (extremely likely) be duplicate objects.
Here's a git repository: https://github.com/chilts/node-awssum-scripts which has a js script file to find out the duplicates in a S3 bucket. I know, pointing you to an external source is not recommended, but I hope it may help you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With