Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find duplicate files in an AWS S3 bucket?

Is there a way to recursively find duplicate files in an Amazon S3 bucket? In a normal file system, I would simply use:

fdupes -r /my/directory
like image 428
Borealis Avatar asked Nov 30 '22 16:11

Borealis


2 Answers

There is no "find duplicates" command in Amazon S3.

However, you do do the following:

  • Retrieve a list of objects in the bucket
  • Look for objects that have the same ETag (checksum) and Size

They would (extremely likely) be duplicate objects.

like image 88
John Rotenstein Avatar answered Dec 06 '22 03:12

John Rotenstein


Here's a git repository: https://github.com/chilts/node-awssum-scripts which has a js script file to find out the duplicates in a S3 bucket. I know, pointing you to an external source is not recommended, but I hope it may help you.

like image 27
RaviTezu Avatar answered Dec 06 '22 03:12

RaviTezu