How to find duplicate files in an AWS S3 bucket?

Question

Is there a way to recursively find duplicate files in an Amazon S3 bucket? In a normal file system, I would simply use:

fdupes -r /my/directory

John Rotenstein · Accepted Answer

There is no "find duplicates" command in Amazon S3.

However, you do do the following:

Retrieve a list of objects in the bucket
Look for objects that have the same ETag (checksum) and Size

They would (extremely likely) be duplicate objects.

RaviTezu · Answer

Here's a git repository: https://github.com/chilts/node-awssum-scripts which has a js script file to find out the duplicates in a S3 bucket. I know, pointing you to an external source is not recommended, but I hope it may help you.

How to find duplicate files in an AWS S3 bucket?

Tags:

linux

duplicates

amazon-web-services

amazon-s3

amazon-ec2

Borealis

2 Answers

John Rotenstein

RaviTezu

Recent Activity

Donate For Us

How to find duplicate files in an AWS S3 bucket?

Tags:

linux

duplicates

amazon-web-services

amazon-s3

amazon-ec2

Borealis

2 Answers

John Rotenstein

RaviTezu

Related questions

Recent Activity

Donate For Us