Finding duplicate files according to md5 with bash

Question

I want to write an algorithm about bash that it finds duplicate files

How can I add size option?

Gilles Quenot · Accepted Answer

Don't reinvent the wheel, use the proper command :

fdupes -r dir

See http://code.google.com/p/fdupes/ (packaged on some Linux distros)

Alex Atkinson · Answer

find . -not -empty -type f -printf "%s
" | sort -rn | uniq -d |\
xargs -I{} -n1 find . -type f -size {}c -print0 | xargs -0 md5sum |\
sort | uniq -w32 --all-repeated=separate

This is how you'd want to do it. This code locates dups based on size first, then MD5 hash. Note the use of -size, in relation to your question. Enjoy. Assumes you want to search in the current directory. If not, change the find . to be appropriate for for the directory(ies) you'd like to search.

Drake Clarris · Answer

find /path/to/folder1 /path/to/folder2 -type f -printf "%f %s " | sort | uniq -d

The find command looks in two folders for files, prints file name only (stripping leading directories) and size, sort and show only dupes. This does assume there are no newlines in the file names.

Finding duplicate files according to md5 with bash

Tags:

bash

shell

user2913020

3 Answers

Gilles Quenot

Alex Atkinson

Drake Clarris

Recent Activity

Donate For Us

Finding duplicate files according to md5 with bash

Tags:

bash

shell

user2913020

3 Answers

Gilles Quenot

Alex Atkinson

Drake Clarris

Related questions

Recent Activity

Donate For Us