Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding duplicate files according to md5 with bash

Tags:

bash

shell

I want to write an algorithm about bash that it finds duplicate files

How can I add size option?

like image 981
user2913020 Avatar asked Oct 23 '13 20:10

user2913020


3 Answers

Don't reinvent the wheel, use the proper command :

fdupes -r dir

See http://code.google.com/p/fdupes/ (packaged on some Linux distros)

like image 174
Gilles Quenot Avatar answered Oct 25 '22 13:10

Gilles Quenot


find . -not -empty -type f -printf "%s\n" | sort -rn | uniq -d |\
xargs -I{} -n1 find . -type f -size {}c -print0 | xargs -0 md5sum |\
sort | uniq -w32 --all-repeated=separate

This is how you'd want to do it. This code locates dups based on size first, then MD5 hash. Note the use of -size, in relation to your question. Enjoy. Assumes you want to search in the current directory. If not, change the find . to be appropriate for for the directory(ies) you'd like to search.

like image 24
Alex Atkinson Avatar answered Oct 25 '22 13:10

Alex Atkinson


find /path/to/folder1 /path/to/folder2 -type f -printf "%f %s\n" | sort | uniq -d

The find command looks in two folders for files, prints file name only (stripping leading directories) and size, sort and show only dupes. This does assume there are no newlines in the file names.

like image 31
Drake Clarris Avatar answered Oct 25 '22 13:10

Drake Clarris