I want to write an algorithm about bash that it finds duplicate files
How can I add size option?
Don't reinvent the wheel, use the proper command :
fdupes -r dir
See http://code.google.com/p/fdupes/ (packaged on some Linux distros)
find . -not -empty -type f -printf "%s\n" | sort -rn | uniq -d |\
xargs -I{} -n1 find . -type f -size {}c -print0 | xargs -0 md5sum |\
sort | uniq -w32 --all-repeated=separate
This is how you'd want to do it. This code locates dups based on size first, then MD5 hash. Note the use of -size
, in relation to your question. Enjoy. Assumes you want to search in the current directory. If not, change the find .
to be appropriate for for the directory(ies) you'd like to search.
find /path/to/folder1 /path/to/folder2 -type f -printf "%f %s\n" | sort | uniq -d
The find command looks in two folders for files, prints file name only (stripping leading directories) and size, sort and show only dupes. This does assume there are no newlines in the file names.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With