Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find duplicate filenames (recursively) in a given directory? BASH

I need to find every duplicate filenames in a given dir tree. I dont know, what dir tree user will give as a script argument, so I dont know the directory hierarchy. I tried this:

#!/bin/sh
find -type f | while IFS= read vo
do
echo `basename "$vo"`
done

but thats not really what I want. It finds only one duplicate and then ends, even, if there are more duplicate filenames, also - it doesnt print a whole path (prints only a filename) and duplicate count. I wanted to do something similar to this command:

find DIRNAME | tr '[A-Z]' '[a-z]' | sort | uniq -c | grep -v " 1 " 

but it doenst work for me, dont know why. Even if I have a duplicates, it prints nothing. I use Xubuntu 12.04.

like image 397
yak Avatar asked Apr 29 '13 10:04

yak


People also ask

How do I find duplicates in a folder and subfolders?

You'll want to select “Duplicates Search” in the Search Mode box at the top of the window and then choose folders to search by clicking the “Browse” button to the right of Base Folders. For example, you could select C:\ to search your entire C: drive for duplicates.


2 Answers

Here is another solution (based on the suggestion by @jim-mcnamara) without awk:

Solution 1

#!/bin/sh 
dirname=/path/to/directory
find $dirname -type f | sed 's_.*/__' | sort|  uniq -d| 
while read fileName
do
find $dirname -type f | grep "$fileName"
done

However, you have to do the same search twice. This can become very slow if you have to search a lot of data. Saving the "find" results in a temporary file might give a better performance.

Solution 2 (with temporary file)

#!/bin/sh 
dirname=/path/to/directory
tempfile=myTempfileName
find $dirname -type f  > $tempfile
cat $tempfile | sed 's_.*/__' | sort |  uniq -d| 
while read fileName
do
 grep "/$fileName" $tempfile
done
#rm -f $tempfile

Since you might not want to write a temp file on the harddrive in some cases, you can choose the method which fits your needs. Both examples print out the full path of the file.

Bonus question here: Is it possible to save the whole output of the find command as a list to a variable?

like image 155
psibar Avatar answered Oct 22 '22 18:10

psibar


Yes this is a really old question. But all those loops and temporary files seem a bit cumbersome.

Here's my 1-line answer:

find /PATH/TO/FILES -type f -printf '%p/ %f\n' | sort -k2 | uniq -f1 --all-repeated=separate

It has its limitations due to uniq and sort:

  • no whitespace (space, tab) in filename (will be interpreted as new field by uniq and sort)
  • needs file name printed as last field delimited by space (uniq doesn't support comparing only 1 field and is inflexible with field delimiters)

But it is quite flexible regarding its output thanks to find -printf and works well for me. Also seems to be what @yak tried to achieve originally.

Demonstrating some of the options you have with this:

find  /PATH/TO/FILES -type f -printf 'size: %s bytes, modified at: %t, path: %h/, file name: %f\n' | sort -k15 | uniq -f14 --all-repeated=prepend

Also there are options in sort and uniq to ignore case (as the topic opener intended to achieve by piping through tr). Look them up using man uniq or man sort.

like image 42
trs Avatar answered Oct 22 '22 20:10

trs