How to find duplicate filenames (recursively) in a given directory? BASH

Tags:

I need to find every duplicate filenames in a given dir tree. I dont know, what dir tree user will give as a script argument, so I dont know the directory hierarchy. I tried this:

#!/bin/sh
find -type f | while IFS= read vo
do
echo `basename "$vo"`
done

but thats not really what I want. It finds only one duplicate and then ends, even, if there are more duplicate filenames, also - it doesnt print a whole path (prints only a filename) and duplicate count. I wanted to do something similar to this command:

find DIRNAME | tr '[A-Z]' '[a-z]' | sort | uniq -c | grep -v " 1 "

but it doenst work for me, dont know why. Even if I have a duplicates, it prints nothing. I use Xubuntu 12.04.

397

asked Apr 29 '13 10:04

yak

2 Answers

Here is another solution (based on the suggestion by @jim-mcnamara) without awk:

Solution 1

#!/bin/sh 
dirname=/path/to/directory
find $dirname -type f | sed 's_.*/__' | sort|  uniq -d| 
while read fileName
do
find $dirname -type f | grep "$fileName"
done

However, you have to do the same search twice. This can become very slow if you have to search a lot of data. Saving the "find" results in a temporary file might give a better performance.

Solution 2 (with temporary file)

#!/bin/sh 
dirname=/path/to/directory
tempfile=myTempfileName
find $dirname -type f  > $tempfile
cat $tempfile | sed 's_.*/__' | sort |  uniq -d| 
while read fileName
do
 grep "/$fileName" $tempfile
done
#rm -f $tempfile

Since you might not want to write a temp file on the harddrive in some cases, you can choose the method which fits your needs. Both examples print out the full path of the file.

Bonus question here: Is it possible to save the whole output of the find command as a list to a variable?

155

answered Oct 22 '22 18:10

psibar

Yes this is a really old question. But all those loops and temporary files seem a bit cumbersome.

Here's my 1-line answer:

find /PATH/TO/FILES -type f -printf '%p/ %f\n' | sort -k2 | uniq -f1 --all-repeated=separate

It has its limitations due to uniq and sort:

no whitespace (space, tab) in filename (will be interpreted as new field by uniq and sort)
needs file name printed as last field delimited by space (uniq doesn't support comparing only 1 field and is inflexible with field delimiters)

But it is quite flexible regarding its output thanks to find -printf and works well for me. Also seems to be what @yak tried to achieve originally.

Demonstrating some of the options you have with this:

find  /PATH/TO/FILES -type f -printf 'size: %s bytes, modified at: %t, path: %h/, file name: %f\n' | sort -k15 | uniq -f14 --all-repeated=prepend

Also there are options in sort and uniq to ignore case (as the topic opener intended to achieve by piping through tr). Look them up using man uniq or man sort.

answered Oct 22 '22 20:10

trs

Related questions
                            
                                How does bash deal with nested quotes? [duplicate]
                            
                                Show git branch and status in Mac Bash [duplicate]
                            
                                Reference to a bash variable whose name contains dot
                            
                                Command line to delete all ClearCase view-private files
                            
                                What is the easiest way to "detach/daemonize" a Bash script?
                            
                                Count files and directories using shell script
                            
                                best way to programmatically check for a failed scp in a shell script
                            
                                Include non-0 exit codes in the subsequent bash prompt
                            
                                How to write integer to binary file using Bash? [duplicate]
                            
                                Run git gc on multiple repositories [duplicate]
                            
                                Shell variable issue when trying to mkdir
                            
                                Shell variable is available on command line but not in script
                            
                                How to delete everything in a string after a specific character?
                            
                                Import XML files to PostgreSQL
                            
                                Output the results of select operation in an array - jq
                            
                                How to check whether a background job is alive? (bash)
                            
                                Bash loop until a certain command stops failing
                            
                                strip comments from xml file and pretty-print it
                            
                                Check if local git repo is ahead/behind remote
                            
                                check if a line is empty using bash

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to find duplicate filenames (recursively) in a given directory? BASH

Tags:

bash

filenames

duplicates

yak

People also ask

2 Answers

psibar

Here's my 1-line answer:

trs

Recent Activity

Donate For Us