The following code prints me a long list of files with hashes and file names
md5sum *.java
I have tried unsuccessfully to list the lines where identical hashes occur, so that I can then remove identical files.
How can you filter and delete identical files which have same content?
fdupes and less view on duplicatesUse fdupes which is a commandline program such as
fdupes -r /home/masi/Documents/ > /tmp/1
less -M +Gg /tmp/1
which finds all duplicates and stores them in file in temp.
The less command shows you the line position of all lines and your proceeding as percentage.
I found fdupes from this answer and its clear Wikipedia article here.
You can install it by homebrew in OSX and by apt-get in Linux.
fdupes interactively with possible deletesRun
fdupes -rd /home/masi/Documents
which let's you choose which copy to delete or not, example view of the interactive work:
Set 4 of 2664, preserve files [1 - 2, all]: all
[+] /home/masi/Documents/Exercise 10 - 1.4.2015/task.bib
[+] /home/masi/Documents/Exercise 9 - 16.3.2015/task.bib
[1] /home/masi/Documents/Celiac_disease/jcom_jun02_celiac.pdf
[2] /home/masi/Documents/turnerWhite/jcom_jun02_celiac.pdf
Set 5 of 2664, preserve files [1 - 2, all]: 2
[-] /home/masi/Documents/Celiac_disease/jcom_jun02_celiac.pdf
[+] /home/masi/Documents/turnerWhite/jcom_jun02_celiac.pdf
where you see that I have 2664 duplicates. It would be nice to have some static file which would save the settings about my wanted duplicates; I opened a thread about this here. For instance, I have same bib -files in some exercises and homework so do not ask second time when the user wants the duplicate.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With