I want to remove redundant files in a folder. Something like
cat_1.jpg
cat_2.jpg
cat_3.jpg
dog_10.jpg
dog_100.jpg
reduced to
cat_3.jpg
dog_100.jpg
That is, take only the version of each file with the highest number suffix and delete the rest.
This is very much like
list the files with minimum sequence
but the bash answer there has a "for ... in ... ". I have thousands of file names.
EDIT:
Got the file name convention wrong. There may be other underscores (ex. cat_and_dog_100.jpg). I need it to only take the number after the last underscore.
Assuming your filenames are always in the form <name>_<numbers>.jpg, here's a quick hack:
while read filename; do
prefix=${filename/%_*/} # Get text before underscore
if [ "$prev_prefix" != "$prefix" ]; then # we see a new prefix
echo "Keeping filename"
prev_prefix=$prefix
else # same prefix
echo "Deleting $filename"
rm $filename
fi
done < <(find . -maxdepth 1 -name "*.jpg"| sort -n -t'_' -k1,2)
How this works:
*.jpg files first by <name> and then by <numbers>.
<number> appearing first<name> is found (which should be the one with the highest <number> )Note that find is used instead of ls *.jpg so we can better handle large number of files.
Disclaimer: This is a rather fragile way of dealing with files and versioning, and should not be adopted as a long term solution. Do heed the comments posted on the question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With