Keep only one version of each file (bash)

Question

I want to remove redundant files in a folder. Something like

cat_1.jpg
cat_2.jpg
cat_3.jpg
dog_10.jpg
dog_100.jpg

reduced to

cat_3.jpg
dog_100.jpg

That is, take only the version of each file with the highest number suffix and delete the rest.

This is very much like

list the files with minimum sequence

but the bash answer there has a "for ... in ... ". I have thousands of file names.

EDIT:

Got the file name convention wrong. There may be other underscores (ex. cat_and_dog_100.jpg). I need it to only take the number after the last underscore.

Shawn Chin · Accepted Answer

Assuming your filenames are always in the form <name>_<numbers>.jpg, here's a quick hack:

while read filename; do
    prefix=${filename/%_*/}  # Get text before underscore
    if [ "$prev_prefix" != "$prefix" ]; then  # we see a new prefix
        echo "Keeping filename"
        prev_prefix=$prefix
    else  # same prefix
        echo "Deleting $filename"
        rm $filename
    fi
done < <(find . -maxdepth 1 -name "*.jpg"| sort -n -t'_' -k1,2)

How this works:

Sorts all *.jpg files first by <name> and then by <numbers>.
- all files with the same prefix will be grouped together with the highest <number> appearing first
Iterates through the list of filenames and delete files except when a new <name> is found (which should be the one with the highest <number> )

Note that find is used instead of ls *.jpg so we can better handle large number of files.

Disclaimer: This is a rather fragile way of dealing with files and versioning, and should not be adopted as a long term solution. Do heed the comments posted on the question.

Keep only one version of each file (bash)

Tags:

bash

Tristan Klassen

1 Answers

Shawn Chin

Recent Activity

Donate For Us

Keep only one version of each file (bash)

Tags:

bash

Tristan Klassen

1 Answers

Shawn Chin

Related questions

Recent Activity

Donate For Us