a lot of files I download have crap/spam in their filenames, e.g. <code>[ www.crap.com ] file.name.ext</code> <code>www.crap.com - file.name.ext</code> I've come up with two ways for dealing with them but they both seem pretty clunky: with parameter expansion: <pre class="prettyprint"><code>if [[ ${base_name} != ${base_name//\[+([^\]])\]} ]] then mv -v "${dir_name}/${base_name}" "${dir_name}/${base_name//\[+([^\]])\]}" && base_name="${base_name//\[+([^\]])\]}" fi if [[ ${base_name} != ${base_name//www.*.com - /} ]] then mv -v "${dir_name}/${base_name}" "${dir_name}/${base_name//www.*.com - /}" && base_name="${base_name//www.*.com - /}" fi # more of these type of statements; one for each type of frequently-encountered pattern </code></pre> and then with echo/sed: <pre class="prettyprint"><code>tmp=`echo "${base_name}" | sed -e 's/\[[^][]*\]//g' | sed -e 's/\s-\s//g'` mv "${base_name}" "{tmp}" </code></pre> I feel like the parameter expansion is the worse of the two but I like it because I'm able to keep the same variable assigned to the file for further processing after the rename (the above code is used in a script that's called for each file after the file download is complete). So anyway I was hoping there's a better/cleaner way to do the above that someone more knowledgeable than myself could show me, preferably in a way that would allow me to easily reassign the old/original variable to the new/renamed file. Thanks

Take advantage of the following classical pattern: <pre class="prettyprint"><code> job_select /path/to/directory| job_strategy | job_process </code></pre> where <code>job_select</code> is responsible for selecting the objects of your job, <code>job_strategy</code> prepares a processing plan for these objects and <code>job_process</code> eventually executes the plan. This assumes that filenames do not contain a vertical bar <code>|</code> nor a newline character. The job_select function <pre class="prettyprint"><code> # job_select PATH # Produce the list of files to process job_select() { find "$1" -name 'www.*.com - *' -o -name '[*] - *' } </code></pre> The <code>find</code> command can examine all properties of the file maintained by the file system, like creation time, access time, modification time. It is also possible to control how the filesystem is explored by telling <code>find</code> not to descend into mounted filesystems, how much recursions levels are allowed. It is common to append pipes to the <code>find</code> command to perform more complicated selections based on the filename. Avoid the common pitfall of including the contents of hidden directories in the output of the <code>job_select</code> function. For instance, the directories <code>CVS</code>, <code>.svn</code>, <code>.svk</code> and <code>.git</code> are used by the corresponding source control management tools and it is almost always wrong to include their contents in the output of the <code>job_select</code> function. By inadvertently batch processing these files, one can easily make the affected working copy unusable. The job_strategy function <pre class="prettyprint"><code># job_strategy # Prepare a plan for renaming files job_strategy() { sed -e ' h s@/www\..*\.com - *@/@ s@/\[^]]* - *@/@ x G s/\n/|/ ' } </code></pre> This commands reads the output of <code>job_select</code> and makes a plan for our renaming job. The plan is represented by text lines having two fields separated by the character <code>|</code>, the first field being the old name of the file and the second being the new computed file of the file, it looks like <pre class="prettyprint"><code>[ www.crap.com ] file.name.1.ext|file.name.1.ext www.crap.com - file.name.2.ext|file.name.2.ext </code></pre> The particular program used to produce the plan is essentially irrelevant, but it is common to use <code>sed</code> as in the example; <code>awk</code> or <code>perl</code> for this. Let us walk through the <code>sed</code>-script used here: <pre class="prettyprint"><code>h Replace the contents of the hold space with the contents of the pattern space. … Edit the contents of the pattern space. x Swap the contents of the pattern and hold spaces. G Append a newline character followed by the contents of the hold space to the pattern space. s/\n/|/ Replace the newline character in the pattern space by a vertical bar. </code></pre> It can be easier to use several filters to prepare the plan. Another common case is the use of the <code>stat</code> command to add creation times to file names. The job_process function <pre class="prettyprint"><code># job_process # Rename files according to a plan job_process() { local oldname local newname while IFS='|' read oldname newname; do mv "$oldname" "$newname" done } </code></pre> The input field separator IFS is adjusted to let the function read the output of <code>job_strategy</code>. Declaring <code>oldname</code> and <code>newname</code> as local is useful in large programs but can be omitted in very simple scripts. The <code>job_process</code> function can be adjusted to avoid overwriting existing files and report the problematic items. About data structures in shell programs Note the use of pipes to transfer data from one stage to the other: apprentices often rely on variables to represent such information but it turns out to be a clumsy choice. Instead, it is preferable to represent data as tabular files or as tabular data streams moving from one process to the other, in this form, data can be easily processed by powerful tools like <code>sed</code>, <code>awk</code>, <code>join</code>, <code>paste</code> and <code>sort</code> — only to cite the most common ones.

Better way to rename files based on multiple patterns

Tags:

linux

bash

shell

unix

sed

a lot of files I download have crap/spam in their filenames, e.g.

[ www.crap.com ] file.name.ext

www.crap.com - file.name.ext

I've come up with two ways for dealing with them but they both seem pretty clunky:

with parameter expansion:

if [[ ${base_name} != ${base_name//\[+([^\]])\]} ]]
then
    mv -v "${dir_name}/${base_name}" "${dir_name}/${base_name//\[+([^\]])\]}" &&
        base_name="${base_name//\[+([^\]])\]}"
fi

if [[ ${base_name} != ${base_name//www.*.com - /} ]]
then
    mv -v "${dir_name}/${base_name}" "${dir_name}/${base_name//www.*.com - /}" &&
        base_name="${base_name//www.*.com - /}"
fi

# more of these type of statements; one for each type of frequently-encountered pattern

and then with echo/sed:

tmp=`echo "${base_name}" | sed -e 's/\[[^][]*\]//g' | sed -e 's/\s-\s//g'`
mv "${base_name}" "{tmp}"

I feel like the parameter expansion is the worse of the two but I like it because I'm able to keep the same variable assigned to the file for further processing after the rename (the above code is used in a script that's called for each file after the file download is complete).

So anyway I was hoping there's a better/cleaner way to do the above that someone more knowledgeable than myself could show me, preferably in a way that would allow me to easily reassign the old/original variable to the new/renamed file.

Thanks

302

asked Dec 17 '13 08:12

user3100854

2 Answers

Two answer: using perl rename or using pure bash

As there are some people who dislike perl, I wrote my bash only version

Renaming files by using the `rename` command.

Introduction

Yes, this is a typical job for rename command which was precisely designed for:

man rename | sed -ne '/example/,/^[^ ]/p'
   For example, to rename all files matching "*.bak" to strip the
   extension, you might say

           rename 's/\.bak$//' *.bak

   To translate uppercase names to lower, you'd use

           rename 'y/A-Z/a-z/' *

More oriented samples

Simply drop all spaces and square brackets:

rename 's/[ \[\]]*//g;' *.ext

Rename all .jpg by numbering from 1:

rename 's/^.*$/sprintf "IMG_%05d.JPG",++$./e' *.jpg

Demo:

touch {a..e}.jpg
ls -ltr
total 0
-rw-r--r-- 1 user user 0 sep  6 16:35 e.jpg
-rw-r--r-- 1 user user 0 sep  6 16:35 d.jpg
-rw-r--r-- 1 user user 0 sep  6 16:35 c.jpg
-rw-r--r-- 1 user user 0 sep  6 16:35 b.jpg
-rw-r--r-- 1 user user 0 sep  6 16:35 a.jpg
rename 's/^.*$/sprintf "IMG_%05d.JPG",++$./e' *.jpg
ls -ltr
total 0
-rw-r--r-- 1 user user 0 sep  6 16:35 IMG_00005.JPG
-rw-r--r-- 1 user user 0 sep  6 16:35 IMG_00004.JPG
-rw-r--r-- 1 user user 0 sep  6 16:35 IMG_00003.JPG
-rw-r--r-- 1 user user 0 sep  6 16:35 IMG_00002.JPG
-rw-r--r-- 1 user user 0 sep  6 16:35 IMG_00001.JPG

Full syntax for matching SO question, in safe way

There is a strong and safe way using rename utility:

As this is perl common tool, we have to use perl syntax:

rename 'my $o=$_;
        s/[ \[\]]+/-/g;
        s/-+/-/g;
        s/^-//g;
        s/-\(\..*\|\)$/$1/g;
        s/(.*[^\d])(|-(\d+))(\.[a-z0-9]{2,6})$/
                my $i=$3;
                $i=0 unless $i;
                sprintf("%s-%d%s", $1, $i+1, $4)
            /eg while
               $o ne $_  &&
               -f $_;
    ' *

Testing rule:

touch '[ www.crap.com ] file.name.ext' 'www.crap.com - file.name.ext'
ls -1
[ www.crap.com ] file.name.ext
www.crap.com - file.name.ext
rename 'my $o=$_; ...
    ...
    ...' *
ls -1
www.crap.com-file.name-1.ext
www.crap.com-file.name.ext

touch '[ www.crap.com ] file.name.ext' 'www.crap.com - file.name.ext'
ls -1
www.crap.com-file.name-1.ext
[ www.crap.com ] file.name.ext
www.crap.com - file.name.ext
www.crap.com-file.name.ext
rename 'my $o=$_; ...
    ...
    ...' *
ls -1
www.crap.com-file.name-1.ext
www.crap.com-file.name-2.ext
www.crap.com-file.name-3.ext
www.crap.com-file.name.ext

... and so on...

... and it's safe while you don't use -f flag to rename command: file won't be overwrited and you will get an error message if something goes wrong.

Renaming files by using bash and so called bashisms:

I prefer doing this by using dedicated utility, but this could even be done by using pure bash (aka without any fork)

There is no use of any other binary than bash (no sed, awk, tr or other):

#!/bin/bash

for file;do
    newname=${file//[ \]\[]/.}
    while [ "$newname" != "${newname#.}" ] ;do
        newname=${newname#.}
      done
    while [ "$newname" != "${newname//[.-][.-]/.}" ] ;do
        newname=${newname//[.-][.-]/-};done
    if [ "$file" != "$newname" ] ;then
        if [ -f $newname ] ;then
            ext=${newname##*.}
            basename=${newname%.$ext}
            partname=${basename%%-[0-9]}
            count=${basename#${partname}-}
            [ "$partname" = "$count" ] && count=0
            while printf -v newname "%s-%d.%s" $partname $[++count] $ext &&
                  [ -f "$newname" ] ;do
              :;done
          fi
        mv  "$file" $newname
      fi
  done

To be run with files as argument, for sample:

/path/to/my/script.sh \[*

Replacing spaces and square bracket by dot
Replacing sequences of .-, -., -- or .. by only one -.
Test if filename don't differ, there is nothing to do.
Test if a file exist with newname...
split filename, counter and extension, for making indexed newname
loop if a file exist with newname
Finaly rename the file.

answered Nov 13 '22 07:11

F. Hauri

Take advantage of the following classical pattern:

 job_select /path/to/directory| job_strategy | job_process

where job_select is responsible for selecting the objects of your job, job_strategy prepares a processing plan for these objects and job_process eventually executes the plan.

This assumes that filenames do not contain a vertical bar | nor a newline character.

The job_select function

 # job_select PATH
 #  Produce the list of files to process
 job_select()
 {
   find "$1" -name 'www.*.com - *' -o -name '[*] - *'
 }

The find command can examine all properties of the file maintained by the file system, like creation time, access time, modification time. It is also possible to control how the filesystem is explored by telling find not to descend into mounted filesystems, how much recursions levels are allowed. It is common to append pipes to the find command to perform more complicated selections based on the filename.

Avoid the common pitfall of including the contents of hidden directories in the output of the job_select function. For instance, the directories CVS, .svn, .svk and .git are used by the corresponding source control management tools and it is almost always wrong to include their contents in the output of the job_select function. By inadvertently batch processing these files, one can easily make the affected working copy unusable.

The job_strategy function

# job_strategy
#  Prepare a plan for renaming files
job_strategy()
{
  sed -e '
    h
    s@/www\..*\.com - *@/@
    s@/\[^]]* - *@/@
    x
    G
    s/\n/|/
  '
}

This commands reads the output of job_select and makes a plan for our renaming job. The plan is represented by text lines having two fields separated by the character |, the first field being the old name of the file and the second being the new computed file of the file, it looks like

[ www.crap.com ] file.name.1.ext|file.name.1.ext
www.crap.com - file.name.2.ext|file.name.2.ext

The particular program used to produce the plan is essentially irrelevant, but it is common to use sed as in the example; awk or perl for this. Let us walk through the sed-script used here:

h       Replace the contents of the hold space with the contents of the pattern space.
…       Edit the contents of the pattern space.
x       Swap the contents of the pattern and hold spaces.
G       Append a newline character followed by the contents of the hold space to the pattern space.
s/\n/|/ Replace the newline character in the pattern space by a vertical bar.

It can be easier to use several filters to prepare the plan. Another common case is the use of the stat command to add creation times to file names.

The job_process function

# job_process
#  Rename files according to a plan
job_process()
{
   local oldname
   local newname
   while IFS='|' read oldname newname; do
     mv "$oldname" "$newname"
   done
}

The input field separator IFS is adjusted to let the function read the output of job_strategy. Declaring oldname and newname as local is useful in large programs but can be omitted in very simple scripts. The job_process function can be adjusted to avoid overwriting existing files and report the problematic items.

About data structures in shell programs Note the use of pipes to transfer data from one stage to the other: apprentices often rely on variables to represent such information but it turns out to be a clumsy choice. Instead, it is preferable to represent data as tabular files or as tabular data streams moving from one process to the other, in this form, data can be easily processed by powerful tools like sed, awk, join, paste and sort — only to cite the most common ones.

answered Nov 13 '22 09:11

Michaël Le Barbier

Related questions
                            
                                gcc compiled binaries w/different sizes?
                            
                                select()-able timers
                            
                                How do I bring a processes window to the foreground on X Windows? (C++)
                            
                                can you fake *nix uptime?
                            
                                How to remember multiple tabs' session in terminal? (Alike FF session manager)
                            
                                java.lang.NoClassDefFoundError: org/codehaus/plexus/classworlds/launcher/Launcher when running bash file to build the project
                            
                                Finding the load address of a shared library in Linux
                            
                                `ar` library override timestamp
                            
                                Default MySQL database name
                            
                                Does local variable in thread function have separe copy according to thread?
                            
                                Get time in milliseconds without an installing an extra package?
                            
                                How to get opcodes of a c program
                            
                                linux thread suspended by real-time signal when running in eclipse
                            
                                How to check if process is running in linux
                            
                                Error thrown in update-alternatives
                            
                                Executing a bash script upon file creation
                            
                                Using sqrtf() in C: "undefined reference to `sqrtf'"
                            
                                Move file as root preserving ownership linux [closed]
                            
                                Execute command as a string in Bash
                            
                                How to code C# on Windows but run on Linux? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Better way to rename files based on multiple patterns

Tags:

linux

bash

shell

unix

sed

user3100854

People also ask

2 Answers

Two answer: using perl rename or using pure bash

Renaming files by using the `rename` command.

Introduction

More oriented samples

Full syntax for matching SO question, in safe way

Renaming files by using bash and so called bashisms:

F. Hauri

Michaël Le Barbier

Recent Activity

Donate For Us

Better way to rename files based on multiple patterns

Tags:

linux

bash

shell

unix

sed

user3100854

People also ask

2 Answers

Two answer: using perl rename or using pure bash

Renaming files by using the rename command.

Introduction

More oriented samples

Full syntax for matching SO question, in safe way

Renaming files by using bash and so called bashisms:

F. Hauri

Michaël Le Barbier

Related questions

Recent Activity

Donate For Us

Renaming files by using the `rename` command.