Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

batch renaming 100K files with python

I have a folder with over 100,000 files, all numbered with the same stub, but without leading zeros, and the numbers aren't always contiguous (usually they are, but there are gaps) e.g:

file-21.png, 
file-22.png,  
file-640.png, 
file-641.png, 
file-642.png, 
file-645.png, 
file-2130.png, 
file-2131.png, 
file-3012.png, 

etc.

I would like to batch process this to create padded, contiguous files. e.g:

file-000000.png, 
file-000001.png, 
file-000002.png, 
file-000003.png, 

When I parse the folder with for filename in os.listdir('.'): the files don't come up in the order I'd like to them to. Understandably they come up

 file-1, 
 file-1x, 
 file-1xx, 
 file-1xxx,

etc. then

 file-2, 
 file-2x, 
 file-2xx, 

etc. How can I get it to go through in the order of the numeric value? I am a complete python noob, but looking at the docs i'm guessing I could use map to create a new list filtering out only the numerical part, and then sort that list, then iterate that? With over 100K files this could be heavy. Any tips welcome!

like image 326
memo Avatar asked Jun 20 '10 00:06

memo


People also ask

How do I bulk rename files in Python?

To rename files in Python, use the rename() method of the os module. The parameters of the rename() method are the source address (old name) and the destination address (new name).

How do I batch rename multiple files at once?

To rename multiple files from File Explorer, select all the files you wish to rename, then press the F2 key. The name of the last file will become highlighted. Type the new name you wish to give to every file, then press Enter.

Can Python rename files?

In Python3, rename() method is used to rename a file or directory. This method is a part of the os module and comes in extremely handy.


2 Answers

import re
thenum = re.compile('^file-(\d+)\.png$')

def bynumber(fn):
  mo = thenum.match(fn)
  if mo: return int(mo.group(1))

allnames = os.listdir('.')
allnames.sort(key=bynumber)

Now you have the files in the order you want them and can loop

for i, fn in enumerate(allnames):
  ...

using the progressive number i (which will be 0, 1, 2, ...) padded as you wish in the destination-name.

like image 60
Alex Martelli Avatar answered Sep 30 '22 17:09

Alex Martelli


There are three steps. The first is getting all the filenames. The second is converting the filenames. The third is renaming them.

If all the files are in the same folder, then glob should work.

import glob
filenames = glob.glob("/path/to/folder/*.txt")

Next, you want to change the name of the file. You can print with padding to do this.

>>> filename = "file-338.txt"
>>> import os
>>> fnpart = os.path.splitext(filename)[0]
>>> fnpart
'file-338'
>>> _, num = fnpart.split("-")
>>> num.rjust(5, "0")
'00338'
>>> newname = "file-%s.txt" % num.rjust(5, "0")
>>> newname
'file-00338.txt'

Now, you need to rename them all. os.rename does just that.

os.rename(filename, newname)

To put it together:

for filename in glob.glob("/path/to/folder/*.txt"): # loop through each file
    newname = make_new_filename(filename) # create a function that does step 2, above
    os.rename(filename, newname)
like image 22
Ryan Ginstrom Avatar answered Sep 30 '22 17:09

Ryan Ginstrom