Lets say I have three files in a folder: file9.txt, file10.txt and file11.txt and i want to read them in this particular order. Can anyone help me with this?
Right now I am using the code
import glob, os
for infile in glob.glob(os.path.join( '*.txt')):
print "Current File Being Processed is: " + infile
and it reads first file10.txt then file11.txt and then file9.txt.
Can someone help me how to get the right order?
In Python, the os module provides a function listdir(dir_path), which returns a list of file and sub-directory names in the given directory path. Then using the filter() function create list of files only. Then sort this list of file names based on the name using the sorted() function.
Python can search for file names in a specified path of the OS. This can be done using the module os with the walk() functions. This will take a specific path as input and generate a 3-tuple involving dirpath, dirnames, and filenames. In the below example we are searching for a file named smpl.
Files on the filesystem are not sorted. You can sort the resulting filenames yourself using the sorted()
function:
for infile in sorted(glob.glob('*.txt')): print "Current File Being Processed is: " + infile
Note that the os.path.join
call in your code is a no-op; with only one argument it doesn't do anything but return that argument unaltered.
Note that your files will sort in alphabetical ordering, which puts 10
before 9
. You can use a custom key function to improve the sorting:
import re numbers = re.compile(r'(\d+)') def numericalSort(value): parts = numbers.split(value) parts[1::2] = map(int, parts[1::2]) return parts for infile in sorted(glob.glob('*.txt'), key=numericalSort): print "Current File Being Processed is: " + infile
The numericalSort
function splits out any digits in a filename, turns it into an actual number, and returns the result for sorting:
>>> files = ['file9.txt', 'file10.txt', 'file11.txt', '32foo9.txt', '32foo10.txt'] >>> sorted(files) ['32foo10.txt', '32foo9.txt', 'file10.txt', 'file11.txt', 'file9.txt'] >>> sorted(files, key=numericalSort) ['32foo9.txt', '32foo10.txt', 'file9.txt', 'file10.txt', 'file11.txt']
You can wrap your glob.glob( ... )
expression inside a sorted( ... )
statement and sort the resulting list of files. Example:
for infile in sorted(glob.glob('*.txt')):
You can give sorted
a comparison function or, better, use the key= ...
argument to give it a custom key that is used for sorting.
Example:
There are the following files:
x/blub01.txt
x/blub02.txt
x/blub10.txt
x/blub03.txt
y/blub05.txt
The following code will produce the following output:
for filename in sorted(glob.glob('[xy]/*.txt')):
print filename
# x/blub01.txt
# x/blub02.txt
# x/blub03.txt
# x/blub10.txt
# y/blub05.txt
Now with key function:
def key_func(x):
return os.path.split(x)[-1]
for filename in sorted(glob.glob('[xy]/*.txt'), key=key_func):
print filename
# x/blub01.txt
# x/blub02.txt
# x/blub03.txt
# y/blub05.txt
# x/blub10.txt
EDIT: Possibly this key function can sort your files:
pat=re.compile("(\d+)\D*$")
...
def key_func(x):
mat=pat.search(os.path.split(x)[-1]) # match last group of digits
if mat is None:
return x
return "{:>10}".format(mat.group(1)) # right align to 10 digits.
It sure can be improved, but I think you get the point. Paths without numbers will be left alone, paths with numbers will be converted to a string that is 10 digits wide and contains the number.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With