Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading files in a particular order in python

Lets say I have three files in a folder: file9.txt, file10.txt and file11.txt and i want to read them in this particular order. Can anyone help me with this?

Right now I am using the code

import glob, os
for infile in glob.glob(os.path.join( '*.txt')):
    print "Current File Being Processed is: " + infile

and it reads first file10.txt then file11.txt and then file9.txt.

Can someone help me how to get the right order?

like image 780
user1620012 Avatar asked Aug 23 '12 14:08

user1620012


People also ask

How do you sort file names in a particular order in Python?

In Python, the os module provides a function listdir(dir_path), which returns a list of file and sub-directory names in the given directory path. Then using the filter() function create list of files only. Then sort this list of file names based on the name using the sorted() function.

How do you search for a specific file in Python?

Python can search for file names in a specified path of the OS. This can be done using the module os with the walk() functions. This will take a specific path as input and generate a 3-tuple involving dirpath, dirnames, and filenames. In the below example we are searching for a file named smpl.


2 Answers

Files on the filesystem are not sorted. You can sort the resulting filenames yourself using the sorted() function:

for infile in sorted(glob.glob('*.txt')):     print "Current File Being Processed is: " + infile 

Note that the os.path.join call in your code is a no-op; with only one argument it doesn't do anything but return that argument unaltered.

Note that your files will sort in alphabetical ordering, which puts 10 before 9. You can use a custom key function to improve the sorting:

import re numbers = re.compile(r'(\d+)') def numericalSort(value):     parts = numbers.split(value)     parts[1::2] = map(int, parts[1::2])     return parts   for infile in sorted(glob.glob('*.txt'), key=numericalSort):     print "Current File Being Processed is: " + infile 

The numericalSort function splits out any digits in a filename, turns it into an actual number, and returns the result for sorting:

>>> files = ['file9.txt', 'file10.txt', 'file11.txt', '32foo9.txt', '32foo10.txt'] >>> sorted(files) ['32foo10.txt', '32foo9.txt', 'file10.txt', 'file11.txt', 'file9.txt'] >>> sorted(files, key=numericalSort) ['32foo9.txt', '32foo10.txt', 'file9.txt', 'file10.txt', 'file11.txt'] 
like image 138
Martijn Pieters Avatar answered Sep 28 '22 01:09

Martijn Pieters


You can wrap your glob.glob( ... ) expression inside a sorted( ... ) statement and sort the resulting list of files. Example:

for infile in sorted(glob.glob('*.txt')):

You can give sorted a comparison function or, better, use the key= ... argument to give it a custom key that is used for sorting.

Example:

There are the following files:

x/blub01.txt
x/blub02.txt
x/blub10.txt
x/blub03.txt
y/blub05.txt

The following code will produce the following output:

for filename in sorted(glob.glob('[xy]/*.txt')):
        print filename
# x/blub01.txt
# x/blub02.txt
# x/blub03.txt
# x/blub10.txt
# y/blub05.txt

Now with key function:

def key_func(x):
        return os.path.split(x)[-1]
for filename in sorted(glob.glob('[xy]/*.txt'), key=key_func):
        print filename
# x/blub01.txt
# x/blub02.txt
# x/blub03.txt
# y/blub05.txt
# x/blub10.txt

EDIT: Possibly this key function can sort your files:

pat=re.compile("(\d+)\D*$")
...
def key_func(x):
        mat=pat.search(os.path.split(x)[-1]) # match last group of digits
        if mat is None:
            return x
        return "{:>10}".format(mat.group(1)) # right align to 10 digits.

It sure can be improved, but I think you get the point. Paths without numbers will be left alone, paths with numbers will be converted to a string that is 10 digits wide and contains the number.

like image 38
hochl Avatar answered Sep 28 '22 02:09

hochl