I have a bunch of files sorted numerically on a folder, when I try to sort glob.glob I never get the files in the right order.
file examples and expected output sorting
folder
------
C:\Users\user\Desktop\folder\1 sample.mp3
C:\Users\user\Desktop\folder\2 sample.mp3
C:\Users\user\Desktop\folder\3 sample.mp3
C:\Users\user\Desktop\folder\4 sample.mp3
C:\Users\user\Desktop\folder\5 sample.mp3
... over 800 files...
What I tried but the output seems random
files = sorted(glob.glob(f'{os.getcwd()}/*.mp3'), key=lambda x: (os.path.splitext(os.path.basename(x))[0]))
C:\Users\user\Desktop\folder\1 speech.mp3
C:\Users\user\Desktop\folder\10 speech.mp3
C:\Users\user\Desktop\folder\100 speech.mp3
C:\Users\user\Desktop\folder\101 speech.mp3
C:\Users\user\Desktop\folder\102 speech.mp3
C:\Users\user\Desktop\folder\103 speech.mp3
C:\Users\user\Desktop\folder\104 speech.mp3
C:\Users\user\Desktop\folder\105 speech.mp3
C:\Users\user\Desktop\folder\106 speech.mp3
C:\Users\user\Desktop\folder\107 speech.mp3
C:\Users\user\Desktop\folder\108 speech.mp3
C:\Users\user\Desktop\folder\109 speech.mp3
C:\Users\user\Desktop\folder\11 speech.mp3
Is not a solution try to sorting by date or size.
UPDATE all the previous answer worked great:
l = sorted(glob.glob(f'{os.getcwd()}/*.mp3'), key=len)
l = sorted(glob.glob(f'{os.getcwd()}/*.mp3'), key=lambda x: int(os.path.basename(x).split(' ')[0]))
def get_key(fp):
filename = os.path.splitext(os.path.basename(fp))[0]
int_part = filename.split()[0]
return int(int_part)
l = sorted(glob.glob(f'{os.getcwd()}/*.mp3'), key=get_key)
The general answer would catch the number with re.match() and to convert that number (string) to integer with int(). Use these numbers to sort the files with sorted()
import re
import math
from pathlib import Path
file_pattern = re.compile(r'.*?(\d+).*?')
def get_order(file):
match = file_pattern.match(Path(file).name)
if not match:
return math.inf
return int(match.groups()[0])
sorted_files = sorted(files, key=get_order)
Consider random files with one integer number in any part of the filename:
├── 012 some file.mp3
├── 1 file.txt
├── 13 file.mp3
├── 2 another file.txt
├── 3 file.csv
├── 4 file.mp3
├── 6 yet another file.txt
├── 88 name of file.mp3
├── and final 999.txt
├── and some another file7.txt
├── some 5 file.mp3
└── test.py
The get_order() could be used to sort the files, when passed to the sorted() builtin function in the key argument
In [1]: sorted(files, key=get_order)
Out[1]:
['C:\\tmp\\file_sort\\1 file.txt',
'C:\\tmp\\file_sort\\2 another file.txt',
'C:\\tmp\\file_sort\\3 file.csv',
'C:\\tmp\\file_sort\\4 file.mp3',
'C:\\tmp\\file_sort\\some 5 file.mp3',
'C:\\tmp\\file_sort\\6 yet another file.txt',
'C:\\tmp\\file_sort\\and some another file7.txt',
'C:\\tmp\\file_sort\\012 some file.mp3',
'C:\\tmp\\file_sort\\13 file.mp3',
'C:\\tmp\\file_sort\\88 name of file.mp3',
'C:\\tmp\\file_sort\\and final 999.txt',
'C:\\tmp\\file_sort\\test.py']
re.compile is used to give a small speed boost (if matching multiple times same pattern)re.match is used to match the regular expression pattern..*? means any character (.), zero or more times (*) non-greedily (?). \d+ matches any digit number one or more times, and the parenthesis just captures that match to the groups() list.match will be None, and the get_order gives infinity; these files are sorted arbitrarily, but one could add logic for these (was not asked in this question).sorted() function takes key argument, which should be callable which takes one argument: The item in the list. In this case, it will be one of those file strings (full file path)Path(file).name just takes the filename part (without suffix) from full file path.Try this:
import glob
import os
files = sorted(glob.glob(f'{os.getcwd()}/*.txt'), key=len)
print(files)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With