Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort glob.glob numerically?

I have a bunch of files sorted numerically on a folder, when I try to sort glob.glob I never get the files in the right order.


file examples and expected output sorting

folder
------
C:\Users\user\Desktop\folder\1 sample.mp3
C:\Users\user\Desktop\folder\2 sample.mp3
C:\Users\user\Desktop\folder\3 sample.mp3
C:\Users\user\Desktop\folder\4 sample.mp3
C:\Users\user\Desktop\folder\5 sample.mp3
... over 800 files...

What I tried but the output seems random

files = sorted(glob.glob(f'{os.getcwd()}/*.mp3'), key=lambda x: (os.path.splitext(os.path.basename(x))[0]))


C:\Users\user\Desktop\folder\1 speech.mp3
C:\Users\user\Desktop\folder\10 speech.mp3 
C:\Users\user\Desktop\folder\100 speech.mp3
C:\Users\user\Desktop\folder\101 speech.mp3
C:\Users\user\Desktop\folder\102 speech.mp3
C:\Users\user\Desktop\folder\103 speech.mp3  
C:\Users\user\Desktop\folder\104 speech.mp3
C:\Users\user\Desktop\folder\105 speech.mp3
C:\Users\user\Desktop\folder\106 speech.mp3
C:\Users\user\Desktop\folder\107 speech.mp3
C:\Users\user\Desktop\folder\108 speech.mp3
C:\Users\user\Desktop\folder\109 speech.mp3
C:\Users\user\Desktop\folder\11 speech.mp3 

Is not a solution try to sorting by date or size.

UPDATE all the previous answer worked great:

l = sorted(glob.glob(f'{os.getcwd()}/*.mp3'), key=len)
l = sorted(glob.glob(f'{os.getcwd()}/*.mp3'), key=lambda x: int(os.path.basename(x).split(' ')[0]))

def get_key(fp):
    filename = os.path.splitext(os.path.basename(fp))[0]
    int_part = filename.split()[0]
    return int(int_part)

l = sorted(glob.glob(f'{os.getcwd()}/*.mp3'), key=get_key)
like image 848
Y4RD13 Avatar asked May 14 '26 13:05

Y4RD13


2 Answers

The general answer would catch the number with re.match() and to convert that number (string) to integer with int(). Use these numbers to sort the files with sorted()

Code

import re 
import math
from pathlib import Path 

file_pattern = re.compile(r'.*?(\d+).*?')
def get_order(file):
    match = file_pattern.match(Path(file).name)
    if not match:
        return math.inf
    return int(match.groups()[0])

sorted_files = sorted(files, key=get_order)

Example input

Consider random files with one integer number in any part of the filename:

├── 012 some file.mp3
├── 1 file.txt
├── 13 file.mp3
├── 2 another file.txt
├── 3 file.csv
├── 4 file.mp3
├── 6 yet another file.txt
├── 88 name of file.mp3
├── and final 999.txt
├── and some another file7.txt
├── some 5 file.mp3
└── test.py

Example output

The get_order() could be used to sort the files, when passed to the sorted() builtin function in the key argument

In [1]: sorted(files, key=get_order)
Out[1]:
['C:\\tmp\\file_sort\\1 file.txt',
 'C:\\tmp\\file_sort\\2 another file.txt',
 'C:\\tmp\\file_sort\\3 file.csv',
 'C:\\tmp\\file_sort\\4 file.mp3',
 'C:\\tmp\\file_sort\\some 5 file.mp3',
 'C:\\tmp\\file_sort\\6 yet another file.txt',
 'C:\\tmp\\file_sort\\and some another file7.txt',
 'C:\\tmp\\file_sort\\012 some file.mp3',
 'C:\\tmp\\file_sort\\13 file.mp3',
 'C:\\tmp\\file_sort\\88 name of file.mp3',
 'C:\\tmp\\file_sort\\and final 999.txt',
 'C:\\tmp\\file_sort\\test.py']

Short explanation

  • The re.compile is used to give a small speed boost (if matching multiple times same pattern)
  • The re.match is used to match the regular expression pattern.
  • In the regex pattern, .*? means any character (.), zero or more times (*) non-greedily (?). \d+ matches any digit number one or more times, and the parenthesis just captures that match to the groups() list.
  • In case of no match (no digits in the file), the match will be None, and the get_order gives infinity; these files are sorted arbitrarily, but one could add logic for these (was not asked in this question).
  • The sorted() function takes key argument, which should be callable which takes one argument: The item in the list. In this case, it will be one of those file strings (full file path)
  • The Path(file).name just takes the filename part (without suffix) from full file path.
like image 130
np8 Avatar answered May 16 '26 03:05

np8


Try this:

import glob
import os
files = sorted(glob.glob(f'{os.getcwd()}/*.txt'), key=len)
print(files)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!