I'm running Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit (Intel)] on win 32
When I'm asking Python
>>> "u11-Phrase 099.wav" < "u11-Phrase 1000.wav"
True
That's fine. When I ask
>>> "u11-Phrase 100.wav" < "u11-Phrase 1000.wav"
True
That's fine, too. But when I ask
>>> "u11-Phrase 101.wav" < "u11-Phrase 1000.wav"
False
So according Python "u11-Phrase 100.wav" comes before "u11-Phrase 1000.wav" but "u11-Phrase 101.wav" comes after "u11-Phrase 1000.wav"! And this is problematic for me because I'm trying to write a file renaming program and this kind of sorting breaks the functionality.
What can I do to overcome this? Should I write my own cmp function and test for edge cases or is there a much simpler shortcut to give me the ordering I want?
On the other hand if I modify the strings such as
>>> "u11-Phrase 0101.wav" < "u11-Phrase 1000.wav"
True
However those strings come from the file listing of directory such as:
files = glob.glob('*.wav')
files.sort()
for file in files:
...
So I'd rather not do surgical operations on the strings after they have been created by glob. And no, I don't want to change the original filenames in that folder, too.
Any hints?
You are looking for human sorting.
The reason 101.wav is not less than 1000.wav is that computers (not just Python) sort strings character by character, and the first difference between these two strings is where the first string has a '1' and the second string has a '0'. '1' is not less than '0', so the strings compare as you have seen.
People naturally parse those strings into their components, and interpret the numbers numerically, not lexically. The code I linked to above will do that same sort of parsing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With