Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python sorts "u11-Phrase 1000.wav" before "u11-Phrase 101.wav"; how can I overcome this?

Tags:

python

sorting

I'm running Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit (Intel)] on win 32

When I'm asking Python

>>> "u11-Phrase 099.wav" <  "u11-Phrase 1000.wav"
True

That's fine. When I ask

>>> "u11-Phrase 100.wav" <  "u11-Phrase 1000.wav"
True

That's fine, too. But when I ask

>>> "u11-Phrase 101.wav" <  "u11-Phrase 1000.wav"
False

So according Python "u11-Phrase 100.wav" comes before "u11-Phrase 1000.wav" but "u11-Phrase 101.wav" comes after "u11-Phrase 1000.wav"! And this is problematic for me because I'm trying to write a file renaming program and this kind of sorting breaks the functionality.

What can I do to overcome this? Should I write my own cmp function and test for edge cases or is there a much simpler shortcut to give me the ordering I want?

On the other hand if I modify the strings such as

>>> "u11-Phrase 0101.wav" <  "u11-Phrase 1000.wav"
True

However those strings come from the file listing of directory such as:

files = glob.glob('*.wav')
files.sort()
for file in files:
    ...

So I'd rather not do surgical operations on the strings after they have been created by glob. And no, I don't want to change the original filenames in that folder, too.

Any hints?

like image 352
Emre Sevinç Avatar asked Dec 21 '09 13:12

Emre Sevinç


1 Answers

You are looking for human sorting.

The reason 101.wav is not less than 1000.wav is that computers (not just Python) sort strings character by character, and the first difference between these two strings is where the first string has a '1' and the second string has a '0'. '1' is not less than '0', so the strings compare as you have seen.

People naturally parse those strings into their components, and interpret the numbers numerically, not lexically. The code I linked to above will do that same sort of parsing.

like image 195
Ned Batchelder Avatar answered Oct 16 '22 12:10

Ned Batchelder