Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Python's sort function the same as Linux's sort with LC_ALL=C

I'm porting a Bash script to Python. The script sets LC_ALL=C and uses the Linux sort command to ensure the native byte order instead of locale-specific sort orders (http://stackoverflow.com/questions/28881/why-doesnt-sort-sort-the-same-on-every-machine).

In Python, I want to use Python's list sort() or sorted() functions (without the key= option). Will I always get the same results as Linux sort with LC_ALL=C?

like image 583
tahoar Avatar asked Jan 08 '12 10:01

tahoar


2 Answers

Sorting should behave as you expect if you pass locale.strcoll as the cmp argument to list.sort() and sorted():

import locale
locale.setlocale(locale.LC_ALL, "C")
yourList.sort(cmp=locale.strcoll)

But in Python 3 (from this answer):

import locale
from functools import cmp_to_key
locale.setlocale(locale.LC_ALL, "C")
yourList.sort(key=cmp_to_key(locale.strcoll))
like image 78
Frédéric Hamidi Avatar answered Oct 05 '22 23:10

Frédéric Hamidi


Considering you can add a comparison function, you can make sure that the sort is going to be the equivalent of LC_ALL=C. From the docs, though, it looks like if all the characters are 7bit, then it sorts in this manner by default, otherwise is uses locale specific sorting.

In the case that you have 8bit or Unicode characters, then locale specific sorting makes a lot of sense.

like image 28
Petesh Avatar answered Oct 05 '22 23:10

Petesh