I'm porting a Bash script to Python. The script sets LC_ALL=C
and uses the Linux sort command to ensure the native byte order instead of locale-specific sort orders (http://stackoverflow.com/questions/28881/why-doesnt-sort-sort-the-same-on-every-machine).
In Python, I want to use Python's list sort()
or sorted()
functions (without the key=
option). Will I always get the same results as Linux sort with LC_ALL=C
?
Sorting should behave as you expect if you pass locale.strcoll as the cmp
argument to
list.sort() and sorted():
import locale
locale.setlocale(locale.LC_ALL, "C")
yourList.sort(cmp=locale.strcoll)
But in Python 3 (from this answer):
import locale
from functools import cmp_to_key
locale.setlocale(locale.LC_ALL, "C")
yourList.sort(key=cmp_to_key(locale.strcoll))
Considering you can add a comparison function, you can make sure that the sort is going to be the equivalent of LC_ALL=C. From the docs, though, it looks like if all the characters are 7bit, then it sorts in this manner by default, otherwise is uses locale specific sorting.
In the case that you have 8bit or Unicode characters, then locale specific sorting makes a lot of sense.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With