Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting list of string with specific locale in python

I work on an application that uses texts from different languages, so, for viewing or reporting purposes, some texts (strings) need to be sorted in a specific language.

Currently I have a workaround messing with the global locale settings, which is bad, and I don't want to put it in production:

default_locale = locale.getlocale(locale.LC_COLLATE)

def sort_strings(strings, locale_=None):
    if locale_ is None:
        return sorted(strings)

    locale.setlocale(locale.LC_COLLATE, locale_)
    sorted_strings = sorted(strings, cmp=locale.strcoll)
    locale.setlocale(locale.LC_COLLATE, default_locale)

    return sorted_strings

The official python locale documentation explicitly says that saving and restoring is a bad idea, but does not give any suggestions: http://docs.python.org/library/locale.html#background-details-hints-tips-and-caveats

like image 358
vonPetrushev Avatar asked Jun 20 '12 14:06

vonPetrushev


People also ask

Can you sort a list of strings Python?

In Python, there are two ways, sort() and sorted() , to sort lists ( list ) in ascending or descending order. If you want to sort strings ( str ) or tuples ( tuple ), use sorted() .

How do you sort a list of strings in Python without sorting?

You can use Nested for loop with if statement to get the sort a list in Python without sort function. This is not the only way to do it, you can use your own logic to get it done.

How do you arrange strings in ascending order in Python?

The sorted() function returns a sorted list of the specified iterable object. You can specify ascending or descending order. Strings are sorted alphabetically, and numbers are sorted numerically.


1 Answers

You could use a PyICU's collator to avoid changing global settings:

import icu # PyICU

def sorted_strings(strings, locale=None):
    if locale is None:
       return sorted(strings)
    collator = icu.Collator.createInstance(icu.Locale(locale))
    return sorted(strings, key=collator.getSortKey)

Example:

>>> L = [u'sandwiches', u'angel delight', u'custard', u'éclairs', u'glühwein']
>>> sorted_strings(L)
['angel delight', 'custard', 'glühwein', 'sandwiches', 'éclairs']
>>> sorted_strings(L, 'en_US')
['angel delight', 'custard', 'éclairs', 'glühwein', 'sandwiches']

Disadvantage: dependency on PyICU library; the behavior is slightly different from locale.strcoll.


I don't know how to get locale.strxfrm function given a locale name without changing it globally. As a hack you could run your function in a different child process:

pool = multiprocessing.Pool()
# ...
pool.apply(locale_aware_sort, [strings, loc])

Disadvantage: might be slow, resource hungry


Using ordinary threading.Lock won't work unless you can control every place where locale aware functions (they are not limited to locale module e.g., re) could be called from multiple threads.


You could compile your function using Cython to synchronize access using GIL. GIL will make sure that no other Python code can be executed while your function is running.

Disadvantage: not pure Python

like image 124
jfs Avatar answered Oct 06 '22 01:10

jfs