Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Memory efficient sort of a list of tuples by two elements

I have a very large list of tuples that I would like to sort by two elements. For example:

List = [('chr1', 34234, 'extrainfo'), ('chr1', 1234, 'extrainfo'), ('chr3', 4234, 'extrainfo'), ('chr1', 3241, 'extrainfo')]

This is a really large list and I wanted to sort using:

List = sorted(List, key=lambda i: (i[0], int[1])))

This works well when using smaller lists such as the above example. However, when I run my code using my much larger datasets I get memory errors:

Python(32306) malloc: *** mmap(size=34684928) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
  File "MyCode.py", line 139, in <module>
    List = sorted(List, key=lambda i: (i[0], int(i[1])))
MemoryError
like image 855
user2165857 Avatar asked Apr 30 '14 21:04

user2165857


People also ask

How do you sort a list of tuples based on two values?

Method #1: Using the Bubble Sort Using the technique of Bubble Sort to we can perform the sorting. Note that each tuple is an element in the given list. Access the second element of each tuple using the nested loops. This performs the in-place method of sorting.

How do you sort two tuples in Python?

Using sorted() In Python, use the sorted() built-in function to sort a Tuple. The tuple should be passed as an argument to the sorted() function. The tuple items are sorted (by default) in ascending order in the list returned by the function. We can use a tuple to convert this list data type to a tuple ().

Which is more efficient sort () or sorted ()?

sort is slightly faster than sorted and consumes around 24% less memory. However, keep in mind that list. sort is only implemented for lists, whereas sorted accepts any iterable.


1 Answers

Some things you can try, roughly in order of difficulty/desirability.

  • Don't create a sorted copy of the list using sorted(). Instead, sort the list in place using List.sort().

  • Sort the list twice, first with key=lambda i: i[1] and then with key=lambda i: i[0]. This will take longer, but the list of keys will require less space on each pass. Python`s sort is guaranteed stable in v2.2 and later. Sorting on the keys in the reversed order of their importance is the way we used to do it back when we could only sort on one key at a time.

  • Don't use a key function at all. Sorting by the items in a tuple in order is the default behavior! You don't care about the order of the third and subsequent items, so why not just let Python go ahead and sort on them? They'll be in order too, but that's as good as any order. (This won't work if the other elements are types that don't support comparison.)

  • Use a cmp function rather than a key function if your version of Python is old enough to support it. This will avoid generating a list of keys, but will be slower and won't work in Python 3.

  • Use a 64-bit version of Python on a 64-bit OS on a machine with plenty of memory.

  • Implement your own sort.

like image 146
kindall Avatar answered Nov 15 '22 00:11

kindall