I have a very large list of tuples that I would like to sort by two elements. For example:
List = [('chr1', 34234, 'extrainfo'), ('chr1', 1234, 'extrainfo'), ('chr3', 4234, 'extrainfo'), ('chr1', 3241, 'extrainfo')]
This is a really large list and I wanted to sort using:
List = sorted(List, key=lambda i: (i[0], int[1])))
This works well when using smaller lists such as the above example. However, when I run my code using my much larger datasets I get memory errors:
Python(32306) malloc: *** mmap(size=34684928) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
File "MyCode.py", line 139, in <module>
List = sorted(List, key=lambda i: (i[0], int(i[1])))
MemoryError
Method #1: Using the Bubble Sort Using the technique of Bubble Sort to we can perform the sorting. Note that each tuple is an element in the given list. Access the second element of each tuple using the nested loops. This performs the in-place method of sorting.
Using sorted() In Python, use the sorted() built-in function to sort a Tuple. The tuple should be passed as an argument to the sorted() function. The tuple items are sorted (by default) in ascending order in the list returned by the function. We can use a tuple to convert this list data type to a tuple ().
sort is slightly faster than sorted and consumes around 24% less memory. However, keep in mind that list. sort is only implemented for lists, whereas sorted accepts any iterable.
Some things you can try, roughly in order of difficulty/desirability.
Don't create a sorted copy of the list using sorted()
. Instead, sort the list in place using List.sort()
.
Sort the list twice, first with key=lambda i: i[1]
and then with key=lambda i: i[0]
. This will take longer, but the list of keys will require less space on each pass. Python`s sort is guaranteed stable in v2.2 and later. Sorting on the keys in the reversed order of their importance is the way we used to do it back when we could only sort on one key at a time.
Don't use a key function at all. Sorting by the items in a tuple in order is the default behavior! You don't care about the order of the third and subsequent items, so why not just let Python go ahead and sort on them? They'll be in order too, but that's as good as any order. (This won't work if the other elements are types that don't support comparison.)
Use a cmp
function rather than a key
function if your version of Python is old enough to support it. This will avoid generating a list of keys, but will be slower and won't work in Python 3.
Use a 64-bit version of Python on a 64-bit OS on a machine with plenty of memory.
Implement your own sort.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With