I'm reading in a file and pulling in data that includes some strings and some numbers, in Python. I'm storing this information as lists of lists, like this:
dataList = [
['blah', 2, 3, 4],
['blahs', 6, 7, 8],
['blaher', 10, 11, 12],
]
I want to keep dataList sorted by the second element of the sub list: dataList[][1]
I thought I could use insort or bisect right when I want to add them in, but I cannot figure out how to make it look at the second element of the sub list.
Any thoughts here? I was just appending data to the end and then doing a linear sort to find things back later on. But, throw a few 10's of thousands of sub-lists in here and then search for 100k items and it takes a while.
You will use these extensively in your Python programming. One of the chief characteristics of a list is that it is ordered. The order of the elements in a list is an intrinsic property of that list and does not change, unless the list itself is modified.
The original list is not changed. It's most common to pass a list into the sorted() function, but in fact it can take as input any sort of iterable collection.
Depends on a few things, but the first thing that comes to mind is using the heapq module:
import heapq
heap = []
for row in rows:
heapq.heappush(heap, (row[1], row))
This would create a heap full of tuples, where the first element is the element you want to sort by, and the second element is the row.
The simplest method to read them back from the heap would be to copy it then pop items:
new_heap = list(heap)
while new_heap:
_, row = heapq.heappop(new_heap)
print row
The runtime of inserting each item into the heap is O(lg N)
, so creating the heap will require O(N lg N)
time, and popping items from the heap also requires O(lg N)
time, so O(N lg N)
time will be required to traverse it.
If these tradeoffs are not ideal, you could use a binary search tree (none exist in the standard library, but they are easy to find), or as other commenters have suggested, sort the rows after reading them: rows.sort(key=lambda row: row[1])
.
Now, in practice, unless you're dealing with a very large number of rows, it will almost certainly be faster to sort the list in-place after loading it (ie, using the .sort()
method)… So try a few things out and see what works best.
Finally, bisect
is a poor idea, because inserting into Python lists requires O(N)
time, so inserting items with bisect would require O(N lg N)
time per item, so a total time of O((N lg N) * N) = O(N**2)
time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With