Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I build a list, and sort it at the same time?

I'm working on a script for a piece of software, and it doesn't really give me direct access to the data I need. Instead, I need to ask for each piece of information I need, and build a list of the data I'm getting. For various reasons, I need the list to be sorted. It's very easy to just build the list once, and then sort it, followed by doing stuff with it. However, I assume it would be faster to run through everything once, rather than build the list and then sort it.

So, at the moment I've basically got this:

my_list = []

for item in "query for stuff":
    my_list.append("query for %s data" % item)

my_list.sort()

do_stuff(my_list)

The "query for stuff" bit is the query interface with the software, which will give me an iterable. my_list needs to contain a list of data from the contents of said iterable. By doing it like this, I'm querying for the first list, then looping over it to extract the data and put it into my_list. Then I'm sorting it. Lastly, I'm doing stuff to it with the do_stuff() method, which will loop over it and do stuff to each item.

The problem is that I can't do_stuff() to it before it's sorted, as the list order is important for various reasons. I don't think I can get away from having to loop over lists twice — once to build the list and once to do stuff to each item in it, as we won't know in advance if a recently added item at position N will stay at position N after we've added the next item — but it seems cleaner to insert each item in a sorted fashion, rather than just appending them at the end. Kind of like this:

for item in "query for stuff":
    my_list.append_sorted(item)

Is it worth bothering trying to do it like this, or should I just stick to building the list, and then sorting it?

Thanks!

like image 653
Simon Lundberg Avatar asked Nov 05 '11 15:11

Simon Lundberg


2 Answers

The short answer is: it's not worth it.

Have a look at insertion sort. The worst-case running time is O(n^2) (average case is also quadratic). On the other hand, Python's sort (also known as Timsort) will take O(n log n) in the worst case.

Yes, it does "seem" cleaner to keep the list sorted as you're inserting, but that's a fallacy. There is no real benefit to it. The only time you'd consider using insertion sort is when you need to show the sorted list after every insertion.

like image 193
mpenkov Avatar answered Sep 21 '22 14:09

mpenkov


The two approaches are asmptotically equivalent.

Sorting is O(n lg n) (Python uses Timsort by default, except for very small arrays), and inserting in a sorted list is O(lg n) (using binary search), which you would have to do n times.

In practice, one method or the other may be slightly faster, depending on how much of your data is already sorted.

EDIT: I assumed that inserting in the middle of a sorted list after you've found the insertion point would be constant time (i.e. the list behaved like a linked list, which is the data structure you would use for such an algorithm). This probably isn't the case with Python lists, as pointed out by Sven. This would make the "keep the list sorted" approach O(n^2), i.e. insertion sort.

I say "probably" because some list implementations switch from array to linked list as the list grows, the most notable example being CFArray/NSArray in CoreFoundation/Cocoa. This may or may not be the case with Python.

like image 42
Can Berk Güder Avatar answered Sep 19 '22 14:09

Can Berk Güder