I am looking for a linked list and related algorithms implementation for Python. Everyone I ask just recommends using built in Python lists, but performance measurements indicate that list insertion and removal is a bottleneck for our application. It's trivial to implement a simple linked list, but I wonder if there is a mature library which includes some operations like sort, merge, splice, search, lower/upper bound, etc...
I know this is a dupe, but searching for python list on any search engine gives predictably poor results, with most people just saying that linked lists are unneeded in python (pfft!).
PS: I need to insert and remove from anywhere in the list, not just the ends.
OK, you asked for it: I need to maintain an ordered list of several hundred thousand entries. I will iterate over the list forwards (one by one), using a visitor on each entry, starting from the beginning or a position found by a binary search. When an entry matching a predicate is found it is removed from the list, and then, another binary search is performed on a subset of the list beginning from the removed entry's previous position, until a position determined statistically beforehand. Ignoring the error condition, the modified entry may be used to create another linked list which is spliced into the new position found through the second binary search. Iteration is continued from the position where the entry was removed. On occasion, several thousand contiguous ordered entries may be added to/removed from any place in the list. Sometimes several thousand non-contiguous entries must be searched for and removed incrementally.
python's list is unnacceptable as the cost of insertion/removal is prohibitive, and the minor gains in speed for the binary search are totally irrelevant to the total cost. Our in house tests confirm this.
if I have neglected any detail perhaps I can e-mail you a copy of my company's non-disclosure agreement and I can privately correspond with you on the matter. sarcasm.end().
Here's a blog post sharing your pain. It includes an implementation of a linked list and a performance comparison.
Perhaps blist
would be better, though (from here)?
The use cases where the BList is slightly slower than Python's list are as follows (O(log n) vs. O(1)):
- A large list that never changes length.
- A large lists where inserts and deletes are only at the end of the list (LIFO).
With that disclaimer out of the way, here are some of the use cases where the BLists is dramatically faster than the built-in list:
- Insertion into or removal from a large list (O(log n) vs. O(n))
- Taking large slices of large lists (O(log n) vs O(n))
- Making shallow copies of large lists (O(1) vs. O(n))
- Changing large slices of large lists (O(log n + log k) vs. O(n + k))
- Multiplying a list to make a large, sparse list (O(log k) vs. O(kn))
Note that it's actually implemented as a B+ tree, allowing great performance for all those operations.
Python lists are O(1) for operations at the end of the list. If you'll be doing all your inserting in a semi-sequential fashion--by analogy to C, only keeping a single pointer into the middle of the list as a "cursor" of sorts--you could save yourself a lot of effort by just using two Python lists. One list for what's before the cursor, one for what's after; moving the cursor involves pulling the next item from one list and appending it to the other. That gives you arbitrary O(1) inserting at the cursor location with far less effort and wheel reinventing than making an entire new data structure, letting you reuse a lot of the existing list functions.
For the fully general case allowing multiple references into the list, though, you're probably stuck making a linked list of some sort.
Edit: You're not seriously thinking of actually doing a "binary search" on a linked list, are you? Binary search doesn't even make sense on an intrinsically sequential data structure...
Anyway, if you're okay with linear-time searching and your insertions will always preserve the list order without re-sorting, then a simple linked list might be all you need. If you do as much searching as iterating you should consider something with fast indexing, and if resorting might be required something like a tree would be better.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With