I've been trying to performance optimize a BFS implementation in Python and my original implementation was using deque to store the queue of nodes to expand and a dict to store the same nodes so that I would have efficient lookup to see if it is already open.
I attempted to optimize (simplicity and efficiency) by moving to an OrderedDict. However, this takes significantly more time. 400 sample searches done take 2 seconds with deque/dict and 3.5 seconds with just an OrderedDict.
My question is, if OrderedDict does the same functionality as the two original data structures, should it not at least be similar in performance? Or am I missing something here? Code examples below.
Using just an OrderedDict:
open_nodes = OrderedDict() closed_nodes = {} current = Node(start_position, None, 0) open_nodes[current.position] = current while open_nodes: current = open_nodes.popitem(False)[1] closed_nodes[current.position] = (current) if goal(current.position): return trace_path(current, open_nodes, closed_nodes) # Nodes bordering current for neighbor in self.environment.neighbors[current.position]: new_node = Node(neighbor, current, current.depth + 1) open_nodes[new_node.position] = new_node
Using both a deque and a dictionary:
open_queue = deque() open_nodes = {} closed_nodes = {} current = Node(start_position, None, 0) open_queue.append(current) open_nodes[current.position] = current while open_queue: current = open_queue.popleft() del open_nodes[current.position] closed_nodes[current.position] = (current) if goal_function(current.position): return trace_path(current, open_nodes, closed_nodes) # Nodes bordering current for neighbor in self.environment.neighbors[current.position]: new_node = Node(neighbor, current, current.depth + 1) open_queue.append(new_node) open_nodes[new_node.position] = new_node
Intent signaling: If you use OrderedDict over dict , then your code makes it clear that the order of items in the dictionary is important. You're clearly communicating that your code needs or relies on the order of items in the underlying dictionary.
No it won't become redundant in Python 3.7 because OrderedDict is not just a dict that retains insertion order, it also offers an order dependent method, OrderedDict. move_to_end() , and supports reversed() iteration*. Two relevant questions here and here. Thank you very much for the explanation.
An OrderedDict is a dictionary subclass in which the order of the content added is maintained.
Both deque and dict are implemented in C and will run faster than OrderedDict which is implemented in pure Python.
The advantage of the OrderedDict is that it has O(1) getitem, setitem, and delitem just like regular dicts. This means that it scales very well, despite the slower pure python implementation.
Competing implementations using deques, lists, or binary trees usually forgo fast big-Oh times in one of those categories in order to get a speed or space benefit in another category.
Update: Starting with Python 3.5, OrderedDict() now has a C implementation. And though it hasn't been highly optimized like some of the other containers. It should run much faster than the pure python implementation. Then starting with Python 3.6, regular dictionaries has been ordered (though the ordering behavior is not yet guaranteed). Those should run faster still :-)
Like Sven Marnach said, OrderedDict
is implemented in Python, I want to add that it is implemented using dict
and list
.
dict
in python is implemented as hashtable. I am not sure how deque
is implemented, but documentation says that deque
is optimized for quick adding or accessing first/last elements, so I guess that deque
is implemented as linked-list.
I think when you do pop
on OrderedDict, python does hashtable look-up which is slower compared to linked-list which has direct pointers to last and first elements. Adding an element to the end of linked-list is also faster compared with hash-table.
So primary cause why OrderDict
in your example is slower, is because it is faster to access last element from linked-list, than to access any element using hash-table.
My thoughts are based on information from book Beautiful Code, it describes implementation details behind dict
, however I do not know much details behind list
and deque
, this answer is just my intuition of how things work, so in case I am wrong, I really deserve down-votes for talking things which I am not sure about. Why I talk things on which I am not sure? -Because I want to test my intuition :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With