Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Priority queue does'nt keep order on same priority elements

I'm using python's Queue.PriorityQueue, and ran into the following problem: when inserting several elements to the queue which have the same priority, I would expect the queue to serve them in the order of insertion (FIFO). For some reason this is not the case:

>>> from Queue import PriorityQueue
>>>
>>> j1 = (1, 'job1')
>>> j2 = (1, 'job2')
>>> j3 = (1, 'job3')
>>> j4 = (1, 'job4')
>>> 
>>> q = PriorityQueue()
>>> q.put(j1)
>>> q.put(j2)
>>> q.put(j3)
>>> q.put(j4)
>>> q.queue
[(1, 'job1'), (1, 'job2'), (1, 'job3'), (1, 'job4')]
>>> q.get()
(1, 'job1')
>>> q.queue
[(1, 'job2'), (1, 'job4'), (1, 'job3')]

As can be seen from the example, the order has been mixed after one get(). What's the reason? how to overcome (keep the order of same prio elements)?

EDIT:

I was asked to add an example that shows that q.get() actually mess things up with the FIFO ordering, so here's an elaborate example:

class Job(object):
    def __init__(self, type_, **data):
        self.type_ = type_
        self.priority = 0 if self.type_ == 'QUIT' else 1
        self.data = data

    def __cmp__(self, other):
        return cmp(self.priority, other.priority)

    def __repr__(self):
        return 'Job("' + self.type_ + '", data=' + repr(self.data) + ')' 

q = PriorityQueue()
q.put(Job('Build'))
q.put(Job('Clean'))
q.put(Job('QUIT'))
q.put(Job('Create'))
q.put(Job('Build'))
q.put(Job('Clean'))

Now I'll dequeue the elements one by one. The expected result: QUIT goes out first, and then the rest, FIFO ordered: Build, Clean, Create, Build, Clean:

>>> q.get()
Job("QUIT", data={})
>>> q.get()
Job("Build", data={})
>>> q.get()
Job("Clean", data={})
>>> q.get()
Job("Build", data={}) # <<---
>>> q.get()
Job("Clean", data={})
>>> q.get()
Job("Create", data={})
like image 579
Omer Dagan Avatar asked Dec 25 '17 14:12

Omer Dagan


People also ask

What is high priority queue in Python?

Priority Queue in Python 1 An element with high priority is dequeued before an element with low priority. 2 If two elements have the same priority, they are served according to their order in the queue. More ...

Is it possible to sort a priority queue?

A priority queue is not supposed to be sorted. The priority queue only guarantees that when you call get (), it returns you the highest priority item. Internally, queue.PriorityQueue uses a binary heap to contain the items.

How do you sort a queue in Python?

In Python Priority Queue, a custom comparator can be used to sort the queue based on user-defined values. For example, we create a Priority Queue using heapq. Then we sort the heapq using the sorted () method. It will sort the elements in the queue according to the keys (priority number) of the elements.

How to implement priority queue with tuples in Python?

Both lists and tuples are ordered data structures of Python and allow duplicate values. But the elements of a list are changeable and the elements of a tuple are unchangeable. To implement Priority Queue with tuples, we will create a tuple first with elements of a priority queue and then we will sort the tuple.


2 Answers

Priority queues "are often implemented with heaps" and Python is no exception. As the documentation says, it's "using the heapq module". And heaps don't naturally offer stability. That's also why heapsort "is not a stable sort". If you want stability, you'll need to enforce it yourself. Fortunately it's as simple as storing entries "as 3-element list including the priority, an entry count, and the task".

Note that you give Python's priority queue pairs of priority and task, but the queue doesn't care. It doesn't think of the two values as priority and task. It just thinks of the pair as one "item" and it never even looks into it. Only we the users think of the pair as priority and task. So you could also give it task strings alone, without extra priorities. The queue wouldn't even notice. It doesn't try to extract some priority. For its prioritization it just asks the whole item whether it's smaller than another. That's why, when you want to prioritize tasks not just by their natural order (e.g., the string 'job1' being smaller than the string 'job2'), you use a tuple of priority and task. Tuples are ordered lexicographically, so (a, b) is smaller than (c, d) if a is smaller than c or if they're equal and b is smaller than d. So when the queue asks such a tuple whether it's smaller than another, it's the tuple that looks into itself and considers the priority first and then potentially the task second.

Also, with q.queue you're inspecting the queue's underlying data structure. You shouldn't care about that. Not sure why it's even accessible. But if you do inspect it, you need to look at it as the heap it is, not think of it as a sorted list. It's not that "the order has been mixed" as you put it, it's that you misinterpreted that list. Anyway... the order you should instead care about is the order you actually get. With q.get(). If you just get all four items of that example with q.get(), you'll see that it does give them to you in your insertion order. Although that's because you're inserting them in sorted order and they only have one possible order, as there are no equal items. You'll get (1, 'job1') first not because it was inserted first but because it's the smallest of the four tuples (because the priorities are the same and 'job1' is the smallest of the four strings). And you'll get (1, 'job2') second not because it was inserted second but because it's the second-smallest item. And so on. If you inserted them in any other order, you'd still get them in order (1, 'job1'), (1, 'job2'), (1, 'job3'), (1, 'job4').

About your added example: Your Job objects only compare themselves by their priority. And those Build, Clean, Create, Build and Clean objects all have the same priority. So as far as the queue can tell, they're all equal! That's not like your first example, where your four tuples only allow one possible order. So we're back at what I said at the start, heaps don't naturally offer stability and if you want stability, you should add an entry count. Check out the explanation and recipe I linked there. It uses a list as heap and uses heapq functions, but you can easily adapt it to use a PriorityQueue instead. Though instead of those separate top-level helper functions, maybe better define your own StablePriorityQueue class, as subclass or wrapper of PriorityQueue.

like image 177
Stefan Pochmann Avatar answered Oct 13 '22 10:10

Stefan Pochmann


As explained here, the Python PriorityQueue is implemented with a binary heap.

A binary heap is a binary tree where each node's value is equal or greater the values of both its children. Hence in a binary heap the root always contains the minimum value. Once you remove the minimum node, the heap is reorganized so that the basic heap property is still in effect.

A heap is usually implemented using an array, where a[k] is the parent of a[2*k] and a[2*k+1]. In Python, q.queue is this array. After you remove an element from the heap, the array is reordered in a way that doesn't preserve the original order.

like image 38
zmbq Avatar answered Oct 13 '22 08:10

zmbq