Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are heap queues for?

Tags:

python

Reading Guido's infamous answer to the question Sorting a million 32-bit integers in 2MB of RAM using Python, I discovered the module heapq.

I also discover I didn't understand jack about it, nor did I know what I could do with it.

Can you explain to me (with the proverbial 6 years old target) what is the heap queue algorithm for and what you can do with it ?

Can you provide a simple Python snippet where using it (with the heapq module) solves a problem that will be better solved with it and not with something else ?

like image 370
e-satis Avatar asked Dec 09 '12 14:12

e-satis


People also ask

What is the purpose of heaps?

A heap is a binary tree data structure (see BinaryTrees) in which each element has a key (or sometimes priority) that is less than the keys of its children. Heaps are used to implement the priority queue abstract data type (see AbstractDataTypes), which we'll talk about first.

Why is heap a priority queue?

The priority queue is the queue data structure and the heap is the tree data structure that operates and organizes data. The priority queue is based on a queue data structure working as a queue with a priority function. The heap is a tree data structure uses for sorting data in a specific order using an algorithm.

Where is heap used in real life?

Taking an example from real life —Heap Sorting can be applied to a sim card store where there are many customers in line. The customers who have to pay bills can be dealt with first because their work will take less time. This method will save time for many customers in line and avoid unnecessary waiting.


2 Answers

heapq implements binary heaps, which are a partially sorted data structure. In particular, they have three interesting operations:

  • heapify turns a list into a heap, in-place, in O(n) time;
  • heappush adds an element to the heap in O(lg n) time;
  • heappop retrieves the smallest element off the heap in O(lg n) time.

Many interesting algorithms rely on heaps for performance. The simplest one is probably partial sorting: getting the k smallest (or largest) elements of a list without sorting the entire list. heapq.nsmallest (nlargest) does that. The implementation of nlargest can be paraphrased as:

def nlargest(n, l):
    # make a heap of the first n elements
    heap = l[:n]
    heapify(heap)

    # loop over the other len(l)-n elements of l
    for i in xrange(n, len(l)):
        # push the current element onto the heap, so its size becomes n+1
        heappush(heap, l[i])
        # pop the smallest element off, so that the heap will contain
        # the largest n elements of l seen so far
        heappop(heap)

    return sorted(heap, reverse=True)

Analysis: let N be the number of elements in l. heapify is run once, for a cost of O(n); that's negligible. Then, in a loop running N-n = O(N) times, we perform a heappop and a heappush at O(lg n) cost each, giving a total running time of O(N lg n). When N >> n, this is a big win compared to the other obvious algorithm, sorted(l)[:n], which takes O(N lg N) time.

like image 199
Fred Foo Avatar answered Nov 15 '22 18:11

Fred Foo


For example: you have a set of 1000 floating-point number. You want to repeatedly remove the smallest item from the set and replace it with a random number between 0 and 1. The fastest way to do it is with the heapq module:

heap = [0.0] * 1000
# heapify(heap)   # usually you need this, but not if the list is initially sorted
while True:
    x = heappop(heap)
    heappush(head, random.random())

This takes a time per iteration that is logarithmic in the length of the heap (i.e. around 7 units, for a list of length 1000). Other solutions take a linear time (i.e. around 1000 units, which is 140 times slower, and gets slower and slower when the length increases):

lst = [0.0] * 1000
while True:
    x = min(lst)    # linear
    lst.remove(x)   # linear
    lst.append(random.random())

or:

lst = [0.0] * 1000
while True:
    x = lst.pop()   # get the largest one in this example
    lst.append(random.random())
    lst.sort()      # linear (in this case)

or even:

lst = [0.0] * 1000
while True:
    x = lst.pop()   # get the largest one in this example
    bisect.insort(lst, random.random())   # linear
like image 21
Armin Rigo Avatar answered Nov 15 '22 19:11

Armin Rigo