Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What would you use the heapq Python module for in real life?

After reading Guido's Sorting a million 32-bit integers in 2MB of RAM using Python, I discovered the heapq module, but the concept is pretty abstract to me.

One reason is that I don't understand the concept of a heap completely, but I do understand how Guido used it.

Now, beside his kinda crazy example, what would you use the heapq module for?

Must it always be related to sorting or minimum value? Is it only something you use because it's faster than other approaches? Or can you do really elegant things that you can't do without?

like image 712
e-satis Avatar asked Dec 24 '11 22:12

e-satis


People also ask

What is Heapq used for?

The Python heapq module is part of the standard library. It implements all the low-level heap operations as well as some high-level common uses for heaps. A priority queue is a powerful tool that can solve problems as varied as writing an email scheduler, finding the shortest path on a map, or merging log files.

Is Python Heapq stable?

This is similar to sorted(iterable), but unlike sorted(), this implementation is not stable.

What does Heapq merge do?

merge(*iterables) : Merge multiple sorted inputs into a single sorted output (for example, merge timestamped entries from multiple log files). Returns an iterator over the sorted values.

How do I use Heapq in Python for max heap?

Using Library functions: We use heapq class to implement Heap in Python. By default Min Heap is implemented by this class. But we multiply each value by -1 so that we can use it as MaxHeap.


1 Answers

The heapq module is commonly use to implement priority queues.

You see priority queues in event schedulers that are constantly adding new events and need to use a heap to efficiently locate the next scheduled event. Some examples include:

  • Python's own sched module: http://hg.python.org/cpython/file/2.7/Lib/sched.py#l106
  • The Tornado web server: https://github.com/facebook/tornado/blob/master/tornado/ioloop.py#L260
  • Twisted internet servers: http://twistedmatrix.com/trac/browser/trunk/twisted/internet/base.py#L712

The heapq docs include priority queue implementation notes which address the common use cases.

In addition, heaps are great for implementing partial sorts. For example, heapq.nsmallest and heapq.nlargest can be much more memory efficient and do many fewer comparisons than a full sort followed by a slice:

>>> from heapq import nlargest
>>> from random import random
>>> nlargest(5, (random() for i in xrange(1000000)))
[0.9999995650034837, 0.9999985756262746, 0.9999971934450994, 0.9999960394998497, 0.9999949126363714]
like image 58
Raymond Hettinger Avatar answered Oct 03 '22 03:10

Raymond Hettinger