itertools.imap vs map over the entire iterable

Question

I'm curious about a statement from http://docs.python.org/2/library/itertools.html#itertools.imap, namely it describes

sum(imap(operator.mul, vector1, vector2))

as an efficient dot-product. My understanding is that imap gives a generator instead of a list, and while I understand how it would be faster/consume less memory if you're only considering the first few elements, with the surrounding sum(), I don't see how it behaves any differently than:

sum(map(operator.mul, vector1, vector2))

Max Noel · Accepted Answer

The difference between map and imap becomes clear when you start increasing the size of what you're iterating over:

# xrange object, takes up no memory
data = xrange(1000000000)

# Tries to builds a list of 1 billion elements!
# Therefore, fails with MemoryError on 32-bit systems.
doubled = map(lambda x: x * 2, data)

# Generator object that lazily doubles each item as it's iterated over.
# Takes up very little (and constant, independent of data's size) memory.
iter_doubled = itertools.imap(lambda x: x * 2, data)

# This is where the iteration and the doubling happen.
# Again, since no list is created, this doesn't run you out of memory.
sum(iter_doubled)

# (The result is 999999999000000000L, if you're interested.
# It takes a minute or two to compute, but consumes minimal memory.)

Note that in Python 3, the built-in map behaves like Python 2's itertools.imap (which was removed because it's no longer needed). To get the "old map" behaviour, you'd use list(map(...)), which is another good way to visualize how Python 2's itertools.imap and map differ from each other.

BartoszKP · Answer

The first line will calculate the sum accumulating items one by one. The second one will at first calculate the whole dot-product, and after that, having the whole result in memory it would proceed to calculate the sum. So there is a memory complexity gain.

Tim Peters · Answer

One other thing to note is that "uses a lot less memory" often implies "runs faster" too. The lazy (iterator) version consumes each product as soon as it's computed, adding it into the running sum. The product and running sum are both almost certainly in L1 cache. If you compute all the products first, then depending on the number of elements it's certain that the first products computed will be kicked out of L1 cache, and then out of L2 cache, and ... so that when a second pass is made to finally add them together, all the products are low in the memory hierarchy (and in the extreme, have to be read up from a paging file).

But it's unclear to me what you mean by "don't see how it behaves any differently than". The final computed result is the same either way.

Kirk Strauser · Answer

The difference is that the entire output of imap(...) or map(...) is passed to sum(). You're write that imap returns a generator, but I think you might be under the impression that sum(map(...)) has some shortcut that does the same thing. It doesn't. map() will construct an entire list of results before anything gets passed to sum().

itertools.imap vs map over the entire iterable

Tags:

python

python-2.x

itertools

Joe

4 Answers

Max Noel

BartoszKP

Tim Peters

Kirk Strauser

Recent Activity

Donate For Us

itertools.imap vs map over the entire iterable

Tags:

python

python-2.x

itertools

Joe

4 Answers

Max Noel

BartoszKP

Tim Peters

Kirk Strauser

Related questions

Recent Activity

Donate For Us