I recently read that one benefit of map
in Python 3 was that it is lazy. That means, it is better to do
map(lambda x: x**2, range(10**100))
rather than
[x**2 for x in range(10**100)]
What I'm curious about, is how I can put this laziness to use. If I generate the map object, how can I, for example, access a particular element in the resulting operation/list. In almost every documentation on map
I've seen, they will do something like print(map(...))
or for i in map(...)
which (as far as I understand it) relinquishes the lazy concept as it implicitly converts the map to a list.
I suppose what I'm looking for is the ability to use map objects in a similarly lazy fashion as range
where I can do x = range(10**100)
and lazily generate x[10000]
without huge computational loads.
If this concept doesn't exist, what is the benefit then of having map
be lazy? If you always need to convert it to some non-lazy object like a list, why does it matter that map
is lazy?
You are comparing apples to oranges here. range
is not just a lazy iterable. It is a specific object whose contents satisfy specific laws that allow to support many operations without actually building a huge sequence in memory. That's because the nth element of range
is basically just start + n*step
(modulo stop
, signs etc.)
However map
is meant to work with any function f
. In particular functions may have shared/global state which already defeats any chance of being able to do map(f, something)[100]
without performing 100 function calls. Not doing so breaks the correctness of the result.
map
is lazy simply means it doesn't immediately build a complete list of results but waits for you to require the next result before doing the call to f
and produce it. This avoid building unneccessary lists in code like:
for x in map(f, iterable):
# do something with x
where if map
was eager it would consume twice the memory of iterable
to do the loop, with a lazy map
the only space required is that of x
basically.
Moreover it allows to call map
on infinite iterables like count()
. This obviously result in a never ending program doing something, or at some point you can just stop looking into map
. An eager map
cannot handle this case.
If you want to use your own restricted map that works only on pure fuctions and that allow random access you could write your own class:
class PureMap:
def __init__(self, function, sequence):
self._f = function
self._sequence = sequence
def __iter__(self):
return map(self._f, self._sequence)
def __getitem__(self, i):
return self._f(self._sequence[i])
# etc.
However even in this case you have some problems:
If sequence
is actually an iterable
to obtain the nth element you have to consume the first n elements. After that you'd have to store them as a sequence in your class for future use. But this already defeats the purpose of the whole thing since doing PureMap(f, sequence)[1000]
requires you to store 1000
elements in memory anyway, even though it avoids 999
calls to f
.
You want to avoid calling f
multiple times on the same element. This means you'd also have to keep track of which element was already computed and which not.
The only situation where you could achieve what you want is the following:
range
that allows random access without having to produce other elementsWhen all those assumptions are met you can have a map object that "works like range
".
First, please note that range
(xrange
in Python 2) is a special case. It's not a simple generator, nor does it just return a list. It supports in
operations as well, which is not a standard feature of iterables or iterators.
Consider that map(func, iterable)
could be called on an infinite iterable, or an iterable where the process of fetching the next value is a time consuming process.
You'd need to be aware that your function might deal with these types of values, and make sure to use a lazy function, like itertools.imap
otherwise. Since its basically impossible to determine that an iterator is infinite, even at runtime, shouldn't the builtin function behave correctly for the widest range of inputs?
Not every use-case requires random access, and those that do must fully instantiate the iterable or use another itertools
function like islice
.
There are many benefits; for example, it makes it easier to write memory efficient code.
def take_up_a_lot_of_memory(*args):
"""
A contrived example of a function that uses a lot of memory
"""
return sum([i ** 2 for i in range(10 ** 6)])
megasum = sum(map(take_up_a_lot_of_memory, range(1000)))
Also, sometimes you can terminate a calculation early without iterating through all of the map results, and so avoid redundancy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With