Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lazy evaluation of map

I recently read that one benefit of map in Python 3 was that it is lazy. That means, it is better to do

map(lambda x: x**2, range(10**100))

rather than

[x**2 for x in range(10**100)]

What I'm curious about, is how I can put this laziness to use. If I generate the map object, how can I, for example, access a particular element in the resulting operation/list. In almost every documentation on map I've seen, they will do something like print(map(...)) or for i in map(...) which (as far as I understand it) relinquishes the lazy concept as it implicitly converts the map to a list.

I suppose what I'm looking for is the ability to use map objects in a similarly lazy fashion as range where I can do x = range(10**100) and lazily generate x[10000] without huge computational loads.

If this concept doesn't exist, what is the benefit then of having map be lazy? If you always need to convert it to some non-lazy object like a list, why does it matter that map is lazy?

like image 590
zephyr Avatar asked May 24 '16 14:05

zephyr


3 Answers

You are comparing apples to oranges here. range is not just a lazy iterable. It is a specific object whose contents satisfy specific laws that allow to support many operations without actually building a huge sequence in memory. That's because the nth element of range is basically just start + n*step (modulo stop, signs etc.)

However map is meant to work with any function f. In particular functions may have shared/global state which already defeats any chance of being able to do map(f, something)[100] without performing 100 function calls. Not doing so breaks the correctness of the result.

map is lazy simply means it doesn't immediately build a complete list of results but waits for you to require the next result before doing the call to f and produce it. This avoid building unneccessary lists in code like:

for x in map(f, iterable):
    # do something with x

where if map was eager it would consume twice the memory of iterable to do the loop, with a lazy map the only space required is that of x basically.

Moreover it allows to call map on infinite iterables like count(). This obviously result in a never ending program doing something, or at some point you can just stop looking into map. An eager map cannot handle this case.

If you want to use your own restricted map that works only on pure fuctions and that allow random access you could write your own class:

class PureMap:
    def __init__(self, function, sequence):
        self._f = function
        self._sequence = sequence

    def __iter__(self):
        return map(self._f, self._sequence)
    def __getitem__(self, i):
        return self._f(self._sequence[i])
    # etc.

However even in this case you have some problems:

  1. If sequence is actually an iterable to obtain the nth element you have to consume the first n elements. After that you'd have to store them as a sequence in your class for future use. But this already defeats the purpose of the whole thing since doing PureMap(f, sequence)[1000] requires you to store 1000 elements in memory anyway, even though it avoids 999 calls to f.

  2. You want to avoid calling f multiple times on the same element. This means you'd also have to keep track of which element was already computed and which not.

The only situation where you could achieve what you want is the following:

  • The function being called is pure
  • The iterable argument is something like range that allows random access without having to produce other elements
  • The function you call is fast so that you can recompute it on the various elements without worrying too much about performance.

When all those assumptions are met you can have a map object that "works like range".

like image 58
Bakuriu Avatar answered Oct 22 '22 09:10

Bakuriu


First, please note that range (xrange in Python 2) is a special case. It's not a simple generator, nor does it just return a list. It supports in operations as well, which is not a standard feature of iterables or iterators.

Consider that map(func, iterable) could be called on an infinite iterable, or an iterable where the process of fetching the next value is a time consuming process.

You'd need to be aware that your function might deal with these types of values, and make sure to use a lazy function, like itertools.imap otherwise. Since its basically impossible to determine that an iterator is infinite, even at runtime, shouldn't the builtin function behave correctly for the widest range of inputs?

Not every use-case requires random access, and those that do must fully instantiate the iterable or use another itertools function like islice.

like image 30
mobiusklein Avatar answered Oct 22 '22 07:10

mobiusklein


There are many benefits; for example, it makes it easier to write memory efficient code.

def take_up_a_lot_of_memory(*args):
    """
    A contrived example of a function that uses a lot of memory
    """
    return sum([i ** 2 for i in range(10 ** 6)])

megasum = sum(map(take_up_a_lot_of_memory, range(1000)))

Also, sometimes you can terminate a calculation early without iterating through all of the map results, and so avoid redundancy.

like image 1
hilberts_drinking_problem Avatar answered Oct 22 '22 08:10

hilberts_drinking_problem