Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 2, map not equivalent to list comprehension in simple case; length dependent

In python 2, the built in function map seems to call the __len__ when length is overwritten. Is that correct - if so, why are we computing the length of the iterable to map? Iterables don't need to have length overwritten (e.g.), and the map function works even when length is not pre-defined by the iterable.

Map is defined here; it does specify that there is length-dependent functionality in the event that multiple iterables are passed. However,

  • I'm interested in the case that only one iterable is passed
  • Even if multiple iterables were passed (not my question), it seems like an odd design choice to explicitely check the length, instead of just iterating until you run out and then returning None

I am concerned because according to several 1 2 extremely highly upvoted questions,

map(f, iterable)

is basically equivalent to:

[f(x) for x in iterable]

But I am running into simple examples where that isn't true.

For Example

class Iterable:

    def __iter__(self):
        self.iterable = [1,2,3,4,5].__iter__()
        return self

    def next(self):
        return self.iterable.next()

   #def __len__(self):
   #     self.iterable = None
   #    return 5


def foo(x): return x

print( [foo(x) for x in Iterable()] )
print( map(foo,Iterable()) )

Behaves as it should, but if you uncomment the overloading of len, it very much does not.

In this case, it raises an AttributeError because the iterable is None. While the unit behaviour is silly, I see no requirement of invariance in the specification of len. Surely, it's good practice to not modify the state in a call to len, but the reason should not be because of unexpectable behaviour in builtin functions. In more realistic cases, my len function may just be slow, and I don't expect to worry about it being called by map, or maybe it isn't thread safe, etc..


Implementation Dependent?

Since map is a builtin function, it may have implementation-specific features outside the spec, but cpython implements it on line 918 of bltinmodule.c, which indeed states:

/* Do a first pass to obtain iterators for the arguments, and set len
 * to the largest of their lengths.
 */

And then calls _PyObject_LengthHint, which is defined in Object/abstract.c, and indeed seems to look for an overwritten len. This doesn't clarify to me whether this is just implementation dependent, or if I'm missing some reason that map purposefully looks for the iterable's length against my instinct.


(Note I haven't tested this in python 3, that is why I specified python 2. In python3, map returns a generator, so at least a few of my claims aren't true)

like image 991
en_Knight Avatar asked Dec 02 '15 19:12

en_Knight


People also ask

Does Python 2 support list comprehension?

List comprehensions were added with Python 2.0. Essentially, it is Python's way of implementing a well-known notation for sets as used by mathematicians. In mathematics the square numbers of the natural numbers are, for example, created by { x2 | x ∈ ℕ } or the set of complex integers { (x,y) | x ∈ ℤ ∧ y ∈ ℤ }.

Is map more efficient than list comprehension?

Map function is faster than list comprehension when the formula is already defined as a function earlier. So, that map function is used without lambda expression.

Is map more efficient than for loop Python?

map() works way faster than for loop. Considering the same code above when run in this ide.

What is faster than list comprehension?

For loops are faster than list comprehensions to run functions.


Video Answer


1 Answers

I am concerned because according to several 1 2 extremely highly upvoted questions,

map(f, iterable)

is basically equivalent to:

[f(x) for x in iterable]

But I am running into simple examples where that isn't true.

But calling _PyObject_LengthHint is supposed to be basically equivalent to not calling it. An object's __len__ or __length_hint__ is not supposed to mutate the object like this. You might as well say that map(f, iterable) and [f(x) for x in iterable] are inequivalent because if f uses stack inspection to determine whether it's being called from map and does something different, the two snippets behave differently.

As for why map does this, it's trying preallocate the list to the right size to avoid needing to resize the list. Resizes only slow things down by a constant factor, but if you can avoid the constant factor, why not? It would be perfectly reasonable for list comprehensions to do this in a future Python version.

like image 83
user2357112 supports Monica Avatar answered Oct 22 '22 14:10

user2357112 supports Monica