Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance of map vs starmap?

I was trying to make a pure-python (without external dependencies) element-wise comparison of two sequences. My first solution was:

list(map(operator.eq, seq1, seq2))

Then I found starmap function from itertools, which seemed pretty similar to me. But it turned out to be 37% faster on my computer in worst case. As it was not obvious to me, I measured the time necessary to retrieve 1 element from a generator (don't know if this way is correct):

from operator import eq
from itertools import starmap

seq1 = [1,2,3]*10000
seq2 = [1,2,3]*10000
seq2[-1] = 5

gen1 = map(eq, seq1, seq2))
gen2 = starmap(eq, zip(seq1, seq2))

%timeit -n1000 -r10 next(gen1)
%timeit -n1000 -r10 next(gen2)

271 ns ± 1.26 ns per loop (mean ± std. dev. of 10 runs, 1000 loops each)
208 ns ± 1.72 ns per loop (mean ± std. dev. of 10 runs, 1000 loops each)

In retrieving elements the second solution is 24% more performant. After that, they both produce the same results for list. But from somewhere we gain extra 13% in time:

%timeit list(map(eq, seq1, seq2))
%timeit list(starmap(eq, zip(seq1, seq2)))

5.24 ms ± 29.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.34 ms ± 84.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

I don't know how to dig deeper in profiling of such nested code? So my question is why the first generator so faster in retrieving and from where we gain extra 13% in list function?

EDIT: My first intention was to perform element-wise comparison instead of all, so the all function was replaced with list. This replacement does not affect the timing ratio.

CPython 3.6.2 on Windows 10 (64bit)

like image 280
godaygo Avatar asked Sep 12 '17 08:09

godaygo


People also ask

How does pool starmap work?

It uses the Pool. starmap method, which accepts a sequence of argument tuples. It then automatically unpacks the arguments from each tuple and passes them to the given function: import multiprocessing from itertools import product def merge_names(a, b): return '{} & {}'.

What is Python starmap?

The starmap() method of the itertools module returns an iterator that applies to the given function by passing the elements of the specified iterable as arguments to the function.

What does starmap return Python?

We can then call the starmap() function on the process pool to apply our task() function to each tuple of arguments in the prepared list. This returns an iterator over the results returned from the task() function, in the order that function calls are completed.

What does a map function do in Python?

Python's map() is a built-in function that allows you to process and transform all the items in an iterable without using an explicit for loop, a technique commonly known as mapping. map() is useful when you need to apply a transformation function to each item in an iterable and transform them into a new iterable.


2 Answers

There are several factors that contribute (in conjunction) to the observed performance difference:

  • zip re-uses the returned tuple if it has a reference count of 1 when the next __next__ call is made.
  • map builds a new tuple that is passed to the "mapped function" every time a __next__ call is made. Actually it probably won't create a new tuple from scratch because Python maintains a storage for unused tuples. But in that case map has to find an unused tuple of the right size.
  • starmap checks if the next item in the iterable is of type tuple and if so it just passes it on.
  • Calling a C function from within C code with PyObject_Call won't create a new tuple that is passed to the callee.

So starmap with zip will only use one tuple over and over again that is passed to operator.eq thus reducing the function call overhead immensely. map on the other hand will create a new tuple (or fill a C array from CPython 3.6 on) every time operator.eq is called. So what is actually the speed difference is just the tuple creation overhead.

Instead of linking to the source code I'll provide some Cython code that can be used to verify this:

In [1]: %load_ext cython

In [2]: %%cython
   ...:
   ...: from cpython.ref cimport Py_DECREF
   ...:
   ...: cpdef func(zipper):
   ...:     a = next(zipper)
   ...:     print('a', a)
   ...:     Py_DECREF(a)
   ...:     b = next(zipper)
   ...:     print('a', a)

In [3]: func(zip([1, 2], [1, 2]))
a (1, 1)
a (2, 2)

Yes, tuples aren't really immutable, a simple Py_DECREF was sufficient to "trick" zip into believing noone else holds a reference to the returned tuple!

As for the "tuple-pass-thru":

In [4]: %%cython
   ...:
   ...: def func_inner(*args):
   ...:     print(id(args))
   ...:
   ...: def func(*args):
   ...:     print(id(args))
   ...:     func_inner(*args)

In [5]: func(1, 2)
1404350461320
1404350461320

So the tuple is passed right through (just because these are defined as C functions!) This doesn't happen for pure Python functions:

In [6]: def func_inner(*args):
   ...:     print(id(args))
   ...:
   ...: def func(*args):
   ...:     print(id(args))
   ...:     func_inner(*args)
   ...:

In [7]: func(1, 2)
1404350436488
1404352833800

Note that it also doesn't happen if the called function isn't a C function even if called from a C function:

In [8]: %%cython
   ...: 
   ...: def func_inner_c(*args):
   ...:     print(id(args))
   ...: 
   ...: def func(inner, *args):
   ...:     print(id(args))
   ...:     inner(*args)
   ...:

In [9]: def func_inner_py(*args):
    ...:     print(id(args))
    ...:
    ...:

In [10]: func(func_inner_py, 1, 2)
1404350471944
1404353010184

In [11]: func(func_inner_c, 1, 2)
1404344354824
1404344354824

So there are a lot of "coincidences" leading up to the point that starmap with zip is faster than calling map with multiple arguments when the called function is also a C function...

like image 119
MSeifert Avatar answered Sep 23 '22 05:09

MSeifert


One difference I can notice is the how map retrieves items from the iterables. Both map and zip create a tuple of iterators from each iterable passed. Now zip maintains a result tuple internally that is populated every time next is called and on the other hand, map creates a new array* with each next call and deallocates it.


*As pointed out by MSeifert till 3.5.4 map_next used to allocate a new Python tuple everytime. This changed in 3.6 and till 5 iterables C stack is used and for anything larger than that heap is used. Related PRs: Issue #27809: map_next() uses fast call and Add _PY_FASTCALL_SMALL_STACK constant | Issue: https://bugs.python.org/issue27809

like image 22
Ashwini Chaudhary Avatar answered Sep 21 '22 05:09

Ashwini Chaudhary