This is somehow a follow-up to this question
So first, you'll notice that you cannot perform a sum
on a list of strings to concatenate them, python tells you to use str.join
instead, and that's good advice because no matter how you use +
on strings, the performance is bad.
The "cannot use sum
" restriction doesn't apply to list
, and though, itertools.chain.from_iterable
is the preferred way to perform such list flattening.
But sum(x,[])
when x
is a list of lists is definitively bad.
But should it stay that way?
I compared 3 approaches
import time
import itertools
a = [list(range(1,1000)) for _ in range(1000)]
start=time.time()
sum(a,[])
print(time.time()-start)
start=time.time()
list(itertools.chain.from_iterable(a))
print(time.time()-start)
start=time.time()
z=[]
for s in a:
z += s
print(time.time()-start)
results:
sum()
on the list of lists: 10.46647310256958. Okay, we knew.itertools.chain
: 0.07705187797546387itertools.chain
as you see)So sum
is way behind because it performs result = result + b
instead of result += b
So now my question:
Why can't sum
use this accumulative approach when available?
(That would be transparent for already existing applications and would make possible the use of the sum
built-in to flatten lists efficiently)
We could try to make sum() smarter, but Alex Martelli and Guido van Rossum wanted to keep it focused on arithmetic summations.
FWIW, you should get reasonable performance with this simple code:
result = []
for seq in mylists:
result += seq
For your other question, "why can't sum use this accumulative approach when available?", see this comment for builtin_sum() in Python/bltinmodule.c:
/* It's tempting to use PyNumber_InPlaceAdd instead of
PyNumber_Add here, to avoid quadratic running time
when doing 'sum(list_of_lists, [])'. However, this
would produce a change in behaviour: a snippet like
empty = []
sum([[x] for x in range(10)], empty)
would change the value of empty. */
/* It's tempting to use PyNumber_InPlaceAdd instead of
PyNumber_Add here, to avoid quadratic running time
when doing 'sum(list_of_lists, [])'. However, this
would produce a change in behaviour: a snippet like
empty = []
sum([[x] for x in range(10)], empty)
would change the value of empty. */
temp = PyNumber_Add(result, item);
Taken from Python's built-in source code https://github.com/python/cpython/blob/master/Python/bltinmodule.c#L2146 Line:2342
FWIW, we can trick the interpreter into letting us use sum
on strings by passing an appropriate custom class instance as the start
arg to sum
.
class Q(object):
def __init__(self, data=''):
self.data = str(data)
def __str__(self):
return self.data
def __add__(self, other):
return Q(self.data + str(other))
print(sum(['abc', 'def', 'ghi'], Q()))
output
abcdefghi
Of course, this is a rather silly thing to do. :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With