From this post I learned that you can concatenate tuples with <code>sum()</code>: <pre class="prettyprint"><code>>>> tuples = (('hello',), ('these', 'are'), ('my', 'tuples!')) >>> sum(tuples, ()) ('hello', 'these', 'are', 'my', 'tuples!') </code></pre> Which looks pretty nice. But why does this work? And, is this optimal, or is there something from <code>itertools</code> that would be preferable to this construct?

the addition operator concatenates tuples in python: <pre class="prettyprint"><code>('a', 'b')+('c', 'd') Out[34]: ('a', 'b', 'c', 'd') </code></pre> From the docstring of <code>sum</code>: <blockquote> Return the sum of a 'start' value (default: 0) plus an iterable of numbers </blockquote> It means <code>sum</code> doesn't start with the first element of your iterable, but rather with an initial value that is passed through <code>start=</code> argument. By default <code>sum</code> is used with numeric thus the default start value is <code>0</code>. So summing an iterable of tuples requires to start with an empty tuple. <code>()</code> is an empty tuple: <pre class="prettyprint"><code>type(()) Out[36]: tuple </code></pre> Therefore the working concatenation. As per performance, here is a comparison: <pre class="prettyprint"><code>%timeit sum(tuples, ()) The slowest run took 9.40 times longer than the fastest. This could mean that an intermediate result is being cached. 1000000 loops, best of 3: 285 ns per loop %timeit tuple(it.chain.from_iterable(tuples)) The slowest run took 5.00 times longer than the fastest. This could mean that an intermediate result is being cached. 1000000 loops, best of 3: 625 ns per loop </code></pre> Now with t2 of a size 10000: <pre class="prettyprint"><code>%timeit sum(t2, ()) 10 loops, best of 3: 188 ms per loop %timeit tuple(it.chain.from_iterable(t2)) 1000 loops, best of 3: 526 µs per loop </code></pre> So if your list of tuples is small, you don't bother. If it's medium size or larger, you should use <code>itertools</code>.

It works because addition is overloaded (on tuples) to return the concatenated tuple: <pre class="prettyprint"><code>>>> () + ('hello',) + ('these', 'are') + ('my', 'tuples!') ('hello', 'these', 'are', 'my', 'tuples!') </code></pre> That's basically what <code>sum</code> is doing, you give an initial value of an empty tuple and then add the tuples to that. However this is generally a bad idea because addition of tuples creates a new tuple, so you create several intermediate tuples just to copy them into the concatenated tuple: <pre class="prettyprint"><code>() ('hello',) ('hello', 'these', 'are') ('hello', 'these', 'are', 'my', 'tuples!') </code></pre> That's an implementation that has quadratic runtime behavior. That quadratic runtime behavior can be avoided by avoiding the intermediate tuples. <pre class="prettyprint"><code>>>> tuples = (('hello',), ('these', 'are'), ('my', 'tuples!')) </code></pre> Using nested generator expressions: <pre class="prettyprint"><code>>>> tuple(tuple_item for tup in tuples for tuple_item in tup) ('hello', 'these', 'are', 'my', 'tuples!') </code></pre> Or using a generator function: <pre class="prettyprint"><code>def flatten(it): for seq in it: for item in seq: yield item >>> tuple(flatten(tuples)) ('hello', 'these', 'are', 'my', 'tuples!') </code></pre> Or using <code>itertools.chain.from_iterable</code>: <pre class="prettyprint"><code>>>> import itertools >>> tuple(itertools.chain.from_iterable(tuples)) ('hello', 'these', 'are', 'my', 'tuples!') </code></pre> And if you're interested how these perform (using my <code>simple_benchmark</code> package): <pre class="prettyprint"><code>import itertools import simple_benchmark def flatten(it): for seq in it: for item in seq: yield item def sum_approach(tuples): return sum(tuples, ()) def generator_expression_approach(tuples): return tuple(tuple_item for tup in tuples for tuple_item in tup) def generator_function_approach(tuples): return tuple(flatten(tuples)) def itertools_approach(tuples): return tuple(itertools.chain.from_iterable(tuples)) funcs = [sum_approach, generator_expression_approach, generator_function_approach, itertools_approach] arguments = {(2**i): tuple((1,) for i in range(1, 2**i)) for i in range(1, 13)} b = simple_benchmark.benchmark(funcs, arguments, argument_name='number of tuples to concatenate') b.plot() </code></pre> <img src="https://i.stack.imgur.com/DsjES.png" alt="enter image description here"> (Python 3.7.2 64bit, Windows 10 64bit) So while the <code>sum</code> approach is very fast if you concatenate only a few tuples it will be really slow if you try to concatenate lots of tuples. The fastest of the tested approaches for many tuples is <code>itertools.chain.from_iterable</code>

Concatenate tuples using sum()

Tags:

From this post I learned that you can concatenate tuples with sum():

>>> tuples = (('hello',), ('these', 'are'), ('my', 'tuples!')) >>> sum(tuples, ()) ('hello', 'these', 'are', 'my', 'tuples!')

Which looks pretty nice. But why does this work? And, is this optimal, or is there something from itertools that would be preferable to this construct?

777

asked Feb 06 '17 02:02

Stephen Rauch

2 Answers

the addition operator concatenates tuples in python:

('a', 'b')+('c', 'd') Out[34]: ('a', 'b', 'c', 'd')

From the docstring of sum:

Return the sum of a 'start' value (default: 0) plus an iterable of numbers

It means sum doesn't start with the first element of your iterable, but rather with an initial value that is passed through start= argument.

By default sum is used with numeric thus the default start value is 0. So summing an iterable of tuples requires to start with an empty tuple. () is an empty tuple:

type(()) Out[36]: tuple

Therefore the working concatenation.

As per performance, here is a comparison:

%timeit sum(tuples, ()) The slowest run took 9.40 times longer than the fastest. This could mean that an intermediate result is being cached. 1000000 loops, best of 3: 285 ns per loop   %timeit tuple(it.chain.from_iterable(tuples)) The slowest run took 5.00 times longer than the fastest. This could mean that an intermediate result is being cached. 1000000 loops, best of 3: 625 ns per loop

Now with t2 of a size 10000:

%timeit sum(t2, ()) 10 loops, best of 3: 188 ms per loop  %timeit tuple(it.chain.from_iterable(t2)) 1000 loops, best of 3: 526 µs per loop

So if your list of tuples is small, you don't bother. If it's medium size or larger, you should use itertools.

176

answered Oct 01 '22 00:10

Zeugma

It works because addition is overloaded (on tuples) to return the concatenated tuple:

>>> () + ('hello',) + ('these', 'are') + ('my', 'tuples!') ('hello', 'these', 'are', 'my', 'tuples!')

That's basically what sum is doing, you give an initial value of an empty tuple and then add the tuples to that.

However this is generally a bad idea because addition of tuples creates a new tuple, so you create several intermediate tuples just to copy them into the concatenated tuple:

() ('hello',) ('hello', 'these', 'are') ('hello', 'these', 'are', 'my', 'tuples!')

That's an implementation that has quadratic runtime behavior. That quadratic runtime behavior can be avoided by avoiding the intermediate tuples.

>>> tuples = (('hello',), ('these', 'are'), ('my', 'tuples!'))

Using nested generator expressions:

>>> tuple(tuple_item for tup in tuples for tuple_item in tup) ('hello', 'these', 'are', 'my', 'tuples!')

Or using a generator function:

def flatten(it):     for seq in it:         for item in seq:             yield item   >>> tuple(flatten(tuples)) ('hello', 'these', 'are', 'my', 'tuples!')

Or using itertools.chain.from_iterable:

>>> import itertools >>> tuple(itertools.chain.from_iterable(tuples)) ('hello', 'these', 'are', 'my', 'tuples!')

And if you're interested how these perform (using my simple_benchmark package):

import itertools import simple_benchmark  def flatten(it):     for seq in it:         for item in seq:             yield item  def sum_approach(tuples):     return sum(tuples, ())  def generator_expression_approach(tuples):     return tuple(tuple_item for tup in tuples for tuple_item in tup)  def generator_function_approach(tuples):     return tuple(flatten(tuples))  def itertools_approach(tuples):     return tuple(itertools.chain.from_iterable(tuples))  funcs = [sum_approach, generator_expression_approach, generator_function_approach, itertools_approach] arguments = {(2**i): tuple((1,) for i in range(1, 2**i)) for i in range(1, 13)} b = simple_benchmark.benchmark(funcs, arguments, argument_name='number of tuples to concatenate')  b.plot()

enter image description here

(Python 3.7.2 64bit, Windows 10 64bit)

So while the sum approach is very fast if you concatenate only a few tuples it will be really slow if you try to concatenate lots of tuples. The fastest of the tested approaches for many tuples is itertools.chain.from_iterable

answered Sep 30 '22 22:09

MSeifert

Related questions
                            
                                Initializers are not allowed in ambient contexts error when installing Blueprint
                            
                                How to prevent embedded netty server from starting with spring-boot-starter-webflux?
                            
                                Spring Boot not recognizing application.properties file
                            
                                RecyclerView (horizontal) nested in BottomSheet preventing vertical scrolling
                            
                                How to add multiple statements inside a when statement in kotlin
                            
                                Firefox 54 Stopped Trusting Self-Signed Certs
                            
                                Postgres UPDATE with ORDER BY, how to do it?
                            
                                What is the simplest way to achieve O(n) performance when creating the union of 3 IEnumerables?
                            
                                Turning an SVG string into an image in a React component
                            
                                How to check Azure function is running on local environment? `RoleEnvironment` is not working in Azure Functions
                            
                                Is there a way to destructure a struct partially?
                            
                                gitignore all files in folders but keep folder structure

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With