Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the preferred way to concatenate sequences in Python 3?

What is the preferred way to concatenate sequences in Python 3?

Right now, I'm doing:

import functools
import operator

def concatenate(sequences):
    return functools.reduce(operator.add, sequences)

print(concatenate([['spam', 'eggs'], ['ham']]))
# ['spam', 'eggs', 'ham']

Needing to import two separate modules to do this seems clunky.

An alternative could be:

def concatenate(sequences):
    concatenated_sequence = []
    for sequence in sequences:
        concatenated_sequence += sequence
    return concatenated_sequence

However, this is incorrect because you don't know that the sequences are lists.

You could do:

import copy

def concatenate(sequences):
    head, *tail = sequences
    concatenated_sequence = copy.copy(head)
    for sequence in sequences:
        concatenated_sequence += sequence
    return concatenated_sequence

But that seems horribly bug prone -- a direct call to copy? (I know head.copy() works for lists and tuples, but copy isn't part of the sequence ABC, so you can't rely on it... what if you get handed strings?). You have to copy to prevent mutation in case you get handed a MutableSequence. Moreover, this solution forces you to unpack the entire set of sequences first. Trying again:

import copy 

def concatenate(sequences):
    iterable = iter(sequences)
    head = next(iterable)
    concatenated_sequence = copy.copy(head)
    for sequence in iterable:
        concatenated_sequence += sequence
    return concatenated_sequence

But come on... this is python! So... what is the preferred way to do this?

like image 681
ToBeReplaced Avatar asked Jan 15 '13 16:01

ToBeReplaced


3 Answers

I'd use itertools.chain.from_iterable() instead:

import itertools

def chained(sequences):
    return itertools.chain.from_iterable(sequences):

or, since you tagged this with python-3.3 you could use the new yield from syntax (look ma, no imports!):

def chained(sequences):
    for seq in sequences:
        yield from seq

which both return iterators (use list() on them if you must materialize the full list). Most of the time you do not need to construct a whole new sequence from concatenated sequences, really, you just want to loop over them to process and/or search for something instead.

Note that for strings, you should use str.join() instead of any of the techniques described either in my answer or your question:

concatenated = ''.join(sequence_of_strings)

Combined, to handle sequences fast and correct, I'd use:

def chained(sequences):
    for seq in sequences:
        yield from seq

def concatenate(sequences):
    sequences = iter(sequences)
    first = next(sequences)
    if hasattr(first, 'join'):
        return first + ''.join(sequences)
    return first + type(first)(chained(sequences))

This works for tuples, lists and strings:

>>> concatenate(['abcd', 'efgh', 'ijkl'])
'abcdefghijkl'
>>> concatenate([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> concatenate([(1, 2, 3), (4, 5, 6), (7, 8, 9)])
(1, 2, 3, 4, 5, 6, 7, 8, 9)

and uses the faster ''.join() for a sequence of strings.

like image 83
Martijn Pieters Avatar answered Nov 20 '22 16:11

Martijn Pieters


what is wrong with:

from itertools import chain
def chain_sequences(*sequences):
  return chain(*sequences)
like image 40
Samantha Atkins Avatar answered Nov 20 '22 17:11

Samantha Atkins


Use itertools.chain.from_iterable.

import itertools

def concatenate(sequences):
    return list(itertools.chain.from_iterable(sequences))

The call to list is needed only if you need an actual new list, so skip it if you just iterate over this new sequence once.

like image 1
Oleh Prypin Avatar answered Nov 20 '22 16:11

Oleh Prypin