Apologies if this has been asked before, but I couldn't find it. If I have something like:
lst = [(('a', 'b'), 1, 2), (('a', 'b'), 3, 4), (('b', 'c'), 5, 6)]
and I want to obtain a shorter list:
new = [(('a', 'b'), (1, 3), (2, 4)), (('b', 'c'), 5, 6)]
so that it groups together the other elements in a tuple by first matching element, what is the fastest way to go about it?
You are grouping, based on a key. If your input groups are always consecutive, you can use itertools.groupby()
, otherwise use a dictionary to group the elements. If order matters, use a dictionary that preserves insertion order (> Python 3.6 dict
or collections.OrderedDict
).
Using groupby()
:
from itertools import groupby
from operator import itemgetter
new = [(k, *zip(*(t[1:] for t in g))) for k, g in groupby(lst, key=itemgetter(0))]
The above uses Python 3 syntax to interpolate tuple elements from an iterable (..., *iterable)`.
Using a dictionary:
groups = {}
for key, *values in lst:
groups.setdefault(key, []).append(values)
new = [(k, *zip(*v)) for k, v in groups.items()]
In Python 3.6 or newer, that'll preserve the input order of the groups.
Demo:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> lst = [(('a', 'b'), 1, 2), (('a', 'b'), 3, 4), (('b', 'c'), 5, 6)]
>>> [(k, *zip(*(t[1:] for t in g))) for k, g in groupby(lst, key=itemgetter(0))]
[(('a', 'b'), (1, 3), (2, 4)), (('b', 'c'), (5,), (6,))]
>>> groups = {}
>>> for key, *values in lst:
... groups.setdefault(key, []).append(values)
...
>>> [(k, *zip(*v)) for k, v in groups.items()]
[(('a', 'b'), (1, 3), (2, 4)), (('b', 'c'), (5,), (6,))]
If you are using Python 2, you'd have to use:
new = [(k,) + tuple(zip(*(t[1:] for t in g))) for k, g in groupby(lst, key=itemgetter(0))]
or
from collections import OrderedDict
groups = OrderedDict()
for entry in lst:
groups.setdefault(entry[0], []).append(entry[1:])
new = [(k,) + tuple(zip(*v)) for k, v in groups.items()]
You could also use a collections.defaultdict
to group your tuple keys:
from collections import defaultdict
lst = [(('a', 'b'), 1, 2), (('a', 'b'), 3, 4), (('b', 'c'), 5, 6)]
d = defaultdict(tuple)
for tup, fst, snd in lst:
d[tup] += fst, snd
# defaultdict(<class 'tuple'>, {('a', 'b'): (1, 2, 3, 4), ('b', 'c'): (5, 6)})
for key, value in d.items():
d[key] = value[0::2], value[1::2]
# defaultdict(<class 'tuple'>, {('a', 'b'): ((1, 3), (2, 4)), ('b', 'c'): ((5,), (6,))})
result = [(k, v1, v2) for k, (v1, v2) in d.items()]
Which Outputs:
[(('a', 'b'), (1, 3), (2, 4)), (('b', 'c'), (5,), (6,))]
The logic of the above code:
defaultdict
of tuples.[0::2]
and [1::2]
.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With