I have a list of tuples and each tuple contains three values. I want to 'roll them up' or group them so that for all tuples where the first two values are the same it will return a list of lists where each component list contains: 1: the first value, 2: the second value, 3: a list of all the 3rd values that match the first two.
Because I am writing the whole script here I have some flexibility on data types so if I am approaching it in a completely wrong manner please let me know. I did wonder if there was an easier way to accomplish it using Pandas.
I am wondering if using itertools.groupby()
it may be possible to accomplish this. I think it would probably need to be combined with operator.itemgetter()
to access the correct parts of the various tuples.
import itertools
import operator
list = [(1, 1, 4), (1, 1, 9), (1, 1, 14), (2, 1, 12), (2, 1, 99), (2, 6, 14), (2, 6, 19)]
list=sorted(list)
def sorter(list):
grouper = itertools.groupby(list, operator.itemgetter(0))
for key, subiter in grouper:
l = []
grouper2 = itertools.groupby(subiter, operator.itemgetter(0))
for key, subiter in grouper2:
l.append(subiter)
yield key, l
This code represents the general direction I was thinking, but it will not yield the desired output. The desired output for this would be:
[[1, 1, [4, 9, 14]], [2, 1, [12, 99]], [2, 6, [14, 19]]]
Again I have significant flexibility in terms of the datatypes here so if I am approaching this wrong I am willing to try something completely different.
No need to use two nested groupby
grouping by a single field. Instead use itemgetter
with two parameters or a lambda
to group by both the first two values at once, then a list comprehension to get the final elements.
>>> from itertools import groupby
>>> from operator import itemgetter
>>> lst = [(1, 1, 4), (1, 1, 9), (1, 1, 14), (2, 1, 12), (2, 1, 99), (2, 6, 14), (2, 6, 19)]
>>> [(*k, [x[2] for x in g]) for k, g in groupby(lst, key=itemgetter(0, 1))]
[(1, 1, [4, 9, 14]), (2, 1, [12, 99]), (2, 6, [14, 19])]
If, for whatever reason, you want to use two separate groupby
, you can use this:
>>> [(k1, k2, [x[2] for x in g2]) for k1, g1 in groupby(lst, key=itemgetter(0))
... for k2, g2 in groupby(g1, key=itemgetter(1))]
[(1, 1, [4, 9, 14]), (2, 1, [12, 99]), (2, 6, [14, 19])]
Of course, this also works as a regular (nested) loop, more in line with your original code:
def sorter(lst):
for k1, g1 in groupby(lst, key=itemgetter(0)):
for k2, g2 in groupby(g1, key=itemgetter(1)):
yield (k1, k2, [x[2] for x in g2])
Or with the single groupby
, returning a generator object:
def sorter(lst):
return ((*k, [x[2] for x in g]) for k, g in groupby(lst, key=itemgetter(0, 1)))
As always, this assumes that lst
is already sorted
by the same key
. If it is not, sort it first.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With