Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group a list of tuples on two values, and return a list of all the third value

I have a list of tuples and each tuple contains three values. I want to 'roll them up' or group them so that for all tuples where the first two values are the same it will return a list of lists where each component list contains: 1: the first value, 2: the second value, 3: a list of all the 3rd values that match the first two.

Because I am writing the whole script here I have some flexibility on data types so if I am approaching it in a completely wrong manner please let me know. I did wonder if there was an easier way to accomplish it using Pandas.

I am wondering if using itertools.groupby() it may be possible to accomplish this. I think it would probably need to be combined with operator.itemgetter() to access the correct parts of the various tuples.

import itertools
import operator

list = [(1, 1, 4), (1, 1, 9), (1, 1, 14), (2, 1, 12), (2, 1, 99), (2, 6, 14), (2, 6, 19)]

list=sorted(list)

def sorter(list):
     grouper = itertools.groupby(list, operator.itemgetter(0))
     for key, subiter in grouper:
          l = []
          grouper2 = itertools.groupby(subiter, operator.itemgetter(0))
          for key, subiter in grouper2: 
               l.append(subiter)
               yield key, l

This code represents the general direction I was thinking, but it will not yield the desired output. The desired output for this would be:

[[1, 1, [4, 9, 14]], [2, 1, [12, 99]], [2, 6, [14, 19]]]

Again I have significant flexibility in terms of the datatypes here so if I am approaching this wrong I am willing to try something completely different.

like image 918
Bennett Tomlin Avatar asked Dec 07 '22 11:12

Bennett Tomlin


1 Answers

No need to use two nested groupby grouping by a single field. Instead use itemgetter with two parameters or a lambda to group by both the first two values at once, then a list comprehension to get the final elements.

>>> from itertools import groupby
>>> from operator import itemgetter
>>> lst = [(1, 1, 4), (1, 1, 9), (1, 1, 14), (2, 1, 12), (2, 1, 99), (2, 6, 14), (2, 6, 19)]
>>> [(*k, [x[2] for x in g]) for k, g in groupby(lst, key=itemgetter(0, 1))]
[(1, 1, [4, 9, 14]), (2, 1, [12, 99]), (2, 6, [14, 19])]

If, for whatever reason, you want to use two separate groupby, you can use this:

>>> [(k1, k2, [x[2] for x in g2]) for k1, g1 in groupby(lst, key=itemgetter(0))
...                               for k2, g2 in groupby(g1,  key=itemgetter(1))]
[(1, 1, [4, 9, 14]), (2, 1, [12, 99]), (2, 6, [14, 19])]

Of course, this also works as a regular (nested) loop, more in line with your original code:

def sorter(lst):
     for k1, g1 in groupby(lst, key=itemgetter(0)):
         for k2, g2 in groupby(g1, key=itemgetter(1)):
             yield (k1, k2, [x[2] for x in g2])

Or with the single groupby, returning a generator object:

def sorter(lst):
    return ((*k, [x[2] for x in g]) for k, g in groupby(lst, key=itemgetter(0, 1)))

As always, this assumes that lst is already sorted by the same key. If it is not, sort it first.

like image 100
tobias_k Avatar answered Dec 28 '22 05:12

tobias_k