Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python: group elements of a tuple having the same first element

i have a tuple like this

[
(379146591, 'it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1), 
(4746004, 'it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), 
(4746004, 'it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)
]

i would like to get instead this:

[
(379146591, (('it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1)), 
(4746004, (('it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), ('it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)))
]

so the for any element, anything that is not the first element is inside a sub-tuple of it, and if the following element has the same element as first element, it will be set as another sub-tuple of the previous one.

so i can do:

for i in data:
    # getting the first element of the list
    for sub_i in i[1]:
        # i access all the tuples inside

are there some functions to do this?

like image 776
91DarioDev Avatar asked Sep 29 '17 16:09

91DarioDev


People also ask

How do you sort a tuple based on the first element?

In python, to sort list of tuples by the first element in descending order, we have to use the sort() method with the parameter ” (reverse=True) “ which will sort the elements in descending order.


2 Answers

It's pretty simple with defaultdict; You initialize the default value to be a list and then append the item to the value of the same key:

lst = [
    (379146591, 'it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1), 
    (4746004, 'it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), 
    (4746004, 'it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)
]

from collections import defaultdict    ​
d = defaultdict(list)

for k, *v in lst:
    d[k].append(v)

list(d.items())
#[(4746004,
#  [('it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2),
#   ('it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)]),
# (379146591, [('it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1)])]

If order is important, use an OrderedDict which can remember the insertion orders:

from collections import OrderedDict
d = OrderedDict()
​
for k, *v in lst:
    d.setdefault(k, []).append(v)

list(d.items())
#[(379146591, [['it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1]]),
# (4746004,
#  [['it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2],
#   ['it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3]])]
like image 87
Psidom Avatar answered Oct 21 '22 13:10

Psidom


Use itertools.groupby (and operator.itemgetter to get the first item). The only thing is that your data needs to already be sorted so that the groups appear one after the other (if you've used the uniq and sort bash commands, same idea), you can use sorted() for this

import operator
from itertools import groupby

data = [
    (379146591, "it", 55, 1, 1, "NON ENTRARE", "NonEntrate", 55, 1),
    (4746004, "it", 28, 2, 2, "NON ENTRARE", "NonEntrate", 26, 2),
    (4746004, "it", 28, 2, 2, "TheBestTroll Group", "TheBestTrollGroup", 2, 3),
]

data = sorted(data, key=operator.itemgetter(0))  # this might be unnecessary
for k, g in groupby(data, operator.itemgetter(0)):
    print(k, list(g))

Will output

4746004 [(4746004, 'it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), (4746004, 'it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)]
379146591 [(379146591, 'it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1)]

In your case, you also need to remove the first element from your lists of values. Change the last two lines of the above to:

for k, g in groupby(data, operator.itemgetter(0)):
    print(k, [item[1:] for item in g])

Output:

4746004 [('it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), ('it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)]
379146591 [('it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1)]
like image 4
Boris Avatar answered Oct 21 '22 12:10

Boris