i have a tuple like this <pre class="prettyprint"><code>[ (379146591, 'it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1), (4746004, 'it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), (4746004, 'it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3) ] </code></pre> i would like to get instead this: <pre class="prettyprint"><code>[ (379146591, (('it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1)), (4746004, (('it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), ('it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3))) ] </code></pre> so the for any element, anything that is not the first element is inside a sub-tuple of it, and if the following element has the same element as first element, it will be set as another sub-tuple of the previous one. so i can do: <pre class="prettyprint"><code>for i in data: # getting the first element of the list for sub_i in i[1]: # i access all the tuples inside </code></pre> are there some functions to do this?

Use <code>itertools.groupby</code> (and <code>operator.itemgetter</code> to get the first item). The only thing is that your data needs to already be sorted so that the groups appear one after the other (if you've used the <code>uniq</code> and <code>sort</code> bash commands, same idea), you can use <code>sorted()</code> for this <pre class="prettyprint"><code>import operator from itertools import groupby data = [ (379146591, "it", 55, 1, 1, "NON ENTRARE", "NonEntrate", 55, 1), (4746004, "it", 28, 2, 2, "NON ENTRARE", "NonEntrate", 26, 2), (4746004, "it", 28, 2, 2, "TheBestTroll Group", "TheBestTrollGroup", 2, 3), ] data = sorted(data, key=operator.itemgetter(0)) # this might be unnecessary for k, g in groupby(data, operator.itemgetter(0)): print(k, list(g)) </code></pre> Will output <pre class="prettyprint"><code>4746004 [(4746004, 'it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), (4746004, 'it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)] 379146591 [(379146591, 'it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1)] </code></pre> In your case, you also need to remove the first element from your lists of values. Change the last two lines of the above to: <pre class="prettyprint"><code>for k, g in groupby(data, operator.itemgetter(0)): print(k, [item[1:] for item in g]) </code></pre> Output: <pre class="prettyprint"><code>4746004 [('it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), ('it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)] 379146591 [('it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1)] </code></pre>

python: group elements of a tuple having the same first element

Tags:

python

list

tuples

i have a tuple like this

[
(379146591, 'it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1), 
(4746004, 'it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), 
(4746004, 'it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)
]

i would like to get instead this:

[
(379146591, (('it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1)), 
(4746004, (('it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), ('it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)))
]

so the for any element, anything that is not the first element is inside a sub-tuple of it, and if the following element has the same element as first element, it will be set as another sub-tuple of the previous one.

so i can do:

for i in data:
    # getting the first element of the list
    for sub_i in i[1]:
        # i access all the tuples inside

are there some functions to do this?

776

asked Sep 29 '17 16:09

91DarioDev

2 Answers

It's pretty simple with defaultdict; You initialize the default value to be a list and then append the item to the value of the same key:

lst = [
    (379146591, 'it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1), 
    (4746004, 'it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), 
    (4746004, 'it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)
]

from collections import defaultdict    
d = defaultdict(list)

for k, *v in lst:
    d[k].append(v)

list(d.items())
#[(4746004,
#  [('it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2),
#   ('it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)]),
# (379146591, [('it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1)])]

If order is important, use an OrderedDict which can remember the insertion orders:

from collections import OrderedDict
d = OrderedDict()

for k, *v in lst:
    d.setdefault(k, []).append(v)

list(d.items())
#[(379146591, [['it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1]]),
# (4746004,
#  [['it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2],
#   ['it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3]])]

answered Oct 21 '22 13:10

Psidom

Use itertools.groupby (and operator.itemgetter to get the first item). The only thing is that your data needs to already be sorted so that the groups appear one after the other (if you've used the uniq and sort bash commands, same idea), you can use sorted() for this

import operator
from itertools import groupby

data = [
    (379146591, "it", 55, 1, 1, "NON ENTRARE", "NonEntrate", 55, 1),
    (4746004, "it", 28, 2, 2, "NON ENTRARE", "NonEntrate", 26, 2),
    (4746004, "it", 28, 2, 2, "TheBestTroll Group", "TheBestTrollGroup", 2, 3),
]

data = sorted(data, key=operator.itemgetter(0))  # this might be unnecessary
for k, g in groupby(data, operator.itemgetter(0)):
    print(k, list(g))

Will output

4746004 [(4746004, 'it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), (4746004, 'it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)]
379146591 [(379146591, 'it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1)]

In your case, you also need to remove the first element from your lists of values. Change the last two lines of the above to:

for k, g in groupby(data, operator.itemgetter(0)):
    print(k, [item[1:] for item in g])

Output:

4746004 [('it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), ('it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)]
379146591 [('it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1)]

answered Oct 21 '22 12:10

Boris

Related questions
                            
                                ValueError: multiclass format is not supported , xgboost
                            
                                Splitting column value into 2 new columns - Python Pandas
                            
                                Python NLP Intent Identification
                            
                                Add new HTML tag after current tag
                            
                                Does conda update packages from pypi installed using pip install?
                            
                                Function object called via class attribute fails
                            
                                Can I access class variables using self?
                            
                                Pytesseract foreign language extraction using python
                            
                                Problems to serialize property (getter and setter) from a model using Django Rest Framework
                            
                                Conditional mean over a Pandas DataFrame
                            
                                Does python static method consume less memory than instance method
                            
                                Python- positional argument follows keyword argument
                            
                                Pandas Insert data into MySQL
                            
                                Adding rows manually to StreamingHttpResponse (Django)
                            
                                Using selenium: How to keep logged in after closing Driver in Python
                            
                                Selecting all column names where value is greater than another column in pandas
                            
                                How to parse hierarchy based on indents with python
                            
                                How to count continuous numbers in numpy
                            
                                Building a connection URL for mssql+pyodbc with sqlalchemy.engine.url.URL
                            
                                What's the difference between transform vs applymap for pandas DataFrame

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With