I have a list of tuples, each containing two elements. The first element of few sublists is common. I want to compare the first element of these sublists and append the second element in one lists. Here is my list: <pre class="prettyprint"><code>myList=[(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)] </code></pre> I would like to make a list of lists out of it which looks something like this:` <pre class="prettyprint"><code>NewList=[(2,3,4,5),(6,7,8),(9,10)] </code></pre> I hope if there is any efficient way.

You can use an OrderedDict to group the elements by the first subelement of each tuple: <pre class="prettyprint"><code>myList=[(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)] from collections import OrderedDict od = OrderedDict() for a,b in myList: od.setdefault(a,[]).append(b) print(list(od.values())) [[2, 3, 4, 5], [6, 7, 8], [9, 10]] </code></pre> If you really want tuples: <pre class="prettyprint"><code>print(list(map(tuple,od.values()))) [(2, 3, 4, 5), (6, 7, 8), (9, 10)] </code></pre> If you did not care about the order the elements appeared and just wanted the most efficient way to group you could use a collections.defaultdict: <pre class="prettyprint"><code>from collections import defaultdict od = defaultdict(list) for a,b in myList: od[a].append(b) print(list(od.values())) </code></pre> Lastly, if your data is in order as per your input example i.e sorted you could simply use itertools.groupby to group by the first subelement from each tuple and extract the second element from the grouped tuples: <pre class="prettyprint"><code>from itertools import groupby from operator import itemgetter print([tuple(t[1] for t in v) for k,v in groupby(myList,key=itemgetter(0))]) </code></pre> Output: <pre class="prettyprint"><code>[(2, 3, 4, 5), (6, 7, 8), (9, 10)] </code></pre> Again the groupby will only work if your data is sorted by at least the first element. Some timings on a reasonable sized list: <pre class="prettyprint"><code>In [33]: myList = [(randint(1,10000),randint(1,10000)) for _ in range(100000)] In [34]: myList.sort() In [35]: timeit ([tuple(t[1] for t in v) for k,v in groupby(myList,key=itemgetter(0))]) 10 loops, best of 3: 44.5 ms per loop In [36]: %%timeit od = defaultdict(list) for a,b in myList: od[a].append(b) ....: 10 loops, best of 3: 33.8 ms per loop In [37]: %%timeit dictionary = OrderedDict() for x, y in myList: if x not in dictionary: dictionary[x] = [] # new empty list dictionary[x].append(y) ....: 10 loops, best of 3: 63.3 ms per loop In [38]: %%timeit od = OrderedDict() for a,b in myList: od.setdefault(a,[]).append(b) ....: 10 loops, best of 3: 80.3 ms per loop </code></pre> If order matters and the data is sorted, go with the groupby, it will get even closer to the defaultdict approach if it is necessary to map all the elements to tuple in the defaultdict. If the data is not sorted or you don't care about any order, you won't find a faster way to group than using the defaultdict approach.

This feels like a task for a dictionary (if you don't know dictionaries yet, look them up on python.org). This is a very verbose example, so it's not what I'd write in everyday coding, but it's better to be verbose than unclear: <pre class="prettyprint"><code>dictionary = collections.OrderedDict() for x, y in myList: if not dictionary.has_key(x): dictionary[x] = [] # new empty list # append y to that list dictionary[x].append(y) </code></pre>

Comparing first element of the consecutive lists of tuples in Python

Tags:

python

list

append

compare

python-2.7

I have a list of tuples, each containing two elements. The first element of few sublists is common. I want to compare the first element of these sublists and append the second element in one lists. Here is my list:

myList=[(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]

I would like to make a list of lists out of it which looks something like this:`

NewList=[(2,3,4,5),(6,7,8),(9,10)]

I hope if there is any efficient way.

840

asked Sep 19 '15 10:09

PythonNoob

2 Answers

You can use an OrderedDict to group the elements by the first subelement of each tuple:

myList=[(1,2),(1,3),(1,4),(1,5),(2,6),(2,7),(2,8),(3,9),(3,10)]

from collections import OrderedDict

od  = OrderedDict()

for a,b in myList:
    od.setdefault(a,[]).append(b)

print(list(od.values()))
[[2, 3, 4, 5], [6, 7, 8], [9, 10]]

If you really want tuples:

print(list(map(tuple,od.values())))
[(2, 3, 4, 5), (6, 7, 8), (9, 10)]

If you did not care about the order the elements appeared and just wanted the most efficient way to group you could use a collections.defaultdict:

from collections import defaultdict

od  = defaultdict(list)

for a,b in myList:
    od[a].append(b)

print(list(od.values()))

Lastly, if your data is in order as per your input example i.e sorted you could simply use itertools.groupby to group by the first subelement from each tuple and extract the second element from the grouped tuples:

from itertools import groupby
from operator import itemgetter
print([tuple(t[1] for t in v) for k,v in groupby(myList,key=itemgetter(0))])

Output:

[(2, 3, 4, 5), (6, 7, 8), (9, 10)]

Again the groupby will only work if your data is sorted by at least the first element.

Some timings on a reasonable sized list:

In [33]: myList = [(randint(1,10000),randint(1,10000)) for _ in range(100000)]

In [34]: myList.sort()

In [35]: timeit ([tuple(t[1] for t in v) for k,v in groupby(myList,key=itemgetter(0))])
10 loops, best of 3: 44.5 ms per loop

In [36]: %%timeit                                                               od = defaultdict(list)
for a,b in myList:
    od[a].append(b)
   ....: 
10 loops, best of 3: 33.8 ms per loop

In [37]: %%timeit
dictionary = OrderedDict()
for x, y in myList:
     if x not in dictionary:
        dictionary[x] = [] # new empty list
    dictionary[x].append(y)
   ....: 
10 loops, best of 3: 63.3 ms per loop

In [38]: %%timeit   
od = OrderedDict()
for a,b in myList:
    od.setdefault(a,[]).append(b)
   ....: 
10 loops, best of 3: 80.3 ms per loop

If order matters and the data is sorted, go with the groupby, it will get even closer to the defaultdict approach if it is necessary to map all the elements to tuple in the defaultdict.

If the data is not sorted or you don't care about any order, you won't find a faster way to group than using the defaultdict approach.

answered Nov 14 '22 23:11

Padraic Cunningham

This feels like a task for a dictionary (if you don't know dictionaries yet, look them up on python.org). This is a very verbose example, so it's not what I'd write in everyday coding, but it's better to be verbose than unclear:

dictionary = collections.OrderedDict()
for x, y in myList:
    if not dictionary.has_key(x):
        dictionary[x] = [] # new empty list
    # append y to that list
    dictionary[x].append(y)

answered Nov 15 '22 00:11

Marcus Müller

Related questions
                            
                                Running Flask with pycharm
                            
                                Using python dict imported from file in another python file
                            
                                Pandas Pivot_Table : Percentage of row calculation for non-numeric values
                            
                                How to limit one session from any browser for a username in flask?
                            
                                Python Selenium Chrome disable prompt for "Trying to download multiple files"
                            
                                Quickest way to dedupe list in dict [duplicate]
                            
                                Cython No such file or directory: '.pyd' error on Windows
                            
                                random.sample on Django querysets: How will sampling on querysets affect performance?
                            
                                Why is Flask checking `'\\/' in json.dumps('/')` in its json module?
                            
                                Making an instagram posting bot with python?
                            
                                Combinations of MultiIndex levels which occur in a DataFrame
                            
                                Accessing serializer instances in nested serializer's field
                            
                                Getting the date of the last day of this [week/month/quarter/year]
                            
                                How to use psycopg2 connection string with variables?
                            
                                Assign value to a list using slice notation with assignee [duplicate]
                            
                                Round off floating point values in dict
                            
                                Python 3.4 lxml.etree: Start tag expected, '<' not found, line 1, column 1
                            
                                how Python cvxopt solvers qp basically works
                            
                                Is there a python construct that is a dummy function?
                            
                                Plot semi transparent contour plot over image file using matplotlib

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With