Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split a list of tuples into sub-lists of the same tuple field [duplicate]

Tags:

I have a huge list of tuples in this format. The second field of the each tuple is the category field.

    [(1, 'A', 'foo'),
    (2, 'A', 'bar'),
    (100, 'A', 'foo-bar'),

    ('xx', 'B', 'foobar'),
    ('yy', 'B', 'foo'),

    (1000, 'C', 'py'),
    (200, 'C', 'foo'),
    ..]

What is the most efficient way to break it down into sub-lists of the same category ( A, B, C .,etc)?

like image 381
Kaung Htet Avatar asked Nov 11 '11 10:11

Kaung Htet


People also ask

How do you split a list of tuples?

To split a tuple, just list the variable names separated by commas on the left-hand side of an equals sign, and then a tuple on the right-hand side.

Can list and tuple have duplicate values?

Allows duplicate members. Use square brackets [] for lists. Tuple is a collection which is ordered and unchangeable. Allows duplicate members.

How do I separate a list into a sublist?

Method #1: Using islice to split a list into sublists of given length, is the most elegant way. # into sublists of given length. Method #2: Using zip is another way to split a list into sublists of given length.

Can we convert list into tuple and tuple into list?

To convert the Python list to a tuple, use the tuple() function. The tuple() is a built-in function that passes the list as an argument and returns the tuple. The list elements will not change when it converts into a tuple.


1 Answers

Use itertools.groupby:

import itertools
import operator

data=[(1, 'A', 'foo'),
    (2, 'A', 'bar'),
    (100, 'A', 'foo-bar'),

    ('xx', 'B', 'foobar'),
    ('yy', 'B', 'foo'),

    (1000, 'C', 'py'),
    (200, 'C', 'foo'),
    ]

for key,group in itertools.groupby(data,operator.itemgetter(1)):
    print(list(group))

yields

[(1, 'A', 'foo'), (2, 'A', 'bar'), (100, 'A', 'foo-bar')]
[('xx', 'B', 'foobar'), ('yy', 'B', 'foo')]
[(1000, 'C', 'py'), (200, 'C', 'foo')]

Or, to create one list with each group as a sublist, you could use a list comprehension:

[list(group) for key,group in itertools.groupby(data,operator.itemgetter(1))]

The second argument to itertools.groupby is a function which itertools.groupby applies to each item in data (the first argument). It is expected to return a key. itertools.groupby then groups together all contiguous items with the same key.

operator.itemgetter(1) picks off the second item in a sequence.

For example, if

row=(1, 'A', 'foo')

then

operator.itemgetter(1)(row)

equals 'A'.


As @eryksun points out in the comments, if the categories of the tuples appear in some random order, then you must sort data first before applying itertools.groupby. This is because itertools.groupy only collects contiguous items with the same key into groups.

To sort the tuples by category, use:

data2=sorted(data,key=operator.itemgetter(1))
like image 126
unutbu Avatar answered Oct 06 '22 01:10

unutbu