Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting a list of tuples to several lists by the same tuple items [duplicate]

I am presented with a list made entirely of tuples, such as:

lst = [("hello", "Blue"), ("hi", "Red"), ("hey", "Blue"), ("yo", "Green")]

How can I split lst into as many lists as there are colours? In this case, 3 lists

[("hello", "Blue"), ("hey", "Blue")]
[("hi", "Red")]
[("yo", "Green")]

I just need to be able to work with these lists later, so I don't want to just output them to screen.

Details about the list

I know that every element of lst is strictly a double-element tuple. The colour is also always going to be that second element of each tuple.

The problem

Problem is,lst is dependant on user input, so I won't always know how many colours there are in total and what they are. That is why I couldn't predefine variables to store these lists in them.

So how can this be done?

like image 776
TGamer Avatar asked May 09 '20 09:05

TGamer


People also ask

Does tuple allow duplicate?

Tuple is a collection which is ordered and unchangeable. Allows duplicate members.

Can list and tuple have duplicate values?

Tuples allow duplicate members and are indexed. Lists Lists hold a collection of objects that are ordered and mutable (changeable), they are indexed and allow duplicate members. Sets Sets are a collection that is unordered and unindexed. They are mutable (changeable) but do not allow duplicate values to be held.

How do you split a list into a tuple?

Method #1 : Using map() + split() + tuple() The map function can be used to link the logic to each string, split function is used to split the inner contents of list to different tuple attributes and tuple function performs the task of forming a tuple.


3 Answers

You could use a collections.defaultdict to group by colour:

from collections import defaultdict

lst = [("hello", "Blue"), ("hi", "Red"), ("hey", "Blue"), ("yo", "Green")]

colours = defaultdict(list)
for word, colour in lst:
    colours[colour].append((word, colour))

print(colours)
# defaultdict(<class 'list'>, {'Blue': [('hello', 'Blue'), ('hey', 'Blue')], 'Red': [('hi', 'Red')], 'Green': [('yo', 'Green')]})

Or if you prefer using no libraries, dict.setdefault is an option:

colours = {}
for word, colour in lst:
      colours.setdefault(colour, []).append((word, colour))

print(colours)
# {'Blue': [('hello', 'Blue'), ('hey', 'Blue')], 'Red': [('hi', 'Red')], 'Green': [('yo', 'Green')]}

If you just want the colour tuples separated into nested lists of tuples, print the values() as a list:

print(list(colours.values()))
# [[('hello', 'Blue'), ('hey', 'Blue')], [('hi', 'Red')], [('yo', 'Green')]]

Benefit of the above approaches is they automatically initialize empty lists for new keys as you add them, so you don't have to do that yourself.

like image 130
RoadRunner Avatar answered Oct 17 '22 12:10

RoadRunner


This can be done relatively efficiently with a supporting dict:

def split_by_idx(items, idx=1):
    result = {}
    for item in items:
        key = item[idx]
        if key not in result:
            result[key] = []
        result[key].append(item)
    return result

and the lists can be collected from result with dict.values():

lst = [("hello", "Blue"), ("hi", "Red"), ("hey", "Blue"), ("yo", "Green")]


d = split_by_idx(lst)
print(list(d.values()))
# [[('hello', 'Blue'), ('hey', 'Blue')], [('hi', 'Red')], [('yo', 'Green')]]

This could be implemented also with dict.setdefault() or a defaultdict which are fundamentally the same except that you do not explicitly have to handle the "key not present" case:

def split_by_idx_sd(items, idx=1):
    result = {}
    for item in items:
        result.setdefault(item[idx], []).append(item)
    return result
import collections


def split_by_idx_dd(items, idx=1):
    result = collections.defaultdict(list)
    for item in items:
        result[item[idx]].append(item)
    return result

Timewise, the dict-based solution is the fastest for your input:

%timeit split_by_idx(lst)
# 1000000 loops, best of 3: 776 ns per loop
%timeit split_by_idx_sd(lst)
# 1000000 loops, best of 3: 866 ns per loop
%timeit split_by_idx_dd(lst)
# 1000000 loops, best of 3: 1.16 µs per loop

but you would get different timings depending on the "collision rate" of your input. In general, you should expect split_by_idx() to be the fastest with low collision rate (i.e. most of the entries create a new element of the dict), while split_by_idx_dd() should be fastest for high collision rate (i.e. most of the entries get appended to existing defaultdict key).

like image 36
norok2 Avatar answered Oct 17 '22 10:10

norok2


from itertools import groupby
from operator import itemgetter

indexer = itemgetter(1)
desired = [list(gr) for _, gr in groupby(sorted(lst, key=indexer), key=indexer)]
# [[('hello', 'Blue'), ('hey', 'Blue')], [('yo', 'Green')], [('hi', 'Red')]]

We sort the list based on first items of tuples and then group them based on first items of tuples. There is a repetition of "based on first items", hence the indexer variable.

like image 3
Mustafa Aydın Avatar answered Oct 17 '22 10:10

Mustafa Aydın