I am presented with a list made entirely of tuples, such as: <pre class="prettyprint"><code>lst = [("hello", "Blue"), ("hi", "Red"), ("hey", "Blue"), ("yo", "Green")] </code></pre> How can I split <code>lst</code> into as many lists as there are colours? In this case, 3 lists <pre class="prettyprint"><code>[("hello", "Blue"), ("hey", "Blue")] [("hi", "Red")] [("yo", "Green")] </code></pre> I just need to be able to work with these lists later, so I don't want to just output them to screen. Details about the list I know that every element of <code>lst</code> is strictly a double-element tuple. The colour is also always going to be that second element of each tuple. The problem Problem is,<code>lst</code> is dependant on user input, so I won't always know how many colours there are in total and what they are. That is why I couldn't predefine variables to store these lists in them. So how can this be done?

You could use a <code>collections.defaultdict</code> to group by colour: <pre class="prettyprint"><code>from collections import defaultdict lst = [("hello", "Blue"), ("hi", "Red"), ("hey", "Blue"), ("yo", "Green")] colours = defaultdict(list) for word, colour in lst: colours[colour].append((word, colour)) print(colours) # defaultdict(<class 'list'>, {'Blue': [('hello', 'Blue'), ('hey', 'Blue')], 'Red': [('hi', 'Red')], 'Green': [('yo', 'Green')]}) </code></pre> Or if you prefer using no libraries, <code>dict.setdefault</code> is an option: <pre class="prettyprint"><code>colours = {} for word, colour in lst: colours.setdefault(colour, []).append((word, colour)) print(colours) # {'Blue': [('hello', 'Blue'), ('hey', 'Blue')], 'Red': [('hi', 'Red')], 'Green': [('yo', 'Green')]} </code></pre> If you just want the colour tuples separated into nested lists of tuples, print the <code>values()</code> as a list: <pre class="prettyprint"><code>print(list(colours.values())) # [[('hello', 'Blue'), ('hey', 'Blue')], [('hi', 'Red')], [('yo', 'Green')]] </code></pre> Benefit of the above approaches is they automatically initialize empty lists for new keys as you add them, so you don't have to do that yourself.

This can be done relatively efficiently with a supporting <code>dict</code>: <pre class="prettyprint"><code>def split_by_idx(items, idx=1): result = {} for item in items: key = item[idx] if key not in result: result[key] = [] result[key].append(item) return result </code></pre> and the lists can be collected from <code>result</code> with <code>dict.values()</code>: <pre class="prettyprint"><code>lst = [("hello", "Blue"), ("hi", "Red"), ("hey", "Blue"), ("yo", "Green")] d = split_by_idx(lst) print(list(d.values())) # [[('hello', 'Blue'), ('hey', 'Blue')], [('hi', 'Red')], [('yo', 'Green')]] </code></pre> <hr> This could be implemented also with <code>dict.setdefault()</code> or a <code>defaultdict</code> which are fundamentally the same except that you do not explicitly have to handle the "key not present" case: <pre class="prettyprint"><code>def split_by_idx_sd(items, idx=1): result = {} for item in items: result.setdefault(item[idx], []).append(item) return result </code></pre> <pre class="prettyprint"><code>import collections def split_by_idx_dd(items, idx=1): result = collections.defaultdict(list) for item in items: result[item[idx]].append(item) return result </code></pre> <hr> Timewise, the <code>dict</code>-based solution is the fastest for your input: <pre class="prettyprint"><code>%timeit split_by_idx(lst) # 1000000 loops, best of 3: 776 ns per loop %timeit split_by_idx_sd(lst) # 1000000 loops, best of 3: 866 ns per loop %timeit split_by_idx_dd(lst) # 1000000 loops, best of 3: 1.16 µs per loop </code></pre> but you would get different timings depending on the "collision rate" of your input. In general, you should expect <code>split_by_idx()</code> to be the fastest with low collision rate (i.e. most of the entries create a new element of the <code>dict</code>), while <code>split_by_idx_dd()</code> should be fastest for high collision rate (i.e. most of the entries get appended to existing <code>defaultdict</code> key).

<pre class="prettyprint"><code>from itertools import groupby from operator import itemgetter indexer = itemgetter(1) desired = [list(gr) for _, gr in groupby(sorted(lst, key=indexer), key=indexer)] # [[('hello', 'Blue'), ('hey', 'Blue')], [('yo', 'Green')], [('hi', 'Red')]] </code></pre> We sort the list based on first items of tuples and then group them based on first items of tuples. There is a repetition of "based on first items", hence the <code>indexer</code> variable.

Splitting a list of tuples to several lists by the same tuple items [duplicate]

I am presented with a list made entirely of tuples, such as:

lst = [("hello", "Blue"), ("hi", "Red"), ("hey", "Blue"), ("yo", "Green")]

How can I split lst into as many lists as there are colours? In this case, 3 lists

[("hello", "Blue"), ("hey", "Blue")]
[("hi", "Red")]
[("yo", "Green")]

I just need to be able to work with these lists later, so I don't want to just output them to screen.

Details about the list

I know that every element of lst is strictly a double-element tuple. The colour is also always going to be that second element of each tuple.

The problem

Problem is,lst is dependant on user input, so I won't always know how many colours there are in total and what they are. That is why I couldn't predefine variables to store these lists in them.

So how can this be done?

Does tuple allow duplicate?

Tuple is a collection which is ordered and unchangeable. Allows duplicate members.

Can list and tuple have duplicate values?

Tuples allow duplicate members and are indexed. Lists Lists hold a collection of objects that are ordered and mutable (changeable), they are indexed and allow duplicate members. Sets Sets are a collection that is unordered and unindexed. They are mutable (changeable) but do not allow duplicate values to be held.

How do you split a list into a tuple?

Method #1 : Using map() + split() + tuple() The map function can be used to link the logic to each string, split function is used to split the inner contents of list to different tuple attributes and tuple function performs the task of forming a tuple.

You could use a collections.defaultdict to group by colour:

from collections import defaultdict

lst = [("hello", "Blue"), ("hi", "Red"), ("hey", "Blue"), ("yo", "Green")]

colours = defaultdict(list)
for word, colour in lst:
    colours[colour].append((word, colour))

print(colours)
# defaultdict(<class 'list'>, {'Blue': [('hello', 'Blue'), ('hey', 'Blue')], 'Red': [('hi', 'Red')], 'Green': [('yo', 'Green')]})

Or if you prefer using no libraries, dict.setdefault is an option:

colours = {}
for word, colour in lst:
      colours.setdefault(colour, []).append((word, colour))

print(colours)
# {'Blue': [('hello', 'Blue'), ('hey', 'Blue')], 'Red': [('hi', 'Red')], 'Green': [('yo', 'Green')]}

If you just want the colour tuples separated into nested lists of tuples, print the values() as a list:

print(list(colours.values()))
# [[('hello', 'Blue'), ('hey', 'Blue')], [('hi', 'Red')], [('yo', 'Green')]]

Benefit of the above approaches is they automatically initialize empty lists for new keys as you add them, so you don't have to do that yourself.

This can be done relatively efficiently with a supporting dict:

def split_by_idx(items, idx=1):
    result = {}
    for item in items:
        key = item[idx]
        if key not in result:
            result[key] = []
        result[key].append(item)
    return result

and the lists can be collected from result with dict.values():

lst = [("hello", "Blue"), ("hi", "Red"), ("hey", "Blue"), ("yo", "Green")]


d = split_by_idx(lst)
print(list(d.values()))
# [[('hello', 'Blue'), ('hey', 'Blue')], [('hi', 'Red')], [('yo', 'Green')]]

This could be implemented also with dict.setdefault() or a defaultdict which are fundamentally the same except that you do not explicitly have to handle the "key not present" case:

def split_by_idx_sd(items, idx=1):
    result = {}
    for item in items:
        result.setdefault(item[idx], []).append(item)
    return result

import collections


def split_by_idx_dd(items, idx=1):
    result = collections.defaultdict(list)
    for item in items:
        result[item[idx]].append(item)
    return result

Timewise, the dict-based solution is the fastest for your input:

%timeit split_by_idx(lst)
# 1000000 loops, best of 3: 776 ns per loop
%timeit split_by_idx_sd(lst)
# 1000000 loops, best of 3: 866 ns per loop
%timeit split_by_idx_dd(lst)
# 1000000 loops, best of 3: 1.16 µs per loop

but you would get different timings depending on the "collision rate" of your input. In general, you should expect split_by_idx() to be the fastest with low collision rate (i.e. most of the entries create a new element of the dict), while split_by_idx_dd() should be fastest for high collision rate (i.e. most of the entries get appended to existing defaultdict key).

from itertools import groupby
from operator import itemgetter

indexer = itemgetter(1)
desired = [list(gr) for _, gr in groupby(sorted(lst, key=indexer), key=indexer)]
# [[('hello', 'Blue'), ('hey', 'Blue')], [('yo', 'Green')], [('hi', 'Red')]]

We sort the list based on first items of tuples and then group them based on first items of tuples. There is a repetition of "based on first items", hence the indexer variable.

Splitting a list of tuples to several lists by the same tuple items [duplicate]

Tags:

python

list

split

tuples

TGamer

People also ask

3 Answers

RoadRunner

norok2

Mustafa Aydın

Recent Activity

Donate For Us

Splitting a list of tuples to several lists by the same tuple items [duplicate]

Tags:

python

list

split

tuples

TGamer

People also ask

3 Answers

RoadRunner

norok2

Mustafa Aydın

Related questions

Recent Activity

Donate For Us