For a given list of tuples, if multiple tuples in the list have the first element of tuple the same - among them select only the tuple with the maximum last element.
For example:
sample_list = [(5,16,2),(5,10,3),(5,8,1),(21,24,1)]
In the sample_list
above since the first 3 tuples has the similar first element 5
in this case among them only the 2nd tuple should be retained since it has the max last element => 3
.
Expected op:
op = [(5,10,3),(21,24,1)]
Code:
op = []
for m in range(len(sample_list)):
li = [sample_list[m]]
for n in range(len(sample_list)):
if(sample_list[m][0] == sample_list[n][0]
and sample_list[m][2] != sample_list[n][2]):
li.append(sample_list[n])
op.append(sorted(li,key=lambda dd:dd[2],reverse=True)[0])
print (list(set(op)))
This works. But it is very slow for long list. Is there a more pythonic or efficient way to do this?
To filter items of a Tuple in Python, call filter() builtin function and pass the filtering function and tuple as arguments. The filtering function should take an argument and return True if the item has to be in the result, or False if the item has to be not in the result.
Python has a built-in function called filter() that allows you to filter a list (or a tuple) in a more beautiful way. The filter() function iterates over the elements of the list and applies the fn() function to each element. It returns an iterator for the elements where the fn() returns True .
The third technique we are going to use is by using the set () function. Set () functions help us in filtering the list of tuples. We pass a list to this function and get the desired results. view source print?
For a given list of tuples, if multiple tuples in the list have the first element of tuple the same - among them select only the tuple with the maximum last element. In the sample_list above since the first 3 tuples has the similar first element 5 in this case among them only the 2nd tuple should be retained since it has the max last element => 3.
You want to filter based on some condition, then display a representation of those items. There are a few ways to do this. List comprehension with filtering. This is usually considered idiomatic or “pythonic” Filter and map. This is lazy and useful if your list is very large and you don’t want to hold it all in memory at once.
A list can be used to store heterogeneous values (i.e data of any data type like integer, floating point, strings, and so on). A list of tuple basically contains tuples enclosed in a list.
Use collections.defaultdict
is the fastest alternative and arguably the most pythonic:
from collections import defaultdict
sample_list = [(5, 16, 2), (5, 10, 3), (5, 8, 1), (21, 24, 1)]
d = defaultdict(lambda: (0, 0, float("-inf")))
for e in sample_list:
first, _, last = e
if d[first][2] < last:
d[first] = e
res = [*d.values()]
print(res)
Output
[(5, 10, 3), (21, 24, 1)]
This is a single pass O(n)
which is not only asymptotically optimal but also performant in practice.
To show that is performant one could design an experiment considering the two main variables of the problem, the number of unique keys (values in the firs position of the tuple) and the length of the input list and the following alternatives approaches:
def defaultdict_max_approach(lst):
d = defaultdict(lambda: (0, 0, float("-inf")))
for e in lst:
first, _, last = e
if d[first][2] < last:
d[first] = e
return [*d.values()]
def dict_max_approach(lst):
# https://stackoverflow.com/a/69025193/4001592
d = {}
for tpl in lst:
first, *_, last = tpl
if first not in d or last > d[first][-1]:
d[first] = tpl
return [*d.values()]
def groupby_max_approach(lst):
# https://stackoverflow.com/a/69025193/4001592
return [max(g, key=ig(-1)) for _, g in groupby(sorted(lst), key=ig(0))]
As shown in the plots below the approach using defaultdict is the most performant method for a varying number of unique keys (500, 1000, 5000, 10000) and also for collections up to 1000000 elements (note that the x axis in is in thousands).
The above experiments are in concordance with experiments done by others (1, 2). The code for reproducing the experiments can be found here.
Stating that is the most pythonic is subjective, but here are the main arguments in favor:
Is a well known Python idiom
Using a defaultdict for grouping a sequence key-value pairs, and aggregating afterwards, is a well known Python idiom. Read the defaultdict examples in the Python documentation.
In the PyCon 2013 talk Transforming Code into Beautiful, Idiomatic Python by Raymond Hettinger also says that using defaultdict for such operations is the better way.
Is compliant with the Zen of Python
In the Zen of Python it can be read that
Flat is better than nested.
Sparse is better than dense.
Using a defaultdict is as flat as using a plain dict only a for-loop
and a simple if
statement. In the case of defaultdict the if condition is even simpler.
Both solutions are sparser than using itertools.groupby
, notice this approach also involves calling sorted
, itemgetter
and max
all inside a list comprehension.
You could use a collections.defaultdict
to group tuples that have the same first element and then take the maximum of each group based on the third:
from collections import defaultdict
sample_list = [(5,16,2),(5,10,3),(5,8,1),(21,24,1)]
d = defaultdict(list)
for e in sample_list:
d[e[0]].append(e)
res = [max(val, key=lambda x: x[2]) for val in d.values()]
print(res)
Output
[(5, 10, 3), (21, 24, 1)]
This approach is O(n)
.
Try itertools.groupby
:
from itertools import groupby
sample_list.sort()
print([max(l, key=lambda x: x[-1]) for _, l in groupby(sample_list, key=lambda x: x[0])])
Or also with operator.itemgetter
:
from itertools import groupby
from operator import itemgetter
sample_list.sort()
print([max(l, key=itemgetter(-1)) for _, l in groupby(sample_list, key=itemgetter(0))])
For performance try:
from operator import itemgetter
dct = {}
for i in sample_list:
if i[0] in dct:
dct[i[0]].append(i)
else:
dct[i[0]] = [i]
print([max(v, key=itemgetter(-1)) for v in dct.values()])
All output:
[(5, 10, 3), (21, 24, 1)]
Use itertools.groupby
and operator.itemgetter
for readability. Within the groups, apply max
with an appropriate key function, again using itemgetter
for brevity:
from itertools import groupby
from operator import itemgetter as ig
lst = [(5, 10, 3), (21, 24, 1), (5, 8, 1), (5, 16, 2)]
[max(g, key=ig(-1)) for _, g in groupby(sorted(lst), key=ig(0))]
# [(5, 10, 3), (21, 24, 1)]
For a linear-time solution, with extra-space only bound the number of unique first elements, you may use a dict
:
d = {}
for tpl in lst:
first, *_, last = tpl
if first not in d or last > d[first][-1]:
d[first] = tpl
[*d.values()]
# [(5, 10, 3), (21, 24, 1)]
Here is a linear-time method which I think qualifies as more Pythonic:
highest = dict()
for a, b, c in sample_list:
if a not in highest or c >= highest[a][2]:
highest[a] = (a, b, c)
op = list(highest.values())
You can change the >=
to >
if you care about how to choose between triples with the same first and last elements but different middle elements.
As pointed out by @AlexWaygood, dict
s have yielded their elements according to insertion order since Python 3.7. The code above therefore causes the elements of op
to be in the same order the elements of sample_list
.
In Python 3.6 or older, on the other hand, the order may change. If you want a solution that works in Python 3.6 too, you will need to use an OrderedDict
, as in:
from collections import OrderedDict
highest = OrderedDict()
for a, b, c in sample_list:
if a not in highest or c >= highest[a][2]:
highest[a] = (a, b, c)
op = list(highest.values())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With