I have this data:
self.data = [(1, 1, 5.0),
(1, 2, 3.0),
(1, 3, 4.0),
(2, 1, 4.0),
(2, 2, 2.0)]
When I run this code:
for mid, group in itertools.groupby(self.data, key=operator.itemgetter(0)):
for list(group)
I get:
[(1, 1, 5.0),
(1, 2, 3.0),
(1, 3, 4.0)]
which is what I want.
But if I use 1 instead of 0
for mid, group in itertools.groupby(self.data, key=operator.itemgetter(1)):
to group by the second number in the tuples, I only get:
[(1, 1, 5.0)]
even though there are other tuples that have "1" in that 1 (2nd) position.
itertools.groupby collects together contiguous items with the same key.
If you want all items with the same key, you have to sort self.data
first.
for mid, group in itertools.groupby(
sorted(self.data,key=operator.itemgetter(1)), key=operator.itemgetter(1)):
Variant without sorting (via dictionary). Should be better performance-wise.
def full_group_by(l, key=lambda x: x):
d = defaultdict(list)
for item in l:
d[key(item)].append(item)
return d.items()
Below "fixes" several annoyances with Python's itertools.groupby
.
def groupby2(l, key=lambda x:x, val=lambda x:x, agg=lambda x:x, sort=True):
if sort:
l = sorted(l, key=key)
return ((k, agg((val(x) for x in v))) \
for k,v in itertools.groupby(l, key=key))
Specifically,
key
as named parameter only.tuple(key, grouped_values)
where values are specified by 3rd parameter.Example Usage
import itertools
from operator import itemgetter
from statistics import *
t = [('a',1), ('b',2), ('a',3)]
for k,v in groupby2(t, itemgetter(0), itemgetter(1), sum):
print(k, v)
This prints,
a 4
b 2
Play with this code
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With