I'm looking for a solution how to limit group size of a group created by itertools.groupby.
Currently I have something like this:
>>> s = '555'
>>> grouped = groupby(s)
>>> print([(k, len(list(g))) for k, g in grouped])
[('5', 3)]
What I would like achieve is to have a max group's size=2, so my output would be:
[('5', 2), ('5', 1)]
Is there any easy and efficient way to do this? Maybe somehow by key argument provided to groupby?
Here is a solution using groupby and a defaultdict.
from itertools import groupby
from collections import defaultdict
s = "5555444"
desired_length = 2
counts = defaultdict(int)
def count(x):
global counts
c = counts[x]
counts[x] += 1
return c
grouped = groupby(s, key=lambda x: (x, count(x) // desired_length))
print([(k[0], len(list(g))) for k, g in grouped])
I honestly think this solution is unacceptable, as it requires that you keep track of the global state at all times, but here it is. I would personally just use a buffer-like thing.
from collections import defaultdict
s = "5555444"
def my_buffer_function(sequence, desired_length):
buffer = defaultdict(int)
for item in sequence:
buffer[item] += 1
if buffer[item] == desired_length:
yield (item, buffer.pop(item))
for k, v in buffer.items():
yield k, v
print(list(my_buffer_function(s, 2)))
This is also a generator. But it might miss some things groupby has that you currently rely on.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With