Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regarding the correct usage of groupby(): Python 3

Tags:

python

I wasn't having any issues figuring out how to apply the functions in itertools until I reached groupby(iterable, key=None), immediately I read the example and it didn't quite click with me which led me to research it on google. I ended up finding an example;however, it didn't really break it down to where it all made sense.To my understanding, groupby() sub-iterates a sorted iterable?

My Question: Can anyone provide an updated,Python 3, explanation of the groupby() function broken down "Barney-Style"?

like image 908
TimLayne Avatar asked Mar 22 '23 03:03

TimLayne


1 Answers

Groupby groups consecutive items together based on some user-specified characteristic. Each element in the resulting iterator is a tuple, where the first element (group in my example) is the "key", which is a label for that group. The second element (items in my example) is an iterator over the items in that group.

In the simplest case, the characteristic is just identity, which means it groups together "runs" of the same thing:

>>> for group, items in itertools.groupby('aabbbccdddee'):
...     print group, list(items)
a [u'a', u'a']
b [u'b', u'b', u'b']
c [u'c', u'c']
d [u'd', u'd', u'd']
e [u'e', u'e']

However, you can also pass a key function to group not just into runs of the same element, but runs of elements that are "the same" in some way you specify:

>>> for group, items in itertools.groupby('aaAaAAbBbcCdDdDeE', lambda x: x.lower()):
...     print group, list(items)
a [u'a', u'a', u'A', u'a', u'A', u'A']
b [u'b', u'B', u'b']
c [u'c', u'C']
d [u'd', u'D', u'd', u'D']
e [u'e', u'E']

Here I used a key function that returns the lowercase form of its input. This means that items are grouped if their lowercase forms are the same. Without the key function, items would only be grouped if they were exactly the same:

>>> for group, items in itertools.groupby('aaAaAAbBbcCdDdDeE'):
...     print group, list(items)
a [u'a', u'a']
A [u'A']
a [u'a']
A [u'A', u'A']
b [u'b']
B [u'B']
b [u'b']
c [u'c']
C [u'C']
d [u'd']
D [u'D']
d [u'd']
D [u'D']
e [u'e']
E [u'E']

Here we have a lot of one-element groups, because even a change in case counts as a difference in group.

The last example also shows an important gotcha: groupby only groups consecutive elements that fall into the same group. So even though there are many "a"s in my data, they aren't all grouped together because they're not consecutive.

In these examples, I just used list to show you everything in each group. Actually, though, each group is an iterator, so the items are only generated as you need them. There is a potential gotcha here as well, in that the contents of a group "disappear" if you iterate past that group:

>>> grouped = itertools.groupby('aabbbccdddee')
>>> group, items = next(grouped)
>>> print group, items   # the "_grouper" object is an iterator of items in group "a"
a <itertools._grouper object at 0x0000000002648DA0>
>>> next(grouped)   # we move to the next group
(u'b', <itertools._grouper object at 0x0000000002648630>)
>>> print list(items)   # oops, items vanished!
[]

What happened was that since we did next(grouped), we moved past group "a", and when we did that, the grouper "forgot" what was in that group. This usually isn't a problem because usually you'll find you're using each group right as you get to it. But if you want to store a group for later, you'll need to make a list of the items with list instead of just storing that _grouper object.

By using the key function, you can usefully group on all sorts of things, but the basic idea is the same: groupby "chunks" sequences of items that are "the same" in a way that you specify.

like image 134
BrenBarn Avatar answered Apr 01 '23 09:04

BrenBarn