While reading the python documentation I came across the itertools.groupby()
function. It was not very straightforward so I decided to look up some info here on stackoverflow. I found something from How do I use Python's itertools.groupby()?.
There seems to be little info about it here and in the documentation so I decided I to post my observations for comments.
Thanks
In Python, you can group consecutive elements of the same value in an iterable object such as a list with itertools. groupby() .
This module implements a number of iterator building blocks inspired by constructs from APL, Haskell, and SML. Each has been recast in a form suitable for Python.
The itertools. combinations() function takes two arguments—an iterable inputs and a positive integer n —and produces an iterator over tuples of all combinations of n elements in inputs .
To start with, you may read the documentation here.
I will place what I consider to be the most important point first. I hope the reason will become clear after the examples.
ALWAYS SORT ITEMS WITH THE SAME KEY TO BE USED FOR GROUPING SO AS TO AVOID UNEXPECTED RESULTS
itertools.groupby(iterable, key=None or some func)
takes a list of iterables and groups them based on a specified key. The key specifies what action to apply to each individual iterable, the result of which is then used as the heading for each grouping the items; items which end up having same 'key' value will end up in the same group.
The return value is an iterable similar to a dictionary in that it is of the form {key : value}
.
Example 1
# note here that the tuple counts as one item in this list. I did not
# specify any key, so each item in the list is a key on its own.
c = groupby(['goat', 'dog', 'cow', 1, 1, 2, 3, 11, 10, ('persons', 'man', 'woman')])
dic = {}
for k, v in c:
dic[k] = list(v)
dic
results in
{1: [1, 1],
'goat': ['goat'],
3: [3],
'cow': ['cow'],
('persons', 'man', 'woman'): [('persons', 'man', 'woman')],
10: [10],
11: [11],
2: [2],
'dog': ['dog']}
Example 2
# notice here that mulato and camel don't show up. only the last element with a certain key shows up, like replacing earlier result
# the last result for c actually wipes out two previous results.
list_things = ['goat', 'dog', 'donkey', 'mulato', 'cow', 'cat', ('persons', 'man', 'woman'), \
'wombat', 'mongoose', 'malloo', 'camel']
c = groupby(list_things, key=lambda x: x[0])
dic = {}
for k, v in c:
dic[k] = list(v)
dic
results in
{'c': ['camel'],
'd': ['dog', 'donkey'],
'g': ['goat'],
'm': ['mongoose', 'malloo'],
'persons': [('persons', 'man', 'woman')],
'w': ['wombat']}
Now for the sorted version
# but observe the sorted version where I have the data sorted first on same key I used for grouping
list_things = ['goat', 'dog', 'donkey', 'mulato', 'cow', 'cat', ('persons', 'man', 'woman'), \
'wombat', 'mongoose', 'malloo', 'camel']
sorted_list = sorted(list_things, key = lambda x: x[0])
print(sorted_list)
print()
c = groupby(sorted_list, key=lambda x: x[0])
dic = {}
for k, v in c:
dic[k] = list(v)
dic
results in
['cow', 'cat', 'camel', 'dog', 'donkey', 'goat', 'mulato', 'mongoose', 'malloo', ('persons', 'man', 'woman'), 'wombat']
{'c': ['cow', 'cat', 'camel'],
'd': ['dog', 'donkey'],
'g': ['goat'],
'm': ['mulato', 'mongoose', 'malloo'],
'persons': [('persons', 'man', 'woman')],
'w': ['wombat']}
Example 3
things = [("animal", "bear"), ("animal", "duck"), ("plant", "cactus"), ("vehicle", "harley"), \
("vehicle", "speed boat"), ("vehicle", "school bus")]
dic = {}
f = lambda x: x[0]
for key, group in groupby(sorted(things, key=f), f):
dic[key] = list(group)
dic
results in
{'animal': [('animal', 'bear'), ('animal', 'duck')],
'plant': [('plant', 'cactus')],
'vehicle': [('vehicle', 'harley'),
('vehicle', 'speed boat'),
('vehicle', 'school bus')]}
Now for the sorted version. I changed the tuples to lists here. Same results either way.
things = [["animal", "bear"], ["animal", "duck"], ["vehicle", "harley"], ["plant", "cactus"], \
["vehicle", "speed boat"], ["vehicle", "school bus"]]
dic = {}
f = lambda x: x[0]
for key, group in groupby(sorted(things, key=f), f):
dic[key] = list(group)
dic
results in
{'animal': [['animal', 'bear'], ['animal', 'duck']],
'plant': [['plant', 'cactus']],
'vehicle': [['vehicle', 'harley'],
['vehicle', 'speed boat'],
['vehicle', 'school bus']]}
As always the documentation of the function should be the first place to check. However itertools.groupby
is certainly one of the trickiest itertools
because it has some possible pitfalls:
It only groups the items if their key
-result is the same for successive items:
from itertools import groupby
for key, group in groupby([1,1,1,1,5,1,1,1,1,4]):
print(key, list(group))
# 1 [1, 1, 1, 1]
# 5 [5]
# 1 [1, 1, 1, 1]
# 4 [4]
One could use sorted
before - if one wants to do an overall groupby
.
It yields two items, and the second one is an iterator (so one needs to iterate over the second item!). I explicitly needed to cast these to a list
in the previous example.
The second yielded element is discarded if one advances the groupby
-iterator:
it = groupby([1,1,1,1,5,1,1,1,1,4])
key1, group1 = next(it)
key2, group2 = next(it)
print(key1, list(group1))
# 1 []
Even if group1
isn't empty!
As already mentioned one can use sorted
to do an overall groupby
operation but that's extremely inefficient (and throws away the memory-efficiency if you want to use groupby on generators). There are better alternatives available if you can't guarantee that the input is sorted
(which also don't require the O(n log(n))
sorting time overhead):
collections.defaultdict
iteration_utilities.groupedby
However it's great to check local properties. There are two recipes in the itertools
-recipes section:
def all_equal(iterable):
"Returns True if all the elements are equal to each other"
g = groupby(iterable)
return next(g, True) and not next(g, False)
and:
def unique_justseen(iterable, key=None):
"List unique elements, preserving order. Remember only the element just seen."
# unique_justseen('AAAABBBCCDAABBB') --> A B C D A B
# unique_justseen('ABBCcAD', str.lower) --> A B C A D
return map(next, map(itemgetter(1), groupby(iterable, key)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With