Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I use itertools.groupby()?

I haven't been able to find an understandable explanation of how to actually use Python's itertools.groupby() function. What I'm trying to do is this:

  • Take a list - in this case, the children of an objectified lxml element
  • Divide it into groups based on some criteria
  • Then later iterate over each of these groups separately.

I've reviewed the documentation, but I've had trouble trying to apply them beyond a simple list of numbers.

So, how do I use of itertools.groupby()? Is there another technique I should be using? Pointers to good "prerequisite" reading would also be appreciated.

like image 415
James Sulak Avatar asked Aug 03 '08 18:08

James Sulak


People also ask

How do Itertools work on Groupby?

groupby() The groupby() method of itertools goes through an iterable and groups values based on a particular key. Then it returns an iterator(stream of tuples). The first value of tuple consists of keys, on which the items of iterable were grouped.

How does Itertools chain work?

chain() function It is a function that takes a series of iterables and returns one iterable. It groups all the iterables together and produces a single iterable as output. Its output cannot be used directly and thus explicitly converted into iterables.

What does Itertools cycle do in Python?

Itertools is the Python module that contains some inbuilt functions for generating sequences using iterators. This module provides various functions that work on iterators to produce complex iterators.


2 Answers

IMPORTANT NOTE: You have to sort your data first.


The part I didn't get is that in the example construction

groups = [] uniquekeys = [] for k, g in groupby(data, keyfunc):    groups.append(list(g))    # Store group iterator as a list    uniquekeys.append(k) 

k is the current grouping key, and g is an iterator that you can use to iterate over the group defined by that grouping key. In other words, the groupby iterator itself returns iterators.

Here's an example of that, using clearer variable names:

from itertools import groupby  things = [("animal", "bear"), ("animal", "duck"), ("plant", "cactus"), ("vehicle", "speed boat"), ("vehicle", "school bus")]  for key, group in groupby(things, lambda x: x[0]):     for thing in group:         print("A %s is a %s." % (thing[1], key))     print("")      

This will give you the output:

A bear is a animal.
A duck is a animal.

A cactus is a plant.

A speed boat is a vehicle.
A school bus is a vehicle.

In this example, things is a list of tuples where the first item in each tuple is the group the second item belongs to.

The groupby() function takes two arguments: (1) the data to group and (2) the function to group it with.

Here, lambda x: x[0] tells groupby() to use the first item in each tuple as the grouping key.

In the above for statement, groupby returns three (key, group iterator) pairs - once for each unique key. You can use the returned iterator to iterate over each individual item in that group.

Here's a slightly different example with the same data, using a list comprehension:

for key, group in groupby(things, lambda x: x[0]):     listOfThings = " and ".join([thing[1] for thing in group])     print(key + "s:  " + listOfThings + ".") 

This will give you the output:

animals: bear and duck.
plants: cactus.
vehicles: speed boat and school bus.

like image 163
James Sulak Avatar answered Sep 17 '22 21:09

James Sulak


itertools.groupby is a tool for grouping items.

From the docs, we glean further what it might do:

# [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B

# [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D

groupby objects yield key-group pairs where the group is a generator.

Features

  • A. Group consecutive items together
  • B. Group all occurrences of an item, given a sorted iterable
  • C. Specify how to group items with a key function *

Comparisons

# Define a printer for comparing outputs >>> def print_groupby(iterable, keyfunc=None): ...    for k, g in it.groupby(iterable, keyfunc): ...        print("key: '{}'--> group: {}".format(k, list(g))) 
# Feature A: group consecutive occurrences >>> print_groupby("BCAACACAADBBB") key: 'B'--> group: ['B'] key: 'C'--> group: ['C'] key: 'A'--> group: ['A', 'A'] key: 'C'--> group: ['C'] key: 'A'--> group: ['A'] key: 'C'--> group: ['C'] key: 'A'--> group: ['A', 'A'] key: 'D'--> group: ['D'] key: 'B'--> group: ['B', 'B', 'B']  # Feature B: group all occurrences >>> print_groupby(sorted("BCAACACAADBBB")) key: 'A'--> group: ['A', 'A', 'A', 'A', 'A'] key: 'B'--> group: ['B', 'B', 'B', 'B'] key: 'C'--> group: ['C', 'C', 'C'] key: 'D'--> group: ['D']  # Feature C: group by a key function >>> # islower = lambda s: s.islower()                      # equivalent >>> def islower(s): ...     """Return True if a string is lowercase, else False."""    ...     return s.islower() >>> print_groupby(sorted("bCAaCacAADBbB"), keyfunc=islower) key: 'False'--> group: ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'D'] key: 'True'--> group: ['a', 'a', 'b', 'b', 'c'] 

Uses

  • Anagrams (see notebook)
  • Binning
  • Group odd and even numbers
  • Group a list by values
  • Remove duplicate elements
  • Find indices of repeated elements in an array
  • Split an array into n-sized chunks
  • Find corresponding elements between two lists
  • Compression algorithm (see notebook)/Run Length Encoding
  • Grouping letters by length, key function (see notebook)
  • Consecutive values over a threshold (see notebook)
  • Find ranges of numbers in a list or continuous items (see docs)
  • Find all related longest sequences
  • Take consecutive sequences that meet a condition (see related post)

Note: Several of the latter examples derive from Víctor Terrón's PyCon (talk) (Spanish), "Kung Fu at Dawn with Itertools". See also the groupby source code written in C.

* A function where all items are passed through and compared, influencing the result. Other objects with key functions include sorted(), max() and min().


Response

# OP: Yes, you can use `groupby`, e.g.  [do_something(list(g)) for _, g in groupby(lxml_elements, criteria_func)] 
like image 33
pylang Avatar answered Sep 20 '22 21:09

pylang