Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

itertools.groupby() not grouping correctly

I have this data:

self.data = [(1, 1, 5.0),
             (1, 2, 3.0),
             (1, 3, 4.0),
             (2, 1, 4.0),
             (2, 2, 2.0)]

When I run this code:

for mid, group in itertools.groupby(self.data, key=operator.itemgetter(0)):

for list(group) I get:

[(1, 1, 5.0),
 (1, 2, 3.0),
 (1, 3, 4.0)]

which is what I want.

But if I use 1 instead of 0

for mid, group in itertools.groupby(self.data, key=operator.itemgetter(1)):

to group by the second number in the tuples, I only get:

[(1, 1, 5.0)]

even though there are other tuples that have "1" in that 1 (2nd) position.

like image 219
user994165 Avatar asked Nov 14 '11 02:11

user994165


3 Answers

itertools.groupby collects together contiguous items with the same key. If you want all items with the same key, you have to sort self.data first.

for mid, group in itertools.groupby(
    sorted(self.data,key=operator.itemgetter(1)), key=operator.itemgetter(1)):
like image 148
unutbu Avatar answered Jan 25 '23 23:01

unutbu


Variant without sorting (via dictionary). Should be better performance-wise.

def full_group_by(l, key=lambda x: x):
    d = defaultdict(list)
    for item in l:
        d[key(item)].append(item)
    return d.items()
like image 45
Konstantine Rybnikov Avatar answered Jan 26 '23 00:01

Konstantine Rybnikov


Below "fixes" several annoyances with Python's itertools.groupby.

def groupby2(l, key=lambda x:x, val=lambda x:x, agg=lambda x:x, sort=True):
    if sort:
        l = sorted(l, key=key)
    return ((k, agg((val(x) for x in v))) \
        for k,v in itertools.groupby(l, key=key))

Specifically,

  1. It doesn't require that you sort your data.
  2. It doesn't require that you must use key as named parameter only.
  3. The output is clean generator of tuple(key, grouped_values) where values are specified by 3rd parameter.
  4. Ability to apply aggregation functions like sum or avg easily.

Example Usage

import itertools
from operator import itemgetter
from statistics import *

t = [('a',1), ('b',2), ('a',3)]
for k,v in groupby2(t, itemgetter(0), itemgetter(1), sum):
  print(k, v)

This prints,

a 4
b 2

Play with this code

like image 27
Shital Shah Avatar answered Jan 26 '23 01:01

Shital Shah