Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple way to group items into buckets

Tags:

python

I often want to bucket an unordered collection in python. itertools.groubpy does the right sort of thing but almost always requires massaging to sort the items first and catch the iterators before they're consumed.

Is there any quick way to get this behavior, either through a standard python module or a simple python idiom?

>>> bucket('thequickbrownfoxjumpsoverthelazydog', lambda x: x in 'aeiou')
{False: ['t', 'h', 'q', 'c', 'k', 'b', 'r', 'w', 'n', 'f', 'x', 'j', 'm', 'p',
    's', 'v', 'r', 't', 'h', 'l', 'z', 'y', 'd', 'g'],
 True: ['e', 'u', 'i', 'o', 'o', 'u', 'o', 'e', 'e', 'a', 'o']}
>>> bucket(xrange(21), lambda x: x % 10)
{0: [0, 10, 20],
 1: [1, 11],
 2: [2, 12],
 3: [3, 13],
 4: [4, 14],
 5: [5, 15],
 6: [6, 16],
 7: [7, 17],
 8: [8, 18],
 9: [9, 19]}
like image 644
Mu Mind Avatar asked Oct 04 '12 04:10

Mu Mind


4 Answers

This has come up several times before -- (1), (2), (3) -- and there's a partition recipe in the itertools recipes, but to my knowledge there's nothing in the standard library.. although I was surprised a few weeks ago by accumulate, so who knows what's lurking there these days? :^)

When I need this behaviour, I use

from collections import defaultdict

def partition(seq, key):
    d = defaultdict(list)
    for x in seq:
        d[key(x)].append(x)
    return d

and get on with my day.

like image 182
DSM Avatar answered Oct 16 '22 01:10

DSM


Here is a simple two liner

d = {}
for x in "thequickbrownfoxjumpsoverthelazydog": d.setdefault(x in 'aeiou', []).append(x)

Edit:

Just adding your other case for completeness.

d={}
for x in xrange(21): d.setdefault(x%10, []).append(x)
like image 22
grieve Avatar answered Oct 16 '22 01:10

grieve


Here's a variant of partition() from above when the predicate is boolean, avoiding the cost of a dict/defaultdict:

def boolpartition(seq, pred):
    passing, failing = [], []
    for item in seq:
        (passing if pred(item) else failing).append(item)
    return passing, failing

Example usage:

>>> even, odd = boolpartition([1, 2, 3, 4, 5], lambda x: x % 2 == 0)
>>> even
[2, 4]
>>> odd
[1, 3, 5]
like image 21
Thomas Perl Avatar answered Oct 16 '22 02:10

Thomas Perl


If its a pandas.DataFrame the following also works, utilizing pd.cut()

from sklearn import datasets
import pandas as pd

# import some data to play with
iris = datasets.load_iris()
df_data = pd.DataFrame(iris.data[:,0])  # we'll just take the first feature

# bucketize
n_bins = 5
feature_name = iris.feature_names[0].replace(" ", "_")
my_labels = [str(feature_name) + "_" + str(num) for num in range(0,n_bins)]
pd.cut(df_data[0], bins=n_bins, labels=my_labels)

yielding

0      0_1
1      0_0
2      0_0
[...]

In case you don't set the labels, the output is going to like this

0       (5.02, 5.74]
1      (4.296, 5.02]
2      (4.296, 5.02]
[...]
like image 30
Boern Avatar answered Oct 16 '22 01:10

Boern