Logo Questions Linux Laravel Mysql Ubuntu Git Menu

In pandas is there something like a GroupBy.get_group, but with an optional default value?




I've got a DataFrame df, which I've 'groupby'ed. I'm looking for a function which is similar to get_group(name) except that rather than throwing a KeyError if the name doesn't exist, returns an empty DataFrame (or some other value), similar to how dict.get works:

g = df.groupby('x')

# doesn't work, but would be nice:
i = g.get_group(1, default=[])

# does work, but is hard to read:
i = g.obj.take(g.indices.get(1, []), g.axis)

Is there already a function which provides this?


In many ways, the GroupBy object is represented by a dict (.indicies, .groups), and this 'get with default' functionality was core enough to the concept of a dict that it is included in the Python language itself. It seemed that if a dict-like thing doesn't have a get with default, maybe I'm not understanding it correctly? Why would a dict like thing not have a 'get with default'?

An abbreviated example of what I want to do is:

df1_bymid = df1.groupby('mid')
df2_bymid = df2.groupby('mid')

for mid in set(df1_bymid.groups) | set(df2_bymid.groups) :
    rows1 = df1_bymid.get_group(mid, [])
    rows2 = df1_bymid.get_group(mid, [])
    for row1, row2 in itertools.product(rows1, rows2) :
        yield row1, row2

Of course I could creating a function, and I might, it just seemed that if I have to go this far out of my way, maybe I'm not using the GroupBy object the way it was intended:

def get_group(df, name, obj=None, default=None) :
    if obj is None :
        obj = df.obj

    try :
        inds = df.indices[name]
    except KeyError, e :
        if default is None :
            raise e

        inds = default

    return df.obj.take(inds, df.axis)
like image 673
Zach Dwiel Avatar asked Nov 06 '13 04:11

Zach Dwiel

People also ask

How to group data in pandas groupby?

Pandas GroupBy 1 Group the unique values from the Team column 2 Now there’s a bucket for each group 3 Toss the other data into the buckets 4 Apply a function on the weight column of each bucket. More ...

Why is pandas groupby so lazy?

Again, a Pandas GroupBy object is lazy. It delays virtually every part of the split-apply-combine process until you invoke a method on it. So, how can you mentally separate the split, apply, and combine stages if you can’t see any of them happening in isolation?

How does groupby work in Python?

Because the .groupby () method works by first splitting the data, we can actually work with the groups directly. Similarly, because any aggregations are done following the splitting, we have full reign over how we aggregate the data.

Why is it so hard to inspect a pandas groupby object?

It can be difficult to inspect df.groupby ("state") because it does virtually none of these things until you do something with the resulting object. Again, a Pandas GroupBy object is lazy. It delays virtually every part of the split-apply-combine process until you invoke a method on it.

Video Answer

1 Answers

I might define my own get_group() as following

In [55]: def get_group(g, key):
   ....:     if key in g.groups: return g.get_group(key)
   ....:     return pd.DataFrame()

In [52]: get_group(g, 's1')
   Mt Sp  Value  count
0  s1  a      1      3
1  s1  b      2      2

In [54]: get_group(g, 's4')
Empty DataFrame
Columns: []
Index: []   
like image 90
waitingkuo Avatar answered Oct 01 '22 08:10
