I've got a DataFrame df, which I've 'groupby'ed. I'm looking for a function which is similar to get_group(name) except that rather than throwing a KeyError if the name doesn't exist, returns an empty DataFrame (or some other value), similar to how dict.get works:
g = df.groupby('x')
# doesn't work, but would be nice:
i = g.get_group(1, default=[])
# does work, but is hard to read:
i = g.obj.take(g.indices.get(1, []), g.axis)
Is there already a function which provides this?
Edit:
In many ways, the GroupBy object is represented by a dict (.indicies, .groups), and this 'get with default' functionality was core enough to the concept of a dict that it is included in the Python language itself. It seemed that if a dict-like thing doesn't have a get with default, maybe I'm not understanding it correctly? Why would a dict like thing not have a 'get with default'?
An abbreviated example of what I want to do is:
df1_bymid = df1.groupby('mid')
df2_bymid = df2.groupby('mid')
for mid in set(df1_bymid.groups) | set(df2_bymid.groups) :
rows1 = df1_bymid.get_group(mid, [])
rows2 = df1_bymid.get_group(mid, [])
for row1, row2 in itertools.product(rows1, rows2) :
yield row1, row2
Of course I could creating a function, and I might, it just seemed that if I have to go this far out of my way, maybe I'm not using the GroupBy object the way it was intended:
def get_group(df, name, obj=None, default=None) :
if obj is None :
obj = df.obj
try :
inds = df.indices[name]
except KeyError, e :
if default is None :
raise e
inds = default
return df.obj.take(inds, df.axis)
Pandas GroupBy 1 Group the unique values from the Team column 2 Now there’s a bucket for each group 3 Toss the other data into the buckets 4 Apply a function on the weight column of each bucket. More ...
Again, a Pandas GroupBy object is lazy. It delays virtually every part of the split-apply-combine process until you invoke a method on it. So, how can you mentally separate the split, apply, and combine stages if you can’t see any of them happening in isolation?
Because the .groupby () method works by first splitting the data, we can actually work with the groups directly. Similarly, because any aggregations are done following the splitting, we have full reign over how we aggregate the data.
It can be difficult to inspect df.groupby ("state") because it does virtually none of these things until you do something with the resulting object. Again, a Pandas GroupBy object is lazy. It delays virtually every part of the split-apply-combine process until you invoke a method on it.
I might define my own get_group()
as following
In [55]: def get_group(g, key):
....: if key in g.groups: return g.get_group(key)
....: return pd.DataFrame()
....:
In [52]: get_group(g, 's1')
Out[52]:
Mt Sp Value count
0 s1 a 1 3
1 s1 b 2 2
In [54]: get_group(g, 's4')
Out[54]:
Empty DataFrame
Columns: []
Index: []
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With