I have a Pandas DataFrame with 3 columns, target
, pred
, and conf_bin
. If I run a groupby(by='conf_bin').apply(...)
my apply function gets called with empty DataFrame
s for values that do not appear in the conf_bin
column. How is this possible?
Details
The DataFrame looks something like this:
target pred conf_bin
0 5 6 0.50
1 4 4 0.60
2 4 4 0.50
3 4 3 0.50
4 4 5 0.50
5 5 5 0.55
6 5 5 0.55
7 5 5 0.55
Obviously conf_bin
is a numeric bin with values in the range np.arange(0, 1, 0.05)
. However, not all values are present in the data:
In [224]: grp = tp.groupby(by='conf_bin')
In [225]: grp.groups.keys()
Out[225]: dict_keys([0.5, 0.60000000000000009, 0.35000000000000003, 0.75, 0.85000000000000009, 0.65000000000000002, 0.55000000000000004, 0.80000000000000004, 0.20000000000000001, 0.45000000000000001, 0.40000000000000002, 0.30000000000000004, 0.70000000000000007, 0.25])
So, for example, the values 0
and 0.05
do not appear. However, when I run an apply
on the group my function does get called for these values:
In [226]: grp.apply(lambda x: x.shape)
Out[226]:
conf_bin
0.00 (0, 3)
0.05 (0, 3)
0.10 (0, 3)
0.15 (0, 3)
0.20 (22, 3)
0.25 (75, 3)
0.30 (95, 3)
0.35 (870, 3)
0.40 (8505, 3)
0.45 (40068, 3)
0.50 (51238, 3)
0.55 (54305, 3)
0.60 (47191, 3)
0.65 (38977, 3)
0.70 (34444, 3)
0.75 (20435, 3)
0.80 (3352, 3)
0.85 (4, 3)
0.90 (0, 3)
dtype: object
Questions:
DataFrame
?DataFrame
objects for values that do no appear in grp.groups
?I too was having this problem, which popped up when trying to create subplots for every category in my dataframe.
I came up with the following workaround (based on this SO post), by pulling out the non-empty groups into a list.
groups = df.groupby('conf_bin')
group_list = [(index, group) for index, group in groups if len(group) > 0]
It does break the implicit contract that "you wrangle your data in pandas", and probably mismanages memory, but it works.
Now you can iterate through your groupby list with the same interface as with a groupby object, e.g.
fig, axes = plt.subplots(nrows=len(group_list), ncols=1)
for (index, group), ax in zip(group_list, axes.flatten()):
group['target'].plot(ax=ax, title=index)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With