pandas 0.25 introduced a new function called NamedAgg to allow creating named fields on groupby object which is a very nice feature see(NamedAgg).
However, It seems I can't get it working with lambda functions. I don't know if this is a bug or by-design.
Setup:
df = pd.DataFrame({'kind': ['cat', 'dog', 'cat', 'dog'],
'height': [9.1, 6.0, 9.5, 34.0],
'weight': [7.9, 7.5, 9.9, 198.0]})
using lambda in a dict works fine. This is the old way.
(
df.groupby(by='kind')
.height.agg({'height_min':lambda x: np.min(x**2), 'height_max':'max'})
)
using lambda with the new NamedAgg function doesn't work
(
df.groupby(by='kind')
.agg(height_min=pd.NamedAgg(column='height', aggfunc=lambda x: np.min(x**2)),
height_max=pd.NamedAgg(column='height', aggfunc='max')
)
)
using lambda with implicit NamedAgg function doesn't work either
(
df.groupby(by='kind')
.agg(height_min=('height', lambda x: np.min(x**2)),
height_max=('height', 'max')
)
)
Can anyone explain why a lambda function doesn't work here?
Here is one way to do this using 0.25 syntax with a single aggregration column:
df.groupby('kind')['height'].agg(height_min=lambda x: np.min(x**2),
height_max='max')
Output:
height_min height_max
kind
cat 82.81 9.5
dog 36.00 34.0
However, I do think this is a bug.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With