Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does the aggregate function in pandas groupby treat builtin functions differently?

Came across this seemingly odd behaviour while discussing https://stackoverflow.com/a/47543066/9017455.

The OP had this dataframe:

x = pd.DataFrame.from_dict({
    'cat1':['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C'],
    'cat2':['X', 'X', 'Y', 'Y', 'Y', 'Y', 'Z', 'Z']})

and wanted to find unique cat2 values for each group of cat1 values.

One option is to aggregate and use a lambda to create a set of unique values:

x.groupby('cat1').agg(lambda x: set(x))

# Returns
        cat2
cat1        
A     {X, Y}
B        {Y}
C     {Z, Y}

I assumed using set on its own would be equivalent to the lambda here, since it is callable, however:

x.groupby('cat1').agg(set)

# Returns
              cat2
cat1              
A     {cat1, cat2}
B     {cat1, cat2}
C     {cat1, cat2}

I get the same behaviour as the lambda method if I define a proper function, and by doing that I can see that pandas calls that function with a Series. It appears that set is being called with a DataFrame, hence it returns the set of column names when iterating over the object.

This seems like inconsistent behaviour. Can anyone shed some light on why Pandas treats the builtin functions differently?

Edit

Looking at how SeriesGroupBy.agg behaves might provide some more insight. Passing any type to this function results in an error "TypeError: 'type' object is not iterable".

x.groupby('cat1')['cat2'].agg(set)
like image 217
Simon Bowly Avatar asked Nov 07 '22 13:11

Simon Bowly


1 Answers

This behaviour seems to have changed by now. At least here in version 0.23.0, both lambda x: set(x) and set behave identically:

In [6]: x.groupby('cat1').agg(set)
Out[6]:
        cat2
cat1
A     {Y, X}
B        {Y}
C     {Y, Z}

In [7]: x.groupby('cat1').agg(lambda x: set(x))
Out[7]:
        cat2
cat1
A     {Y, X}
B        {Y}
C     {Y, Z}

I could not positively identify the change, but bug #16405 looks suspiciously related (although the fix was already released with 0.20.2 in June 2017, long before this question...).

like image 160
ojdo Avatar answered Nov 14 '22 22:11

ojdo