Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python pandas custom agg function

Dataframe:
  one two
a  1  x
b  1  y
c  2  y
d  2  z
e  3  z

grp = DataFrame.groupby('one')
grp.agg(lambda x: ???) #or equivalent function

Desired output from grp.agg:

one two
1   x|y
2   y|z
3   z

My agg function before integrating dataframes was "|".join(sorted(set(x))). Ideally I want to have any number of columns in the group and agg returns the "|".join(sorted(set()) for each column item like two above. I also tried np.char.join().

Love Pandas and it has taken me from a 800 line complicated program to a 400 line walk in the park that zooms. Thank you :)

like image 858
brian_the_bungler Avatar asked Jan 09 '13 21:01

brian_the_bungler


People also ask

How do I use custom function on Groupby pandas?

Simply use the apply method to each dataframe in the groupby object. This is the most straightforward way and the easiest to understand. Notice that the function takes a dataframe as its only argument, so any code within the custom function needs to work on a pandas dataframe.

How do pandas use multiple aggregate functions?

To apply aggregations to multiple columns, just add additional key:value pairs to the dictionary. Applying multiple aggregation functions to a single column will result in a multiindex. Working with multi-indexed columns is a pain and I'd recommend flattening this after aggregating by renaming the new columns.

What is Groupby AGG?

Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame. apply. For a DataFrame, can pass a dict, if the keys are DataFrame column names.


2 Answers

Just an elaboration on the accepted answer:

df.groupby('one').agg(lambda x: "|".join(x.tolist()))

Note that the type of df.groupby('one') is SeriesGroupBy. And the function agg defined on this type. If you check the documentation of this function, it says its input is a function that works on Series. This means that x type in the above lambda is Series.

Another note is that defining the agg function as lambda is not necessary. If the aggregation function is complex, it can be defined separately as a regular function like below. The only constraint is that the x type should be of Series (or compatible with it):

def myfun1(x):
    return "|".join(x.tolist())

and then:

df.groupby('one').agg(myfun1)
like image 147
qartal Avatar answered Oct 30 '22 10:10

qartal


You were so close:

In [1]: df.groupby('one').agg(lambda x: "|".join(x.tolist()))
Out[1]:
     two
one
1    x|y
2    y|z
3      z

Expanded answer to handle sorting and take only the set:

In [1]: df = DataFrame({'one':[1,1,2,2,3], 'two':list('xyyzz'), 'three':list('eecba')}, index=list('abcde'), columns=['one','two','three'])

In [2]: df
Out[2]:
   one two three
a    1   x     e
b    1   y     e
c    2   y     c
d    2   z     b
e    3   z     a

In [3]: df.groupby('one').agg(lambda x: "|".join(x.order().unique().tolist()))
Out[3]:
     two three
one
1    x|y     e
2    y|z   b|c
3      z     a
like image 37
Zelazny7 Avatar answered Oct 30 '22 09:10

Zelazny7