Dataframe:
one two
a 1 x
b 1 y
c 2 y
d 2 z
e 3 z
grp = DataFrame.groupby('one')
grp.agg(lambda x: ???) #or equivalent function
Desired output from grp.agg:
one two
1 x|y
2 y|z
3 z
My agg function before integrating dataframes was "|".join(sorted(set(x)))
. Ideally I want to have any number of columns in the group and agg returns the "|".join(sorted(set())
for each column item like two above. I also tried np.char.join()
.
Love Pandas and it has taken me from a 800 line complicated program to a 400 line walk in the park that zooms. Thank you :)
Simply use the apply method to each dataframe in the groupby object. This is the most straightforward way and the easiest to understand. Notice that the function takes a dataframe as its only argument, so any code within the custom function needs to work on a pandas dataframe.
To apply aggregations to multiple columns, just add additional key:value pairs to the dictionary. Applying multiple aggregation functions to a single column will result in a multiindex. Working with multi-indexed columns is a pain and I'd recommend flattening this after aggregating by renaming the new columns.
Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame. apply. For a DataFrame, can pass a dict, if the keys are DataFrame column names.
Just an elaboration on the accepted answer:
df.groupby('one').agg(lambda x: "|".join(x.tolist()))
Note that the type of df.groupby('one')
is SeriesGroupBy
. And the function agg
defined on this type. If you check the documentation of this function, it says its input is a function that works on Series. This means that x
type in the above lambda is Series.
Another note is that defining the agg function as lambda is not necessary. If the aggregation function is complex, it can be defined separately as a regular function like below. The only constraint is that the x type should be of Series (or compatible with it):
def myfun1(x):
return "|".join(x.tolist())
and then:
df.groupby('one').agg(myfun1)
You were so close:
In [1]: df.groupby('one').agg(lambda x: "|".join(x.tolist()))
Out[1]:
two
one
1 x|y
2 y|z
3 z
Expanded answer to handle sorting and take only the set:
In [1]: df = DataFrame({'one':[1,1,2,2,3], 'two':list('xyyzz'), 'three':list('eecba')}, index=list('abcde'), columns=['one','two','three'])
In [2]: df
Out[2]:
one two three
a 1 x e
b 1 y e
c 2 y c
d 2 z b
e 3 z a
In [3]: df.groupby('one').agg(lambda x: "|".join(x.order().unique().tolist()))
Out[3]:
two three
one
1 x|y e
2 y|z b|c
3 z a
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With