Starting from this dataframe df:
df = pd.DataFrame({'c':[1,1,1,2,2,2],'l1':['a','a','b','c','c','b'],'l2':['b','d','d','f','e','f']}) c l1 l2 0 1 a b 1 1 a d 2 1 b d 3 2 c f 4 2 c e 5 2 b f
I would like to perform a groupby over the c
column to get unique values of the l1
and l2
columns. For one columns I can do:
g = df.groupby('c')['l1'].unique()
that correctly returns:
c 1 [a, b] 2 [c, b] Name: l1, dtype: object
but using:
g = df.groupby('c')['l1','l2'].unique()
returns:
AttributeError: 'DataFrameGroupBy' object has no attribute 'unique'
I know I can get the unique values for the two columns with (among others):
In [12]: np.unique(df[['l1','l2']]) Out[12]: array(['a', 'b', 'c', 'd', 'e', 'f'], dtype=object)
Is there a way to apply this method to the groupby in order to get something like:
c 1 [a, b, d] 2 [c, b, e, f] Name: l1, dtype: object
To count unique values per groups in Python Pandas, we can use df. groupby('column_name'). count().
Pandas series aka columns has a unique() method that filters out only unique values from a column. The first output shows only unique FirstNames. We can extend this method using pandas concat() method and concat all the desired columns into 1 single column and then find the unique of the resultant column.
How to groupby multiple columns in pandas DataFrame and compute multiple aggregations? groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.
To find unique values from multiple columns, use the unique() method. Let's say you have Employee Records with “EmpName” and “Zone” in your Pandas DataFrame.
You can do it with apply
:
import numpy as np g = df.groupby('c')['l1','l2'].apply(lambda x: list(np.unique(x)))
Alternatively, you can use agg
:
g = df.groupby('c')['l1','l2'].agg(['unique'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With