Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get unique values from multiple columns in a pandas groupby

Tags:

python

pandas

Starting from this dataframe df:

df = pd.DataFrame({'c':[1,1,1,2,2,2],'l1':['a','a','b','c','c','b'],'l2':['b','d','d','f','e','f']})     c l1 l2 0  1  a  b 1  1  a  d 2  1  b  d 3  2  c  f 4  2  c  e 5  2  b  f 

I would like to perform a groupby over the c column to get unique values of the l1 and l2 columns. For one columns I can do:

g = df.groupby('c')['l1'].unique() 

that correctly returns:

c 1    [a, b] 2    [c, b] Name: l1, dtype: object 

but using:

g = df.groupby('c')['l1','l2'].unique() 

returns:

AttributeError: 'DataFrameGroupBy' object has no attribute 'unique' 

I know I can get the unique values for the two columns with (among others):

In [12]: np.unique(df[['l1','l2']]) Out[12]: array(['a', 'b', 'c', 'd', 'e', 'f'], dtype=object) 

Is there a way to apply this method to the groupby in order to get something like:

c 1    [a, b, d] 2    [c, b, e, f] Name: l1, dtype: object 
like image 595
Fabio Lamanna Avatar asked Mar 19 '16 20:03

Fabio Lamanna


People also ask

How do you get unique values in Groupby pandas?

To count unique values per groups in Python Pandas, we can use df. groupby('column_name'). count().

How do I get unique values from multiple columns in pandas?

Pandas series aka columns has a unique() method that filters out only unique values from a column. The first output shows only unique FirstNames. We can extend this method using pandas concat() method and concat all the desired columns into 1 single column and then find the unique of the resultant column.

Can you use Groupby with multiple columns in pandas?

How to groupby multiple columns in pandas DataFrame and compute multiple aggregations? groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.

How do I get unique values from two columns in a data frame?

To find unique values from multiple columns, use the unique() method. Let's say you have Employee Records with “EmpName” and “Zone” in your Pandas DataFrame.


2 Answers

You can do it with apply:

import numpy as np g = df.groupby('c')['l1','l2'].apply(lambda x: list(np.unique(x))) 
like image 191
ayhan Avatar answered Sep 16 '22 12:09

ayhan


Alternatively, you can use agg:

g = df.groupby('c')['l1','l2'].agg(['unique']) 
like image 33
Yaakov Bressler Avatar answered Sep 16 '22 12:09

Yaakov Bressler