Starting from this dataframe df: <pre class="prettyprint"><code>df = pd.DataFrame({'c':[1,1,1,2,2,2],'l1':['a','a','b','c','c','b'],'l2':['b','d','d','f','e','f']}) c l1 l2 0 1 a b 1 1 a d 2 1 b d 3 2 c f 4 2 c e 5 2 b f </code></pre> I would like to perform a groupby over the <code>c</code> column to get unique values of the <code>l1</code> and <code>l2</code> columns. For one columns I can do: <pre class="prettyprint"><code>g = df.groupby('c')['l1'].unique() </code></pre> that correctly returns: <pre class="prettyprint"><code>c 1 [a, b] 2 [c, b] Name: l1, dtype: object </code></pre> but using: <pre class="prettyprint"><code>g = df.groupby('c')['l1','l2'].unique() </code></pre> returns: <pre class="prettyprint"><code>AttributeError: 'DataFrameGroupBy' object has no attribute 'unique' </code></pre> I know I can get the unique values for the two columns with (among others): <pre class="prettyprint"><code>In [12]: np.unique(df[['l1','l2']]) Out[12]: array(['a', 'b', 'c', 'd', 'e', 'f'], dtype=object) </code></pre> Is there a way to apply this method to the groupby in order to get something like: <pre class="prettyprint"><code>c 1 [a, b, d] 2 [c, b, e, f] Name: l1, dtype: object </code></pre>

You can do it with <code>apply</code>: <pre class="prettyprint"><code>import numpy as np g = df.groupby('c')['l1','l2'].apply(lambda x: list(np.unique(x))) </code></pre>

Alternatively, you can use <code>agg</code>: <pre class="prettyprint"><code>g = df.groupby('c')['l1','l2'].agg(['unique']) </code></pre>

How to get unique values from multiple columns in a pandas groupby

Tags:

python

pandas

Starting from this dataframe df:

df = pd.DataFrame({'c':[1,1,1,2,2,2],'l1':['a','a','b','c','c','b'],'l2':['b','d','d','f','e','f']})     c l1 l2 0  1  a  b 1  1  a  d 2  1  b  d 3  2  c  f 4  2  c  e 5  2  b  f

I would like to perform a groupby over the c column to get unique values of the l1 and l2 columns. For one columns I can do:

g = df.groupby('c')['l1'].unique()

that correctly returns:

c 1    [a, b] 2    [c, b] Name: l1, dtype: object

but using:

g = df.groupby('c')['l1','l2'].unique()

returns:

AttributeError: 'DataFrameGroupBy' object has no attribute 'unique'

I know I can get the unique values for the two columns with (among others):

In [12]: np.unique(df[['l1','l2']]) Out[12]: array(['a', 'b', 'c', 'd', 'e', 'f'], dtype=object)

Is there a way to apply this method to the groupby in order to get something like:

c 1    [a, b, d] 2    [c, b, e, f] Name: l1, dtype: object

595

asked Mar 19 '16 20:03

Fabio Lamanna

2 Answers

You can do it with apply:

import numpy as np g = df.groupby('c')['l1','l2'].apply(lambda x: list(np.unique(x)))

191

answered Sep 16 '22 12:09

ayhan

Alternatively, you can use agg:

g = df.groupby('c')['l1','l2'].agg(['unique'])

answered Sep 16 '22 12:09

Yaakov Bressler

Related questions
                            
                                About refreshing objects in sqlalchemy session
                            
                                SQLAlchemy delete doesn't cascade
                            
                                Python sockets error TypeError: a bytes-like object is required, not 'str' with send function
                            
                                Is there Django List View model sort?
                            
                                matplotlib: change title and colorbar text and tick colors
                            
                                parsing a tab-separated file in Python
                            
                                Python: Start new command prompt on Windows and wait for it finish/exit
                            
                                Why can't I set a global variable in Python?
                            
                                Python 3.2 - cookielib
                            
                                Create dummies from column with multiple values in pandas
                            
                                Should I pin my Python dependencies versions?
                            
                                Plot multiple columns of pandas DataFrame using Seaborn
                            
                                scipy.io.loadmat nested structures (i.e. dictionaries)
                            
                                top values from dictionary
                            
                                How to fix Python Numpy/Pandas installation?
                            
                                Replace None value in list?
                            
                                Python Scipy FFT wav files
                            
                                python json load set encoding to utf-8
                            
                                Import multiple excel files into python pandas and concatenate them into one dataframe
                            
                                How to install pyaudio on mac using Python 3?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With