Having pandas data frame <code>df</code> with at least columns C1,C2,C3 how would you get all the unique C1,C2,C3 values as a new DataFrame? in other words, similiar to : <pre class="prettyprint"><code>SELECT C1,C2,C3 FROM T GROUP BY C1,C2,C3 </code></pre> Tried that <pre class="prettyprint"><code>print df.groupby(by=['C1','C2','C3']) </code></pre> but im getting <pre class="prettyprint"><code><pandas.core.groupby.DataFrameGroupBy object at 0x000000000769A9E8> </code></pre>

I believe you need <code>drop_duplicates</code> if want all unique triples: <pre class="prettyprint"><code>df = df.drop_duplicates(subset=['C1','C2','C3']) </code></pre> If want use <code>groupby</code> add <code>first</code>: <pre class="prettyprint"><code>df = df.groupby(by=['C1','C2','C3'], as_index=False).first() </code></pre>

Get unique values of multiple columns as a new dataframe in pandas

Tags:

python

pandas

pandas-groupby

Having pandas data frame df with at least columns C1,C2,C3 how would you get all the unique C1,C2,C3 values as a new DataFrame?

in other words, similiar to :

SELECT C1,C2,C3
FROM T
GROUP BY C1,C2,C3

Tried that

print df.groupby(by=['C1','C2','C3'])

but im getting

<pandas.core.groupby.DataFrameGroupBy object at 0x000000000769A9E8>

649

asked Jan 06 '18 20:01

Ofek Ron

1 Answers

I believe you need drop_duplicates if want all unique triples:

df = df.drop_duplicates(subset=['C1','C2','C3'])

If want use groupby add first:

df = df.groupby(by=['C1','C2','C3'], as_index=False).first()

140

answered Oct 18 '22 06:10

jezrael

Related questions
                            
                                Turn off the upper/right axis tick marks
                            
                                python closure with assigning outer variable inside inner function
                            
                                MongoDB insert raises duplicate key error
                            
                                How to my "exe" from PyCharm project [duplicate]
                            
                                cx_Oracle: ImportError: DLL load failed: This application has failed
                            
                                Writing a csv file into SQL Server database using python
                            
                                Pre-allocating a list of None
                            
                                Python Selenium Webdriver - Try except loop
                            
                                Stop nosetests from printing logging information?
                            
                                matplotlib imshow plots different if using colormap or RGB array
                            
                                setting SparkContext for pyspark
                            
                                Unwanted extra dimensions in NumPy array
                            
                                How to get the process name by pid in Linux using Python?
                            
                                Python's equivalent of Ruby's ||=
                            
                                How do I mock an open(...).write() without getting a 'No such file or directory' error?
                            
                                How to shutdown a computer using Python
                            
                                How to do a column sum in Tensorflow?
                            
                                Extract string within parentheses - PYTHON
                            
                                python pickle UnicodeDecodeError
                            
                                Drop A specific row In Pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With