How can I get the rows by distinct values in <code>COL2</code>? For example, I have the dataframe below: <pre class="prettyprint"><code>COL1 COL2 a.com 22 b.com 45 c.com 34 e.com 45 f.com 56 g.com 22 h.com 45 </code></pre> I want to get the rows based on unique values in <code>COL2</code>: <pre class="prettyprint"><code>COL1 COL2 a.com 22 b.com 45 c.com 34 f.com 56 </code></pre> So, how can I get that? I would appreciate it very much if anyone can provide any help.

Use <code>drop_duplicates</code> with specifying column <code>COL2</code> for check duplicates: <pre class="prettyprint"><code>df = df.drop_duplicates('COL2') #same as #df = df.drop_duplicates('COL2', keep='first') print (df) COL1 COL2 0 a.com 22 1 b.com 45 2 c.com 34 4 f.com 56 </code></pre> You can also keep only last values: <pre class="prettyprint"><code>df = df.drop_duplicates('COL2', keep='last') print (df) COL1 COL2 2 c.com 34 4 f.com 56 5 g.com 22 6 h.com 45 </code></pre> Or remove all duplicates: <pre class="prettyprint"><code>df = df.drop_duplicates('COL2', keep=False) print (df) COL1 COL2 2 c.com 34 4 f.com 56 </code></pre>

Get Rows based on distinct values from Column 2

Tags:

python

pandas

How can I get the rows by distinct values in COL2?

For example, I have the dataframe below:

COL1   COL2 a.com  22 b.com  45 c.com  34 e.com  45 f.com  56 g.com  22 h.com  45

I want to get the rows based on unique values in COL2:

COL1  COL2 a.com 22 b.com 45 c.com 34 f.com 56

So, how can I get that? I would appreciate it very much if anyone can provide any help.

674

asked Apr 29 '17 11:04

import.zee

1 Answers

Use drop_duplicates with specifying column COL2 for check duplicates:

df = df.drop_duplicates('COL2') #same as #df = df.drop_duplicates('COL2', keep='first') print (df)     COL1  COL2 0  a.com    22 1  b.com    45 2  c.com    34 4  f.com    56

You can also keep only last values:

df = df.drop_duplicates('COL2', keep='last') print (df)     COL1  COL2 2  c.com    34 4  f.com    56 5  g.com    22 6  h.com    45

Or remove all duplicates:

df = df.drop_duplicates('COL2', keep=False) print (df)     COL1  COL2 2  c.com    34 4  f.com    56

158

answered Oct 17 '22 23:10

jezrael

Related questions
                            
                                Tensorflow r1.0 : could not a find a version that satisfies the requirement tensorflow
                            
                                How to fix "ImportError: DLL load failed" while importing win32api
                            
                                Is it true that I can't use curly braces in Python?
                            
                                PIL: Thumbnail and end up with a square image
                            
                                Reclassing an instance in Python
                            
                                How can I rewrite python __version__ with git?
                            
                                How do I read an image file using Python? [closed]
                            
                                TypeError: unhashable type: 'numpy.ndarray'
                            
                                How come regex match objects aren't iterable even though they implement __getitem__?
                            
                                How to create only one table with SQLAlchemy?
                            
                                Get a preview JPEG of a PDF on Windows?
                            
                                Using the multiprocessing module for cluster computing
                            
                                Python relative-import script two levels up
                            
                                Is there a way to use Python unit test assertions outside of a TestCase?
                            
                                Why are arbitrary target expressions allowed in for-loops?
                            
                                Python type() or __class__, == or is
                            
                                Is python += string concatenation bad practice?
                            
                                Why can't environmental variables set in python persist?
                            
                                Add "b" prefix to python variable?
                            
                                AttributeError: module 'urllib' has no attribute 'parse'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With