How can I get the rows by distinct values in COL2?
For example, I have the dataframe below:
COL1   COL2 a.com  22 b.com  45 c.com  34 e.com  45 f.com  56 g.com  22 h.com  45  I want to get the rows based on unique values in COL2:
COL1  COL2 a.com 22 b.com 45 c.com 34 f.com 56  So, how can I get that? I would appreciate it very much if anyone can provide any help.
And you can use the following syntax to select unique rows across specific columns in a pandas DataFrame: df = df. drop_duplicates(subset=['col1', 'col2', ...])
Pandas series aka columns has a unique() method that filters out only unique values from a column. The first output shows only unique FirstNames. We can extend this method using pandas concat() method and concat all the desired columns into 1 single column and then find the unique of the resultant column.
Adding the DISTINCT keyword to a SELECT query causes it to return only unique values for the specified column list so that duplicate rows are removed from the result set. Since DISTINCT operates on all of the fields in SELECT's column list, it can't be applied to an individual field that are part of a larger group.
Use drop_duplicates with specifying column COL2 for check duplicates:
df = df.drop_duplicates('COL2') #same as #df = df.drop_duplicates('COL2', keep='first') print (df)     COL1  COL2 0  a.com    22 1  b.com    45 2  c.com    34 4  f.com    56   You can also keep only last values:
df = df.drop_duplicates('COL2', keep='last') print (df)     COL1  COL2 2  c.com    34 4  f.com    56 5  g.com    22 6  h.com    45   Or remove all duplicates:
df = df.drop_duplicates('COL2', keep=False) print (df)     COL1  COL2 2  c.com    34 4  f.com    56 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With