How can I get the rows by distinct values in COL2
?
For example, I have the dataframe below:
COL1 COL2 a.com 22 b.com 45 c.com 34 e.com 45 f.com 56 g.com 22 h.com 45
I want to get the rows based on unique values in COL2
:
COL1 COL2 a.com 22 b.com 45 c.com 34 f.com 56
So, how can I get that? I would appreciate it very much if anyone can provide any help.
And you can use the following syntax to select unique rows across specific columns in a pandas DataFrame: df = df. drop_duplicates(subset=['col1', 'col2', ...])
Pandas series aka columns has a unique() method that filters out only unique values from a column. The first output shows only unique FirstNames. We can extend this method using pandas concat() method and concat all the desired columns into 1 single column and then find the unique of the resultant column.
Adding the DISTINCT keyword to a SELECT query causes it to return only unique values for the specified column list so that duplicate rows are removed from the result set. Since DISTINCT operates on all of the fields in SELECT's column list, it can't be applied to an individual field that are part of a larger group.
Use drop_duplicates
with specifying column COL2
for check duplicates:
df = df.drop_duplicates('COL2') #same as #df = df.drop_duplicates('COL2', keep='first') print (df) COL1 COL2 0 a.com 22 1 b.com 45 2 c.com 34 4 f.com 56
You can also keep only last values:
df = df.drop_duplicates('COL2', keep='last') print (df) COL1 COL2 2 c.com 34 4 f.com 56 5 g.com 22 6 h.com 45
Or remove all duplicates:
df = df.drop_duplicates('COL2', keep=False) print (df) COL1 COL2 2 c.com 34 4 f.com 56
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With