Select first row when there are multiple rows with repeated values in a column [duplicate]

Question

I want to select the first row when there are multiple rows with repeated values in a column.

For example:

import pandas as pd
df = pd.DataFrame({'col1':['one', 'one', 'one', 'one', 'one', 'one', 'one', 'one'], 
                   'col2':['ID=ABCD1234', 'ID=ABCD1234', 'ID=ABCD1234', 'ID=ABCD5678', 
                           'ID=ABCD5678', 'ID=ABCD5678', 'ID=ABCD9102', 'ID=ABCD9102']})

The pandas dataframe looks like this:

print(df)
  col1         col2
0  one  ID=ABCD1234
1  one  ID=ABCD1234
2  one  ID=ABCD1234
3  one  ID=ABCD5678
4  one  ID=ABCD5678
5  one  ID=ABCD5678
6  one  ID=ABCD9102
7  one  ID=ABCD9102

I want the row 0, row 3, and row 6 to be selected and output as a new dataframe.

Expected output:

      col1         col2
    0  one  ID=ABCD1234
    3  one  ID=ABCD5678
    6  one  ID=ABCD9102

Joe · Accepted Answer

You can use:

df.drop_duplicates(subset = ['col2'], keep = 'first', inplace = True)

filbranden · Answer

Simply group by the values of the row and use first() to pick the first row:

df.groupby('col2').first()

You might decide to group by multiple columns too:

df.groupby(['col1', 'col2']).first()

Select first row when there are multiple rows with repeated values in a column [duplicate]

Tags:

python

pandas

dataframe

botloggy

2 Answers

Joe

filbranden

Recent Activity

Donate For Us

Select first row when there are multiple rows with repeated values in a column [duplicate]

Tags:

python

pandas

dataframe

botloggy

2 Answers

Joe

filbranden

Related questions

Recent Activity

Donate For Us