I have a pandas data frame which looks like this.
Column1 Column2 Column3 0 cat 1 C 1 dog 1 A 2 cat 1 B
I want to identify that cat and bat are same values which have been repeated and hence want to remove one record and preserve only the first record. The resulting data frame should only have.
Column1 Column2 Column3 0 cat 1 C 1 dog 1 A
Use pandas. DataFrame. drop() method to delete/remove rows with condition(s).
Pandas drop_duplicates() Function Syntax If 'first', duplicate rows except the first one is deleted. If 'last', duplicate rows except the last one is deleted.
Using drop_duplicates
with subset
with list of columns to check for duplicates on and keep='first'
to keep first of duplicates.
If dataframe
is:
df = pd.DataFrame({'Column1': ["'cat'", "'toy'", "'cat'"], 'Column2': ["'bat'", "'flower'", "'bat'"], 'Column3': ["'xyz'", "'abc'", "'lmn'"]}) print(df)
Result:
Column1 Column2 Column3 0 'cat' 'bat' 'xyz' 1 'toy' 'flower' 'abc' 2 'cat' 'bat' 'lmn'
Then:
result_df = df.drop_duplicates(subset=['Column1', 'Column2'], keep='first') print(result_df)
Result:
Column1 Column2 Column3 0 'cat' 'bat' 'xyz' 1 'toy' 'flower' 'abc'
import pandas as pd df = pd.DataFrame({"Column1":["cat", "dog", "cat"], "Column2":[1,1,1], "Column3":["C","A","B"]}) df = df.drop_duplicates(subset=['Column1'], keep='first') print(df)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With