I have a DataFrame df
filled with rows and columns where there are duplicate Id's:
Index Id Type 0 a1 A 1 a2 A 2 b1 B 3 b3 B 4 a1 A ...
When I use:
uniqueId = df["Id"].unique()
I get a list of unique IDs.
How can I however apply this filtering on the whole DataFrame such that it keeps the structure but that the duplicates (based on "Id") are removed?
You can use the pandas unique() function to get the different unique values present in a column. It returns a numpy array of the unique values in the column.
It seems you need DataFrame.drop_duplicates
with parameter subset
which specify where are test duplicates:
#keep first duplicate value df = df.drop_duplicates(subset=['Id']) print (df) Id Type Index 0 a1 A 1 a2 A 2 b1 B 3 b3 B
#keep last duplicate value df = df.drop_duplicates(subset=['Id'], keep='last') print (df) Id Type Index 1 a2 A 2 b1 B 3 b3 B 4 a1 A
#remove all duplicate values df = df.drop_duplicates(subset=['Id'], keep=False) print (df) Id Type Index 1 a2 A 2 b1 B 3 b3 B
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With