I have a pandas
data frame with a column uniqueid
. I would like to remove all duplicates from the data frame based on this column, such that all remaining observations are unique.
You can get unique values in column (multiple columns) from pandas DataFrame using unique() or Series. unique() functions. unique() from Series is used to get unique values from a single column and the other one is used to get from multiple columns.
Pandas: Series - unique() function The unique() function is used to get unique values of Series object. Uniques are returned in order of appearance. Hash table-based unique, therefore does NOT sort. The unique values returned as a NumPy array.
With the help of np. unique() method, we can get the unique values from an array given as parameter in np. unique() method.
We can get unique row values in Pandas DataFrame using the drop_duplicates() function. It removes all duplicate rows based on column values and returns unique rows. If you want to get duplicate rows from Pandas DataFrame you can use DataFrame. duplicated() function.
There is also the drop_duplicates()
method for any data frame (docs here). You can pass specific columns to drop from as an argument.
df.drop_duplicates(subset='uniqueid', inplace=True)
Use the duplicated
method
Since we only care if uniqueid
(A
in my example) is duplicated, select that and call duplicated
on that series. Then use the ~
to flip the bools.
In [90]: df = pd.DataFrame({'A': ['a', 'b', 'b', 'c'], 'B': [1, 2, 3, 4]})
In [91]: df
Out[91]:
A B
0 a 1
1 b 2
2 b 3
3 c 4
In [92]: df['A'].duplicated()
Out[92]:
0 False
1 False
2 True
3 False
Name: A, dtype: bool
In [93]: df.loc[~df['A'].duplicated()]
Out[93]:
A B
0 a 1
1 b 2
3 c 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With