Say my dataframe is:
df = pandas.DataFrame([[[1,0]],[[0,0]],[[1,0]]])
which yields:
0
0 [1, 0]
1 [0, 0]
2 [1, 0]
I want to drop duplicates, and only get elements [1,0] and [0,0], if I write:
df.drop_duplicates()
I get the following error: TypeError: unhashable type: 'list'
How can I call drop_duplicates()?
More in general:
df = pandas.DataFrame([[[1,0],"a"],[[0,0],"b"],[[1,0],"c"]], columns=["list", "letter"])
And I want to call df["list"].drop_duplicates(), so drop_duplicates applies to a Series and not a dataframe?
Use DataFrame. drop_duplicates() to Drop Duplicate and Keep First Rows. You can use DataFrame. drop_duplicates() without any arguments to drop rows with the same values on all columns.
To drop consecutive duplicates with Python Pandas, we can use shift . to check if the last column isn't equal the current one with a. shift(-1) !=
Return DataFrame with duplicate rows removed. Considering certain columns is optional. Indexes, including time indexes are ignored.
I tried the other answers but they didn't solve what I needed (large dataframe with multiple list columns).
I solved it this way:
df = df[~df.astype(str).duplicated()]
You can use numpy.unique()
function:
>>> df = pandas.DataFrame([[[1,0]],[[0,0]],[[1,0]]])
>>> pandas.DataFrame(np.unique(df), columns=df.columns)
0
0 [0, 0]
1 [1, 0]
If you want to preserve the order checkout: numpy.unique with order preserved
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With