Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas drop duplicates on elements made of lists

Say my dataframe is:

df = pandas.DataFrame([[[1,0]],[[0,0]],[[1,0]]])

which yields:

        0
0  [1, 0]
1  [0, 0]
2  [1, 0]

I want to drop duplicates, and only get elements [1,0] and [0,0], if I write:

df.drop_duplicates()

I get the following error: TypeError: unhashable type: 'list'

How can I call drop_duplicates()?

More in general:

df = pandas.DataFrame([[[1,0],"a"],[[0,0],"b"],[[1,0],"c"]], columns=["list", "letter"])

And I want to call df["list"].drop_duplicates(), so drop_duplicates applies to a Series and not a dataframe?

like image 472
user Avatar asked May 18 '18 19:05

user


People also ask

How do you drop duplicate rows in pandas based on a column?

Use DataFrame. drop_duplicates() to Drop Duplicate and Keep First Rows. You can use DataFrame. drop_duplicates() without any arguments to drop rows with the same values on all columns.

How do I get rid of consecutive duplicates in pandas?

To drop consecutive duplicates with Python Pandas, we can use shift . to check if the last column isn't equal the current one with a. shift(-1) !=

Does drop duplicates ignore index?

Return DataFrame with duplicate rows removed. Considering certain columns is optional. Indexes, including time indexes are ignored.


2 Answers

I tried the other answers but they didn't solve what I needed (large dataframe with multiple list columns).

I solved it this way:

df = df[~df.astype(str).duplicated()]
like image 79
Andreas Avatar answered Sep 18 '22 16:09

Andreas


You can use numpy.unique() function:

>>> df = pandas.DataFrame([[[1,0]],[[0,0]],[[1,0]]])
>>> pandas.DataFrame(np.unique(df), columns=df.columns)
        0
0  [0, 0]
1  [1, 0]

If you want to preserve the order checkout: numpy.unique with order preserved

like image 22
Mazdak Avatar answered Sep 18 '22 16:09

Mazdak