Say my dataframe is: <pre class="prettyprint"><code>df = pandas.DataFrame([[[1,0]],[[0,0]],[[1,0]]]) </code></pre> which yields: <pre class="prettyprint"><code> 0 0 [1, 0] 1 [0, 0] 2 [1, 0] </code></pre> I want to drop duplicates, and only get elements [1,0] and [0,0], if I write: <pre class="prettyprint"><code>df.drop_duplicates() </code></pre> I get the following error: TypeError: unhashable type: 'list' How can I call drop_duplicates()? More in general: <pre class="prettyprint"><code>df = pandas.DataFrame([[[1,0],"a"],[[0,0],"b"],[[1,0],"c"]], columns=["list", "letter"]) </code></pre> And I want to call df["list"].drop_duplicates(), so drop_duplicates applies to a Series and not a dataframe?

I tried the other answers but they didn't solve what I needed (large dataframe with multiple list columns). I solved it this way: <pre class="prettyprint"><code>df = df[~df.astype(str).duplicated()] </code></pre>

You can use <code>numpy.unique()</code> function: <pre class="prettyprint"><code>>>> df = pandas.DataFrame([[[1,0]],[[0,0]],[[1,0]]]) >>> pandas.DataFrame(np.unique(df), columns=df.columns) 0 0 [0, 0] 1 [1, 0] </code></pre> If you want to preserve the order checkout: numpy.unique with order preserved

Pandas drop duplicates on elements made of lists

Say my dataframe is:

df = pandas.DataFrame([[[1,0]],[[0,0]],[[1,0]]])

which yields:

        0
0  [1, 0]
1  [0, 0]
2  [1, 0]

I want to drop duplicates, and only get elements [1,0] and [0,0], if I write:

df.drop_duplicates()

I get the following error: TypeError: unhashable type: 'list'

How can I call drop_duplicates()?

More in general:

df = pandas.DataFrame([[[1,0],"a"],[[0,0],"b"],[[1,0],"c"]], columns=["list", "letter"])

And I want to call df["list"].drop_duplicates(), so drop_duplicates applies to a Series and not a dataframe?

How do you drop duplicate rows in pandas based on a column?

Use DataFrame. drop_duplicates() to Drop Duplicate and Keep First Rows. You can use DataFrame. drop_duplicates() without any arguments to drop rows with the same values on all columns.

How do I get rid of consecutive duplicates in pandas?

To drop consecutive duplicates with Python Pandas, we can use shift . to check if the last column isn't equal the current one with a. shift(-1) !=

Does drop duplicates ignore index?

Return DataFrame with duplicate rows removed. Considering certain columns is optional. Indexes, including time indexes are ignored.

I tried the other answers but they didn't solve what I needed (large dataframe with multiple list columns).

I solved it this way:

df = df[~df.astype(str).duplicated()]

You can use numpy.unique() function:

>>> df = pandas.DataFrame([[[1,0]],[[0,0]],[[1,0]]])
>>> pandas.DataFrame(np.unique(df), columns=df.columns)
        0
0  [0, 0]
1  [1, 0]

If you want to preserve the order checkout: numpy.unique with order preserved

Pandas drop duplicates on elements made of lists

Tags:

python

python-3.x

pandas

user

People also ask

2 Answers

Andreas

Mazdak

Recent Activity

Donate For Us

Pandas drop duplicates on elements made of lists

Tags:

python

python-3.x

pandas

user

People also ask

2 Answers

Andreas

Mazdak

Related questions

Recent Activity

Donate For Us