I am trying to use drop_duplicates method on my dataframe, but I am getting an error. See the following: <blockquote> error: TypeError: unhashable type: 'list' </blockquote> The code I am using: <pre class="prettyprint"><code>df = db.drop_duplicates() </code></pre> My DB is huge and contains strings, floats, dates, NaN's, booleans, integers... Any help is appreciated.

drop_duplicates won't work with lists in your dataframe as the error message implies. However, you can drop duplicates on the dataframe casted as str and then extract the rows from original df using the index from the results. Setup <pre class="prettyprint"><code>df = pd.DataFrame({'Keyword': {0: 'apply', 1: 'apply', 2: 'apply', 3: 'terms', 4: 'terms'}, 'X': {0: [1, 2], 1: [1, 2], 2: 'xy', 3: 'xx', 4: 'yy'}, 'Y': {0: 'yy', 1: 'yy', 2: 'yx', 3: 'ix', 4: 'xi'}}) #Drop directly causes the same error df.drop_duplicates() Traceback (most recent call last): ... TypeError: unhashable type: 'list' </code></pre> Solution <pre class="prettyprint"><code>#convert hte df to str type, drop duplicates and then select the rows from original df. df.loc[df.astype(str).drop_duplicates().index] Out[205]: Keyword X Y 0 apply [1, 2] yy 2 apply xy yx 3 terms xx ix 4 terms yy xi #the list elements are still list in the final results. df.loc[df.astype(str).drop_duplicates().index].loc[0,'X'] Out[207]: [1, 2] </code></pre> <blockquote> Edit: replaced iloc with loc. In this particular case, both work as the index matches the positional index, but it is not general </blockquote>

@Allen's answer is great, but have a little problem. <pre class="prettyprint"><code>df.iloc[df.astype(str).drop_duplicates().index] </code></pre> it should be loc not iloc.loot at the example. <pre class="prettyprint"><code>a = pd.DataFrame([['a',18],['b',11],['a',18]],index=[4,6,8]) Out[52]: 0 1 4 a 18 6 b 11 8 a 18 a.iloc[a.astype(str).drop_duplicates().index] Out[53]: ... IndexError: positional indexers are out-of-bounds a.loc[a.astype(str).drop_duplicates().index] Out[54]: 0 1 4 a 18 6 b 11 </code></pre>

Pandas drop_duplicates method not working on dataframe containing lists

Tags:

python

list

pandas

duplicates

drop-duplicates

I am trying to use drop_duplicates method on my dataframe, but I am getting an error. See the following:

error: TypeError: unhashable type: 'list'

The code I am using:

df = db.drop_duplicates()

My DB is huge and contains strings, floats, dates, NaN's, booleans, integers... Any help is appreciated.

553

asked May 08 '17 19:05

SLack A

4 Answers

drop_duplicates won't work with lists in your dataframe as the error message implies. However, you can drop duplicates on the dataframe casted as str and then extract the rows from original df using the index from the results.

Setup

df = pd.DataFrame({'Keyword': {0: 'apply', 1: 'apply', 2: 'apply', 3: 'terms', 4: 'terms'},
 'X': {0: [1, 2], 1: [1, 2], 2: 'xy', 3: 'xx', 4: 'yy'},
 'Y': {0: 'yy', 1: 'yy', 2: 'yx', 3: 'ix', 4: 'xi'}})

#Drop directly causes the same error
df.drop_duplicates()
Traceback (most recent call last):
...
TypeError: unhashable type: 'list'

Solution

#convert hte df to str type, drop duplicates and then select the rows from original df.

df.loc[df.astype(str).drop_duplicates().index]
Out[205]: 
  Keyword       X   Y
0   apply  [1, 2]  yy
2   apply      xy  yx
3   terms      xx  ix
4   terms      yy  xi

#the list elements are still list in the final results.
df.loc[df.astype(str).drop_duplicates().index].loc[0,'X']
Out[207]: [1, 2]

Edit: replaced iloc with loc. In this particular case, both work as the index matches the positional index, but it is not general

155

answered Oct 16 '22 13:10

Allen

@Allen's answer is great, but have a little problem.

df.iloc[df.astype(str).drop_duplicates().index]

it should be loc not iloc.loot at the example.

a = pd.DataFrame([['a',18],['b',11],['a',18]],index=[4,6,8])
Out[52]: 
   0   1
4  a  18
6  b  11
8  a  18

a.iloc[a.astype(str).drop_duplicates().index]
Out[53]:
...
IndexError: positional indexers are out-of-bounds

a.loc[a.astype(str).drop_duplicates().index]
Out[54]: 
   0   1
4  a  18
6  b  11

answered Oct 16 '22 13:10

Hsgao

I also just want to mention (in case someone else is as stupid as I was), that you will get the same error if you mistakenly give a list of lists as the 'subset' argument for the drop_duplicates function.

Turns out I spend hours looking for a list that wasn't in my dataframe all because I put one to many brackets in my parameters.

answered Oct 16 '22 12:10

Peter Erichsen

Overview: you can see which rows are duplicated

Method 1:

df2=df.copy()
mylist=df2.iloc[0,1]
df2.iloc[0,1]=' '.join(map(str,mylist))

mylist=df2.iloc[1,1]
df2.iloc[1,1]=' '.join(map(str,mylist))

duplicates=df2.duplicated(keep=False)
print(df2[duplicates])

Method 2:

print(df.astype(str).duplicated(keep=False))

answered Oct 16 '22 13:10

Golden Lion

Related questions
                            
                                Python package import from parent directory
                            
                                Tkinter assign button command in loop with lambda
                            
                                How to do group by on a multiindex in pandas?
                            
                                Structure of inputs to scipy minimize function
                            
                                Python Matplotlib - how to specify values on y axis?
                            
                                Missing data, insert rows in Pandas and fill with NAN
                            
                                Seaborn boxplot + stripplot: duplicate legend
                            
                                Adding attributes to python objects
                            
                                SQLAlchemy - build query filter dynamically from dict
                            
                                Convert a JSON schema to a python class
                            
                                How python-Levenshtein.ratio is computed
                            
                                Mutually exclusive option groups in python Click
                            
                                Failed to upload packages to PyPI: 410 Gone
                            
                                Query for list of attribute instead of tuples in SQLAlchemy
                            
                                Multiple positional arguments with Python and argparse
                            
                                Python's Logical Operator AND
                            
                                `AttributeError: rint` when using numpy.round
                            
                                Build a URL using Requests module Python
                            
                                Coordinates of the closest points of two geometries in Shapely
                            
                                TypeError: 'type' object is not subscriptable when indexing in to a dictionary

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas drop_duplicates method not working on dataframe containing lists

Tags:

python

list

pandas

duplicates

drop-duplicates

SLack A

People also ask

4 Answers

Allen

Hsgao

Peter Erichsen

Golden Lion

Recent Activity

Donate For Us