Pandas Drop Duplicates Series Hashing Error




I have created a pandas dataframe but when dropping duplicate rows I am given the error:

TypeError: 'Series' objects are mutable, thus they cannot be hashed

This happens when I run:

print(type(data)) # <class 'pandas.core.frame.DataFrame'> check that it's not a series
data.drop_duplicates(subset=['statement'], inplace=True)

Info returns this:

> class 'pandas.core.frame.DataFrame'
> Int64Index: 39671 entries, 0 to 39670
> Data columns (total 4 columns):
> statement          39671 non-null object
> topic_direction    39671 non-null object
> topic              39671 non-null object
> direction          39671 non-null object
> dtypes: object(4)
> memory usage: 1.5+ MB
> None
1 Answers

the individual elements in your 'statement' column are pandas.Series. That is a clear sign that things have gone astray. You can validate my claim by running data['statement'].apply(type) you should see a bunch of <pandas.Series> or something similar.

If you're stuck with the situation, try


This forces each element of the 'statement' column to be a tuple which is hashable. Then you can find the duplicate rows and filter.

