I have created a pandas dataframe but when dropping duplicate rows I am given the error: <blockquote> TypeError: 'Series' objects are mutable, thus they cannot be hashed </blockquote> This happens when I run: <pre class="prettyprint"><code>print(type(data)) # <class 'pandas.core.frame.DataFrame'> check that it's not a series data.drop_duplicates(subset=['statement'], inplace=True) print(data.info()) </code></pre> Info returns this: <pre class="prettyprint"><code>> class 'pandas.core.frame.DataFrame' > Int64Index: 39671 entries, 0 to 39670 > Data columns (total 4 columns): > statement 39671 non-null object > topic_direction 39671 non-null object > topic 39671 non-null object > direction 39671 non-null object > dtypes: object(4) > memory usage: 1.5+ MB > None </code></pre>

the individual elements in your <code>'statement'</code> column are <code>pandas.Series</code>. That is a clear sign that things have gone astray. You can validate my claim by running <code>data['statement'].apply(type)</code> you should see a bunch of <code><pandas.Series></code> or something similar. If you're stuck with the situation, try <pre class="prettyprint"><code>df[~df['statement'].apply(tuple).duplicated()] </code></pre> This forces each element of the <code>'statement'</code> column to be a <code>tuple</code> which is hashable. Then you can find the duplicate rows and filter.

Pandas Drop Duplicates Series Hashing Error

Tags:

python

pandas

I have created a pandas dataframe but when dropping duplicate rows I am given the error:

TypeError: 'Series' objects are mutable, thus they cannot be hashed

This happens when I run:

print(type(data)) # <class 'pandas.core.frame.DataFrame'> check that it's not a series
data.drop_duplicates(subset=['statement'], inplace=True)
print(data.info())

Info returns this:

> class 'pandas.core.frame.DataFrame'
> Int64Index: 39671 entries, 0 to 39670
> Data columns (total 4 columns):
> statement          39671 non-null object
> topic_direction    39671 non-null object
> topic              39671 non-null object
> direction          39671 non-null object
> dtypes: object(4)
> memory usage: 1.5+ MB
> None

765

asked Aug 27 '18 20:08

Jacob B

1 Answers

the individual elements in your 'statement' column are pandas.Series. That is a clear sign that things have gone astray. You can validate my claim by running data['statement'].apply(type) you should see a bunch of <pandas.Series> or something similar.

If you're stuck with the situation, try

df[~df['statement'].apply(tuple).duplicated()]

This forces each element of the 'statement' column to be a tuple which is hashable. Then you can find the duplicate rows and filter.

answered Oct 10 '22 00:10

piRSquared

Related questions
                            
                                Calculating the distance and yaw between ArUco marker and camera?
                            
                                openpyxl - adding new rows in excel file with merged cell existing
                            
                                How to use spec when mocking data classes in Python
                            
                                Python Pony ORM Insert multiple values at once
                            
                                How to make a python object return attribute data when called directly
                            
                                Reading .doc file in Python using antiword in Windows (also .docx)
                            
                                How to get @property methods in asdict?
                            
                                How can I install the pylint for python2.7?
                            
                                Pytest not able to skip testcase in a class via marker skipif
                            
                                How to ignore an invalid SSL certificate with requests_html?
                            
                                Error importing tensorflow in anaconda on Mac OSX
                            
                                Cast dict to defaultdict
                            
                                How to get a list of version numbers for python packages released up until a specific date?
                            
                                Getting outer environment arguments from java using graal python
                            
                                Using Mock in Python for nested objects (DynamoDB and Table)
                            
                                How to pass Passphrase programmatically in Python
                            
                                How to handle query with parameter in python graphene
                            
                                Webcam light still on after cam.release()
                            
                                Filter a GroupBy object where at least 1 row fulfills the condition
                            
                                filling a column values with max value in pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With