I have created a pandas dataframe but when dropping duplicate rows I am given the error:
TypeError: 'Series' objects are mutable, thus they cannot be hashed
This happens when I run:
print(type(data)) # <class 'pandas.core.frame.DataFrame'> check that it's not a series
data.drop_duplicates(subset=['statement'], inplace=True)
print(data.info())
Info returns this:
> class 'pandas.core.frame.DataFrame'
> Int64Index: 39671 entries, 0 to 39670
> Data columns (total 4 columns):
> statement 39671 non-null object
> topic_direction 39671 non-null object
> topic 39671 non-null object
> direction 39671 non-null object
> dtypes: object(4)
> memory usage: 1.5+ MB
> None
Pandas drop_duplicates() Function Syntax keep: allowed values are {'first', 'last', False}, default 'first'. If 'first', duplicate rows except the first one is deleted. If 'last', duplicate rows except the last one is deleted. If False, all the duplicate rows are deleted.
Drop duplicates and reset the index When we drop the rows from DataFrame, by default, it keeps the original row index as is. But, if we need to reset the index of the resultant DataFrame, we can do that using the ignore_index parameter of DataFrame. drop_duplicate() .
The drop_duplicates() method removes duplicate rows. Use the subset parameter if only some specified columns should be considered when looking for duplicates.
the individual elements in your 'statement'
column are pandas.Series
. That is a clear sign that things have gone astray. You can validate my claim by running data['statement'].apply(type)
you should see a bunch of <pandas.Series>
or something similar.
If you're stuck with the situation, try
df[~df['statement'].apply(tuple).duplicated()]
This forces each element of the 'statement'
column to be a tuple
which is hashable. Then you can find the duplicate rows and filter.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With