Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataFrame.nunique() : ("unhashable type : 'list'", 'occured at index columns')

Tags:

python

pandas

I want to apply the .nunique() function to a full dataFrame.

On the following screenshot, we can see that it contains 130 features. Screenshot of shape and columns of the dataframe. The goal is to get the number of different values per feature. I use the following code (that worked on another dataFrame).

def nbDifferentValues(data):
    total = data.nunique()
    total = total.sort_values(ascending=False)
    percent = (total/data.shape[0]*100)
    return pd.concat([total, percent], axis=1, keys=['Total','Pourcentage'])

diffValues = nbDifferentValues(dataFrame)

And the code fails at the first line and I get the following error which I don't know how to solve ("unhashable type : 'list'", 'occured at index columns'): Trace of the error

like image 734
Thomas Coquereau Avatar asked Jan 28 '23 23:01

Thomas Coquereau


1 Answers

You probably have a column whose content are lists.

Since lists in Python are mutable they are unhashable.

import pandas as pd

df = pd.DataFrame([
    (0, [1,2]),
    (1, [2,3])    
])

#  raises "unhashable type : 'list'" error
df.nunique()

SOLUTION: Don't use mutable structures (like lists) in your dataframe:

df = pd.DataFrame([
    (0, (1,2)),
    (1, (2,3))    
])

df.nunique()

#  0    2
#  1    2
#  dtype: int64
like image 50
raul ferreira Avatar answered Jan 30 '23 13:01

raul ferreira