Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

df ['X'].unique() and TypeError: unhashable type: 'numpy.ndarray'

all,

I have a column in a dataframe that looks like this:

allHoldingsFund['BrokerMixed']
Out[419]: 
78         ML
81       CITI
92         ML
173      CITI
235        ML
262        ML
264        ML
25617      GS
25621    CITI
25644    CITI
25723      GS
25778    CITI
25786    CITI
25793      GS
25797    CITI
Name: BrokerMixed, Length: 2554, dtype: object

Although the column is an object. I am not able to group by that column or even extract the unique values of that column. For example when I do:

allHoldingsFund['BrokerMixed'].unique()

I get an error

     uniques = table.unique(values)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1340, in pandas._libs.hashtable.PyObjectHashTable.unique
TypeError: unhashable type: 'numpy.ndarray'

I also get an error when I do group by.

Any help is welcome. Thank you

like image 264
SBad Avatar asked Jun 21 '26 23:06

SBad


1 Answers

Looks like you have a NumPy array in your series. But you can't hash NumPy arrays and pd.Series.unique, like set, relies on hashing.

If you can't ensure your series data only consists of strings, you can convert NumPy arrays to tuples before calling pd.Series.unique:

s = pd.Series([np.array([1, 2, 3]), 1, 'hello', 'test', 1, 'test'])

def tuplizer(x):
    return tuple(x) if isinstance(x, (np.ndarray, list)) else x

res = s.apply(tuplizer).unique()

print(res)

array([(1, 2, 3), 1, 'hello', 'test'], dtype=object)

Of course, this means your data type information is lost in the result, but at least you get to see your "unique" NumPy arrays, provided they are 1-dimensional.

like image 179
jpp Avatar answered Jun 23 '26 12:06

jpp