df ['X'].unique() and TypeError: unhashable type: 'numpy.ndarray'

Question

all,

I have a column in a dataframe that looks like this:

allHoldingsFund['BrokerMixed']
Out[419]: 
78         ML
81       CITI
92         ML
173      CITI
235        ML
262        ML
264        ML
25617      GS
25621    CITI
25644    CITI
25723      GS
25778    CITI
25786    CITI
25793      GS
25797    CITI
Name: BrokerMixed, Length: 2554, dtype: object

Although the column is an object. I am not able to group by that column or even extract the unique values of that column. For example when I do:

allHoldingsFund['BrokerMixed'].unique()

I get an error

     uniques = table.unique(values)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1340, in pandas._libs.hashtable.PyObjectHashTable.unique
TypeError: unhashable type: 'numpy.ndarray'

I also get an error when I do group by.

Any help is welcome. Thank you

jpp · Accepted Answer

Looks like you have a NumPy array in your series. But you can't hash NumPy arrays and pd.Series.unique, like set, relies on hashing.

If you can't ensure your series data only consists of strings, you can convert NumPy arrays to tuples before calling pd.Series.unique:

s = pd.Series([np.array([1, 2, 3]), 1, 'hello', 'test', 1, 'test'])

def tuplizer(x):
    return tuple(x) if isinstance(x, (np.ndarray, list)) else x

res = s.apply(tuplizer).unique()

print(res)

array([(1, 2, 3), 1, 'hello', 'test'], dtype=object)

Of course, this means your data type information is lost in the result, but at least you get to see your "unique" NumPy arrays, provided they are 1-dimensional.

df ['X'].unique() and TypeError: unhashable type: 'numpy.ndarray'

Tags:

python

pandas

group-by

SBad

1 Answers

jpp

Recent Activity

Donate For Us

df ['X'].unique() and TypeError: unhashable type: 'numpy.ndarray'

Tags:

python

pandas

group-by

SBad

1 Answers

jpp

Related questions

Recent Activity

Donate For Us