all,
I have a column in a dataframe that looks like this:
allHoldingsFund['BrokerMixed']
Out[419]:
78 ML
81 CITI
92 ML
173 CITI
235 ML
262 ML
264 ML
25617 GS
25621 CITI
25644 CITI
25723 GS
25778 CITI
25786 CITI
25793 GS
25797 CITI
Name: BrokerMixed, Length: 2554, dtype: object
Although the column is an object. I am not able to group by that column or even extract the unique values of that column. For example when I do:
allHoldingsFund['BrokerMixed'].unique()
I get an error
uniques = table.unique(values)
File "pandas/_libs/hashtable_class_helper.pxi", line 1340, in pandas._libs.hashtable.PyObjectHashTable.unique
TypeError: unhashable type: 'numpy.ndarray'
I also get an error when I do group by.
Any help is welcome. Thank you
Looks like you have a NumPy array in your series. But you can't hash NumPy arrays and pd.Series.unique, like set, relies on hashing.
If you can't ensure your series data only consists of strings, you can convert NumPy arrays to tuples before calling pd.Series.unique:
s = pd.Series([np.array([1, 2, 3]), 1, 'hello', 'test', 1, 'test'])
def tuplizer(x):
return tuple(x) if isinstance(x, (np.ndarray, list)) else x
res = s.apply(tuplizer).unique()
print(res)
array([(1, 2, 3), 1, 'hello', 'test'], dtype=object)
Of course, this means your data type information is lost in the result, but at least you get to see your "unique" NumPy arrays, provided they are 1-dimensional.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With