I am working with a pd.Series where each entry is a list. I would like to find the mode of the series, that is, the most common list in this series. I have tried using both pandas.Series.value_counts and pandas.Series.mode. However, both of these approaches lead to the following exception being raised:
TypeError: unhashable type: 'list'
Here is a simple example of such a series:
pd.Series([[1,2,3], [4,5,6], [1,2,3]])
I am looking for a function that will return [1,2,3].
You need to convert to tuple , then using mode
pd.Series([[1,2,3], [4,5,6], [1,2,3]]).apply(tuple).mode().apply(list)
Out[192]:
0 [1, 2, 3]
dtype: object
Slightly improvement:
list(pd.Series([[1,2,3], [4,5,6], [1,2,3]]).apply(tuple).mode().iloc[0])
Out[210]: [1, 2, 3]
Since two apply is ugly
s=pd.Series([[1,2,3], [4,5,6], [1,2,3]])
s[s.astype(str)==s.astype(str).mode()[0]].iloc[0]
Out[205]: [1, 2, 3]
Lists are not hashable, so you will need to transform your Series of lists to a Series of tuples.
Once you do that, you can use a Counter to quickly and efficiently generate a multi-set of tuples, and then use Counter.most_common to extract the most common element (AKA, the mode).
s = pd.Series([[1,2,3], [4,5,6], [1,2,3]])
from collections import Counter
c = Counter(tuple(l) for l in s)
list(c.most_common(1)[0][0])
[1, 2, 3]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With