How can I get the most frequent item in a pandas
series?
Consider the series s
s = pd.Series("1 5 3 3 3 5 2 1 8 10 2 3 3 3".split()).astype(int)
The returned value should be 3
You can just use pd.Series.mode
and extract the first value:
res = s.mode().iloc[0]
This not necessarily inefficient. As always, test with your data to see what suits.
import numpy as np, pandas as pd
from scipy.stats.mstats import mode
from collections import Counter
np.random.seed(0)
s = pd.Series(np.random.randint(0, 100, 100000))
def jez_np(s):
_, idx, counts = np.unique(s, return_index=True, return_counts=True)
index = idx[np.argmax(counts)]
val = s[index]
return val
def pir(s):
i, r = s.factorize()
return r[np.bincount(i).argmax()]
%timeit s.mode().iloc[0] # 1.82 ms
%timeit pir(s) # 2.21 ms
%timeit s.value_counts().index[0] # 2.52 ms
%timeit mode(s).mode[0] # 5.64 ms
%timeit jez_np(s) # 8.26 ms
%timeit Counter(s).most_common(1)[0][0] # 8.27 ms
Use value_counts
and select first value by index
:
val = s.value_counts().index[0]
Or Counter.most_common
:
from collections import Counter
val = Counter(s).most_common(1)[0][0]
Or numpy solution:
_, idx, counts = np.unique(s, return_index=True, return_counts=True)
index = idx[np.argmax(counts)]
val = s[index]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With