Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Obtain mode from column in groupby [duplicate]

Tags:

python

pandas

I'm trying to obtain the mode of a column in a groupby object, but I'm getting this error: incompatible index of inserted column with frame index.

This is the line I'm getting this on, and I'm not sure how to fix it. Any help would be appreciated.

dfBitSeq['KMeans'] = df.groupby('OnBitSeq')['KMeans'].apply(lambda x: x.mode())
like image 929
John Avatar asked Jan 28 '23 17:01

John


2 Answers

Pandas mode returns a data frame unlike mean and median which return a scalar. So you just need to select the slice using x.mode().iloc[0]

dfBitSeq['KMeans'] = df.groupby('OnBitSeq')['KMeans'].apply(lambda x: x.mode().iloc[0])
like image 144
Vaishali Avatar answered Jan 31 '23 09:01

Vaishali


You can use scipy.stats.mode. Example below.

from scipy.stats import mode

df = pd.DataFrame([[1, 5], [2, 3], [3, 5], [2, 4], [2, 3], [1, 4], [1, 5]],
                  columns=['OnBitSeq', 'KMeans'])

#    OnBitSeq  KMeans
# 0         1       5
# 1         2       3
# 2         3       5
# 3         2       4
# 4         2       3
# 5         1       4
# 6         1       5

modes = df.groupby('OnBitSeq')['KMeans'].apply(lambda x: mode(x)[0][0]).reset_index()

#    OnBitSeq  KMeans
# 0         1       5
# 1         2       3
# 2         3       5

If you need to add this back to the original dataframe:

df['Mode'] = df['OnBitSeq'].map(modes.set_index('OnBitSeq')['KMeans'])
like image 24
jpp Avatar answered Jan 31 '23 09:01

jpp