I'm trying to obtain the mode of a column in a groupby object, but I'm getting this error: incompatible index of inserted column with frame index
.
This is the line I'm getting this on, and I'm not sure how to fix it. Any help would be appreciated.
dfBitSeq['KMeans'] = df.groupby('OnBitSeq')['KMeans'].apply(lambda x: x.mode())
Pandas mode returns a data frame unlike mean and median which return a scalar. So you just need to select the slice using x.mode().iloc[0]
dfBitSeq['KMeans'] = df.groupby('OnBitSeq')['KMeans'].apply(lambda x: x.mode().iloc[0])
You can use scipy.stats.mode
. Example below.
from scipy.stats import mode
df = pd.DataFrame([[1, 5], [2, 3], [3, 5], [2, 4], [2, 3], [1, 4], [1, 5]],
columns=['OnBitSeq', 'KMeans'])
# OnBitSeq KMeans
# 0 1 5
# 1 2 3
# 2 3 5
# 3 2 4
# 4 2 3
# 5 1 4
# 6 1 5
modes = df.groupby('OnBitSeq')['KMeans'].apply(lambda x: mode(x)[0][0]).reset_index()
# OnBitSeq KMeans
# 0 1 5
# 1 2 3
# 2 3 5
If you need to add this back to the original dataframe:
df['Mode'] = df['OnBitSeq'].map(modes.set_index('OnBitSeq')['KMeans'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With