Lets say I have a dataframe like below. What I want is that, if a number between columns a,b,c appear most then it should output that number or if the all three numbers are different then take the number of a. For example, in first row, 1 appears most among 1 and 5 then output in d is 1. But in second row, all three numbers 11, 2, 7 of column a,b,c are different, output is the value of column a(11), so output in d is 11
list a b c
1 1 5 1
11 11 2 7
0 0 0 0
9 5 9 5
8 8 2 7
Expected output
list a b c d
1 1 5 1 1
11 11 2 7 11
0 0 0 0 0
9 5 9 5 5
8 8 2 7 8
scipy calculates mode, but I am surprised not to find this in numpy.
import pandas as pd
import numpy as np
from scipy import stats
df = pd.DataFrame([[1, 1, 5, 1],
[11, 11, 2, 7],
[0, 0, 0, 0],
[9, 5, 9, 5],
[8, 8, 2, 7]],
columns=['list', 'a', 'b', 'c'])
df['d'], df['count'] = stats.mode(df[['a', 'b', 'c']].values, axis=1)
df.loc[df['count'] == 1, 'd'] = df['a']
df = df.drop('count', 1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With