Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: check if a number appear multiple times in a row

Tags:

python

pandas

Lets say I have a dataframe like below. What I want is that, if a number between columns a,b,c appear most then it should output that number or if the all three numbers are different then take the number of a. For example, in first row, 1 appears most among 1 and 5 then output in d is 1. But in second row, all three numbers 11, 2, 7 of column a,b,c are different, output is the value of column a(11), so output in d is 11

list   a  b   c  
 1     1  5   1 
11    11  2   7 
 0     0  0   0 
 9     5  9   5 
 8     8  2   7  

Expected output

list   a  b   c  d 
 1     1  5   1  1
11    11  2   7  11
 0     0  0   0  0
 9     5  9   5  5
 8     8  2   7  8 
like image 391
user1670773 Avatar asked Jan 23 '18 00:01

user1670773


1 Answers

scipy calculates mode, but I am surprised not to find this in numpy.

import pandas as pd
import numpy as np
from scipy import stats

df = pd.DataFrame([[1, 1, 5, 1],
                   [11, 11, 2, 7],
                   [0, 0, 0, 0],
                   [9, 5, 9, 5],
                   [8, 8, 2, 7]],
                  columns=['list', 'a', 'b', 'c'])

df['d'], df['count'] = stats.mode(df[['a', 'b', 'c']].values, axis=1)
df.loc[df['count'] == 1, 'd'] = df['a']
df = df.drop('count', 1)
like image 61
jpp Avatar answered Sep 22 '22 02:09

jpp