Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting frequent values from multiple columns from R table

Tags:

r

data.table

I have a data.table res which has data as follows:

            V1 V2 V3 V4
  1:     Day_1  4  4  4
  2:     Day_2  1  1  2
  3:     Day_3  4  5  4
  4:     Day_4  3  4  4
  5:     Day_5  3  2  3

I need to select the most frequent value from the columns V2, V3 and V4 combined. That is, I need to select result as follows:

Day_1 4
Day_2 1
Day_3 4
Day_4 4
Day_5 3

I'm not expecting any ties since the there will be always odd number of columns. Is it possible to manipulate the data.table to do this? Or should I modify it some other data type?

Thanks - V

like image 844
visakh Avatar asked Dec 19 '22 02:12

visakh


2 Answers

I'm posting this as a data.table version of this old question until something better is offered

Mode <- function(x) {
  ux <- unique(x)  
  ux[which.max(tabulate(match(x, ux)))]
}

DT[, .(res = Mode(unlist(.SD))), by = V1]

#       V1 res
# 1: Day_1   4
# 2: Day_2   1
# 3: Day_3   4
# 4: Day_4   4
# 5: Day_5   3
like image 52
David Arenburg Avatar answered Mar 08 '23 16:03

David Arenburg


Convert to long form and then it's trivial to do:

dt <- data.table(id=paste("Day",1:5,sep="_"),V2=c(4,1,4,3,3),V3=c(4,1,5,4,2),V4=c(4,2,4,4,3))

melt(dt, id.vars = 'id')[, .N, by = .(id, value)][, value[which.max(N)], by = id]
#      id V1
#1: Day_1  4
#2: Day_2  1
#3: Day_3  4
#4: Day_4  4
#5: Day_5  3

This is significantly faster than the other options so far, as long as number of unique (id,value) pairs is not too large.

like image 32
eddi Avatar answered Mar 08 '23 18:03

eddi