I have a data.table
res
which has data as follows:
V1 V2 V3 V4
1: Day_1 4 4 4
2: Day_2 1 1 2
3: Day_3 4 5 4
4: Day_4 3 4 4
5: Day_5 3 2 3
I need to select the most frequent value from the columns V2, V3 and V4 combined. That is, I need to select result as follows:
Day_1 4
Day_2 1
Day_3 4
Day_4 4
Day_5 3
I'm not expecting any ties since the there will be always odd number of columns. Is it possible to manipulate the data.table
to do this? Or should I modify it some other data type?
Thanks - V
I'm posting this as a data.table
version of this old question until something better is offered
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
DT[, .(res = Mode(unlist(.SD))), by = V1]
# V1 res
# 1: Day_1 4
# 2: Day_2 1
# 3: Day_3 4
# 4: Day_4 4
# 5: Day_5 3
Convert to long form and then it's trivial to do:
dt <- data.table(id=paste("Day",1:5,sep="_"),V2=c(4,1,4,3,3),V3=c(4,1,5,4,2),V4=c(4,2,4,4,3))
melt(dt, id.vars = 'id')[, .N, by = .(id, value)][, value[which.max(N)], by = id]
# id V1
#1: Day_1 4
#2: Day_2 1
#3: Day_3 4
#4: Day_4 4
#5: Day_5 3
This is significantly faster than the other options so far, as long as number of unique (id,value)
pairs is not too large.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With