Let's say I have a data.table
like
sample<-data.table(id=c(1,1,2,2,3,3,3,4,4),
name=c("apple","apple","orange","orange",
"pear","pear","pear","banana","banana"),
atr=c("pretty","ugly","bruised","delicious",
"pear-shaped","bruised","infested",
"too-ripe","perfect"),
N=c(10,9,15,4,5,7,7,4,12))
I want to return essentially unique(sample[,list(id, name)])
except that I also want the atr
column for the value with the greatest N. In cases where there is a tie for highest N then I don't care which of the two is picked but I want only one to be picked.
This almost works merge(sample[,list(N=max(N)),by=list(id,name1)], sample,by=c("id","name1","N"))
but since pear has two atr
values that tie for max this returns two pears. Besides it not giving the intended result, I also assume/hope there's a way to do this that doesn't involve the join.
You could use atr[N == max(N)][1]
to return only the first in the case of a tie, like this -
library(data.table)
sample[, .(atr = atr[N == max(N)][1]), by = .(id, name)]
# id name atr
# 1: 1 apple pretty
# 2: 2 orange bruised
# 3: 3 pear bruised
# 4: 4 banana perfect
Note: As Frank points out atr[N == max(N)][1]
is also simply atr[which.max(N)]
I would just use order
:
> unique(sample[order(-N), .(id, name, atr)], by = c("id", "name"))
id name atr
1: 2 orange bruised
2: 4 banana perfect
3: 1 apple pretty
4: 3 pear bruised
If you want to maintain overall sorting, just use order(id, name, -N)
instead.
You could also split this to two lines:
setorder(sample, -N) #done by reference, as with all set* functions in data.table
unique(sample[ , .(id, name, atr)], by = c("id", "name"))
Or perhaps better depending on your end goal:
setkey(setorder(sample, -N), id, name)
unique(sample[ , .(id, name, atr)])
(note: order is crucial in this last, as using setorder
first will overwrite keys to NULL
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With