Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lookup in R using data.table

Tags:

r

data.table

I have a dataset that contains repeated observations on an individual in a long format. So each row as a type A or B observation. The following code reproduces the dataset.

library(data.table)
set.seed(1487)
dat <- data.table(id = rep(seq(10), 2), 
                  type = c(rep("A", 10), rep("B", 10)), 
                  x = sample.int(100,20))
dat
#     id type  x
#  1:  1    A 38
#  2:  2    A 58
#  3:  3    A 28
#  4:  4    A 21
#  5:  5    A 19
#  6:  6    A 62
#  7:  7    A 52
#  8:  8    A 86
#  9:  9    A 85
# 10: 10    A 90
# 11:  1    B 15
# 12:  2    B 11
# 13:  3    B 37
# 14:  4    B 93
# 15:  5    B 34
# 16:  6    B 91
# 17:  7    B 79
# 18:  8    B 94
# 19:  9    B 24
# 20: 10    B 41

Then I select out the top 3 individuals ranked by x for both types of observations:

setorderv(dat, c("type", "x"), c(1, -1))
top3 <- dat[, head(.SD, 3), by = list(type)]
top3
#    type id  x
# 1:    A 10 90
# 2:    A  8 86
# 3:    A  9 85
# 4:    B  8 94
# 5:    B  4 93
# 6:    B  6 91

Now I want to add a column that contains the original x value for the opposite observation type. If that makes any sense. So, the following code reproduces what I am looking for:

top3[,x2 := c(41, 94, 24, 86, 21, 62)]
#    type id  x x2
# 1:    A 10 90 41
# 2:    A  8 86 94
# 3:    A  9 85 24
# 4:    B  8 94 86
# 5:    B  4 93 21
# 6:    B  6 91 62

Of course I can go through the whole dataset row-by-row and use an if statement or whatever. The original dataset is quite big and I am looking for an elegant and efficient way to do it. I really like data.table and I have been using it a lot lately. I know there is a simple elegant way to do it. I also tried things using .GRP. I need some help.

Thanks in advance!

MY FINAL SOLUTION

Thanks to those who provided with inspiration. Those interested this is my working solution to my problem that actually works better for the intent of the project.

dat <- dcast.data.table(dat, id~type, value.var = "x")
top3 <- rbind(dat[order(-A), head(.SD, 3L)][,rank_by := "A"],
              dat[order(-B), head(.SD, 3L)][,rank_by := "B"])
#    id  A  B rank_by
# 1: 10 90 41       A
# 2:  8 86 94       A
# 3:  9 85 24       A
# 4:  8 86 94       B
# 5:  4 21 93       B
# 6:  6 62 91       B

Cheers,

tstev

like image 784
tstev Avatar asked Dec 20 '22 01:12

tstev


1 Answers

Seems like you want to merge back by id and opposite type. Depending on your particular circumstances I might just skip changing the type, and merge on both types, and discard the same one (code below assumes version 1.9.5+):

(dat[order(-x), head(.SD, 3), by = type]
    [dat, on = 'id', nomatch = 0][type != i.type]
    [order(type, -id)])
#   type id  x i.type i.x
#1:    A 10 90      B  41
#2:    A  8 86      B  94
#3:    A  9 85      B  24
#4:    B  8 94      A  86
#5:    B  4 93      A  21
#6:    B  6 91      A  62
like image 185
eddi Avatar answered Jan 18 '23 04:01

eddi