I have a dataset that contains repeated observations on an individual in a long format. So each row as a type A
or B
observation. The following code reproduces the dataset.
library(data.table)
set.seed(1487)
dat <- data.table(id = rep(seq(10), 2),
type = c(rep("A", 10), rep("B", 10)),
x = sample.int(100,20))
dat
# id type x
# 1: 1 A 38
# 2: 2 A 58
# 3: 3 A 28
# 4: 4 A 21
# 5: 5 A 19
# 6: 6 A 62
# 7: 7 A 52
# 8: 8 A 86
# 9: 9 A 85
# 10: 10 A 90
# 11: 1 B 15
# 12: 2 B 11
# 13: 3 B 37
# 14: 4 B 93
# 15: 5 B 34
# 16: 6 B 91
# 17: 7 B 79
# 18: 8 B 94
# 19: 9 B 24
# 20: 10 B 41
Then I select out the top 3 individuals ranked by x
for both types of observations:
setorderv(dat, c("type", "x"), c(1, -1))
top3 <- dat[, head(.SD, 3), by = list(type)]
top3
# type id x
# 1: A 10 90
# 2: A 8 86
# 3: A 9 85
# 4: B 8 94
# 5: B 4 93
# 6: B 6 91
Now I want to add a column that contains the original x
value for the opposite observation type. If that makes any sense. So, the following code reproduces what I am looking for:
top3[,x2 := c(41, 94, 24, 86, 21, 62)]
# type id x x2
# 1: A 10 90 41
# 2: A 8 86 94
# 3: A 9 85 24
# 4: B 8 94 86
# 5: B 4 93 21
# 6: B 6 91 62
Of course I can go through the whole dataset row-by-row and use an if
statement or whatever. The original dataset is quite big and I am looking for an elegant and efficient way to do it. I really like data.table
and I have been using it a lot lately. I know there is a simple elegant way to do it. I also tried things using .GRP
. I need some help.
Thanks in advance!
MY FINAL SOLUTION
Thanks to those who provided with inspiration. Those interested this is my working solution to my problem that actually works better for the intent of the project.
dat <- dcast.data.table(dat, id~type, value.var = "x")
top3 <- rbind(dat[order(-A), head(.SD, 3L)][,rank_by := "A"],
dat[order(-B), head(.SD, 3L)][,rank_by := "B"])
# id A B rank_by
# 1: 10 90 41 A
# 2: 8 86 94 A
# 3: 9 85 24 A
# 4: 8 86 94 B
# 5: 4 21 93 B
# 6: 6 62 91 B
Cheers,
tstev
Seems like you want to merge back by id
and opposite type. Depending on your particular circumstances I might just skip changing the type, and merge on both types, and discard the same one (code below assumes version 1.9.5+
):
(dat[order(-x), head(.SD, 3), by = type]
[dat, on = 'id', nomatch = 0][type != i.type]
[order(type, -id)])
# type id x i.type i.x
#1: A 10 90 B 41
#2: A 8 86 B 94
#3: A 9 85 B 24
#4: B 8 94 A 86
#5: B 4 93 A 21
#6: B 6 91 A 62
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With