I am wondering how to perform some basic data manipulation in R. What i want to do is the following.
I have a data table with the following pattern :
V1 V2 V3
ABC X 24
ABC Y 30
EFG X 4
EFG Y 28
HIJ P 40
HIJ Y 41
PKL X 32
Now i want to retrieve all the values/pairs of V1 where it doesn't have a corresponding value which is not X on V2. In the above dataset this subset would be
HIJ P 40
HIJ Y 41
Since neither of the pair of HIJ have a V2 value of X.
I would also like to retrieve all values of V1 that don't repeat twice. In the above example it would be
PKL X 32
You mentioned data.table
, so here's two possible approaches for both requests
library(data.table)
For 1.
setDT(df)[, .SD[all(V2 != "X")], by = V1]
# V1 V2 V3
# 1: HIJ P 40
# 2: HIJ Y 41
For 2.
df[, .SD[.N == 1L], by = V1]
# V1 V2 V3
# 1: PKL X 32
Or (a bit more optimized version)
indx <- df[, .(indx = .I[.N == 1L]), by = V1]$indx
df[indx]
# V1 V2 V3
# 1: PKL X 32
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With