I have two data frames with a range of cases. One from time 1 and one from time 2. I am looking for a way to quickly identify cases where changes have occurred between time1 and time 2 and I am kinda stuck.
Here is an example. So, I have a data frame from time 1,
df.t1 <- data.frame(id = c(1,1,1,2,2,3,3,5,5,6), ABC = LETTERS[1:10], Num = 101:110)
and it looks like this,
df.t1
id ABC Num
1 1 A 101
2 1 B 102
3 1 C 103
4 2 D 104
5 2 E 105
6 3 F 106
7 3 G 107
8 5 H 108
9 5 I 109
10 6 J 110
time two rolls around
df.t2 <- df.t1
and some changes occur,
df.t2[3,3] <- 104
df.t2[2,2] <- "H"
df.t2[8,3] <- 999
df.t2[10,3] <- NA
df.t2[11,] <- c(3, "J", 107)
this is time 2,
df.t2
id ABC Num
1 1 A 101
2 1 H 102
3 1 C 104
4 2 D 104
5 2 E 105
6 3 F 106
7 3 G 107
8 5 H 999
9 5 I 109
10 6 J <NA>
11 3 J 107
I'm now looking for a quick way to remove cases, all id's, for cases where NO changes have occurred in the case (any row) between time1 and time 2. In the specific example it is only with id # 2 that no changes occurred between time 1 and time 2.
I am looking for an end result that looks like this,
(df <- subset(df.t2, id != 2))
id ABC Num
1 1 A 101
2 1 H 102
3 1 C 104
6 3 F 106
7 3 G 107
8 5 H 999
9 5 I 109
10 6 J <NA>
11 3 J 107
any help would be appreciated.
I think this would work. A solution using data.table
:
require(data.table)
dt1 <- data.table(df.t1)
dt2 <- data.table(df.t2)
# your conversion made them characters
dt2[, `:=`(id = as.numeric(id), Num = as.numeric(Num))]
setkey(dt1, "id", "ABC")
setkey(dt2, "id", "ABC")
dt <- dt1[dt2]
dt2[id %in% dt[, !(all(Num == Num.1)) | any(is.na(c(Num, Num.1))), by=id][V1 == TRUE]$id]
# id ABC Num
# 1: 1 A 101
# 2: 1 C 104
# 3: 1 H 102
# 4: 3 F 106
# 5: 3 G 107
# 6: 3 J 107
# 7: 5 H 999
# 8: 5 I 109
# 9: 6 J NA
Alternatively, after obtaining dt = dt1[dt2]
:
dt2[id %in% dt[, ctr := Num %in% Num.1, by=1:nrow(dt)][ctr == FALSE, unique(id)]]
What about using data.table and keying by all of id
, ABC
and NUM
require(data.table)
dt1 <- data.table(df.t1)
dt2 <- data.table(df.t2)
# your conversion made them characters
dt2[, `:=`(id = as.numeric(id), Num = as.numeric(Num))]
setkey(dt1, "id", "ABC", "Num")
setkey(dt2, "id", "ABC", "Num")
# then it is just
dt2[dt2[!dt1][,list(unique(id))]]
id ABC Num
1: 1 A 101
2: 1 C 104
3: 1 H 102
4: 3 F 106
5: 3 G 107
6: 3 J 107
7: 5 H 999
8: 5 I 109
9: 6 J NA
This uses a non-join, between dt2 and dt1, selects the unique id values from these, and then subsets the dt2
data appropriately.
with a caveat about NA values in keys being problematic....
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With