Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

remove cases, all id's, for cases where NO changes have occured between time 1 and time 2

I have two data frames with a range of cases. One from time 1 and one from time 2. I am looking for a way to quickly identify cases where changes have occurred between time1 and time 2 and I am kinda stuck.

Here is an example. So, I have a data frame from time 1,

df.t1 <- data.frame(id = c(1,1,1,2,2,3,3,5,5,6), ABC = LETTERS[1:10], Num = 101:110)

and it looks like this,

df.t1
   id ABC Num
1   1   A 101
2   1   B 102
3   1   C 103
4   2   D 104
5   2   E 105
6   3   F 106
7   3   G 107
8   5   H 108
9   5   I 109
10  6   J 110

time two rolls around

df.t2 <- df.t1

and some changes occur,

df.t2[3,3] <- 104
df.t2[2,2] <- "H"
df.t2[8,3] <- 999
df.t2[10,3] <- NA
df.t2[11,] <- c(3, "J", 107)

this is time 2,

df.t2
   id ABC  Num
1   1   A  101
2   1   H  102
3   1   C  104
4   2   D  104
5   2   E  105
6   3   F  106
7   3   G  107
8   5   H  999
9   5   I  109
10  6   J <NA>
11  3   J  107

I'm now looking for a quick way to remove cases, all id's, for cases where NO changes have occurred in the case (any row) between time1 and time 2. In the specific example it is only with id # 2 that no changes occurred between time 1 and time 2.

I am looking for an end result that looks like this,

(df <- subset(df.t2, id != 2))
   id ABC  Num
1   1   A  101
2   1   H  102
3   1   C  104
6   3   F  106
7   3   G  107
8   5   H  999
9   5   I  109
10  6   J <NA>
11  3   J  107

any help would be appreciated.

like image 368
Eric Fail Avatar asked Feb 14 '13 00:02

Eric Fail


2 Answers

I think this would work. A solution using data.table:

require(data.table)
dt1 <- data.table(df.t1)
dt2 <- data.table(df.t2)
# your conversion made them characters
dt2[, `:=`(id = as.numeric(id), Num = as.numeric(Num))]
setkey(dt1, "id", "ABC")
setkey(dt2, "id", "ABC")
dt <- dt1[dt2]
dt2[id %in% dt[, !(all(Num == Num.1)) | any(is.na(c(Num, Num.1))), by=id][V1 == TRUE]$id]

#    id ABC Num
# 1:  1   A 101
# 2:  1   C 104
# 3:  1   H 102
# 4:  3   F 106
# 5:  3   G 107
# 6:  3   J 107
# 7:  5   H 999
# 8:  5   I 109
# 9:  6   J  NA

Alternatively, after obtaining dt = dt1[dt2]:

dt2[id %in% dt[, ctr := Num %in% Num.1, by=1:nrow(dt)][ctr == FALSE, unique(id)]]
like image 54
Arun Avatar answered Sep 28 '22 15:09

Arun


What about using data.table and keying by all of id, ABC and NUM

require(data.table)
dt1 <- data.table(df.t1)
dt2 <- data.table(df.t2)
# your conversion made them characters
dt2[, `:=`(id = as.numeric(id), Num = as.numeric(Num))]
setkey(dt1, "id", "ABC", "Num")
setkey(dt2, "id", "ABC", "Num")
# then it is just

dt2[dt2[!dt1][,list(unique(id))]]


   id ABC Num
1:  1   A 101
2:  1   C 104
3:  1   H 102
4:  3   F 106
5:  3   G 107
6:  3   J 107
7:  5   H 999
8:  5   I 109
9:  6   J  NA

This uses a non-join, between dt2 and dt1, selects the unique id values from these, and then subsets the dt2 data appropriately.

with a caveat about NA values in keys being problematic....

like image 37
mnel Avatar answered Sep 28 '22 14:09

mnel