remove cases, all id's, for cases where NO changes have occured between time 1 and time 2

Question

I have two data frames with a range of cases. One from time 1 and one from time 2. I am looking for a way to quickly identify cases where changes have occurred between time1 and time 2 and I am kinda stuck.

Here is an example. So, I have a data frame from time 1,

df.t1 <- data.frame(id = c(1,1,1,2,2,3,3,5,5,6), ABC = LETTERS[1:10], Num = 101:110)

and it looks like this,

df.t1
   id ABC Num
1   1   A 101
2   1   B 102
3   1   C 103
4   2   D 104
5   2   E 105
6   3   F 106
7   3   G 107
8   5   H 108
9   5   I 109
10  6   J 110

time two rolls around

df.t2 <- df.t1

and some changes occur,

df.t2[3,3] <- 104
df.t2[2,2] <- "H"
df.t2[8,3] <- 999
df.t2[10,3] <- NA
df.t2[11,] <- c(3, "J", 107)

this is time 2,

df.t2
   id ABC  Num
1   1   A  101
2   1   H  102
3   1   C  104
4   2   D  104
5   2   E  105
6   3   F  106
7   3   G  107
8   5   H  999
9   5   I  109
10  6   J <NA>
11  3   J  107

I'm now looking for a quick way to remove cases, all id's, for cases where NO changes have occurred in the case (any row) between time1 and time 2. In the specific example it is only with id # 2 that no changes occurred between time 1 and time 2.

I am looking for an end result that looks like this,

(df <- subset(df.t2, id != 2))
   id ABC  Num
1   1   A  101
2   1   H  102
3   1   C  104
6   3   F  106
7   3   G  107
8   5   H  999
9   5   I  109
10  6   J <NA>
11  3   J  107

any help would be appreciated.

Arun · Accepted Answer

I think this would work. A solution using data.table:

require(data.table)
dt1 <- data.table(df.t1)
dt2 <- data.table(df.t2)
# your conversion made them characters
dt2[, `:=`(id = as.numeric(id), Num = as.numeric(Num))]
setkey(dt1, "id", "ABC")
setkey(dt2, "id", "ABC")
dt <- dt1[dt2]
dt2[id %in% dt[, !(all(Num == Num.1)) | any(is.na(c(Num, Num.1))), by=id][V1 == TRUE]$id]

#    id ABC Num
# 1:  1   A 101
# 2:  1   C 104
# 3:  1   H 102
# 4:  3   F 106
# 5:  3   G 107
# 6:  3   J 107
# 7:  5   H 999
# 8:  5   I 109
# 9:  6   J  NA

Alternatively, after obtaining dt = dt1[dt2]:

dt2[id %in% dt[, ctr := Num %in% Num.1, by=1:nrow(dt)][ctr == FALSE, unique(id)]]

mnel · Answer

What about using data.table and keying by all of id, ABC and NUM

require(data.table)
dt1 <- data.table(df.t1)
dt2 <- data.table(df.t2)
# your conversion made them characters
dt2[, `:=`(id = as.numeric(id), Num = as.numeric(Num))]
setkey(dt1, "id", "ABC", "Num")
setkey(dt2, "id", "ABC", "Num")
# then it is just

dt2[dt2[!dt1][,list(unique(id))]]


   id ABC Num
1:  1   A 101
2:  1   C 104
3:  1   H 102
4:  3   F 106
5:  3   G 107
6:  3   J 107
7:  5   H 999
8:  5   I 109
9:  6   J  NA

This uses a non-join, between dt2 and dt1, selects the unique id values from these, and then subsets the dt2 data appropriately.

with a caveat about NA values in keys being problematic....

remove cases, all id's, for cases where NO changes have occured between time 1 and time 2

Tags:

r

data-management

Eric Fail

2 Answers

Arun

mnel

Recent Activity

Donate For Us

remove cases, all id's, for cases where NO changes have occured between time 1 and time 2

Tags:

r

data-management

Eric Fail

2 Answers

Arun

mnel

Related questions

Recent Activity

Donate For Us