I got these duplicated records from ton of data. Now, I need to choose one row from these duplicated rows.
ID <- c("6820","6820","17413","17413","38553","38553","52760","52760","717841","717841","717841","747187","747187","747187")
date <- c("2014-06-12","2015-06-11","2014-05-01","2014-05-01","2014-06-12","2015-06-11","2014-10-24","2014-10-24","2014-05-01","2014-05-01","2014-12-02","2014-03-01","2014-05-12","2014-05-12")
type <- c("ST","ST","MC","MC","LC","LC","YA","YA","YA","YA","MC","LC","LC","MC")
level <-c("firsttime","new","new","active","active","active","firsttime","new","active","new","active","new","active","active")
data <- data.frame(ID,date,type,level)
The data frame will look like this:

I want to compare this: for each ID,if their dates are different, then keep all of them in df.right; if the date is same, then compare type, choose them in order of LC>MC>YA>ST (eg. choose MC over YA), put them into df.right; if type is same, then compare level, choose them in order of active>new>firsttime (eg. choose new over first time), and put the choosen into df.right.
I tried to use foreach, this is only on the first step, and it is not working for ID have 3 duplicated rows.
foreach (i=unique(data$ID), .combine='rbind') %do% {data[data$ID==i, "date"][1] == data[data$ID==i, "date"][2])
b <- data[data$ID==i,]}
The result should be like this:
Does anybody knows how to do this? really appreciate it. Thank you
The dplyr package is good for this sort of thing
Using factors, you can specify how you want your categories ordered. Then you can pick the first of each type and level for each unique ID/date pair.
library(dplyr)
ID <- c("6820","6820","17413","17413","38553","38553","52760","52760","717841","717841","717841","747187","747187","747187")
date <- c("2014-06-12","2015-06-11","2014-05-01","2014-05-01","2014-06-12","2015-06-11","2014-10-24","2014-10-24","2014-05-01","2014-05-01","2014-12-02","2014-03-01","2014-05-12","2014-05-12")
type <- c("ST","ST","MC","MC","LC","LC","YA","YA","YA","YA","MC","LC","LC","MC")
level <-c("firsttime","new","new","active","active","active","firsttime","new","active","new","active","new","active","active")
type <- factor(type, levels=c("LC", "MC", "YA", "ST"))
level <- factor(level, levels=c("active", "new", "firsttime"))
data <- data.frame(ID,date,type,level)
df.right <- data %>%
group_by(ID, date) %>%
filter(type == sort(type)[1]) %>%
filter(level == sort(level)[1])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With