Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combining/merging a column in R

Tags:

r

I am probably missing something quite easy with this problem. I can't find the proper answer anywhere - and I really need to move on. So I have oversimplified my data:

eventID <- c(2,4)
Time <- c("09:32","09:43")
df1 <- data.frame(eventID,Time)
eventID <- rep(c(1:4),rep(3,4))
Time <- rep(c("09:30",NA,"09:40",NA),rep(3,4))
df2 <- data.frame(eventID,Time)

What I would like is to combine the Time column. So the NAs in df2 should be filled up with Time values from df1 matching the eventID. My original data is quite large so a for-loop is not what I am looking for really. I was hoping this would work:

> (res1 <- merge(df1,df2, by = "eventID", all = T))
#   eventID Time.x Time.y
#1        1   <NA>  09:30
#2        1   <NA>  09:30
#3        1   <NA>  09:30
#4        2  09:32   <NA>
#5        2  09:32   <NA>
#6        2  09:32   <NA>
#7        3   <NA>  09:40
#8        3   <NA>  09:40
#9        3   <NA>  09:40
#10       4  09:43   <NA>
#11       4  09:43   <NA>
#12       4  09:43   <NA>

Desired output:

> eventID <- rep(c(1:4),rep(3,4))
> Time <- rep(c("09:30","09:32","09:40","09:43"), rep(3,4))
> (res2 <- data.frame(eventID,Time))
#   eventID  Time
#1        1 09:30
#2        1 09:30
#3        1 09:30
#4        2 09:32
#5        2 09:32
#6        2 09:32
#7        3 09:40
#8        3 09:40
#9        3 09:40
#10       4 09:43
#11       4 09:43
#12       4 09:43

Suggestions? If possible in base I would prefer that, or data.table package.

like image 238
tstev Avatar asked Nov 23 '25 00:11

tstev


1 Answers

Try

library(data.table)
setkey(setDT(df2), eventID)[df1, Time:= i.Time][]
#   eventID  Time
#1:       1 09:30
#2:       1 09:30
#3:       1 09:30
#4:       2 09:32
#5:       2 09:32
#6:       2 09:32
#7:       3 09:40
#8:       3 09:40
#9:       3 09:40
#10:      4 09:43
#11:      4 09:43
#12:      4 09:43

Or a base R option after the merge would be

 data.frame(eventID= res1[,1], 
     Time=res1[-1][cbind(1:nrow(res1),max.col(!is.na(res1[-1])))])

Update

Suppose there are some 'eventID' in the first dataset are not in the second ('df2'), then we can specify nomatch=0

 df3 <- data.frame(eventID= c(2,4,5), Time=c('09:32', '09:43', '09:45'))

 setkey(setDT(df2), eventID)[df3, Time:= i.Time,nomatch=0][]
  #    eventID  Time
  # 1:       1 09:30
  # 2:       1 09:30
  # 3:       1 09:30
  # 4:       2 09:32
  # 5:       2 09:32
  # 6:       2 09:32
  # 7:       3 09:40
  # 8:       3 09:40
  # 9:       3 09:40
  #10:       4 09:43
  #11:       4 09:43
  #12:       4 09:43
like image 109
akrun Avatar answered Nov 24 '25 14:11

akrun



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!