I have two data frames:
> a
box hits
1 px085 agx|amx|app
2 px075 gxz|gpx|amr
3 px065 abc|apr|ppy
4 rx055 alo|amx|bbc
5 rx088 ppy|pxg|ptr
6 rx099 prt|ppm|zee
> b
hitcode appid
1 agx 12485
2 abc 18550
3 bbc 19225
4 ppy 15260
5 zee 16880
I'm trying to get output:
box hits appcode
1 px085 agx|amx|app 12485
2 px075 gxz|gpx|amr
3 px065 abc|apr|ppy 18550
4 rx055 alo|amx|bbc 19225
5 rx088 ppy|pxg|ptr 15260
6 rx099 prt|ppm|zee 16880
I tried:
gcode <- function(x){
b[grep(x, b$hitcode, ignore.case = TRUE, perl = TRUE), c('appid')]
}
Which is giving me:
> gcode(a$hits)
#[1] 12485
#Warning message:
#In grep(x, b$hitcode, ignore.case = TRUE, perl = TRUE) :
# argument 'pattern' has length > 1 and only the first element will be used
What am I missing here?
As per the comments, your example allows that multiple apps are matched to your hitcodes. Here's a solution using loops, in which the appid is not being overwritten if multiple matches exist.
I assume that your character variables are formatted as factors. Otherwise, the 1:nlevels(b$hitcode) becomes 1:length(b$hitcode).
a$appid <- as.character(NA)
for(i in 1:nlevels(b$hitcode)){
cur <- b$hitcode[i]
hit <- grep(cur, a$hits)
app <- b$appid[i]
na <- is.na(a$appid[hit])
a$appid[ hit[na] ] <- app
a$appid[ hit[!na] ] <- paste(a$appid[ hit[!na] ],app,sep=";")
}
This gives:
# > a
# box hits appid
# 1 px085 agx|amx|app 12485
# 2 px075 gxz|gpx|amr <NA>
# 3 px065 abc|apr|ppy 18550;15260
# 4 rx055 alo|amx|bbc 19225
# 5 rx088 ppy|pxg|ptr 15260
# 6 rx099 prt|ppm|zee 16880
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With