Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R grep for each element in vector

Tags:

r

I have two data frames:

> a
    box        hits
1 px085 agx|amx|app
2 px075 gxz|gpx|amr
3 px065 abc|apr|ppy
4 rx055 alo|amx|bbc
5 rx088 ppy|pxg|ptr
6 rx099 prt|ppm|zee

> b
  hitcode appid
1     agx 12485
2     abc 18550
3     bbc 19225
4     ppy 15260
5     zee 16880

I'm trying to get output:

    box        hits appcode
1 px085 agx|amx|app   12485
2 px075 gxz|gpx|amr       
3 px065 abc|apr|ppy   18550
4 rx055 alo|amx|bbc   19225
5 rx088 ppy|pxg|ptr   15260
6 rx099 prt|ppm|zee   16880

I tried:

gcode <- function(x){
  b[grep(x, b$hitcode, ignore.case = TRUE, perl = TRUE), c('appid')]
}

Which is giving me:

> gcode(a$hits)
#[1] 12485
#Warning message:
#In grep(x, b$hitcode, ignore.case = TRUE, perl = TRUE) :
#  argument 'pattern' has length > 1 and only the first element will be used

What am I missing here?

like image 211
nsr Avatar asked Dec 04 '25 10:12

nsr


1 Answers

As per the comments, your example allows that multiple apps are matched to your hitcodes. Here's a solution using loops, in which the appid is not being overwritten if multiple matches exist.

I assume that your character variables are formatted as factors. Otherwise, the 1:nlevels(b$hitcode) becomes 1:length(b$hitcode).

a$appid <- as.character(NA)

for(i in 1:nlevels(b$hitcode)){
   cur <- b$hitcode[i]
   hit <- grep(cur, a$hits)
   app <- b$appid[i]

   na <- is.na(a$appid[hit])
   a$appid[ hit[na] ] <- app
   a$appid[ hit[!na] ] <- paste(a$appid[ hit[!na] ],app,sep=";")

}

This gives:

# > a
#     box        hits       appid
# 1 px085 agx|amx|app       12485
# 2 px075 gxz|gpx|amr        <NA>
# 3 px065 abc|apr|ppy 18550;15260
# 4 rx055 alo|amx|bbc       19225
# 5 rx088 ppy|pxg|ptr       15260
# 6 rx099 prt|ppm|zee       16880
like image 132
SimonG Avatar answered Dec 07 '25 02:12

SimonG



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!