Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

gsub return an empty string when no match is found

Tags:

regex

r

gsub

I'm using the gsub function in R to return occurrences of my pattern (reference numbers) on a list of text. This works great unless no match is found, in which case I get the entire string back, instead of an empty string. Consider the example:

data <- list("a sentence with citation (Ref. 12)",
             "another sentence without reference")

sapply(data, function(x) gsub(".*(Ref. (\\d+)).*", "\\1", x))

Returns:

[1] "Ref. 12"                            "another sentence without reference"

But I'd like to get

[1] "Ref. 12"                            ""

Thanks!

like image 219
cboettig Avatar asked Apr 18 '12 17:04

cboettig


2 Answers

I'd probably go a different route, since the sapply doesn't seem necessary to me as these functions are vectorized already:

fun <- function(x){
    ind <- grep(".*(Ref. (\\d+)).*",x,value = FALSE)
    x <- gsub(".*(Ref. (\\d+)).*", "\\1", x)
    x[-ind] <- ""
    x
}

fun(data)
like image 129
joran Avatar answered Sep 22 '22 15:09

joran


according to the documentation, this is a feature of gsub it returns the input string if there are no matches to the supplied pattern matches returns the entire string.

here, I use the function grepl first to return a logical vector of the presence/absence of the pattern in the given string:

ifelse(grepl(".*(Ref. (\\d+)).*", data), 
      gsub(".*(Ref. (\\d+)).*", "\\1", data), 
      "")

embedding this in a function:

mygsub <- function(x){
     ans <- ifelse(grepl(".*(Ref. (\\d+)).*", x), 
              gsub(".*(Ref. (\\d+)).*", "\\1", x), 
              "")
     return(ans)
}

mygsub(data)
like image 36
David LeBauer Avatar answered Sep 24 '22 15:09

David LeBauer