Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter/grep functions behaving oddly

Tags:

regex

r

Take the following code to select only alphanumeric strings from a list of strings:

isValid = function(string){
  return(grep("^[A-z0-9]+$", string))
}

strings = c("aaa", "[email protected]", "", "valid")

print(Filter(isValid, strings))

The output is [1] "aaa" "[email protected]".

Why is "valid" not outputted, and why is "[email protected]" outputted?

like image 946
clb Avatar asked Mar 11 '23 09:03

clb


1 Answers

The Filter function accepts a logical vector, you supplied a numeric. Use grepl:

isValid = function(string){
  return(grepl("^[A-z0-9]+$", string))
}

strings = c("aaa", "[email protected]", "", "valid")

print(Filter(isValid, strings))
[1] "aaa"   "valid"

Why didn't grep work? It is due to R's coercion of numeric values to logical and the weirdness of Filter.

Here's what happened, grep("^[A-z0-9]+$", string) correctly returns 1 4. That is the index of matches on the first and fourth elements.

But that is not how Filter works. It runs the condition on each element with as.logical(unlist(lapply(x, f))).

So it ran isValid(strings[1]) then isValid(strings[2]) and so on. It created this:

[[1]]
[1] 1

[[2]]
integer(0)

[[3]]
integer(0)

[[4]]
[1] 1

It then called unlist on that list to get 1 1 and turned that into a logical vector TRUE TRUE. So in the end you got:

strings[which(c(TRUE, TRUE))]

which turned into

strings[c(1,2)]
[1] "aaa"           "[email protected]"

Moral of the story, don't use Filter :)

like image 90
Pierre L Avatar answered Mar 20 '23 01:03

Pierre L