Take the following code to select only alphanumeric strings from a list of strings:
isValid = function(string){
return(grep("^[A-z0-9]+$", string))
}
strings = c("aaa", "[email protected]", "", "valid")
print(Filter(isValid, strings))
The output is [1] "aaa" "[email protected]"
.
Why is "valid"
not outputted, and why is "[email protected]"
outputted?
The Filter
function accepts a logical vector, you supplied a numeric. Use grepl
:
isValid = function(string){
return(grepl("^[A-z0-9]+$", string))
}
strings = c("aaa", "[email protected]", "", "valid")
print(Filter(isValid, strings))
[1] "aaa" "valid"
Why didn't grep
work? It is due to R's coercion of numeric values to logical and the weirdness of Filter
.
Here's what happened, grep("^[A-z0-9]+$", string)
correctly returns 1 4
. That is the index of matches on the first and fourth elements.
But that is not how Filter
works. It runs the condition on each element with as.logical(unlist(lapply(x, f)))
.
So it ran isValid(strings[1])
then isValid(strings[2])
and so on. It created this:
[[1]]
[1] 1
[[2]]
integer(0)
[[3]]
integer(0)
[[4]]
[1] 1
It then called unlist
on that list to get 1 1
and turned that into a logical vector TRUE TRUE
. So in the end you got:
strings[which(c(TRUE, TRUE))]
which turned into
strings[c(1,2)]
[1] "aaa" "[email protected]"
Moral of the story, don't use Filter
:)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With