I have a vector composed of entries such as "ZZZ1Z01Z0ZZ0", "1001ZZ0Z00Z0", and so on, and I want to subset this vector based on conditions such as:
I tried playing around with strsplit and grep, but I couldn't figure out a way to restrict my conditions based on the position of the character on the string. Any suggestions?
Many thanks!
You can do this with regular expressions (see ?regexp
for details on regular expressions).
grep
returns the location of the match and returns a zero-length vector if no match is found. You may want to use grepl
instead, since it returns a logical vector you can use to subset.
z <- c("ZZZ1Z01Z0ZZ0", "1001ZZ0Z00Z0")
# 3rd character is Z ("^" is start of string, "." is any character)
grep("^..Z", z)
# 3rd and 7th characters are Z
grep("^..Z...Z", z)
# 3rd and 7th characters are Z, no other characters are Z
# "[]" defines a "character class" and "^" in a character class negates the match
# "{n}" repeats the preceding match n times, "+" repeats is one or more times
grep("^[^Z]{2}Z[^Z]{3}Z[^Z]+", z)
Expanding Josh's answer, you want
your_dataset <- data.frame(
z = c("ZZZ1Z01Z0ZZ0", "1001ZZ0Z00Z0")
)
regexes <- c("^..Z", "^..Z...Z", "^[^Z]{2}Z[^Z]{3}Z[^Z]+")
lapply(regexes, function(rx)
{
subset(your_dataset, grepl(rx, z))
})
Also consider replacing grepl(rx, z)
with str_detect(z, rx)
, using the stringr
package. (There's no real difference except for slightly more readable code.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With