How to subset vector based on string character?

Question

I have a vector composed of entries such as "ZZZ1Z01Z0ZZ0", "1001ZZ0Z00Z0", and so on, and I want to subset this vector based on conditions such as:

The third character is a Z
The third AND seventh characters are Z
The third AND seventh characters are Z, AND none of the other characters are Z

I tried playing around with strsplit and grep, but I couldn't figure out a way to restrict my conditions based on the position of the character on the string. Any suggestions?

Many thanks!

Joshua Ulrich · Accepted Answer

You can do this with regular expressions (see ?regexp for details on regular expressions).

grep returns the location of the match and returns a zero-length vector if no match is found. You may want to use grepl instead, since it returns a logical vector you can use to subset.

z <- c("ZZZ1Z01Z0ZZ0", "1001ZZ0Z00Z0")
# 3rd character is Z ("^" is start of string, "." is any character)
grep("^..Z", z)
# 3rd and 7th characters are Z
grep("^..Z...Z", z)
# 3rd and 7th characters are Z, no other characters are Z
# "[]" defines a "character class" and "^" in a character class negates the match
# "{n}" repeats the preceding match n times, "+" repeats is one or more times
grep("^[^Z]{2}Z[^Z]{3}Z[^Z]+", z)

Richie Cotton · Answer

Expanding Josh's answer, you want

your_dataset <- data.frame(
  z = c("ZZZ1Z01Z0ZZ0", "1001ZZ0Z00Z0")
)
regexes <- c("^..Z", "^..Z...Z", "^[^Z]{2}Z[^Z]{3}Z[^Z]+")

lapply(regexes, function(rx)
{
  subset(your_dataset, grepl(rx, z))
})

Also consider replacing grepl(rx, z) with str_detect(z, rx), using the stringr package. (There's no real difference except for slightly more readable code.)

How to subset vector based on string character?

Tags:

string

r

Rafael Maia

2 Answers

Joshua Ulrich

Richie Cotton

Recent Activity

Donate For Us

How to subset vector based on string character?

Tags:

string

r

Rafael Maia

2 Answers

Joshua Ulrich

Richie Cotton

Related questions

Recent Activity

Donate For Us