I'm really putting time into learning regex and I'm playing with different toy scenarios. One setup I can't get to work is to grab from the beginning of a string to n occurrence of a character where n > 1.
Here I can grab from the beginning of the string to the first underscore but I can't generalize this to the second or third underscore.
x <- c("a_b_c_d", "1_2_3_4", "<_?_._:")
gsub("_.*$", "", x)
Here's what I'm trying to achieve with regex. (`sub`/`gsub`):
## > sapply(lapply(strsplit(x, "_"), "[", 1:2), paste, collapse="_")
## [1] "a_b" "1_2" "<_?"
#or
## > sapply(lapply(strsplit(x, "_"), "[", 1:3), paste, collapse="_")
## [1] "a_b_c" "1_2_3" "<_?_."
Related post: regex from first character to the end of the string
This regex means characters “l”, “m”, “n”, “o”, “p” would match in a string. Subtraction of ranges also works in character classes. This regex means vowels are subtracted from the range “a-z”. Regex patterns discussed so far require that each position in the input string match a specific character class.
In general, regex consists of normal characters, character classes, wildcard characters, and quantifiers. We will talk specifically about character classes here. At times there’s a need to match any sequence that contains one or more characters, in any order, that is part of a set of characters.
Notice that you can match also non-printable characters like tabs , new-lines , carriage returns . We are learning how to construct a regex but forgetting a fundamental concept: flags. A regex usually comes within this form / abc /, where the search pattern is delimited by two slash characters /.
However, this expression will include the letter “a” in the match. To extract everything after the letter a, we need to introduce a capture group using parentheses: The contents of the parentheses is now capture group 1, and can thus be extracted from the regex return array.
Here's a start. To make this safe for general use, you'll need it to properly escape regular expressions' special characters:
x <- c("a_b_c_d", "1_2_3_4", "<_?_._:", "", "abcd", "____abcd")
matchToNth <- function(char, n) {
others <- paste0("[^", char, "]*") ## matches "[^_]*" if char is "_"
mainPat <- paste0(c(rep(c(others, char), n-1), others), collapse="")
paste0("(^", mainPat, ")", "(.*$)")
}
gsub(matchToNth("_", 2), "\\1", x)
# [1] "a_b" "1_2" "<_?" "" "abcd" "_"
gsub(matchToNth("_", 3), "\\1", x)
# [1] "a_b_c" "1_2_3" "<_?_." "" "abcd" "__"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With