I am trying to remove all digits in a string except the first set of digits. So in other words, all repeating sets of digits, there could be 1 sets or 10+ sets in the string but I only want to keep the first set along with the rest of the string.
For example, the following string:
x <- 'foo123bar123baz123456abc1111def123456789'
The result would be:
foo123barbazabcdef
I am have tried using gsub
and replacing \d+
with an empty string but this replaces all digits in the string, I have also tried using groups to capture some of the results but had no luck.
You could do this through PCRE verb (*SKIP)(*F)
.
^\D*\d+(*SKIP)(*F)|\d+
^\D*\d+
matches all the characters from the start upto the first number. (*SKIP)(*F)
causes the match to fail and then the regex engine tries to match the characters using the pattern which was at the right side of |
that is \d+
against the remaining string. Because (*SKIP)(*F)
is a PCRE verb, you must need to enable perl=TRUE
parameter.
DEMO
Code:
> x <- 'foo123bar123baz123456abc1111def123456789'
> gsub("^\\D*\\d+(*SKIP)(*F)|\\d+", "", x, perl=TRUE)
[1] "foo123barbazabcdef"
Using gsub you can use the \G
feature, an anchor that can match at one of two positions.
x <- 'foo123bar123baz123456abc1111def123456789'
gsub('(?:\\d+|\\G(?<!^)\\D*)\\K\\d*', '', x, perl=T)
# [1] "foo123barbazabcdef"
Explanation:
(?: # group, but do not capture:
\d+ # digits (0-9) (1 or more times)
| # OR
\G(?<!^) # contiguous to a precedent match, not at the start of the string
\D* # non-digits (all but 0-9) (0 or more times)
)\K # end of grouping and reset the match from the result
\d* # digits (0-9) (0 or more times)
Alternatively, you can use an optional group:
gsub('(?:^\\D*\\d+)?\\K\\d*', '', x, perl=T)
Another way that I find useful and does not require (*SKIP)(*F)
backtracking verbs or the \G
and \K
feature is to use the alternation operator in context placing what you want to match in a capturing group on the left side and place what you want to exclude on the right side, (saying throw this away, it's garbage...)
gsub('^(\\D*\\d+)|\\d+', '\\1', x)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With