I'm using R and need a regex for
a block of N characters starting with zero or more whitespaces and continuing with one or more digits afterwards
For N = 9 here are
examples of valid strings
123456789
kfasdf 3456789asdf
a 1
and examples of invalid strings
12345 789
1 9
a 678a
The ‹ ^ › and ‹ $ › anchors ensure that the regex matches the entire subject string; otherwise, it could match 10 characters within longer text. The ‹ [A-Z] › character class matches any single uppercase character from A to Z, and the interval quantifier ‹ {1,10} › repeats the character class from 1 to 10 times.
There is a method for matching specific characters using regular expressions, by defining them inside square brackets. For example, the pattern [abc] will only match a single a, b, or c letter and nothing else.
i) makes the regex case insensitive. (? s) for "single line mode" makes the dot match all characters, including line breaks.
Another option is to match 8 times either a digit OR a space not preceded by a digit and then match a digit at the end.
(?<![\d\h])(?>\d|(?<!\d)\h){8}\d
In parts
(?<![\d\h])
Negative lookbehind, assert what is on the left is not a horizontal whitespace char or digit(?>
Atomic group (no backtracking)
\d
Match a digit|
Or\h(?<!\d\h)
Match a horizontal whitespace char asserting that it is not preceded by a digit){8}
Close the group and repeat 8 times\d
Match the last digitRegex demo | R demo
Example code, using perl=TRUE
x <- "123456789
kfasdf 3456789asdf
a 1
12345 789
1 9
a 678a"
regmatches(x, gregexpr("(?<![\\d\\h])(?>\\d|(?<!\\d)\\h){8}\\d", x, perl=TRUE))
Output
[[1]]
[1] "123456789" " 3456789" " 1"
If there can not be a digit present after matching the last 9th digit, you could end the pattern with a negative lookahead asserting not a digit.
(?<![\d\h])(?>\d|(?<!\d)\h){8}\d(?!\d)
Regex demo
If there can not be any digits on any side:
(?<!\d)(?>\d|(?<!\d)\h){8}\d(?!\d)
Regex demo
Using string s
from @d.b's answer.
Extract optional whitespace followed by numbers.
library(stringr)
str_extract(s, '(\\s+)?\\d+')
#[1] "123456789" " 3456789" " 1" "12345" "1" " 678"
Check their length using nchar
.
nchar(str_extract(s, '(\\s+)?\\d+')) == 9
#[1] TRUE TRUE TRUE FALSE FALSE FALSE
Using the same logic in base R function.
nchar(regmatches(s, regexpr('(\\s+)?\\d+', s))) == 9
#[1] TRUE TRUE TRUE FALSE FALSE FALSE
If there could be multiple such instances we can use str_extract_all
:
sapply(str_extract_all(s, '(\\s+)?\\d+'), function(x) any(nchar(x) == 9))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With