Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex force length of specific regex [closed]

Tags:

regex

r

I'm using R and need a regex for

a block of N characters starting with zero or more whitespaces and continuing with one or more digits afterwards

For N = 9 here are

examples of valid strings

  • 123456789
  • kfasdf 3456789asdf
  • a 1

and examples of invalid strings

  • 12345 789
  • 1 9
  • a 678a
like image 944
s1624210 Avatar asked Mar 20 '20 15:03

s1624210


People also ask

How do you restrict length in regex?

The ‹ ^ › and ‹ $ › anchors ensure that the regex matches the entire subject string; otherwise, it could match 10 characters within longer text. The ‹ [A-Z] › character class matches any single uppercase character from A to Z, and the interval quantifier ‹ {1,10} › repeats the character class from 1 to 10 times.

How do I match a specific character in regex?

There is a method for matching specific characters using regular expressions, by defining them inside square brackets. For example, the pattern [abc] will only match a single a, b, or c letter and nothing else.

What does regex (? S match?

i) makes the regex case insensitive. (? s) for "single line mode" makes the dot match all characters, including line breaks.


2 Answers

Another option is to match 8 times either a digit OR a space not preceded by a digit and then match a digit at the end.

(?<![\d\h])(?>\d|(?<!\d)\h){8}\d

In parts

  • (?<![\d\h]) Negative lookbehind, assert what is on the left is not a horizontal whitespace char or digit
  • (?> Atomic group (no backtracking)
    • \d Match a digit
    • | Or
    • \h(?<!\d\h) Match a horizontal whitespace char asserting that it is not preceded by a digit
  • ){8} Close the group and repeat 8 times
  • \d Match the last digit

Regex demo | R demo

Example code, using perl=TRUE

x <- "123456789
kfasdf  3456789asdf
a        1

12345 789
1       9
a     678a"
    regmatches(x, gregexpr("(?<![\\d\\h])(?>\\d|(?<!\\d)\\h){8}\\d", x, perl=TRUE))

Output

[[1]]
[1] "123456789" "  3456789" "        1"

If there can not be a digit present after matching the last 9th digit, you could end the pattern with a negative lookahead asserting not a digit.

(?<![\d\h])(?>\d|(?<!\d)\h){8}\d(?!\d)

Regex demo

If there can not be any digits on any side:

 (?<!\d)(?>\d|(?<!\d)\h){8}\d(?!\d)

Regex demo

like image 123
The fourth bird Avatar answered Oct 18 '22 20:10

The fourth bird


Using string s from @d.b's answer.

Extract optional whitespace followed by numbers.

library(stringr)
str_extract(s, '(\\s+)?\\d+')
#[1] "123456789" "  3456789" "        1" "12345"     "1"         "     678" 

Check their length using nchar.

nchar(str_extract(s, '(\\s+)?\\d+')) == 9
#[1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

Using the same logic in base R function.

nchar(regmatches(s, regexpr('(\\s+)?\\d+', s))) == 9
#[1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

If there could be multiple such instances we can use str_extract_all :

sapply(str_extract_all(s, '(\\s+)?\\d+'), function(x) any(nchar(x) == 9))
like image 38
Ronak Shah Avatar answered Oct 18 '22 21:10

Ronak Shah