I'm using R and need a regex for <blockquote> a block of N characters starting with zero or more whitespaces and continuing with one or more digits afterwards </blockquote> For N = 9 here are examples of valid strings <ul> <li><code>123456789</code></li> <li><code>kfasdf 3456789asdf</code></li> <li><code>a 1</code></li> </ul> and examples of invalid strings <ul> <li><code>12345 789</code></li> <li><code>1 9</code></li> <li><code>a 678a</code></li> </ul>

Another option is to match 8 times either a digit OR a space not preceded by a digit and then match a digit at the end. <pre class="prettyprint"><code>(?<![\d\h])(?>\d|(?<!\d)\h){8}\d </code></pre> In parts <ul> <li> <code>(?<![\d\h])</code> Negative lookbehind, assert what is on the left is not a horizontal whitespace char or digit</li> <li> <code>(?></code> Atomic group (no backtracking) <ul> <li> <code>\d</code> Match a digit</li> <li> <code>|</code> Or</li> <li> <code>\h(?<!\d\h)</code> Match a horizontal whitespace char asserting that it is not preceded by a digit</li> </ul> </li> <li> <code>){8}</code> Close the group and repeat 8 times</li> <li> <code>\d</code> Match the last digit</li> </ul> Regex demo | R demo Example code, using perl=TRUE <pre class="prettyprint"><code>x <- "123456789 kfasdf 3456789asdf a 1 12345 789 1 9 a 678a" regmatches(x, gregexpr("(?<![\\d\\h])(?>\\d|(?<!\\d)\\h){8}\\d", x, perl=TRUE)) </code></pre> Output <pre class="prettyprint"><code>[[1]] [1] "123456789" " 3456789" " 1" </code></pre> If there can not be a digit present after matching the last 9th digit, you could end the pattern with a negative lookahead asserting not a digit. <pre class="prettyprint"><code>(?<![\d\h])(?>\d|(?<!\d)\h){8}\d(?!\d) </code></pre> Regex demo If there can not be any digits on any side: <pre class="prettyprint"><code> (?<!\d)(?>\d|(?<!\d)\h){8}\d(?!\d) </code></pre> Regex demo

Using string <code>s</code> from @d.b's answer. Extract optional whitespace followed by numbers. <pre class="prettyprint"><code>library(stringr) str_extract(s, '(\\s+)?\\d+') #[1] "123456789" " 3456789" " 1" "12345" "1" " 678" </code></pre> Check their length using <code>nchar</code>. <pre class="prettyprint"><code>nchar(str_extract(s, '(\\s+)?\\d+')) == 9 #[1] TRUE TRUE TRUE FALSE FALSE FALSE </code></pre> Using the same logic in base R function. <pre class="prettyprint"><code>nchar(regmatches(s, regexpr('(\\s+)?\\d+', s))) == 9 #[1] TRUE TRUE TRUE FALSE FALSE FALSE </code></pre> <hr> If there could be multiple such instances we can use <code>str_extract_all</code> : <pre class="prettyprint"><code>sapply(str_extract_all(s, '(\\s+)?\\d+'), function(x) any(nchar(x) == 9)) </code></pre>

Regex force length of specific regex [closed]

Q: How do you restrict length in regex?

The &lsaquo; ^ &rsaquo; and &lsaquo; $ &rsaquo; anchors ensure that the regex matches the entire subject string; otherwise, it could match 10 characters within longer text. The &lsaquo; [A-Z] &rsaquo; character class matches any single uppercase character from A to Z, and the interval quantifier &lsaquo; {1,10} &rsaquo; repeats the character class from 1 to 10 times.

2 Answers

Another option is to match 8 times either a digit OR a space not preceded by a digit and then match a digit at the end.

(?<![\d\h])(?>\d|(?<!\d)\h){8}\d

In parts

(?<![\d\h]) Negative lookbehind, assert what is on the left is not a horizontal whitespace char or digit
(?> Atomic group (no backtracking)
- \d Match a digit
- | Or
- \h(?<!\d\h) Match a horizontal whitespace char asserting that it is not preceded by a digit
){8} Close the group and repeat 8 times
\d Match the last digit

Regex demo | R demo

Example code, using perl=TRUE

x <- "123456789
kfasdf  3456789asdf
a        1

12345 789
1       9
a     678a"
    regmatches(x, gregexpr("(?<![\\d\\h])(?>\\d|(?<!\\d)\\h){8}\\d", x, perl=TRUE))

Output

[[1]]
[1] "123456789" "  3456789" "        1"

If there can not be a digit present after matching the last 9th digit, you could end the pattern with a negative lookahead asserting not a digit.

(?<![\d\h])(?>\d|(?<!\d)\h){8}\d(?!\d)

Regex demo

If there can not be any digits on any side:

 (?<!\d)(?>\d|(?<!\d)\h){8}\d(?!\d)

Regex demo

123

answered Oct 18 '22 20:10

The fourth bird

Using string s from @d.b's answer.

Extract optional whitespace followed by numbers.

library(stringr)
str_extract(s, '(\\s+)?\\d+')
#[1] "123456789" "  3456789" "        1" "12345"     "1"         "     678"

Check their length using nchar.

nchar(str_extract(s, '(\\s+)?\\d+')) == 9
#[1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

Using the same logic in base R function.

nchar(regmatches(s, regexpr('(\\s+)?\\d+', s))) == 9
#[1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

If there could be multiple such instances we can use str_extract_all :

sapply(str_extract_all(s, '(\\s+)?\\d+'), function(x) any(nchar(x) == 9))

answered Oct 18 '22 21:10

Ronak Shah

Related questions
                            
                                R: Copy/Clone full DB from SQL Server to local
                            
                                Overloading `$` for named vector in R
                            
                                How to count the factors in ordered sequence
                            
                                Join 2 nested lists
                            
                                Increase resolution of R rayshader image
                            
                                R shinydashboardplus flipbox - how to remove images
                            
                                How can you change single entries in an excel file with R and not the entire data sheet?
                            
                                Calling another cpp function in templated Rcpp function
                            
                                Running foreach without returning any value in R
                            
                                Removing an arbitrary gridline but maintaining respective tick mark in ggplot2
                            
                                Set a document-persistent ggplot2 color theme
                            
                                How to rename all column names in tibble by passing a character vector?
                            
                                Overlay histogram and histogram border in ggplot
                            
                                How to save a table as an image but also preserve its quality? R
                            
                                How to summarise a categorical variable with missing data?
                            
                                Count rows in data table with certain values by group
                            
                                R packages cem and MatchIt: Different imbalance measure
                            
                                Reproducing R's gaussian process maximum likelihood regression in Python
                            
                                Install R packages using conda via an environment.yml file
                            
                                Filter data frame columns based on list values

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Regex force length of specific regex [closed]

Tags:

regex

r

s1624210

People also ask

2 Answers

The fourth bird

Ronak Shah

Recent Activity

Donate For Us