Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How should I write a python regex for one/two digits numbers sequences?

Tags:

python

regex

I'm trying to create a python regex for "one or two digits numbers sequences separated by optional multiple spaces or an optional single comma."

For example:

"   1"  Should tests good
"    1  2     3 3  4 5 7 17" Should test good
" 1, 2,3,11,74" Should test good
"1,11,14, 15" Should test good

"111, 101" Should not test good
"1 2 3  a" Should not test good
"1, 25, 5.0 " Should not test good
"1,, 7, 80" Should not test good
"1,11,14," Should not test good

Comma signs should only appear between numbers (or white spaces). That's why last example shouldn't test good.

I tried with this:

^\s*\d{1,2}(\s*\,?\d{1,2}\s*\,?)*\s*$

But got not good results, for example "11111" would test good. How should I write my regex?

like image 804
diegoaguilar Avatar asked Dec 25 '22 21:12

diegoaguilar


2 Answers

Using the regex module of python, you can have this (rather convoluted!) regex:

(?:^\s*|\G)\s*(?:,\s*)?\K(\b\d{1,2}\b)(?=(?:\s*(?:,\s*)?\b\d{1,2}\b)*$)

regex101 demo

(?:^\s*|\G)                    # Matches beginning of line and any spaces, or at the end of the previous match
\s*(?:,\s*)?                   # Spaces and optional comma
\K                             # Resets the match
(\b\d{1,2}\b)                  # Match and capture 1-2 digits
(?=                            # Makes sure there is (ahead) ...
  (?:
     \s*(?:,\s*)?\b\d{1,2}\b   # A sequence of spaces (with optional comma) and 1-2 digits...
  )*                           # ... any number of times until...
$)                             # ... the end of the line

This one should be faster:

(?:^(?=(?:\s*(?:,\s*)?\b\d{1,2}\b)*$)|\G)\s*(?:,\s*)?\K(\b\d{1,2}\b)
like image 44
Jerry Avatar answered Dec 27 '22 11:12

Jerry


This regex should work ^(\s*\d{1,2}\s*$)|^((\s*\d{1,2}\s*[\,\s]\s*\d{1,2}\s*))+([\,\s]\s*\d{1,2}\b\s*)*$. Note that to capture between 1 and two times you use {1,2}, where the number before the comma is the lower bounds, while the number after the comma is the upper bounds. The way it works is we either capture ^(\s*\d{1,2}\s*$) or ^((\s*\d{1,2}\s*[\,\s]\s*\d{1,2}\s*))+([\,\s]\s*\d{1,2}\b\s*)*$. For the first option, we first look for beginning of String ^. Next, we look for an optional infinite amount of space \s* followed by a number of one or two digits (\d{1,2}), followed by an optional infinite amount of space, then the end of String $. For the second option, we allow optional infinite space \s* followed by one or two digit number \d{1,2}, followed by optional infinite amount of space \s*. Next we allow either a comma or a space [\,\s]. Then we allow optional infinite spaces again \s*, followed by one or two digits \d{1,2}, followed by optional infinite space \s*. This must occur at least once + to be considered a match (just whitespace alone or anything starting with a comma will not match). It can be followed by a comma or space [\,\s], followed by an infinite amount of space \s*, followed by a one or two digit number \d{1,2}. This is followed by a boundary \b and an optional infinite amount of space s*. This group can occur an optional infinite amount of times, hence * and is followed by $, the end of String.

like image 56
Moishe Lipsker Avatar answered Dec 27 '22 11:12

Moishe Lipsker