Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression for whole numbers and integers?

Tags:

python

regex

I am trying to detect all integers and whole numbers (among a lot of other things) from a string. Here are the regular expressions I am currently using:

Whole numbers: r"[0-9]+"

Integers: r"[+,-]?[0-9]+"

Here are the issues:

  1. The whole numbers regex is detecting negative numbers as well, which I cannot have. How do I solve this? If I use a space before at start of the regex I get only positive numbers, but then I get a space at the start of my output!
  2. For whole numbers, I would like to detect positive numbers with the format +[0-9] but store them without the sign.
  3. For integers, I would like to store any positive integer detected with the sign, irrespective if it is present in the original string.

Almost done now: One last thing, I have a string that says "Add 10 and -15". I want to store the integers in a list. I do so using the findall(). While storing the numbers is it possible to store '10' as '+10'

like image 641
Sahil Thapar Avatar asked May 27 '13 13:05

Sahil Thapar


2 Answers

For positive integers, use

r"(?<![-.])\b[0-9]+\b(?!\.[0-9])"

Explanation:

(?<![-.])   # Assert that the previous character isn't a minus sign or a dot.
\b          # Anchor the match to the start of a number.
[0-9]+      # Match a number.
\b          # Anchor the match to the end of the number.
(?!\.[0-9]) # Assert that no decimal part follows.

For signed/unsigned integers, use

r"[+-]?(?<!\.)\b[0-9]+\b(?!\.[0-9])"

The word boundaries \b are crucial to make sure that the entire number is matched.

like image 107
Tim Pietzcker Avatar answered Oct 10 '22 10:10

Tim Pietzcker


You almost had it.

import re

regex = re.compile(r'(\d+)|([\+-]?\d+)')

s = "1 2 3 4 5 6 +1 +2 +3 -1 -2 -3 +654 -789 321"
for r in regex.findall(s):
    if r[0]:
        # whole (unsigned)
        print 'whole', r[0]
    elif r[1]:
        # a signed integer
        print 'signed', r[1]

Results:

>>> 
whole 1
whole 2
whole 3
whole 4
whole 5
whole 6
signed +1
signed +2
signed +3
signed -1
signed -2
signed -3
signed +654
signed -789
whole 321

Or, you could use "or" to get the actual result in a "nicer" way:

print [r[0] or r[1] for r in regex.findall(s)]
>>> 
['1', '2', '3', '4', '5', '6', '+1', '+2', '+3', '-1', '-2', '-3', '+654', '-789', '321']

Edit: As per your question " is it possible to store '10' as '+10' " :

import re

def _sign(num):
    if r[0]:
        return '+%s'%r[0]
    else:
        return r[1]

regex = re.compile(r'(\d+)|([\+-]?\d+)')
s = "1 2 3 4 5 6 +1 +2 +3 -1 -2 -3 +654 -789 321"      
print [_sign(r) for r in regex.findall(s)]
>>>
['+1', '+2', '+3', '+4', '+5', '+6', '+1', '+2', '+3', '-1', '-2', '-3', '+654', '-789', '+321']

Or in 1-line:

print ['+%s'%r[0] if r[0] else r[1] for r in regex.findall(s)]
>>> 
['+1', '+2', '+3', '+4', '+5', '+6', '+1', '+2', '+3', '-1', '-2', '-3', '+654', '-789', '+321']
like image 24
Inbar Rose Avatar answered Oct 10 '22 10:10

Inbar Rose