I'm just learning regex and now I'm trying to match a number which more or less represents this:
[zero or more numbers][possibly a dot or comma][zero or more numbers]
No dot or comma is also okay. So it should match the following:
1
123
123.
123.4
123.456
.456
123, # From here it's the same but with commas instead of dot separators
123,4
123,456
,456
But it should not match the following:
0.,1
0a,1
0..1
1.1.2
100,000.99 # I know this and the one below are valid in many languages, but I simply want to reject these
100.000,99
So far I've come up with [0-9]*[.,][0-9]*
, but it doesn't seem to work so well:
>>> import re
>>> r = re.compile("[0-9]*[.,][0-9]*")
>>> if r.match('0.1.'): print 'it matches!'
...
it matches!
>>> if r.match('0.abc'): print 'it matches!'
...
it matches!
I have the feeling I'm doing two things wrong: I don't use match correctly AND my regex is not correct. Could anybody enlighten me on what I'm doing wrong? All tips are welcome!
The 0-9 indicates characters 0 through 9, the comma , indicates comma, and the semicolon indicates a ; . The closing ] indicates the end of the character set. The plus + indicates that one or more of the "previous item" must be present.
in regex is a metacharacter, it is used to match any character. To match a literal dot in a raw Python string ( r"" or r'' ), you need to escape it, so r"\." Unless the regular expression is stored inside a regular python string, in which case you need to use a double \ ( \\ ) instead.
Using int() method To remove the decimal from a number, we can use the int() method in Python. The int() method takes the number as an argument and returns the integer by removing the decimal part from it. It can be also used with negative numbers.
The regex [0-9] matches single-digit numbers 0 to 9. [1-9][0-9] matches double-digit numbers 10 to 99. That's the easy part. Matching the three-digit numbers is a little more complicated, since we need to exclude numbers 256 through 999.
You need to make [.,]
part as optional by adding ?
after that character class and also don't forget to add anchors. ^
asserts that we are at the start and $
asserts that we are at the end.
^\d*[.,]?\d*$
DEMO
>>> import re
>>> r = re.compile(r"^\d*[.,]?\d*$")
>>> if r.match('0.1.'): print 'it matches!'
...
>>> if r.match('0.abc'): print 'it matches!'
...
>>> if r.match('0.'): print 'it matches!'
...
it matches!
If you don't want to allow a single comma or dot then use a lookahead.
^(?=.*?\d)\d*[.,]?\d*$
DEMO
Your regex would work fine if you just add the ^ at the front and the $ at the back so that system knows how your string would begin and end.
Try this
^[0-9]*[.,]{0,1}[0-9]*$
import re
checklist = ['1', '123', '123.', '123.4', '123.456', '.456', '123,', '123,4', '123,456', ',456', '0.,1', '0a,1', '0..1', '1.1.2', '100,000.99', '100.000,99', '0.1.', '0.abc']
pat = re.compile(r'^[0-9]*[.,]{0,1}[0-9]*$')
for c in checklist:
if pat.match(c):
print '%s : it matches' % (c)
else:
print '%s : it does not match' % (c)
1 : it matches
123 : it matches
123. : it matches
123.4 : it matches
123.456 : it matches
.456 : it matches
123, : it matches
123,4 : it matches
123,456 : it matches
,456 : it matches
0.,1 : it does not match
0a,1 : it does not match
0..1 : it does not match
1.1.2 : it does not match
100,000.99 : it does not match
100.000,99 : it does not match
0.1. : it does not match
0.abc : it does not match
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With