Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can somebody explain a money regex that just checks if the value matches some pattern?

There are multiple posts on here that capture value, but I'm just looking to check to see if the value is something. More vaguely put; I'm looking to understand the difference between checking a value, and "capturing" a value. In the current case the value would be the following acceptable money formats:

Here is a post that explains some about a money regex but I don't understand it a bit.

.50
50
50.00
50.0
$5000.00
$.50

I don't want commas (people should know that's ridiculous).

The thing I'm having trouble with are:

  1. Allowing for a $ at the starting of the value (but still optional)
  2. Allowing for only 1 decimal point (but not allowing it at the end)
  3. Understanding how it's working inside
  4. Also understanding out to get a normalized version (only digits and a the optional decimal point) out of it that strips the dollar sign.

My current regex (which obviously doesn't work right) is:

# I'm checking the Boolean of the following:
re.compile(r'^[\$][\d\.]$').search(value)

(Note: I'm working in Python)

like image 778
orokusaki Avatar asked Jan 27 '10 21:01

orokusaki


1 Answers

Assuming you want to allow $5. but not 5., the following will accept your language:

money = re.compile('|'.join([
  r'^\$?(\d*\.\d{1,2})$',  # e.g., $.50, .50, $1.50, $.5, .5
  r'^\$?(\d+)$',           # e.g., $500, $5, 500, 5
  r'^\$(\d+\.?)$',         # e.g., $5.
]))

Important pieces to understand:

  • ^ and $ match only at the beginning and end of the input string, respectively.
  • \. matches a literal dot
  • \$ matches a literal dollar sign
    • \$? matches a dollar sign or nothing (i.e., an optional dollar sign)
  • \d matches any single digit (0-9)
    • \d* matches runs of zero or more digits
    • \d+ matches runs of one or more digits
    • \d{1,2} matches any single digit or a run of two digits

The parenthesized subpatterns are capture groups: all text in the input matched by the subexpression in a capture group will be available in matchobj.group(index). The dollar sign won't be captured because it's outside the parentheses.

Because Python doesn't support multiple capture groups with the same name (!!!) we must search through matchobj.groups() for the one that isn't None. This also means you have to be careful when modifying the pattern to use (?:...) for every group except the amount.

Tweaking Mark's nice test harness, we get

for test, expected in tests:
    result = money.match(test) 
    is_match = result is not None
    if is_match == expected:
      status = 'OK'
      if result:
        amt = [x for x in result.groups() if x is not None].pop()
        status += ' (%s)' % amt
    else:
      status = 'Fail'
    print test + '\t' + status

Output:

.50     OK (.50)
50      OK (50)
50.00   OK (50.00)
50.0    OK (50.0)
$5000   OK (5000)
$.50    OK (.50)
$5.     OK (5.)
5.      OK
$5.000  OK
5000$   OK
$5.00$  OK
$-5.00  OK
$5,00   OK
        OK
$       OK
.       OK
.5      OK (.5)
like image 72
Greg Bacon Avatar answered Sep 30 '22 02:09

Greg Bacon