Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using regular expression to comma separate a large number in south asian numbering system

I am trying to find a regular expression to comma separate a large number based on the south asian numbering system.

A few examples:

  • 1,000,000 (Arabic) is 10,00,000 (Indian/Hindu/South Asian)
  • 1,000,000,000 (Arabic) is 100,00,00,000 (Indian/H/SA).

The comma pattern repeats for every 7 digits. For example, 1,00,00,000,00,00,000.

From the book Mastering Regular Expressions by Friedl , I have the following regular expression for Arabic numbering system:

r'(?<=\d)(?=(\d{3})+(?!\d))'

For Indian numbering system, I have come up with the following expression but it doesn't work for numbers with more than 8 digits:

r'(?<=\d)(?=(((\d{2}){0,2}\d{3})(?=\b)))'

Using the above pattern, I get 100000000,00,00,000.

I am using the Python re module (re.sub()). Any ideas?

like image 1000
newbie Avatar asked Jan 09 '13 14:01

newbie


2 Answers

I know Tim has answered the question you asked, but assuming you start with numbers rather than strings, have you considered whether you need a regular expression at all? If the machine you are using supported an Indian locale then you could just use the locale module:

>>> import locale
>>> locale.setlocale(locale.LC_NUMERIC, "en_IN")
'en_IN'
>>> locale.format("%d", 10000000, grouping=True)
'1,00,00,000'

That interpreter session was copied from an Ubuntu system, but be aware that Windows systems may not support a suitable locale (at least mine doesn't), so while this is in some ways a 'cleaner' solution, depending on your environment it may or may not be usable.

like image 92
Duncan Avatar answered Oct 22 '22 20:10

Duncan


Try this:

(?<=\d)(?=(\d{2}){0,2}\d{3}(\d{7})*(?!\d))

For example:

>>> import re
>>> inp = ["1" + "0"*i for i in range(20)]
>>> [re.sub(r"(?<=\d)(?=(\d{2}){0,2}\d{3}(\d{7})*(?!\d))", ",", i) 
     for i in inp]
['1', '10', '100', '1,000', '10,000', '1,00,000', '10,00,000', '1,00,00,000', 
 '10,00,00,000', '100,00,00,000', '1,000,00,00,000', '10,000,00,00,000', 
 '1,00,000,00,00,000', '10,00,000,00,00,000', '1,00,00,000,00,00,000', 
 '10,00,00,000,00,00,000', '100,00,00,000,00,00,000', 
 '1,000,00,00,000,00,00,000', '10,000,00,00,000,00,00,000',
 '1,00,000,00,00,000,00,00,000']

As a commented regex:

result = re.sub(
    r"""(?x)       # Enable verbose mode (comments)
    (?<=\d)        # Assert that we're not at the start of the number.
    (?=            # Assert that it's possible to match:
     (\d{2}){0,2}  # 0, 2 or 4 digits,
     \d{3}         # followed by 3 digits,
     (\d{7})*      # followed by 0, 7, 14, 21 ... digits,
     (?!\d)        # and no more digits after that.
    )              # End of lookahead assertion.""", 
    ",", subject)
like image 41
Tim Pietzcker Avatar answered Oct 22 '22 20:10

Tim Pietzcker