I am trying to find a regular expression to comma separate a large number based on the south asian numbering system.
A few examples:
1,000,000
(Arabic) is 10,00,000
(Indian/Hindu/South Asian)1,000,000,000
(Arabic) is 100,00,00,000
(Indian/H/SA). The comma pattern repeats for every 7 digits. For example,
1,00,00,000,00,00,000
.
From the book Mastering Regular Expressions by Friedl , I have the following regular expression for Arabic numbering system:
r'(?<=\d)(?=(\d{3})+(?!\d))'
For Indian numbering system, I have come up with the following expression but it doesn't work for numbers with more than 8 digits:
r'(?<=\d)(?=(((\d{2}){0,2}\d{3})(?=\b)))'
Using the above pattern, I get 100000000,00,00,000
.
I am using the Python re
module (re.sub()
). Any ideas?
I know Tim has answered the question you asked, but assuming you start with numbers rather than strings, have you considered whether you need a regular expression at all? If the machine you are using supported an Indian locale then you could just use the locale module:
>>> import locale
>>> locale.setlocale(locale.LC_NUMERIC, "en_IN")
'en_IN'
>>> locale.format("%d", 10000000, grouping=True)
'1,00,00,000'
That interpreter session was copied from an Ubuntu system, but be aware that Windows systems may not support a suitable locale (at least mine doesn't), so while this is in some ways a 'cleaner' solution, depending on your environment it may or may not be usable.
Try this:
(?<=\d)(?=(\d{2}){0,2}\d{3}(\d{7})*(?!\d))
For example:
>>> import re
>>> inp = ["1" + "0"*i for i in range(20)]
>>> [re.sub(r"(?<=\d)(?=(\d{2}){0,2}\d{3}(\d{7})*(?!\d))", ",", i)
for i in inp]
['1', '10', '100', '1,000', '10,000', '1,00,000', '10,00,000', '1,00,00,000',
'10,00,00,000', '100,00,00,000', '1,000,00,00,000', '10,000,00,00,000',
'1,00,000,00,00,000', '10,00,000,00,00,000', '1,00,00,000,00,00,000',
'10,00,00,000,00,00,000', '100,00,00,000,00,00,000',
'1,000,00,00,000,00,00,000', '10,000,00,00,000,00,00,000',
'1,00,000,00,00,000,00,00,000']
As a commented regex:
result = re.sub(
r"""(?x) # Enable verbose mode (comments)
(?<=\d) # Assert that we're not at the start of the number.
(?= # Assert that it's possible to match:
(\d{2}){0,2} # 0, 2 or 4 digits,
\d{3} # followed by 3 digits,
(\d{7})* # followed by 0, 7, 14, 21 ... digits,
(?!\d) # and no more digits after that.
) # End of lookahead assertion.""",
",", subject)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With