I want to match amount like Rs. 2000 , Rs.2000 , Rs 20,000.00 ,20,000 INR 200.25 INR.
Output should be 2000,2000,20000.00,20000,200.25
The regular expression i have tried is this
(?:(?:(?:rs)|(?:inr))(?:!-{0,}|\.{1}|\ {0,}|\.{1}\ {0,}))(-?[\d,]+ (?:\.\d+)?)(?:[^/^-^X^x])|(?:(-?[\d,]+(?:\.\d+)?)(?:(?:\ {0,}rs)|(?:\ {0,}rs)|(?:\ {0,}(inr))))
But it is not matching numbers with inr
or rs
after the amount
I want to match it using re library in Python.
I suggest using alternation group with capture groups inside to only match the numbers before or after your constant string values:
(?:Rs\.?|INR)\s*(\d+(?:[.,]\d+)*)|(\d+(?:[.,]\d+)*)\s*(?:Rs\.?|INR)
See the regex demo.
Pattern explanation:
(?:Rs\.?|INR)\s*(\d+(?:[.,]\d+)*)
- Branch 1:
(?:Rs\.?|INR)
- matches Rs
, Rs.
, or INR
...\s*
- followed with 0+ whitespaces(\d+(?:[.,]\d+)*)
- Group 1: one or more digits followed with 0+ sequences of a comma or a dot followed with 1+ digits|
- or(\d+(?:[.,]\d+)*)\s*(?=Rs\.?|INR)
- Branch 2:
(\d+(?:[.,]\d+)*)
- Group 2 capturing the same number as in Branch 1\s*
- zero or more whitespaces(?:Rs\.?|INR)
- followed with Rs
, Rs.
or INR
.Sample code:
import re
p = re.compile(r'(?:Rs\.?|INR)\s*(\d+(?:[.,]\d+)*)|(\d+(?:[.,]\d+)*)\s*(?:Rs\.?|INR)')
s = "Rs. 2000 , Rs.3000 , Rs 40,000.00 ,50,000 INR 600.25 INR"
print([x if x else y for x,y in p.findall(s)])
See the IDEONE demo
Alternatively, if you can use PyPi regex
module, you may leverage branch reset construct (?|...|...)
where capture group IDs are reset within each branch:
>>> import regex as re
>>> rx = re.compile(r'(?|(?:Rs\.?|INR)\s*(\d+(?:[.,]\d+)*)|(\d+(?:[.,]\d+)*)\s*(?:Rs\.?|INR))')
>>> prices = [match.group(1) for match in rx.finditer(teststring)]
>>> print(prices)
['2000', '2000', '20,000.00', '20,000', '200.25']
You can access the capture group in each branch by ID=1 (see match.group(1)
).
Though slightly out of scope, here's a fingerplay with the newer and far superior regex
module by Matthew Barnett (which has the ability of subroutines and branch resets):
import regex as re
rx = re.compile(r"""
(?(DEFINE)
(?<amount>\d[\d.,]+) # amount, starting with a digit
(?<currency1>Rs\.?\ ?) # Rs, Rs. or Rs with space
(?<currency2>INR) # just INR
)
(?|
(?¤cy1)
(?P<money>(?&amount))
|
(?P<money>(?&amount))
(?=\ (?¤cy2))
)
""", re.VERBOSE)
teststring = "Rs. 2000 , Rs.2000 , Rs 20,000.00 ,20,000 INR 200.25 INR."
prices = [m.group('money') for m in rx.finditer(teststring)]
print prices
# ['2000', '2000', '20,000.00', '20,000', '200.25']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With