I have a piece of text:
randomtext 1150,25 USD randomtext
and a simple regex to extract the amount of money in different currencies:
(((\d+)(,?\s?|.)(\d{1,2}))\s?(PLN|EUR|USD|CHF|GBP))
Which gives me these groups:
However, the number and the currency may swap their positions:
randomtext USD 1150,25 randomtext
or
randomtext USD1150,25 randomtext
How should I improve my regex to satisfy that condition without repeating whole groups (AB|BA) while keeping the current grouping?
You can use this kind of pattern:
String p = "\\b (?=[\\dPEUCG]) # to jump quickly at interesting positions \n" +
"(?= # open a lookahead \n" +
" (?> [\\d,]+ \\s* )? # perhaps the value is before \n" +
" (?<currency> PLN|EUR|USD|CHF|GBP ) # capture the currency \n" +
" (?:\\b|\\d) # a word boundary or a digit \n" +
") # close the lookahead \n" +
"(?> [B-HLNPRSU]{3} \\s* )? (?<value> \\d+(?:,\\d+)? ) ";
Pattern RegComp = Pattern.compile(p, Pattern.COMMENTS);
String s = "USD 1150,25 randomtext \n" +
"Non works randomtext 1150,25 USD randomtext\n" +
"Works randomtextUSD 1150,25 USD randomtext\n" +
"Works randomtext USD 1150,25 randomtext\n" +
"Works randomtext USD1150,25 randomtext\n" +
"Non work randomtext 1150,25 USD randomtext";
Matcher m = RegComp.matcher(s);
while( m.find() ) {
System.out.println(m.group("value") + " : " + m.group("currency"));
}
The idea is to capture the currency in a lookahead (that is a zero-width assertion). The lookahead is only an assertion and doesn't consume characters, and the subpattern inside describes an eventual value before. So the position of the currency doesn't change anything. The value is captured outside of the lookahead.
About \\b (?=[\\dPEUCG])
:
The goal of this subpattern is to filter positions in the string that are not the beginning of a word that starts with a digit or one of the first letters of the different currencies without to test the whole pattern.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With