Regex: ignoring order of groups

Question

I have a piece of text:

randomtext 1150,25 USD randomtext

and a simple regex to extract the amount of money in different currencies:

(((\d+)(,?\s?|.)(\d{1,2}))\s?(PLN|EUR|USD|CHF|GBP))

Which gives me these groups:

1150,25 USD
1150,25
1150
,
25
USD

However, the number and the currency may swap their positions:

randomtext USD 1150,25 randomtext

or

randomtext USD1150,25 randomtext

How should I improve my regex to satisfy that condition without repeating whole groups (AB|BA) while keeping the current grouping?

Casimir et Hippolyte · Accepted Answer

You can use this kind of pattern:

String p = "\b (?=[\dPEUCG])  # to jump quickly at interesting positions       
" +
           "(?=     # open a lookahead                                           
" +
           "    (?> [\d,]+ \s* )? # perhaps the value is before                
" +
           "    (?<currency> PLN|EUR|USD|CHF|GBP )  # capture the currency       
" +
           "    (?:\b|\d) # a word boundary or a digit                         
" +
           ")       # close the lookahead                                        
" +
           "(?> [B-HLNPRSU]{3} \s* )? (?<value> \d+(?:,\d+)? )                  ";

Pattern RegComp = Pattern.compile(p, Pattern.COMMENTS);

String s = "USD 1150,25 randomtext 
" +
           "Non works randomtext 1150,25 USD randomtext
" +
           "Works randomtextUSD 1150,25 USD randomtext
" +
           "Works randomtext USD 1150,25 randomtext
" +
           "Works randomtext USD1150,25 randomtext
" +
           "Non work randomtext 1150,25 USD randomtext";

Matcher m = RegComp.matcher(s);

while( m.find() ) {
    System.out.println(m.group("value") + " : " + m.group("currency"));
}

The idea is to capture the currency in a lookahead (that is a zero-width assertion). The lookahead is only an assertion and doesn't consume characters, and the subpattern inside describes an eventual value before. So the position of the currency doesn't change anything. The value is captured outside of the lookahead.

About \b (?=[\dPEUCG]): The goal of this subpattern is to filter positions in the string that are not the beginning of a word that starts with a digit or one of the first letters of the different currencies without to test the whole pattern.

Regex: ignoring order of groups

Tags:

java

regex

EyesClear

1 Answers

Casimir et Hippolyte

Recent Activity

Donate For Us

Regex: ignoring order of groups

Tags:

java

regex

EyesClear

1 Answers

Casimir et Hippolyte

Related questions

Recent Activity

Donate For Us