Regular Expression: Extract pairs of values from strings that do not start with "#"

Question

I have a text file that contains the following:

# This is a comment, do not parse this: U20:%x[-2,1]
U01:%x[-2,1]
U02:%x[-2,1]/%x[-1,1]/%x[0,1]

The requirement is that I need to extract out the value pairs within each square parenthesis in each line.

For eg. for the first line I expect to get the pair -2 and 1. For the second line, I expect 3 pairs of values.

The line should start with a "U" followed by at least 1 digit, followed by a colon ":".

If the line is empty or starts with "#", then it should be ignored.

This was the regex I used, but it's not ignoring lines starting with the "#".

(?:U\d+:|/)\%x$$(?:(-?\d+),(\d+))$$

How I can I change the regex to make it work?

Sujith PS · Accepted Answer

You can use REGEX (?:[^$$]*)(?:\[)(-?\d+),(\d+)(?=$$) to find matches between [ and ] .

Explanation :

enter image description here

CODE:

String ar[] ={ "# This is a comment, do not parse this: U20:%x[-2,1]",
               "U01:%x[-2,1]",
               "U02:%x[-2,1]/%x[-1,1]/%x[0,1]"};

String REGEX = "(?:[^$$]*)(?:\[)(-?\d+),(\d+)(?=$$)";
Pattern p = Pattern.compile(REGEX);
for(String theString:ar){
    if(!theString.matches("^U\d+:.*"))
            continue;

    Matcher m = p.matcher(theString);
    while (m.find()) {
        String matched = m.group(1);
        String matched1 = m.group(2);
        System.out.println("Mached :  "+ matched +", "+ matched1);

    }
}

OUTPUT:

Mached :  -2, 1
Mached :  -2, 1
Mached :  -1, 1
Mached :  0, 1

Casimir et Hippolyte · Answer

You can use this pattern with a global research:

(?m:^U\d+:|\G/)%x$$(-?\d+),(-?\d+)$$

pattern details:

(?m:                # non capturing group with the multiline modifier
    ^               # anchor: start of the line
    U\d+:           # literal "U" followed by digits and : 
  |                 # OR
    \G/             # literal "/" contiguous to a precedent match 
)
%x$$(-?\d+),(-?\d+)$$

example:

Pattern p = Pattern.compile("(?m:^U\d+:|\G/)%x$$(-?\d+),(-?\d+)$$");
Matcher m = p.matcher(s); // s is all the content of your txt file
while (m.find()) {
    System.out.print(m.group(1) + "," + m.group(2));
}

If something else between two %x[a,b] is allowed, you can change the pattern to:

(?m:^U\d+:|\G(?>[^#\n/]++|/(?!%x$$))*/)%x\[(-?\d+),(-?\d+)$$

or

(?m:^U\d+:|\G[^#\n]*?/)%x$$(-?\d+),(-?\d+)$$

note that the added subpattern can't match a comment since the character # is exclude from the character class

Another way: since your data are in a text file, you can read the file line by line and extract the numbers with one of the precedent patterns (in this case you can remove the m modifier). The advantage is that you know what line are the numbers from.

Regular Expression: Extract pairs of values from strings that do not start with "#"

Tags:

java

regex

swirlobt

2 Answers

Sujith PS

Casimir et Hippolyte

Recent Activity

Donate For Us

Regular Expression: Extract pairs of values from strings that do not start with "#"

Tags:

java

regex

swirlobt

2 Answers

Sujith PS

Casimir et Hippolyte

Related questions

Recent Activity

Donate For Us