Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression: Extract pairs of values from strings that do not start with "#"

Tags:

java

regex

I have a text file that contains the following:

# This is a comment, do not parse this: U20:%x[-2,1]
U01:%x[-2,1]
U02:%x[-2,1]/%x[-1,1]/%x[0,1]

The requirement is that I need to extract out the value pairs within each square parenthesis in each line.

For eg. for the first line I expect to get the pair -2 and 1. For the second line, I expect 3 pairs of values.

The line should start with a "U" followed by at least 1 digit, followed by a colon ":".

If the line is empty or starts with "#", then it should be ignored.

This was the regex I used, but it's not ignoring lines starting with the "#".

(?:U\d+:|/)\%x\[(?:(-?\d+),(\d+))\]

How I can I change the regex to make it work?

like image 913
swirlobt Avatar asked Jan 21 '14 08:01

swirlobt


2 Answers

You can use REGEX (?:[^\\[]*)(?:\\[)(-?\\d+),(\\d+)(?=\\]) to find matches between [ and ] .

Explanation :

enter image description here

enter image description here

CODE:

String ar[] ={ "# This is a comment, do not parse this: U20:%x[-2,1]",
               "U01:%x[-2,1]",
               "U02:%x[-2,1]/%x[-1,1]/%x[0,1]"};

String REGEX = "(?:[^\\[]*)(?:\\[)(-?\\d+),(\\d+)(?=\\])";
Pattern p = Pattern.compile(REGEX);
for(String theString:ar){
    if(!theString.matches("^U\\d+:.*"))
            continue;

    Matcher m = p.matcher(theString);
    while (m.find()) {
        String matched = m.group(1);
        String matched1 = m.group(2);
        System.out.println("Mached :  "+ matched +", "+ matched1);

    }
}

OUTPUT:

Mached :  -2, 1
Mached :  -2, 1
Mached :  -1, 1
Mached :  0, 1
like image 42
Sujith PS Avatar answered Nov 02 '22 12:11

Sujith PS


You can use this pattern with a global research:

(?m:^U\d+:|\G/)%x\[(-?\d+),(-?\d+)\]

pattern details:

(?m:                # non capturing group with the multiline modifier
    ^               # anchor: start of the line
    U\d+:           # literal "U" followed by digits and : 
  |                 # OR
    \G/             # literal "/" contiguous to a precedent match 
)
%x\[(-?\d+),(-?\d+)\]

example:

Pattern p = Pattern.compile("(?m:^U\\d+:|\\G/)%x\\[(-?\\d+),(-?\\d+)\\]");
Matcher m = p.matcher(s); // s is all the content of your txt file
while (m.find()) {
    System.out.print(m.group(1) + "," + m.group(2));
}

If something else between two %x[a,b] is allowed, you can change the pattern to:

(?m:^U\d+:|\G(?>[^#\n/]++|/(?!%x\[))*/)%x\[(-?\d+),(-?\d+)\]

or

(?m:^U\d+:|\G[^#\n]*?/)%x\[(-?\d+),(-?\d+)\]

note that the added subpattern can't match a comment since the character # is exclude from the character class

Another way: since your data are in a text file, you can read the file line by line and extract the numbers with one of the precedent patterns (in this case you can remove the m modifier). The advantage is that you know what line are the numbers from.

like image 62
Casimir et Hippolyte Avatar answered Nov 02 '22 10:11

Casimir et Hippolyte