I have a text file that contains the following:
# This is a comment, do not parse this: U20:%x[-2,1]
U01:%x[-2,1]
U02:%x[-2,1]/%x[-1,1]/%x[0,1]
The requirement is that I need to extract out the value pairs within each square parenthesis in each line.
For eg. for the first line I expect to get the pair -2 and 1. For the second line, I expect 3 pairs of values.
The line should start with a "U" followed by at least 1 digit, followed by a colon ":".
If the line is empty or starts with "#", then it should be ignored.
This was the regex I used, but it's not ignoring lines starting with the "#".
(?:U\d+:|/)\%x\[(?:(-?\d+),(\d+))\]
How I can I change the regex to make it work?
You can use REGEX (?:[^\\[]*)(?:\\[)(-?\\d+),(\\d+)(?=\\])
to find matches between [
and ]
.
Explanation :
CODE:
String ar[] ={ "# This is a comment, do not parse this: U20:%x[-2,1]",
"U01:%x[-2,1]",
"U02:%x[-2,1]/%x[-1,1]/%x[0,1]"};
String REGEX = "(?:[^\\[]*)(?:\\[)(-?\\d+),(\\d+)(?=\\])";
Pattern p = Pattern.compile(REGEX);
for(String theString:ar){
if(!theString.matches("^U\\d+:.*"))
continue;
Matcher m = p.matcher(theString);
while (m.find()) {
String matched = m.group(1);
String matched1 = m.group(2);
System.out.println("Mached : "+ matched +", "+ matched1);
}
}
OUTPUT:
Mached : -2, 1
Mached : -2, 1
Mached : -1, 1
Mached : 0, 1
You can use this pattern with a global research:
(?m:^U\d+:|\G/)%x\[(-?\d+),(-?\d+)\]
pattern details:
(?m: # non capturing group with the multiline modifier
^ # anchor: start of the line
U\d+: # literal "U" followed by digits and :
| # OR
\G/ # literal "/" contiguous to a precedent match
)
%x\[(-?\d+),(-?\d+)\]
example:
Pattern p = Pattern.compile("(?m:^U\\d+:|\\G/)%x\\[(-?\\d+),(-?\\d+)\\]");
Matcher m = p.matcher(s); // s is all the content of your txt file
while (m.find()) {
System.out.print(m.group(1) + "," + m.group(2));
}
If something else between two %x[a,b]
is allowed, you can change the pattern to:
(?m:^U\d+:|\G(?>[^#\n/]++|/(?!%x\[))*/)%x\[(-?\d+),(-?\d+)\]
or
(?m:^U\d+:|\G[^#\n]*?/)%x\[(-?\d+),(-?\d+)\]
note that the added subpattern can't match a comment since the character #
is exclude from the character class
Another way: since your data are in a text file, you can read the file line by line and extract the numbers with one of the precedent patterns (in this case you can remove the m modifier). The advantage is that you know what line are the numbers from.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With