{
Main Block
{
Nested Block
}
}
{
Main Block
{
Nested Block
}
{
Nested Block
}
}
I want to get data within Main Blocks including its Nested Blocks with Java Regex. Is it possible?
Thanks in Advance
A regular expression probably isn't the best tool for the job (since it appears that you can have arbitrarily-nested braces). I think you might be better off writing a parser based on some grammar (that you'll have to define).
Here is an EBNF to get you started; it's incomplete because I don't know what things can be inside your block (other than more blocks):
blocks ::= { block }
block ::= "{", block-content, "}"
block-content ::= blocks | things-other-than-blocks
For some resources on parsing, take a look at this answer.
IF there can only be at most 1 level of nesting, and the braces characters can not be escaped, then in fact the regex pattern for this is quite simple.
Essentially the structure we have, in some abstract notation, is:
{…(?:{…}…)*…}
Here's a visual breakdown:
___top___
/ nest \
/ / \ \
{…(?:{…}…)*…}
| \______/| |
| | |
open | close
|
zero or more
This is not quite regex, of course, because:
{
and }
, since they're metacharacters…
with the actual pattern for content
[^{}]*+
would be a fine pattern. The […]
is a character class. [^…]
is a negated character class. The *
is zero-or-more repetition. The +
following the repetition specifier is the possessive quantifier.So, meta-regexing technique is used to programmatically transform this abstract pattern (which is readable) to valid regex pattern (which can be ugly at times like this). Here's an example (also see on ideone.com):
import java.util.*;
import java.util.regex.*;
//...
Pattern block = Pattern.compile(
"{…(?:{…}…)*…}"
.replaceAll("[{}]", "\\\\$0")
.replace("…", "[^{}]*+")
);
System.out.println(block.pattern());
// \{[^{}]*+(?:\{[^{}]*+\}[^{}]*+)*[^{}]*+\}
String text
= "{ main1 { sub1a } { sub1b } { sub1c } }\n"
+ "{ main2\n"
+ " { sub2a }\n"
+ " { sub2c }\n"
+ "}"
+ " { last one, promise } ";
Matcher m = block.matcher(text);
while (m.find()) {
System.out.printf(">>> %s <<<%n", m.group());
}
// >>> { main1 { sub1a } { sub1b } { sub1c } } <<<
// >>> { main2
// { sub2a }
// { sub2c }
// } <<<
// >>> { last one, promise } <<<
As you can see, the actual regex pattern is therefore:
\{[^{}]*+(?:\{[^{}]*+\}[^{}]*+)*[^{}]*+\}
Which as a Java string literal:
"\\{[^{}]*+(?:\\{[^{}]*+\\}[^{}]*+)*[^{}]*+\\}"
If the nesting level can be deeper, then regex can still be used. You can also allow the {
and }
to be "escaped" (i.e. used in the content part but not as block delimiter).
The final regex pattern will be quite complicated, but depending on how comfortable you are with meta-regexing (which requires you to be comfortable with regex itself), the code can be quite readable and manageable.
If the nesting level can be arbitrarily deep, then some flavors (e.g. .NET or Perl) can still handle it, but Java regex is not powerful enough to handle it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With