I need some help to model this regular expression. I think it'll be easier with an example. I need a regular expression that matches a comma, but only if it's not inside this structure: "( )"
, like this:
,a,b,c,d,"("x","y",z)",e,f,g,
Then the first five and the last four commas should match the expression, the two between xyz
and inside the ( )
section shouldn't.
I tried a lot of combinations but regular expressions is still a little foggy for me.
I want it to use with the split method in Java. The example is short, but it can be much more longer and have more than one section between "( and )". The split method receives an expression and if some text (in this case the comma) matches the expression it will be the separator.
So, want to do something like this:
String keys[] = row.split(expr);
System.out.println(keys[0]); // print a
System.out.println(keys[1]); // print b
System.out.println(keys[2]); // print c
System.out.println(keys[3]); // print d
System.out.println(keys[4]); // print "("x","y",z)"
System.out.println(keys[5]); // print e
System.out.println(keys[6]); // print f
System.out.println(keys[7]); // print g
Thanks!
You can do this with a negative lookahead. Here's a slightly simplified problem to illustrate the idea:
String text = "a;b;c;d;<x;y;z>;e;f;g;<p;q;r;s>;h;i;j";
String[] parts = text.split(";(?![^<>]*>)");
System.out.println(java.util.Arrays.toString(parts));
// _ _ _ _ _______ _ _ _ _________ _ _ _
// [a, b, c, d, <x;y;z>, e, f, g, <p;q;r;s>, h, i, j]
Note that instead of ,
, the delimiter is now ;
, and instead of "(
and ")
, the parentheses are simply <
and >
, but the idea still works.
The […]
is a character class. Something like [aeiou]
matches one of any of the lowercase vowels. [^…]
is a negated character class. [^aeiou]
matches one of anything but the lowercase vowels.
The *
repetition specifier can be used to match "zero-or-more times" of the preceding pattern.
The (?!…)
is a negative lookahead; it can be used to assert that a certain pattern DOES NOT match, looking ahead (i.e. to the right) of the current position.
The pattern [^<>]*>
matches a sequence (possibly empty) of everything except parentheses, finally followed by a paranthesis which is of the closing type.
Putting all of the above together, we get ;(?![^<>]*>)
, which matches a ;
, but only if we can't see the closing parenthesis as the first parenthesis to its right, because witnessing such phenomenon would only mean that the ;
is "inside" the parentheses.
This technique, with some modifications, can be adapted to the original problem. Remember to escape regex metacharacters (
and )
as necessary, and of course "
as well as \
in a Java string literal must be escaped by preceding with a \
.
You can also make the *
possessive to try to improve performance, i.e. ;(?![^<>]*+>)
.
Try this one:
(?![^(]*\)),
It worked for me in my testing, grabbed all commas not inside parenthesis.
Edit: Gopi pointed out the need to escape the slashes in Java:
(?![^(]*\\)),
Edit: Alan Moore pointed out some unnecessary complexity. Fixed.
If the parens are paired correctly and cannot be nested, you can split the text first at parens, then process the chunks.
List<String> result = new ArrayList<String>();
String[] chunks = text.split("[()]");
for (int i = 0; i < chunks.length; i++) {
if ((i % 2) == 0) {
String[] atoms = chunks[i].split(",");
for (int j = 0; j < atoms.length; j++)
result.add(atoms[j]);
}
else
result.add(chunks[i]);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With