Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Match first occurrence of semicolon in string, only if not preceded by '--'

Tags:

java

regex

I'm trying to write a regular expression for Java that matches if there is a semicolon that does not have two (or more) leading '-' characters.

I'm only able to get the opposite working: A semicolon that has at least two leading '-' characters.

([\-]{2,}.*?;.*)

But I need something like

([^([\-]{2,})])*?;.*

I'm somehow not able to express 'not at least two - characters'.

Here are some examples I need to evaluate with the expression:

; -- a           : should match
-- a ;           : should not match
-- ;             : should not match
--;              : should not match
-;-              : should match
---;             : should not match
-- semicolon ;   : should not match
bla ; bla        : should match
bla              : should not match (; is mandatory)
-;--;            : should match (the first occuring semicolon must not have two or more consecutive leading '-')
like image 473
Richard Avatar asked Nov 01 '22 20:11

Richard


1 Answers

It seems that this regex matches what you want

String regex = "[^-]*(-[^-]+)*-?;.*";

DEMO

Explanation: matches will accept string that:

  • [^-]* can start with non dash characters
  • (-[^-]+)*-?; is a bit tricky because before we will match ; we need to make sure that each - do not have another - after it so:
    • (-[^-]+)* each - have at least one non - character after it
    • -? or - was placed right before ;
  • ;.* if earlier conditions ware fulfilled we can accept ; and any .* characters after it.

More readable version, but probably little slower

((?!--)[^;])*;.*

Explanation:

To make sure that there is ; in string we can use .*;.* in matches.
But we need to add some conditions to characters before first ;.

So to make sure that matched ; will be first one we can write such regex as

[^;]*;.*

which means:

  • [^;]* zero or more non semicolon characters
  • ; first semicolon
  • .* zero or more of any characters (actually . can't match line separators like \n or \r)

So now all we need to do is make sure that character matched by [^;] is not part of --. To do so we can use look-around mechanisms for instance:

  • (?!--)[^;] before matching [^;] (?!--) checks that next two characters are not --, in other words character matched by [^;] can't be first - in series of two --
  • [^;](?<!--) checks if after matching [^;] regex engine will not be able to find -- if it will backtrack two positions, in other words [^;] can't be last character in series of --.
like image 98
Pshemo Avatar answered Nov 08 '22 13:11

Pshemo