Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best delimiter to separate multipe regex

I need to put multiple regular expressions in a single string and then parse it back as separate regex. Like below

regex1<!>regex2<!>regex3

Problem is I am not sure which delimiter will be best to use to separate the expressions in place of <!> shown in example, so that I can safely split the string when parsing it back.

Constraints are, I can not make the string in multiple lines or use xml or json string. Because this string of expressions should be easily configurable.

Looking forward for any suggestion.

Edited:

Q: Why does it have to be a single string?

A: The system has a configuration manager that loads config from properties file. And properties are containing lines like

com.some.package.Class1.Field1: value
com.some.package.Class1.Expressions: exp1<!>exp2<!>exp3

There is no way to write the value in multiple lines in the properties file. That's why.

like image 526
Samiron Avatar asked Jul 02 '13 05:07

Samiron


People also ask

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string). Both are called anchors and ensure that the entire string is matched instead of just a substring.

Is regex faster than string split?

Regex will work faster in execution, however Regex's compile time and setup time will be more in instance creation. But if you keep your regex object ready in the beginning, reusing same regex to do split will be faster. String.


1 Answers

The best way would be to use invalid regex as delimiter such as ** Because if it is used in normal regex it won't work and would throw an exception{NOTE:++ is valid}

regex1+"**"+regex2

Now you can split it with this regex

(?<!\\\\)[*][*](?![*])
-------         -----
   |              |->to avoid matching pattern like "A*"+"**"+"n+"
   |->check if  * is not escaped

Following is a list of invalid regex

  • [+
  • (+
  • [*
  • (*
  • [?
  • *+
  • ** (delimiter would be (?<!\\\\)[*][*](?![*]))
  • ??(delimiter would be (?<!\\\\)[?][?](?![?]))

While splitting you need to check if they are escaped

(?<!\\\\)delimiter
like image 146
Anirudha Avatar answered Oct 16 '22 19:10

Anirudha