I have several regexes (actually several thousands), and I must check if one string matches any of these regexes. It is not very efficient, so I would like to merge all these regexes as a single regex.
For example, if a have these regexes:
I would like to obtain something like 'foo *(bar|zip)|zap *bar'.
Is there some algorithm, library or tool to do this?
to combine two expressions or more, put every expression in brackets, and use: *?
Pattern pattern = Pattern. compile( String. join( "|" , regexes ) );
Chaining regular expressions Regular expressions can be chained together using the pipe character (|). This allows for multiple search options to be acceptable in a single regex string.
Python's re. compile() method is used to compile a regular expression pattern provided as a string into a regex pattern object ( re. Pattern ). Later we can use this pattern object to search for a match inside different target strings using regex methods such as a re. match() or re.search() .
You can just concatenate the regexes using or (|
) (and anchors for the beginning/end of string).
Most good regex libraries optimize their finite state automata after they build it from your regex. PCRE does that, for instance.
This step usually takes care of your optimization problem, ie. they apply most of the transformations you would have to do "by hand".
In theory a regex is a (nondeterministic)finite-state automata; thus they can be merged and minimized. You can take a look at this as a starting point.
Beware, though, that this might not be the most correct answer. Why do you have to deal with several thousands regular expressions? I can only fathom the maintentance hell of such a thing. Perhaps you should consider writing a parser and a grammar -- much easily done (and grammars are more powerful than regexps anyways).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With