Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge several regexes to a single one

I have several regexes (actually several thousands), and I must check if one string matches any of these regexes. It is not very efficient, so I would like to merge all these regexes as a single regex.

For example, if a have these regexes:

  • 'foo *bar'
  • 'foo *zip'
  • 'zap *bar'

I would like to obtain something like 'foo *(bar|zip)|zap *bar'.

Is there some algorithm, library or tool to do this?

like image 309
Jazz Avatar asked Dec 11 '09 15:12

Jazz


People also ask

How do I combine two regex expressions?

to combine two expressions or more, put every expression in brackets, and use: *?

How to combine multiple regex into one in java?

Pattern pattern = Pattern. compile( String. join( "|" , regexes ) );

How do you chain in regex?

Chaining regular expressions Regular expressions can be chained together using the pipe character (|). This allows for multiple search options to be acceptable in a single regex string.

What does re compile to Python?

Python's re. compile() method is used to compile a regular expression pattern provided as a string into a regex pattern object ( re. Pattern ). Later we can use this pattern object to search for a match inside different target strings using regex methods such as a re. match() or re.search() .


2 Answers

You can just concatenate the regexes using or (|) (and anchors for the beginning/end of string).

Most good regex libraries optimize their finite state automata after they build it from your regex. PCRE does that, for instance.

This step usually takes care of your optimization problem, ie. they apply most of the transformations you would have to do "by hand".

like image 115
Vlad Valceanu Avatar answered Sep 27 '22 23:09

Vlad Valceanu


In theory a regex is a (nondeterministic)finite-state automata; thus they can be merged and minimized. You can take a look at this as a starting point.

Beware, though, that this might not be the most correct answer. Why do you have to deal with several thousands regular expressions? I can only fathom the maintentance hell of such a thing. Perhaps you should consider writing a parser and a grammar -- much easily done (and grammars are more powerful than regexps anyways).

like image 24
lorenzog Avatar answered Sep 28 '22 00:09

lorenzog