Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simplify a complex regular expression

Tags:

java

regex

I am looking for a way to simplify a regular expression which consists of values (e.g. 12345), relation signs (<,>,<=,>=) and junctors (&,!). E.g. the expression:

>= 12345 & <=99999 & !55555 

should be matched. I have this regular expression:

(^<=|^<= | ^>= | ^>= |^<|^>|^< |^> |^)((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*

I am especially unhappy with the repetition of <=, >=, <, > at the beginning and end of the expression. I would be glad to get a hint how to make it simpler e.g. look ahead, look back.

like image 257
user1413457 Avatar asked May 23 '12 20:05

user1413457


People also ask

How do you simplify in regular expressions?

② Simplify regexClick Simplify step to perform one simplification step, and Simplify full to perform simplification until the end. Using set algebra and FSM equivalence laws, regex simplification reduces the length of the regex definition string while not changing the language that the regex defines.

What is {} in regular expression?

Integer values enclosed in {} indicate the number of times to apply the preceding regular expression. n is the minimum number, and u is the maximum number. If you specify only n, it indicates the exact number of times to apply the regular expression.

What does A+ mean in regular expression?

The character + in a regular expression means "match the preceding character one or more times". For example A+ matches one or more of character A. The plus character, used in a regular expression, is called a Kleene plus . Regular Expression.


1 Answers

Starting from your regex, you can do this simplification steps:

 (^<=|^<= | ^>= | ^>= |^<|^>|^< |^> |^)((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*
  1. Move the anchor out of the alternation

    ^(<=|<= |>= |>= |<|>|< |> |)((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*
    

    Why has there been whitespace before the anchor? (removed that)

  2. Move the following whitespace outside and make it optional

    ^(<=|<=|>=|>=|<|>|<|>|) ?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*
    
  3. Remove the duplicates in the alternations

    ^(<=|>=|<|>|) ?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*
    
  4. The empty alternative at the end would match the empty string ==> this alternation is optional

    ^((<=|>=|<|>)? ?)?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*
    
  5. Make the equal sign optional and remove the duplicates

    ^((<|>)=? ?)?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*
    
  6. The alternation with single characters can be replaced with a character class

    ^([<>]=? ?)?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*
    
  7. Do similar things with the alternation at the end and you end up with something like this:

    ^([<>]=? ?)?((!|)([0-9]{1,5}))( ?(& ?([<>]=?)?)?|$)
    

This is untested, I did not change the semantic (I think so), but I did this just here in the editor.

like image 167
stema Avatar answered Oct 28 '22 18:10

stema