I am looking for a way to simplify a regular expression which consists of values (e.g. 12345), relation signs (<,>,<=,>=) and junctors (&,!). E.g. the expression:
>= 12345 & <=99999 & !55555
should be matched. I have this regular expression:
(^<=|^<= | ^>= | ^>= |^<|^>|^< |^> |^)((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*
I am especially unhappy with the repetition of <=, >=, <, > at the beginning and end of the expression. I would be glad to get a hint how to make it simpler e.g. look ahead, look back.
② Simplify regexClick Simplify step to perform one simplification step, and Simplify full to perform simplification until the end. Using set algebra and FSM equivalence laws, regex simplification reduces the length of the regex definition string while not changing the language that the regex defines.
Integer values enclosed in {} indicate the number of times to apply the preceding regular expression. n is the minimum number, and u is the maximum number. If you specify only n, it indicates the exact number of times to apply the regular expression.
The character + in a regular expression means "match the preceding character one or more times". For example A+ matches one or more of character A. The plus character, used in a regular expression, is called a Kleene plus . Regular Expression.
Starting from your regex, you can do this simplification steps:
(^<=|^<= | ^>= | ^>= |^<|^>|^< |^> |^)((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*
Move the anchor out of the alternation
^(<=|<= |>= |>= |<|>|< |> |)((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*
Why has there been whitespace before the anchor? (removed that)
Move the following whitespace outside and make it optional
^(<=|<=|>=|>=|<|>|<|>|) ?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*
Remove the duplicates in the alternations
^(<=|>=|<|>|) ?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*
The empty alternative at the end would match the empty string ==> this alternation is optional
^((<=|>=|<|>)? ?)?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*
Make the equal sign optional and remove the duplicates
^((<|>)=? ?)?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*
The alternation with single characters can be replaced with a character class
^([<>]=? ?)?((!|)([0-9]{1,5}))( & > | & < |& >=|&>=|&<=||&<=|&>=|&<|&>|&| &| & |$))*
Do similar things with the alternation at the end and you end up with something like this:
^([<>]=? ?)?((!|)([0-9]{1,5}))( ?(& ?([<>]=?)?)?|$)
This is untested, I did not change the semantic (I think so), but I did this just here in the editor.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With