Any code I've seen that uses Regexes tends to use them as a black box: <ol> <li>Put in string</li> <li>Magic Regex</li> <li>Get out string</li> </ol> This doesn't seem a particularly good idea to use in production code, as even a small change can often result in a completely different regex. Apart from cases where the standard is permanent and unchanging, are regexes the way to do things, or is it better to try different methods?

I don't know which language you're using, but Perl - for example - supports the <code>x</code> flag, so spaces are ignored in regexes unless escaped, so you can break it into several lines and comment everything inline: <pre class="prettyprint"><code>$foo =~ m{ (some-thing) # matches something \s* # matches any amount of spaces (match another thing) # matches something else }x; </code></pre> This helps making long regexes more readable.

Are regexes really maintainable?

5 Answers

If regexes are long and impenetrable, making them hard to maintain then they should be commented.

A lot of regex implementations allow you to pad regexes with whitespace and comments.
See https://www.regular-expressions.info/freespacing.html#parenscomment
and Coding Horror: Regular Expressions: Now You Have Two Problems

Any code I've seen that uses Regexes tends to use them as a black box:

If by black box you mean abstraction, that's what all programming is, trying to abstract away the difficult part (parsing strings) so that you can concentrate on the problem domain (what kind of strings do I want to match).

even a small change can often result in a completely different regex.

That's true of any code. As long as you are testing your regex to make sure it matches the strings you expect, ideally with unit tests, then you should be confident at changing them.

Edit: please also read Jeff's comment to this answer about production code.

176

answered Oct 05 '22 19:10

Sam Hasler

Obligatory.

It really comes down to the regex. If it's this huge monolithic expression, then yes, it's a maintainability problem. If you can express them succinctly (perhaps by breaking them up), or if you have good comments and tools to help you understand them, then they can be a powerful tool.

answered Oct 05 '22 19:10

Joel Coehoorn

I don't know which language you're using, but Perl - for example - supports the x flag, so spaces are ignored in regexes unless escaped, so you can break it into several lines and comment everything inline:

$foo =~ m{
    (some-thing)          # matches something
    \s*                   # matches any amount of spaces
    (match another thing) # matches something else
}x;

This helps making long regexes more readable.

answered Oct 05 '22 21:10

jkramer

It only seems like magic if you don't understand the regex. Any number of small changes in production code can cause major problems so that is not a good reason, in my opinion, to not use regex's. Thorough testing should point out any problems.

answered Oct 05 '22 19:10

DMKing

Small changes to any code in any language can result in completely different results. Some of them even prevent compilation.

Substitute regex with "C" or "C#" or "Java" or "Python" or "Perl" or "SQL" or "Ruby" or "awk" or ... anything, really, and you get the same question.

Regex is just another language, Huffman coded to be efficient at string matching. Just like Java, Perl, PHP, or especially SQL, each language has strengths and weaknesses, and you need to know the language you're writing in when you're writing it (or maintaining it) to have any hope of being productive.

Edit: Mike, regex's are Huffman coded in that common things to do are shorter than than rarer things. Literal matches of text is generally a single character (the one you want to match). Special characters exist - the common ones are short. Special constructs, such as (?:) are longer. These are not the same things that would be common in general-purpose languages like Perl, C++, etc., so the Huffman coding was targetted at this specialisation.

answered Oct 05 '22 19:10

Tanktalus

Related questions
                            
                                Notepad++ How to remove last character (:) on every line
                            
                                Regex: 5 digits in increasing order
                            
                                How do I replace an actual asterisk character (*) in a Regex expression?
                            
                                Remove 'index.php' from URL with .htaccess
                            
                                Javascript regex for validating filenames
                            
                                Regex to match a slug?
                            
                                Regular Expression for URL validation
                            
                                removing characters of a specific unicode range from a string
                            
                                RegEx for Prices?
                            
                                Check if NSString contains any numbers and is at least 7 characters long
                            
                                Regular expression to validate valid time
                            
                                Regular expression for GB based and only numeric phone number
                            
                                Regex get last occurrence of the pattern
                            
                                Notepad++ Capitalize Every First Letter of Every Word
                            
                                Regex for binary multiple of 3
                            
                                Javascript regex for alphabetic characters and spaces? [closed]
                            
                                Regex to extract subdomain from URL?
                            
                                Allow only alphanumeric in textbox
                            
                                regex - return all before the second occurrence
                            
                                Regex to match Date

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Are regexes really maintainable?

Tags:

regex

coding-style

Rich Bradshaw

People also ask

5 Answers

Sam Hasler

Joel Coehoorn

jkramer

DMKing

Tanktalus

Recent Activity

Donate For Us