Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

To use or not to use regular expressions?

Tags:

I just asked this question about using a regular expression to allow numbers between -90.0 and +90.0. I got some answers on how to implement the regular expression, but most of the answers also mentioned that that would be better handled without using a regular expression or using a regular expression would be overkill. So how do you decide when to use a regular expression and when not to use a regular expression. Is there a check list you can follow?

like image 524
Xaisoft Avatar asked Nov 04 '10 15:11

Xaisoft


People also ask

Should I use regular expressions?

Regular expressions are useful in search and replace operations. The typical use case is to look for a sub-string that matches a pattern and replace it with something else. Most APIs using regular expressions allow you to reference capture groups from the search pattern in the replacement string.

Are regular expressions still used?

Despite being hard to read, hard to validate, hard to document and notoriously hard to master, regexes are still widely used today. Supported by all modern programming languages, text processing programs and advanced text editors, regexes are now used in more than a third of both Python and JavaScript projects.

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string).


2 Answers

Regular expressions are a text processing tool for character-based tests. More formally, regular expressions are good at handling regular languages and bad at almost anything else.

In practice, this means that regular expressions are not well suited for tasks that require discovering meaning (semantics) in text that goes beyond the character level. This would require a full-blown parser.

In your particular case: recognizing a number in a text is an exercise that regular expressions are good at (decimal numbers can be trivially described using a regular language). This works on the character level.

But doing more advanced stuff with the number that requires knowledge of its numerical value (i.e. its semantics) requires interpretation. Regular expressions are bad at this. So finding a number in text is easy. Finding a number in text that is greater than 11 but smaller than 1004 (or that is divisible by 3) is hard: it requires recognizing the meaning of the number.

like image 157
Konrad Rudolph Avatar answered Oct 26 '22 17:10

Konrad Rudolph


I would say that regex expressions are most effective on Strings. For other data types, manipulations of that data type will usually be more intuitive and provide better results.

For example, if you know that you're dealing with DateTime, then you can use the Parse and TryParse methods will the different formats and it will usually be more reliable than your own regex expressions.

In your example, you are dealing with numbers so deal with them accordingly.

Regex is very powerful, but it isn't the easiest code to read and to debug. When another reliable solution is at hand, you should probably go for that.

like image 26
Hugo Migneron Avatar answered Oct 26 '22 17:10

Hugo Migneron