Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When not to use Regex in C# (or Java, C++, etc.)

Tags:

java

c#

regex

It is clear that there are lots of problems that look like a simple regex expression will solve, but which prove to be very hard to solve with regex.

So how does someone that is not an expert in regex, know if he/she should be learning regex to solve a given problem?

(See "Regex to parse C# source code to find all strings" for way I am asking this question.)

This seems to sums it up well:

Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems...

(I have just changed the title of the question to make it more specific, as some of the problems with Regex in C# are solved in Perl and JScript, for example the fact that the two levels of quoting makes a Regex so unreadable.)

like image 559
Ian Ringrose Avatar asked Jun 09 '09 08:06

Ian Ringrose


People also ask

Can regex be used in C?

It is used in every programming language like C++, Java, and Python. Used to find any of the characters or numbers specified between the brackets.

Is regex better than for loop?

Regex is faster for large string than an if (perhaps in a for loops) to check if anything matches your requirement.

Does regex affect performance?

Being more specific with your regular expressions, even if they become much longer, can make a world of difference in performance. The fewer characters you scan to determine the match, the faster your regexes will be.


2 Answers

Don't try to use regex to parse hierarchical text like program source (or nested XML): they are proven to be not powerful enough for that, for example, they can't, for a string of parens, figure out whether they're balanced or not.

Use parser generators (or similar technologies) for that.

Also, I'd not recommend using regex to validate data with strict formal standards, like e-mail addresses. They're harder than you want, and you'll either have unaccurate or a very long regex.

like image 118
alamar Avatar answered Oct 14 '22 14:10

alamar


There are two aspects to consider:

  • Capability: is the language you are trying to recognize a Type-3 language (a regular one)? if so, then you might use regex, if not, you need a more powerful tool.

  • Maintainability: If it takes more time write, test and understand a regular expression than its programmatic counterpart, then it's not appropriate. How to check this is complicated, I'd recommend peer review with your fellows (if they say "what the ..." when they see it, then it's too complicated) or just leave it undocumented for a few days and then take a look by yourself and measure how long does it take to understand it.

like image 4
fortran Avatar answered Oct 14 '22 14:10

fortran