String manipulation vs Regexps

Tags:

We are often told that Regexps are slow and should be avoided whenever possible.

However, taking into account the overhead of doing some string manipulation oneself (not talking about algorithm mistakes - this is a different matter), especially in PHP or Perl (maybe Java) what is the limit, in which case can we consider string manipulation to be a better alternative? What regexps are particularly CPU greedy?

For instance, for the following, in C++, Java, PHP or Perl, what would you recommend

The regexps would probably be faster:

s/abc/def/g or a ... while((i=index("abc",$x)>=0) ...$y .= substr()... based solution?
s/(\d)+/N/g or a scanning algorithm

But what about

an email validation regexp?
s/((0|\w)+?[xy]*[^xy]){2,7}/u/g

wouldn't a handmade and specific algorithm be faster (while longer to write)?

edit

The point of the question is to determine what kind of regexp would better be rewritten specifically for a given problem via string manipulation?

edit2

A common implementation is Perl regexp. For instance in Perl - that requires to know how they are implemented - what kind of regexp is to be avoided, because the implementation will make the process lengthy and ineffective? It may not be a complex regexp...

edit July 2011 (based on comments)

I'm not saying all regexps are slow. Some particular regexps patterns are known to be slow, due to the particular processing their and due to their implementation.
In recent Perl / PHP implementations for instance, what is known to be rather slow - and should be avoided?
The answer is expected from people who did already their own research (profiler...) and who are able to provide a kind of general guidelines about what is recommended/to be avoided.

927

asked Aug 31 '10 15:08

Déjà vu

1 Answers

Who said regexes were slow? At least in Perl they tend to be the preferred method of manipulating strings.

Regexes are bad at some things like email validation because the subject is too complex, not because they are slow. A proper email validation regex is over 6,000 characters long, and it doesn't even handle all of the cases (you have to strip out comments first).

At least in Perl 5, if it has a grammar it probably shouldn't be parsed with one regex.

You should also rewrite a regex as a custom function if the regex has grown to the point it can no longer be easily maintained (see the previous email validation example) or profiling shows that the regex is the slow component of your code.

You seem to be concerned with the speed of the regex vs the custom algorithm, but that is not a valid concern until you prove that it is with a profiler. Write the code in the most maintainable way. If a regex is clear, then use a regex. If a custom algorithm is clear, then use a custom algorithm. If you find that either is eating up a lot of time after profiling your code, then start looking for alternatives.

108

answered Oct 05 '22 16:10

Chas. Owens

Related questions
                            
                                Android Google Map, Polylines and zoom
                            
                                Invert Map with list value Map<Key, List<Value>> to Map <Value, Key> in Java 8
                            
                                Total of all numbers from 1 to N will always be zero
                            
                                Java Optional and orElse [duplicate]
                            
                                SMTP with CRAM-MD5 in Java
                            
                                How do you decide when to upgrade a library in your project?
                            
                                Java Object Serialization Performance tips
                            
                                How to fill color on triangle
                            
                                Java Generics and Infinity (Comparable)
                            
                                Transitioning from Java to C and then C++? [closed]
                            
                                Alternatives to decorator pattern
                            
                                Java circular Generics
                            
                                Can someone steal a password from a Java application?
                            
                                Cosine Similarity
                            
                                Getting all mapped Entities from EntityManager
                            
                                Why are (almost) all the on-line games written in ActionScript (Flash) not Java?
                            
                                How to use mkdir and rmdir commands in a java program
                            
                                Does form with enctype="multipart/form-data" cause problems accessing a hidden field
                            
                                Java initializing classes without repeating myself
                            
                                Testing methodologies

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

String manipulation vs Regexps

Tags:

java

string

regex

php

perl

Déjà vu

People also ask

1 Answers

Chas. Owens

Recent Activity

Donate For Us