Why regular expression .* is slower at one place and faster at other

Tags:

regex

Lately I am using a lot of regular expressions in java/groovy. For testing I routinely use regex101.com. Obviously I am looking at the regular expressions performance too.

One thing I noticed that using .* properly can significantly improve the overall performance. Primarily, using .* in between, or better to say not at the end of the regular expression is performance kill.

For example, in this regular expression the required number of steps is 27:

enter image description here

If I change first .* to \s*, it will reduce the steps required significantly to 16:

enter image description here

However, if I change second .* to \s*, it does not reduce the steps any further:

enter image description here

I have few questions:

Why the above? I dont want to compare \s and .*. I know the difference. I want to know why \s and .* costs different based on their position in the complete regex. And then the characteristics of the regex which may cost different based on their position in the overall regex (or based on any other aspect other than position, if there is any).
Does the steps counter given in this site really gives any indication about regex performance?
what other simple or similar (position related) regex performance observations you have?

548

asked Nov 06 '15 13:11

Mahesha999

1 Answers

The following is output from the debugger.

pattern 1

pattern 2

pattern 3

The big reason for the difference in performance is that .* will consume everything until the end of the string (except the newline). The pattern will then continue, forcing the regex to backtrack (as seen in the first image).

The reason that \s and .* perform equally well at the end of the pattern is that the greedy pattern vs. consuming whitespace makes no difference if there's nothing else to match (besides WS).

If your test string didn't end in whitespace, there would be a difference in performance, much like you saw in the first pattern - the regex would be forced to backtrack.

EDIT

You can see the performance difference if you end with something besides whitespace:

Bad:

^myname.*mahesh.*hiworld

bad

Better:

^myname.*mahesh\s*hiworld

little better

Even better:

^myname\s*mahesh\s*hiworld

Much better

159

answered Oct 01 '22 11:10

erip

Related questions
                            
                                Bash Regular Expression -- Can't seem to match any of \s \S \d \D \w \W etc
                            
                                grepl for a period "." in R?
                            
                                Regex to find whole word in text but case insensitive
                            
                                Swift regular expression format?
                            
                                php regex validation
                            
                                how to match whitespace and alphanumeric characters in python
                            
                                PHP preg_match get in between string
                            
                                Javascript regex - no white space at beginning + allow space in the middle
                            
                                How to grep a word inside xml files in a folder
                            
                                Validating user's UTF-8 name in Javascript
                            
                                Android regular expression - return matched string
                            
                                Print RegEx matches using SED in bash
                            
                                JS Regex to find href of several a tags
                            
                                Positive lookahead doesn't stop at first occurrence
                            
                                Remove everything after space in string
                            
                                MySQL REGEXP + whitespace (\s)
                            
                                Regex return file name, remove path and file extension
                            
                                Regex to check if whitespace present?
                            
                                Issue with Laravel Rules & Regex (OR) operator
                            
                                Postgresql and ActiveRecord where: Regex matching

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why regular expression .* is slower at one place and faster at other

Tags:

regex

Mahesha999

People also ask

1 Answers

erip

Recent Activity

Donate For Us