Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is more efficient regular expression?

I'm parsing some big log files and have some very simple string matches for example

if(m/Some String Pattern/o){
    #Do something
}

It seems simple enough but in fact most of the matches I have could be against the start of the line, but the match would be "longer" for example

if(m/^Initial static string that matches Some String Pattern/o){
    #Do something
}

Obviously this is a longer regular expression and so more work to match. However I can use the start of line anchor which would allow an expression to be discarded as a failed match sooner.

It is my hunch that the latter would be more efficient. Can any one back me up/shoot me down :-)

like image 785
Vagnerr Avatar asked Dec 01 '22 08:12

Vagnerr


2 Answers

I think you'll find that starting your regex with ^ will definitely be faster, because the regex engine doesn't have to look any further than the left edge of the string for a match.

This is something that you could easily test and measure, of course. Do a regex match 10 million times or so, measure how long it takes, then try again with a different regex.

like image 170
Greg Hewgill Avatar answered Dec 17 '22 22:12

Greg Hewgill


The line anchor makes it faster. I have to add though that the //o modifier is not necessary here, in fact it does nothing. That's code smell to me.

There used to be valid usages for //o, but these days that is provided by qr//

like image 40
Leon Timmermans Avatar answered Dec 17 '22 22:12

Leon Timmermans