Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to measure regex re performance properly?

Trying some regex performance tests (heard some rumors that erlang is slow)

>Fun = fun F(X) -> case X > 1000000 of true -> ok; false -> Y = X + 1, re:run(<<"1ab1jgjggghjgjgjhhhhhhhhhhhhhjgdfgfdgdfgdfgdfgdfgdfgdfgdfgdfgfgv">>, "^[a-zA-Z0-9_]+$"), F(Y) end end.
#Fun<erl_eval.30.128620087>
> timer:tc(Fun, [0]).                                                         
{17233982,ok}                                                                   
> timer:tc(Fun, [0]).   
{17155982,ok}

and some tests after compiling regex

{ok, MP} = re:compile("^[a-zA-Z0-9_]+$").                                   
{ok,{re_pattern,0,0,0,                                                          
            <<69,82,67,80,107,0,0,0,16,0,0,0,1,0,0,0,255,255,255,
              255,255,255,...>>}}
> Fun = fun F(X) -> case X > 1000000 of true -> ok; false -> Y = X + 1, re:run(<<"1ab1jgjggghjgjgjhhhhhhhhhhhhhjgdfgfdgdfgdfgdfgdfgdfgdfgdfgdfgfgv">>, MP), F(Y) end end.               
#Fun<erl_eval.30.128620087>
> timer:tc(Fun, [0]).                                                         
{15796985,ok}                                                                   
>        
> timer:tc(Fun, [0]).
{15921984,ok}

http://erlang.org/doc/man/timer.html :

Unless otherwise stated, time is always measured in milliseconds.

http://erlang.org/doc/man/re.html#compile-1 :

Compiling the regular expression before matching is useful if the same expression is to be used in matching against multiple subjects during the lifetime of the program. Compiling once and executing many times is far more efficient than compiling each time one wants to match.

Questions

  1. Why is it returning microseconds to me?(should be milliseconds?)
  2. Compiling regex doesn't make much difference, why?
  3. Should i bother compiling it?
like image 418
asim Avatar asked Sep 04 '25 17:09

asim


1 Answers

  1. In module timer, the function tc/2 returns microseconds
tc(Fun) -> {Time, Value}
tc(Fun, Arguments) -> {Time, Value}
tc(Module, Function, Arguments) -> {Time, Value}
    Types
    Module = module()
    Function = atom()
    Arguments = [term()]
    Time = integer()
      In microseconds
    Value = term()
  1. Because the function Fun need to compile the string "^[a-zA-Z0-9_]+$" every single recursive (1 million times) in case 1. By contrast, you do the compile first in case 2. After that you bring the result into the recursive, so this is reason why the performance is low than case 1.

run(Subject, RE) -> {match, Captured} | nomatch

Subject = iodata() | unicode:charlist()

RE = mp() | iodata()

The regular expression can be specified either as iodata() in which case it is automatically compiled (as by compile/2) and executed, or as a precompiled mp() in which case it is executed against the subject directly.

  1. Yes, you should pay attention about compiling first before bring it to recursive
like image 81
bxdoan Avatar answered Sep 06 '25 09:09

bxdoan