I noticed some extreme delays in my Ruby (1.9) scripts and after some digging it boiled down to regular expression matching. I'm using the following test scripts in Perl and in Ruby: Perl: <pre class="prettyprint"><code>$fname = shift(@ARGV); open(FILE, "<$fname" ); while (<FILE>) { if ( /(.*?) \|.*?SENDING REQUEST.*?TID=(.*?),/ ) { print "$1: $2\n"; } } </code></pre> Ruby: <pre class="prettyprint"><code>f = File.open( ARGV.shift ) while ( line = f.gets ) if /(.*?) \|.*?SENDING REQUEST.*?TID=(.*?),/.match(line) puts "#{$1}: #{$2}" end end </code></pre> I use the same input for both scripts, a file with only 44290 lines. The timing for each one is: Perl: <pre class="prettyprint"><code>xenofon@cpm:~/bin/local/project$ time ./try.pl input >/dev/null real 0m0.049s user 0m0.040s sys 0m0.000s </code></pre> Ruby: <pre class="prettyprint"><code>xenofon@cpm:~/bin/local/project$ time ./try.rb input >/dev/null real 1m5.106s user 1m4.910s sys 0m0.010s </code></pre> I guess I'm doing something awfully stupid, any suggestions? Thank you

<pre class="prettyprint"><code>regex = Regexp.new(/(.*?) \|.*?SENDING REQUEST.*?TID=(.*?),/) f = File.open( ARGV.shift ).each do |line| if regex .match(line) puts "#{$1}: #{$2}" end end </code></pre> Or <pre class="prettyprint"><code>regex = Regexp.new(/(.*?) \|.*?SENDING REQUEST.*?TID=(.*?),/) f = File.open( ARGV.shift ) f.each_line do |line| if regex.match(line) puts "#{$1}: #{$2}" end </code></pre>

One possible difference is the amount of backtracking being performed. Perl might do a better job of pruning the search tree when backtracking (i.e. noticing when part of a pattern can't possibly match). Its regex engine is highly optimised. First, adding a leading «<code>^</code>» could make a huge difference. If the pattern doesn't match starting at position 0, it's not going to match at starting position 1 either! So don't try to match at position 1. Along the same lines, «<code>.*?</code>» isn't as limiting as you might think, and replacing each instance of it with a more limiting pattern could prevent a lot of backtracking. Why don't you try: <pre class="prettyprint"><code>/ ^ (.*?) [ ]\| (?:(?!SENDING[ ]REQUEST).)* SENDING[ ]REQUEST (?:(?!TID=).)* TID= ([^,]*) , /x </code></pre> (Not sure if it was safe to replace the first «<code>.*?</code>» with «<code>[^|]</code>», so I didn't.) (At least for patterns that match a single string, <code>(?:(?!PAT).)</code> is to <code>PAT</code> as <code>[^CHAR]</code> is to <code>CHAR</code>.) Using <code>/s</code> could possibly speed things up if «<code>.</code>» is allowed to match newlines, but I think it's pretty minor. Using «<code>\space</code>» instead of «<code>[space]</code>» to match a space under <code>/x</code> might be slightly faster in Ruby. (They're the same in recent versions of Perl.) I used the latter because it's far more readable.

Regular expression - Ruby vs Perl

Tags:

regex

ruby

perl

I noticed some extreme delays in my Ruby (1.9) scripts and after some digging it boiled down to regular expression matching. I'm using the following test scripts in Perl and in Ruby:

Perl:

Click to copy

$fname = shift(@ARGV);
open(FILE, "<$fname" );
while (<FILE>) {
    if ( /(.*?) \|.*?SENDING REQUEST.*?TID=(.*?),/ ) {
        print "$1: $2\n";
    }
}

Ruby:

Click to copy

f = File.open( ARGV.shift )
while ( line = f.gets )
    if /(.*?) \|.*?SENDING REQUEST.*?TID=(.*?),/.match(line)
        puts "#{$1}: #{$2}"
    end
end

I use the same input for both scripts, a file with only 44290 lines. The timing for each one is:

Perl:

Click to copy

xenofon@cpm:~/bin/local/project$ time ./try.pl input >/dev/null

real    0m0.049s
user    0m0.040s
sys     0m0.000s

Ruby:

Click to copy

xenofon@cpm:~/bin/local/project$ time ./try.rb input >/dev/null

real    1m5.106s
user    1m4.910s
sys     0m0.010s

I guess I'm doing something awfully stupid, any suggestions?

Thank you

801

asked Apr 20 '12 09:04

xpapad

2 Answers

Click to copy

regex = Regexp.new(/(.*?) \|.*?SENDING REQUEST.*?TID=(.*?),/)  f = File.open( ARGV.shift ).each do |line|     if regex .match(line)         puts "#{$1}: #{$2}"     end end

Click to copy

regex = Regexp.new(/(.*?) \|.*?SENDING REQUEST.*?TID=(.*?),/)  f = File.open( ARGV.shift ) f.each_line do |line|   if regex.match(line)     puts "#{$1}: #{$2}"   end

195

answered Oct 06 '22 23:10

LaGrandMere

One possible difference is the amount of backtracking being performed. Perl might do a better job of pruning the search tree when backtracking (i.e. noticing when part of a pattern can't possibly match). Its regex engine is highly optimised.

First, adding a leading «^» could make a huge difference. If the pattern doesn't match starting at position 0, it's not going to match at starting position 1 either! So don't try to match at position 1.

Along the same lines, «.*?» isn't as limiting as you might think, and replacing each instance of it with a more limiting pattern could prevent a lot of backtracking.

Why don't you try:

Click to copy

/
    ^
    (.*?)                       [ ]\|
    (?:(?!SENDING[ ]REQUEST).)* SENDING[ ]REQUEST
    (?:(?!TID=).)*              TID=
    ([^,]*)                     ,
/x

(Not sure if it was safe to replace the first «.*?» with «[^|]», so I didn't.)

(At least for patterns that match a single string, (?:(?!PAT).) is to PAT as [^CHAR] is to CHAR.)

Using /s could possibly speed things up if «.» is allowed to match newlines, but I think it's pretty minor.

Using «\space» instead of «[space]» to match a space under /x might be slightly faster in Ruby. (They're the same in recent versions of Perl.) I used the latter because it's far more readable.

answered Oct 07 '22 01:10

ikegami

Related questions
                            
                                How can I copy gems to another server?
                            
                                What's the benefit of using Sinatra instead of RoR if I'm only need a DB and an API
                            
                                Using Watir-webdriver : Getting the text of h1 tag
                            
                                How to make an async POST request in Ruby using RestClient
                            
                                Strange irb behaviour, listing content current directory
                            
                                Nesting two custom Liquid tags that have optional parameters
                            
                                How to use Ruby to replace text in a VC++ resource file, when the encoding is all wacked out?
                            
                                Segmentation Fault on MySQL2 / Ruby 1.9.3 / Rails 3.2
                            
                                Chef: Can I share common per-environment run list items?
                            
                                Switching between different version of rails in same gem set
                            
                                Can't launch using from RubyMine
                            
                                Rails test with triggers
                            
                                Why does to_a and to_ary behave differently in subclasses of Array?
                            
                                Is there an equivalent of Ruby's Yard in Python?
                            
                                Marshal or Serialize an ActiveRecord object
                            
                                Large number of WebSocket connections
                            
                                Carrierwave upload works in rails console but not in spec
                            
                                running a Unit test from irb or pry
                            
                                How do I parse this Craigslist page in this particular way?
                            
                                Rspec: Check if array includes object which includes property

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Regular expression - Ruby vs Perl

Tags:

regex

ruby

perl

xpapad

People also ask

2 Answers

LaGrandMere

ikegami

Recent Activity

Donate For Us