I'm working on some text conversion routines that parse time values in different formats in Ruby. This routine is growing in complexity, and I'm currently testing a better approach to this problem. I'm currently testing a way to use <code>scanf</code>. Why? I always thought that was faster than a regex, but what happened in Ruby? It was much slower! What am I doing wrong? Note: I'm using ruby-1.9.2-p290 [ x86_64 ] (MRI) First Ruby test: <pre class="prettyprint"><code>require "scanf" require 'benchmark' def duration_in_seconds_regex(duration) if duration =~ /^\d{2,}\:\d{2}:\d{2}$/ h, m, s = duration.split(":").map{ |n| n.to_i } h * 3600 + m * 60 + s end end def duration_in_seconds_scanf(duration) a = duration.scanf("%d:%d:%d") a[0] * 3600 + a[1] * 60 + a[2] end n = 500000 Benchmark.bm do |x| x.report { for i in 1..n; duration_in_seconds_scanf("00:10:30"); end } end Benchmark.bm do |x| x.report { for i in 1..n; duration_in_seconds_regex("00:10:30"); end } end </code></pre> This is what I got using <code>scanf</code> first and a regex second: <pre class="prettyprint"><code> user system total real 95.020000 0.280000 95.300000 ( 96.364077) user system total real 2.820000 0.000000 2.820000 ( 2.835170) </code></pre> Second test using C: <pre class="prettyprint"><code>#include <stdio.h> #include <stdlib.h> #include <time.h> #include <sys/types.h> #include <string.h> #include <regex.h> char *regexp(char *string, char *patrn, int *begin, int *end) { int i, w = 0, len; char *word = NULL; regex_t rgT; regmatch_t match; regcomp(&rgT, patrn, REG_EXTENDED); if ((regexec(&rgT, string, 1, &match, 0)) == 0) { *begin = (int) match.rm_so; *end = (int) match.rm_eo; len = *end - *begin; word = malloc(len + 1); for (i = *begin; i<*end; i++) { word[w] = string[i]; w++; } word[w] = 0; } regfree(&rgT); return word; } int main(int argc, char** argv) { char * str = "00:01:30"; int h, m, s; int i, b, e; float start_time, end_time, time_elapsed; regex_t regex; regmatch_t * pmatch; char msgbuf[100]; char *pch; char *str2; char delims[] = ":"; char *result = NULL; start_time = (float) clock() / CLOCKS_PER_SEC; for (i = 0; i < 500000; i++) { if (sscanf(str, "%d:%d:%d", &h, &m, &s) == 3) { s = h * 3600L + m * 60L + s; } } end_time = (float) clock() / CLOCKS_PER_SEC; time_elapsed = end_time - start_time; printf("sscanf_time (500k iterations): %.4f", time_elapsed); start_time = (float) clock() / CLOCKS_PER_SEC; for (i = 0; i < 500000; i++) { char * match = regexp(str, "[0-9]{2,}:[0-9]{2}:[0-9]{2}", &b, &e); if (strcmp(match, str) == 0) { str2 = (char*) malloc(sizeof (str)); strcpy(str2, str); h = strtok(str2, delims); m = strtok(NULL, delims); s = strtok(NULL, delims); s = h * 3600L + m * 60L + s; } } end_time = (float) clock() / CLOCKS_PER_SEC; time_elapsed = end_time - start_time; printf("\n\nregex_time (500k iterations): %.4f", time_elapsed); return (EXIT_SUCCESS); } </code></pre> The C code results are obviously faster, and the regex results are slower than <code>scanf</code> results as expected: <pre class="prettyprint"><code>sscanf_time (500k iterations): 0.1774 regex_time (500k iterations): 3.9692 </code></pre> It is obvious that the C running time is faster, so please don't comment that Ruby is interpreted and stuff like that please. This is the related gist.

The problem is not that it's interpreted, but that everything in Ruby is an object. You can explore "scanf.rb" in your Ruby distribution and compare it to scanf implementation in C. Ruby implementation of scanf based on RegExp matching. Every atom like "%d" is an object in ruby, while it's only one case item in C. So, to my mind, the reason of such execution time is lots of object allocation/deallocation.

why ruby scanf is so slow?

Tags:

c

regex

ruby

scanf

I'm working on some text conversion routines that parse time values in different formats in Ruby. This routine is growing in complexity, and I'm currently testing a better approach to this problem.

I'm currently testing a way to use scanf. Why? I always thought that was faster than a regex, but what happened in Ruby? It was much slower!

What am I doing wrong?

Note: I'm using ruby-1.9.2-p290 [ x86_64 ] (MRI)

First Ruby test:

require "scanf"
require 'benchmark'

def duration_in_seconds_regex(duration)
  if duration =~ /^\d{2,}\:\d{2}:\d{2}$/
    h, m, s = duration.split(":").map{ |n| n.to_i }
    h * 3600 + m * 60 + s
  end
end

def duration_in_seconds_scanf(duration)
  a = duration.scanf("%d:%d:%d")
  a[0] * 3600 + a[1] * 60 + a[2]
end

n = 500000
Benchmark.bm do |x|
  x.report { for i in 1..n; duration_in_seconds_scanf("00:10:30"); end }
end

Benchmark.bm do |x|
  x.report { for i in 1..n; duration_in_seconds_regex("00:10:30"); end }
end

This is what I got using scanf first and a regex second:

      user     system      total        real
  95.020000   0.280000  95.300000 ( 96.364077)
       user     system      total        real
   2.820000   0.000000   2.820000 (  2.835170)

Second test using C:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <sys/types.h>
#include <string.h>
#include <regex.h>

char *regexp(char *string, char *patrn, int *begin, int *end) {
    int i, w = 0, len;
    char *word = NULL;
    regex_t rgT;
    regmatch_t match;
    regcomp(&rgT, patrn, REG_EXTENDED);
    if ((regexec(&rgT, string, 1, &match, 0)) == 0) {
        *begin = (int) match.rm_so;
        *end = (int) match.rm_eo;
        len = *end - *begin;
        word = malloc(len + 1);
        for (i = *begin; i<*end; i++) {
            word[w] = string[i];
            w++;
        }
        word[w] = 0;
    }
    regfree(&rgT);
    return word;
}

int main(int argc, char** argv) {
    char * str = "00:01:30";
    int h, m, s;
    int i, b, e;
    float start_time, end_time, time_elapsed;
    regex_t regex;
    regmatch_t * pmatch;
    char msgbuf[100];
    char *pch;
    char *str2;
    char delims[] = ":";
    char *result = NULL;

    start_time = (float) clock() / CLOCKS_PER_SEC;
    for (i = 0; i < 500000; i++) {
        if (sscanf(str, "%d:%d:%d", &h, &m, &s) == 3) {
            s = h * 3600L + m * 60L + s;
        }
    }
    end_time = (float) clock() / CLOCKS_PER_SEC;
    time_elapsed = end_time - start_time;
    printf("sscanf_time (500k iterations): %.4f", time_elapsed);

    start_time = (float) clock() / CLOCKS_PER_SEC;
    for (i = 0; i < 500000; i++) {
        char * match = regexp(str, "[0-9]{2,}:[0-9]{2}:[0-9]{2}", &b, &e);
        if (strcmp(match, str) == 0) {
            str2 = (char*) malloc(sizeof (str));
            strcpy(str2, str);
            h = strtok(str2, delims);
            m = strtok(NULL, delims);
            s = strtok(NULL, delims);
            s = h * 3600L + m * 60L + s;
        }
    }
    end_time = (float) clock() / CLOCKS_PER_SEC;
    time_elapsed = end_time - start_time;
    printf("\n\nregex_time (500k iterations): %.4f", time_elapsed);

    return (EXIT_SUCCESS);
}

The C code results are obviously faster, and the regex results are slower than scanf results as expected:

sscanf_time (500k iterations): 0.1774

regex_time (500k iterations): 3.9692

It is obvious that the C running time is faster, so please don't comment that Ruby is interpreted and stuff like that please.

This is the related gist.

739

asked Mar 01 '12 19:03

AndreDurao

1 Answers

The problem is not that it's interpreted, but that everything in Ruby is an object. You can explore "scanf.rb" in your Ruby distribution and compare it to scanf implementation in C.

Ruby implementation of scanf based on RegExp matching. Every atom like "%d" is an object in ruby, while it's only one case item in C. So, to my mind, the reason of such execution time is lots of object allocation/deallocation.

100

answered Sep 30 '22 21:09

Volodymyr Rudyi

Related questions
                            
                                Generate Ascii art text in C
                            
                                Comparing the Checksums of Two executables built from the same exact source
                            
                                How to get execution time using msp430?
                            
                                detecting bind mounts on linux
                            
                                Does the Python 3 interpreter leak memory when embedded?
                            
                                struct alignment
                            
                                User-defined structs with a MATLAB mex function
                            
                                How to manage memory alignments and generic pointer arithmetics in a portable way in C?
                            
                                How do I find segmentation fault from multiple files using GDB
                            
                                No warning or error indication when variable defined as static but declared as extern
                            
                                How do you share a constant array of strings between files?
                            
                                traversing C string: get the last word of a string
                            
                                How can I prove that fread calls ReadFile and fopen calls CreateFile
                            
                                New to C programming on linux, stuck at typecasting
                            
                                malloc pointer to array
                            
                                Euler 160 : Find the non trivial 5 digits of the factorial
                            
                                Expected project/directory structure for Automake?
                            
                                How can I inspect a static library to see if the debug symbols are being exported?
                            
                                What is the difference between devm_kzalloc() and kzalloc() in linux driver programming
                            
                                The meaning of period in ALSA

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With