When accessing individual characters in a string in Perl, is substr or splitting to an array faster?

2 Answers

It really depends on exactly what you're doing with your data -- but hey, you're headed the right way with your last question! Don't guess, benchmark.

Perl provides the Benchmark module for exactly this kind of thing, and using it is really pretty straightforward. Here's a little sample code to get started with:

#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw(cmpthese);

my $dna;
$dna .= [qw(G A T C)]->[rand 4] for 1 .. 100;

sub frequency_substr {
  my $length = length $dna;
  my %hist;

  for my $pos (0 .. $length) {
    $hist{$pos}{substr $dna, $pos, 1} ++;
  }

  \%hist;
}

sub frequency_split {
  my %hist;
  my $pos = 0;
  for my $char (split //, $dna) {
    $hist{$pos ++}{$char} ++;
  }

  \%hist;
}

sub frequency_regmatch {
  my %hist;

  while ($dna =~ /(.)/g) {
    $hist{pos($dna)}{$1} ++;
  }

  \%hist;
}


cmpthese(-5, # Run each for at least 5 seconds
  { 
    substr => \&frequency_substr,
    split => \&frequency_split,
    regex => \&frequency_regmatch
  }
);

And a sample result:

         Rate  regex  split substr
regex  6254/s     --   -26%   -32%
split  8421/s    35%     --    -9%
substr 9240/s    48%    10%     --

Turns out substr is surprisingly fast. :)

answered Nov 15 '22 15:11

hobbs

Here is what I would do instead of first trying to choose between substr and split:

#!/usr/bin/perl

use strict; use warnings;

my %dist;
while ( my $s = <> ) {
    while ( $s =~ /(.)/g ) {
        ++ $dist{ pos($s) }{ $1 };
    }
}

Update:

My curiosity got the best of me. Here is a benchmark:

#!/usr/bin/perl

use strict; use warnings;
use Benchmark qw( cmpthese );

my @chars = qw(A C G T);
my @to_split = my @to_substr = my @to_match = map {
    join '', map $chars[rand @chars], 1 .. 100
} 1 .. 1_000;

cmpthese -1, {
    'split'  => \&bench_split,
    'substr' => \&bench_substr,
    'match'  => \&bench_match,
};

sub bench_split {
    my %dist;
    for my $s ( @to_split ) {
        my @s = split //, $s;
        for my $i ( 0 .. $#s ) {
            ++ $dist{ $i }{ $s[$i] };
        }
    }
}

sub bench_substr {
    my %dist;
    for my $s ( @to_substr ) {
        my $u = length($s) - 1;
        for my $i (0 .. $u) {
            ++ $dist{ $i }{ substr($s, $i, 1) };
        }
    }
}

sub bench_match {
    my %dist;
    for my $s ( @to_match ) {
        while ( $s =~ /(.)/g ) {
            ++ $dist{ pos($s) }{ $1 };
        }
    }
}

Output:

         Rate  split  match substr
split  4.93/s     --   -31%   -65%
match  7.11/s    44%     --   -49%
substr 14.0/s   184%    97%     --

answered Nov 15 '22 15:11

Sinan Ünür

Related questions
                            
                                Filtering a hash of hash in perl
                            
                                Will it ever be possible for $/ to support regexes?
                            
                                Using -e and -s switch in perl
                            
                                Perl: elegant way to check if something is object blessed as package?
                            
                                Difference between "printf" and "print sprintf"
                            
                                Why does reverse() not change my array?
                            
                                How to override exit() call in Perl eval block
                            
                                grep not performing very well on large files, is there an alternative?
                            
                                Explain Perl code to display a number of bytes in KB, MB, GB etc
                            
                                What is the use of (+) bareword with shift operator?
                            
                                Is there a Catalyst tutorial that uses HTML::Template instead of TT?
                            
                                Include Perl in Java
                            
                                Can a Perl BEGIN block spread a virus or lose data?
                            
                                Can I run a Perl script from stdin?
                            
                                How can I add a progress bar to WWW::Mechanize?
                            
                                How can I generate a range of IP addresses in Perl?
                            
                                How do I use an array as an object attribute in Perl?
                            
                                Custom array sort in perl
                            
                                Perl Strange -M Flag in 'If' statement
                            
                                How do I evaluate shell variables in a string?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

When accessing individual characters in a string in Perl, is substr or splitting to an array faster?

Tags:

performance

string

character

perl

Ryan C. Thompson

People also ask

2 Answers

hobbs

Update:

Sinan Ünür

Recent Activity

Donate For Us