I have a string similar to: but much longer
my $a = "000000001111111111000000011111111111111111111111111111111";
I am counting the number of "1"'s with:
my $total_1_available = $a =~ tr/1//;
And that works amazingly well and is really FAST.
HOWEVER, I also wish to count (in a fast way), the total number of consecutive 1's in a row. The MAX COUNT of "1's" consecutively.
IN the example above, it would return the count of:
11111111111111111111111111111111
As this is the maximum in a row.
So, I end up with the TOTAL_COUNT and also the TOTAL_CONSECUTIVE_COUNT.
I have it working with a REGEXP which basically replaces the 1's and then counts what was replaced and loops around...which actually is totally fine and works... but it doesn't "feel" right.
Ideally I don't want to replace the string at all as I am looking for the max consecutive count.
But, I know in Perl this probably isn't the fastest or cleanest way.
Could you teach me a better way please and increase my learning?
AS requested this is my current code:
my $a= "0110011001101111";
my $total_1_available = $a =~ tr/1//;
print "Total number of 1's = $total_1_available\n";
my $max_c = 0;
while ( $a=~s/(1+)/ / ) {
$max_c = length($1) if length($1) > $max_c;
}
print "Consecutive count = $max_c\n";
And FINAL CODE:
use strict;
use warnings;
use Benchmark ':all';
use String::Random;
## We test 525,600 as this is the length of the string.
## Actually each 0 or 1 represents a minute of the year.
## And these represent engineer minues available in a 24 hr / 365 day year.
## And there are lots and lots of engineers.
## Hence my wish to improve the performance and I wish to thank everyone whom responded.
## there are a lot more 0's than 1's so hack to sort of simulate
my $test_regex = '[0][0][0][0][0][0-1][0-1][0-1][0-1][0-1]' x 52560;
my $pass = String::Random->new;
my $string = $pass->randregex($test_regex);
cmpthese(-1, {
org => sub { my $max = 0; while ($string =~ /(1+)/g) { my $len = length($1); if ($max < $len) { $max = $len } } },
hack => sub { my $match = ""; while ($string =~ /(${match}1+)/g) { $match = $1; } length $match }
});
# BLOWN AWAY !!!!!!
# BLOWN AWAY !!!!!!
# BLOWN AWAY !!!!!!
# BLOWN AWAY !!!!!!
Remarkable speed improvements can be made with a dynamic regex. We can use a variable to store the max length string, then search for a string that is that long, plus one or more. The theory being that we only need to look for strings longer than the one we already have.
I used a solution that looks like this
sub hack {
my $match = ""; # original search string
while ($string =~ /(${match}1+)/g) { # search for $match plus 1 or more 1s
$match = $1; # when found, change to new match
}
length $match; # return max length
}
And compared it to the original method described by the OP, with the following result
use strict;
use warnings;
use Benchmark ':all';
my $string = '0100100101111011010010101101101110101011111111101010100100100001011101010100' x 10_000;
cmpthese(-1, {
org => sub { my $max = 0; while ($string =~ /(1+)/g) { my $len = length($1); if ($max < $len) { $max = $len } } },
hack => sub { my $match = ""; while ($string =~ /(${match}1+)/g) { $match = $1; } length $match }
});
Output:
Rate org hack
org 7.31/s -- -99%
hack 1372/s 18669% --
Which seems astonishingly high, 19000% faster. It makes me think I've made a mistake, but I can't think what that would be. Maybe I am missing something in the regex machine internals, but this would be quite the improvement on the original solution.
For short strings, the following is faster than all previously-presented solutions:
use List::Util qw( max );
max 0, map length, split /[^1]+/, $s
Rate hack sort org max mxsp xs
hack 76879/s -- -12% -34% -37% -48% -98% <-- TLP
sort 87664/s 14% -- -24% -28% -41% -98% <-- Jim Davis
org 115660/s 50% 32% -- -6% -22% -98% <-- OP
max 122504/s 59% 40% 6% -- -17% -98% <-- Jim Davis
mxsp 147867/s 92% 69% 28% 21% -- -97% <-- ikegami (above)
xs 4950278/s 6339% 5547% 4180% 3941% 3248% -- <-- ikegami (below)
Benchmark code:
use Benchmark qw( cmpthese );
my $string = ( '01001001011110110100101011011011101010'
. '11111111101010100100100001011101010100' );
cmpthese(-3, {
org => sub { my $max = 0; while ($string =~ /(1+)/g) { my $len = length($1); if ($max < $len) { $max = $len; } } },
hack => sub { my $match = ""; while ($string =~ /(${match}1+)/g) { $match = $1; } my $max = length($match); },
sort => sub { my $max = ( sort { $b <=> $a } $string =~ /(1+)/g )[0]; },
max => sub { my $max = max 0, map length, $string =~ /(1+)/g; },
mxsp => sub { my $max = max 0, map length, split /[^1]+/, $string; },
xs => sub { my $max = longuest_ones_count($string); },
});
That said, the fastest solution would involve XS. The following is my stab at it:
IV longuest_ones_count(SV* sv) {
IV max = 0;
IV count = 0;
// This code works whether the string is upgraded or downgraded.
STRLEN len;
char *s = SvPV(sv, len);
while (len--) {
if (*(s++) == '1') {
++count;
}
else if (count) {
if (max < count)
max = count;
count = 0;
}
}
if (max < count)
max = count;
return max;
}
One way of using it:
use 5.014;
use warnings;
use Inline C => <<'__';
...above code here...
__
say "$_: ", longuest_ones_count($_)
for qw(
0
11111
011111
111110
01110111110
01111101110
);
You saw this beat the other solutions out of the water for short strings. But you don't have short strings. For long strings, this is not as fast as TLP's version!!!
Same benchmark as above but using
my $string = ( '01001001011110110100101011011011101010'
. '11111111101010100100100001011101010100' ) x 10_000;
Rate sort org max mxsp xs hack
sort 8.61/s -- -25% -31% -44% -99% -99%
org 11.6/s 34% -- -8% -24% -99% -99%
max 12.5/s 46% 9% -- -18% -99% -99%
mxsp 15.3/s 77% 32% 22% -- -99% -99% <-- ikegami (Perl)
xs 1031/s 11870% 8822% 8118% 6653% -- -25% <-- ikegami (XS)
hack 1366/s 15772% 11731% 10797% 8855% 33% -- <-- TLP
Wow the regex engine is good! It can obviously be beat using XS (by eliminating the time needed to compile the pattern), but what's the point?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With