Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Perl, how to speed up regex to modify a really big string?

Tags:

string

regex

perl

I have a string which comprised of 300 million bases;

$str = "ATCGTAGCTAGXCTAGCTAGCTGATXXXXATCGTAGCTAGCTGXTGCTAGCXXXXA...A";

I want to replace characters that are not [ATGC] in the string to something else, let's say to "A", meanwhile get the positions of characters that have be replaced;

I tried this:

while ($str=~/[^ATGC]/ig)
{
  $pos = pos($str);
  substr($str, $pos-1,1) = "A";
}

but the speed is not good.

Does anyone know better ways to do this?

like image 996
Shichen Wang Avatar asked Dec 17 '25 07:12

Shichen Wang


2 Answers

Regexes can also substitute as well as match.

$str =~ s/X/A/g;

If you're only doing a single character, you can even use the tr operator.

$str =~ tr/X/A/g;

which may even be faster.

like image 156
Andy Lester Avatar answered Dec 19 '25 23:12

Andy Lester


You can perform the replacement with regex directly using search and replace:

$str =~ s/X/A/ig;
like image 24
Andrew Clark Avatar answered Dec 19 '25 23:12

Andrew Clark



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!