Given pairs of string like this.
my $s1 = "ACTGGA";
my $s2 = "AGTG-A";
# Note the string can be longer than this.
I would like to find position and character in in $s1
where it differs with $s2
.
In this case the answer would be:
#String Position 0-based
# First col = Base in S1
# Second col = Base in S2
# Third col = Position in S1 where they differ
C G 1
G - 4
I can achieve that easily with substr()
. But it is horribly slow.
Typically I need to compare millions of such pairs.
Is there a fast way to achieve that?
'eq' operator in Perl is one of the string comparison operators used to check for the equality of the two strings. It is used to check if the string to its left is stringwise equal to the string to its right.
"== does a numeric comparison: it converts both arguments to a number and then compares them."
difference() returns the difference between two strings, returning the portion of the second string, which starts to differ from the first. StringUtils. indexOfDifference() returns the index at which the second string starts to diverge from the first.
length() function in Perl finds length (number of characters) of a given string, or $_ if not specified. Return: Returns the size of the string.
Stringwise ^ is your friend:
use strict;
use warnings;
my $s1 = "ACTGGA";
my $s2 = "AGTG-A";
my $mask = $s1 ^ $s2;
while ($mask =~ /[^\0]/g) {
print substr($s1,$-[0],1), ' ', substr($s2,$-[0],1), ' ', $-[0], "\n";
}
EXPLANATION:
The ^
(exclusive or) operator, when used on strings, returns a string composed of the result of an exclusive or on each bit of the numeric value of each character. Breaking down an example into equivalent code:
"AB" ^ "ab"
( "A" ^ "a" ) . ( "B" ^ "b" )
chr( ord("A") ^ ord("a") ) . chr( ord("B") ^ ord("b") )
chr( 65 ^ 97 ) . chr( 66 ^ 98 )
chr(32) . chr(32)
" " . " "
" "
The useful feature of this here is that a nul character ("\0"
) occurs when and only when the two strings have the same character at a given position. So ^
can be used to efficiently compare every character of the two strings in one quick operation, and the result can be searched for non-nul characters (indicating a difference). The search can be repeated using the /g regex flag in scalar context, and the position of each character difference found using $-[0]
, which gives the offset of the beginning of the last successful match.
Use binary bit ops on the complete strings.
Things like $s1 & $s2
or $s1 ^ $s2
run incredibly fast, and work with strings of arbitrary length.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With