Checking distance between $barcode and two strings, first string has same 12 characters at the front and another is completely different but both gives same distance?
#!/usr/bin/perl
use warnings;
use strict;
use Text::Fuzzy;
my $barcode = "TCCCTTGTCTCC";
foreach my $line1 (<DATA>) {
print "New string\n";
print "Barcode length:", length $barcode, "\nSequence length:",
length $line1, "\n";
my $tf = Text::Fuzzy->new($barcode);
my $ed = $tf->distance($line1);
print "Edit distance: ", $ed ,"\n\n";
}
__DATA__
TCCCTTGTCTCCCCTGATATCCTGTAAAATCCTTTTCTTCTGATGGGTGCCATTTGCCACTAGAGGAAGCTGAACAGACCTGACTACCTGGA
GACGAGACTGATCACCTGATATCCTGTAAAATCCTTTTCTTCTGATGGGTGCCATTTGCCACTAGAGGAAGCTGCAGACCTGACTACCTGGA
Outputs:
New string
Barcode length:12
Sequence length:93
Edit distance: 81
New string
Barcode length:12
Sequence length:93
Edit distance: 81
That seems right since all the characters of subsequence are present in the longer sequence both would have the same Levenshtein edit distance. This is so because all it would need is deletions to transform the longer to shorter sequence
Example :
artic => arc edit distance 2, i.e deletions 2
arche => arc would have the same edit distance 2 i.e deletions 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With