Given two strings, I want to find all common substrings of a specified length, but allowing one character to be different.
For example, if s1 is 'ATCAGC', s2 is 'ATAATCGAC', and the specified length is 3, then I'd want output along these lines:
ATC from s1 matches ATA, ATC from s2
TCA from s1 matches TAA, TCG from s2
Questions
First, google result for "perl hamming distance" found a perlmonks thread that mentions Text::LevenshteinXS, various typical implementations, and a cute xor trick :
sub hd{ length( $_[ 0 ] ) - ( ( $_[ 0 ] ^ $_[ 1 ] ) =~ tr[\0][\0] ) }
You should skim wikipedia article on String metrics if Levenshtein distance or Hamming distance aren't familiar.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With