Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I make inexact string comparisons with Perl?

Given two strings, I want to find all common substrings of a specified length, but allowing one character to be different.

For example, if s1 is 'ATCAGC', s2 is 'ATAATCGAC', and the specified length is 3, then I'd want output along these lines:

ATC from s1 matches ATA, ATC from s2
TCA from s1 matches TAA, TCG from s2

Questions

  • Can I do so with a simple regex?
  • If not, is there module for this in Perl?
like image 301
Mariya Avatar asked Dec 11 '25 19:12

Mariya


1 Answers

First, google result for "perl hamming distance" found a perlmonks thread that mentions Text::LevenshteinXS, various typical implementations, and a cute xor trick :

sub hd{ length( $_[ 0 ] ) - ( ( $_[ 0 ] ^ $_[ 1 ] ) =~ tr[\0][\0] ) }

You should skim wikipedia article on String metrics if Levenshtein distance or Hamming distance aren't familiar.

like image 195
Jeff Burdges Avatar answered Dec 13 '25 07:12

Jeff Burdges



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!