Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl - how can I match strings that are not exactly the same?

Tags:

string

regex

perl

I have a list of strings I want to find within a file. This would be fairly simple to accomplish if the strings in my list and in the file matched exactly. Unfortunately, there are typos and variations on the name. Here's an example of how some of these strings differ

List          File
B-Arrestin    Beta-Arrestin
Becn-1        BECN 1
CRM-E4        CRME4

Note that each of those pairs should count as a match despite being different strings. I know that I could categorize every kind of variation and write separate REGEX to identify matches but that is cumbersome enough that I might be better off manually looking for matches. I think the best solution for my problem would be some kind of expression that says:

"Match this string exactly but still count it as a match if there are X characters that do not match"

Does something like this exist? Is there another way to match strings that are not exactly the same but close?

like image 670
Slavatron Avatar asked Oct 20 '25 19:10

Slavatron


1 Answers

As 200_success pointed out, you can do fuzzy matching with Text::Fuzzy, which computes the Levenshtein distance between bits of text. You will have to play with what maximum Levenshtein distance you want to allow, but if you do a case-insensitive comparison, the maximum distance in your sample data is three:

use strict;
use warnings;
use 5.010;

use Text::Fuzzy;

my $max_dist = 3;

while (<DATA>) {
    chomp;
    my ($string1, $string2) = split ' ', $_, 2;

    my $tf = Text::Fuzzy->new(lc $string1);
    say "'$string1' matches '$string2'" if $tf->distance(lc $string2) <= $max_dist;
}

__DATA__
B-Arrestin    Beta-Arrestin
Becn-1        BECN 1
CRM-E4        CRME4

Output:

'B-Arrestin' matches 'Beta-Arrestin'
'Becn-1' matches 'BECN 1'
'CRM-E4' matches 'CRME4'
like image 90
ThisSuitIsBlackNot Avatar answered Oct 22 '25 09:10

ThisSuitIsBlackNot



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!