I'm trying to align strings in PHP using Levenshtein distance algorithm. The problem is that my back tracing code does not work properly for all cases. For example when the second array has inserted lines at the beginning. Then the back tracing will only go as far as when i=0. How to properly implement back tracing for Levenshtein distance? Levenshtein distance, $s and $t are arrays of strings (rows) <pre class="prettyprint"><code>function match_rows($s, $t) { $m = count($s); $n = count($t); for($i = 0; $i <= $m; $i++) $d[$i][0] = $i; for($j = 0; $j <= $n; $j++) $d[0][$j] = $j; for($i = 1; $i <= $m; $i++) { for($j = 1; $j <= $n; $j++) { if($s[$i-1] == $t[$j-1]) { $d[$i][$j] = $d[$i-1][$j-1]; } else { $d[$i][$j] = min($d[$i-1][$j], $d[$i][$j-1], $d[$i-1][$j-1]) + 1; } } } // backtrace $i = $m; $j = $n; while($i > 0 && $j > 0) { $min = min($d[$i-1][$j], $d[$i][$j-1], $d[$i-1][$j-1]); switch($min) { // equal or substitution case($d[$i-1][$j-1]): if($d[$i][$j] != $d[$i-1][$j-1]) { // substitution $sub['i'][] = $i; $sub['j'][] = $j; } $i = $i - 1; $j = $j - 1; break; // insertion case($d[$i][$j-1]): $ins[] = $j; $j = $j - 1; break; // deletion case($d[$i-1][$j]): $del[] = $i; $i = $i - 1; break; } } </code></pre>

This is not to be nit-picky, but to help you find the answers you want (and improve your implementation). The algorithm you are using is the Wagner-Fischer algorithm, not the Levenshtein algorithm. Also, Levenshtein distance is not use to align strings. It is strictly a distance measurement. There are two types of alignment: global and local. Global alignment is used to minimize the distance between two entire strings. Example: global align "RACE" on "REACH", you get "RxACx". The x's are gaps. In general, this is the Needleman-Wunsch algorithm, which is very similar to the Wagner-Fischer algorithm. Local alignment finds a substring in a long string and minimizes the difference between a short string and a the substring of the long string. Example: local align "BELL" on "UMBRELLA" and you get "BxELL" aligned on "BRELL". It is the Smith-Waterman algorithm which, again, is very similar to the Wagner-Fischer algorithm. I hope that this is helpful in allowing you to better define the exact kind of alignment you want.

Levenshtein distance with back tracing in PHP

Tags:

php

levenshtein-distance

I'm trying to align strings in PHP using Levenshtein distance algorithm. The problem is that my back tracing code does not work properly for all cases. For example when the second array has inserted lines at the beginning. Then the back tracing will only go as far as when i=0.

How to properly implement back tracing for Levenshtein distance?

Levenshtein distance, $s and $t are arrays of strings (rows)

function match_rows($s, $t)
{
$m = count($s);
$n = count($t);

for($i = 0; $i <= $m; $i++) $d[$i][0] = $i;
for($j = 0; $j <= $n; $j++) $d[0][$j] = $j;

for($i = 1; $i <= $m; $i++)
{
    for($j = 1; $j <= $n; $j++)
    {
        if($s[$i-1] == $t[$j-1])
        {
            $d[$i][$j] = $d[$i-1][$j-1];
        }
        else
        {
            $d[$i][$j] = min($d[$i-1][$j], $d[$i][$j-1], $d[$i-1][$j-1]) + 1;
        }
    }
}

// backtrace
$i = $m;
$j = $n;
while($i > 0 && $j > 0)
{
    $min = min($d[$i-1][$j], $d[$i][$j-1], $d[$i-1][$j-1]);

    switch($min)
    {
        // equal or substitution
        case($d[$i-1][$j-1]):
            if($d[$i][$j] != $d[$i-1][$j-1])
            {
                // substitution
                $sub['i'][] = $i;
                $sub['j'][] = $j;
            }
            $i = $i - 1;
            $j = $j - 1;
            break;

        // insertion
        case($d[$i][$j-1]):
            $ins[] = $j;
            $j = $j - 1;
            break;

        // deletion
        case($d[$i-1][$j]):
            $del[] = $i;
            $i = $i - 1;
            break;
    }
}

812

asked Oct 18 '12 17:10

mixxu

1 Answers

This is not to be nit-picky, but to help you find the answers you want (and improve your implementation).

The algorithm you are using is the Wagner-Fischer algorithm, not the Levenshtein algorithm. Also, Levenshtein distance is not use to align strings. It is strictly a distance measurement.

There are two types of alignment: global and local. Global alignment is used to minimize the distance between two entire strings. Example: global align "RACE" on "REACH", you get "RxACx". The x's are gaps.

In general, this is the Needleman-Wunsch algorithm, which is very similar to the Wagner-Fischer algorithm. Local alignment finds a substring in a long string and minimizes the difference between a short string and a the substring of the long string.

Example: local align "BELL" on "UMBRELLA" and you get "BxELL" aligned on "BRELL". It is the Smith-Waterman algorithm which, again, is very similar to the Wagner-Fischer algorithm.

I hope that this is helpful in allowing you to better define the exact kind of alignment you want.

answered Oct 16 '22 18:10

kainaw

Related questions
                            
                                Curl: Showing "Can't resolve domain name" error of the all localhost sites from PHP
                            
                                Laravel beyondcode websockets do not connect
                            
                                How should I start with OAuth server at PHP
                            
                                Pass variable number of variables to a class in PHP
                            
                                PHP connect via SSH tunnel to LDAP in other network
                            
                                Browser-based yaml editor, preferably in PHP?
                            
                                Session hijacking from another angle
                            
                                How to dynamically set array keys in php
                            
                                How can I get Wordpress to save comments in markdown format?
                            
                                Different results using Bing.com and Bing Search API
                            
                                Using dependency injection in a PHP web application
                            
                                PHP GD resizing transparent image giving black border
                            
                                How to get number of friends added by facebook user in the specific time of period?
                            
                                What is the reason for having authorization rules in the database?
                            
                                Error When Using Symfony Forms / Silex & Twig
                            
                                Is the Visitor pattern useful for dynamically typed languages?
                            
                                PHP Weird Undefined index error
                            
                                Default Europe timezone set in php.ini, but date_default_timezone_get() returns 'UTC'
                            
                                PHP/JS Text Diff
                            
                                MySQL Database Connection Management In PDO

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With