Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Highlight the difference between two strings in PHP

People also ask

How do you find the difference between two strings?

To find the difference between 2 Strings you can use the StringUtils class and the difference method. It compares the two Strings, and returns the portion where they differ.

Can you use == to compare strings in PHP?

The assignment operator assigns the variable on the left to have a new value as the variable on right, while the equal operator == tests for equality and returns true or false as per the comparison results. Example: This example describes the string comparison using the == operator.

What is strcmp function in PHP?

PHP | strcmp() Function This function compares two strings and tells us that whether the first string is greater or smaller than the second string or equals to the second string.


Just wrote a class to compute smallest (not to be taken literally) number of edits to transform one string into another string:

http://www.raymondhill.net/finediff/

It has a static function to render a HTML version of the diff.

It's a first version, and likely to be improved, but it works just fine as of now, so I am throwing it out there in case someone needs to generate a compact diff efficiently, like I needed.

Edit: It's on Github now: https://github.com/gorhill/PHP-FineDiff


You were able to use the PHP Horde_Text_Diff package.

However this package is no longer available.


This is a nice one, also http://paulbutler.org/archives/a-simple-diff-algorithm-in-php/

Solving the problem is not as simple as it seems, and the problem bothered me for about a year before I figured it out. I managed to write my algorithm in PHP, in 18 lines of code. It is not the most efficient way to do a diff, but it is probably the easiest to understand.

It works by finding the longest sequence of words common to both strings, and recursively finding the longest sequences of the remainders of the string until the substrings have no words in common. At this point it adds the remaining new words as an insertion and the remaining old words as a deletion.

You can download the source here: PHP SimpleDiff...


If you want a robust library, Text_Diff (a PEAR package) looks to be pretty good. It has some pretty cool features.


Here is a short function you can use to diff two arrays. It implements the LCS algorithm:

function computeDiff($from, $to)
{
    $diffValues = array();
    $diffMask = array();

    $dm = array();
    $n1 = count($from);
    $n2 = count($to);

    for ($j = -1; $j < $n2; $j++) $dm[-1][$j] = 0;
    for ($i = -1; $i < $n1; $i++) $dm[$i][-1] = 0;
    for ($i = 0; $i < $n1; $i++)
    {
        for ($j = 0; $j < $n2; $j++)
        {
            if ($from[$i] == $to[$j])
            {
                $ad = $dm[$i - 1][$j - 1];
                $dm[$i][$j] = $ad + 1;
            }
            else
            {
                $a1 = $dm[$i - 1][$j];
                $a2 = $dm[$i][$j - 1];
                $dm[$i][$j] = max($a1, $a2);
            }
        }
    }

    $i = $n1 - 1;
    $j = $n2 - 1;
    while (($i > -1) || ($j > -1))
    {
        if ($j > -1)
        {
            if ($dm[$i][$j - 1] == $dm[$i][$j])
            {
                $diffValues[] = $to[$j];
                $diffMask[] = 1;
                $j--;  
                continue;              
            }
        }
        if ($i > -1)
        {
            if ($dm[$i - 1][$j] == $dm[$i][$j])
            {
                $diffValues[] = $from[$i];
                $diffMask[] = -1;
                $i--;
                continue;              
            }
        }
        {
            $diffValues[] = $from[$i];
            $diffMask[] = 0;
            $i--;
            $j--;
        }
    }    

    $diffValues = array_reverse($diffValues);
    $diffMask = array_reverse($diffMask);

    return array('values' => $diffValues, 'mask' => $diffMask);
}

It generates two arrays:

  • values array: a list of elements as they appear in the diff.
  • mask array: contains numbers. 0: unchanged, -1: removed, 1: added.

If you populate an array with characters, it can be used to compute inline difference. Now just a single step to highlight the differences:

function diffline($line1, $line2)
{
    $diff = computeDiff(str_split($line1), str_split($line2));
    $diffval = $diff['values'];
    $diffmask = $diff['mask'];

    $n = count($diffval);
    $pmc = 0;
    $result = '';
    for ($i = 0; $i < $n; $i++)
    {
        $mc = $diffmask[$i];
        if ($mc != $pmc)
        {
            switch ($pmc)
            {
                case -1: $result .= '</del>'; break;
                case 1: $result .= '</ins>'; break;
            }
            switch ($mc)
            {
                case -1: $result .= '<del>'; break;
                case 1: $result .= '<ins>'; break;
            }
        }
        $result .= $diffval[$i];

        $pmc = $mc;
    }
    switch ($pmc)
    {
        case -1: $result .= '</del>'; break;
        case 1: $result .= '</ins>'; break;
    }

    return $result;
}

Eg.:

echo diffline('StackOverflow', 'ServerFault')

Will output:

S<del>tackO</del><ins>er</ins>ver<del>f</del><ins>Fau</ins>l<del>ow</del><ins>t</ins> 

StackOerverfFaulowt

Additional notes:

  • The diff matrix requires (m+1)*(n+1) elements. So you can run into out of memory errors if you try to diff long sequences. In this case diff larger chunks (eg. lines) first, then diff their contents in a second pass.
  • The algorithm can be improved if you trim the matching elements from the beginning and the end, then run the algorithm on the differing middle only. A latter (more bloated) version contains these modifications too.

There is also a PECL extension for xdiff:

  • http://php.net/manual/en/book.xdiff.php
  • http://pecl.php.net/package/xdiff

In particular:

  • xdiff_string_diff — Make unified diff of two strings

Example from PHP Manual:

<?php
$old_article = file_get_contents('./old_article.txt');
$new_article = $_POST['article'];

$diff = xdiff_string_diff($old_article, $new_article, 1);
if (is_string($diff)) {
    echo "Differences between two articles:\n";
    echo $diff;
}

I had terrible trouble with the both the PEAR-based and the simpler alternatives shown. So here's a solution that leverages the Unix diff command (obviously, you have to be on a Unix system or have a working Windows diff command for it to work). Choose your favourite temporary directory, and change the exceptions to return codes if you prefer.

/**
 * @brief Find the difference between two strings, lines assumed to be separated by "\n|
 * @param $new string The new string
 * @param $old string The old string
 * @return string Human-readable output as produced by the Unix diff command,
 * or "No changes" if the strings are the same.
 * @throws Exception
 */
public static function diff($new, $old) {
  $tempdir = '/var/somewhere/tmp'; // Your favourite temporary directory
  $oldfile = tempnam($tempdir,'OLD');
  $newfile = tempnam($tempdir,'NEW');
  if (!@file_put_contents($oldfile,$old)) {
    throw new Exception('diff failed to write temporary file: ' . 
         print_r(error_get_last(),true));
  }
  if (!@file_put_contents($newfile,$new)) {
    throw new Exception('diff failed to write temporary file: ' . 
         print_r(error_get_last(),true));
  }
  $answer = array();
  $cmd = "diff $newfile $oldfile";
  exec($cmd, $answer, $retcode);
  unlink($newfile);
  unlink($oldfile);
  if ($retcode != 1) {
    throw new Exception('diff failed with return code ' . $retcode);
  }
  if (empty($answer)) {
    return 'No changes';
  } else {
    return implode("\n", $answer);
  }
}