Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP LZW Binary Decompression Function

I've been looking on the internets and couldn't find an LZW decompression implementation in PHP that works with the data outputted by these javascript functions:

function lzw_encode(s) {
    var dict = {};
    var data = (s + "").split("");
    var out = [];
    var currChar;
    var phrase = data[0];
    var code = 256;
    for (var i=1; i<data.length; i++) {
        currChar=data[i];
        if (dict[phrase + currChar] != null) {
            phrase += currChar;
        }
        else {
            out.push(phrase.length > 1 ? dict[phrase] : phrase.charCodeAt(0));
            dict[phrase + currChar] = code;
            code++;
            phrase=currChar;
        }
    }
    out.push(phrase.length > 1 ? dict[phrase] : phrase.charCodeAt(0));
    for (var i=0; i<out.length; i++) {
        out[i] = String.fromCharCode(out[i]);
    }
    return out.join("");
}

function lzw_decode(s) {
    var dict = {};
    var data = (s + "").split("");
    var currChar = data[0];
    var oldPhrase = currChar;
    var out = [currChar];
    var code = 256;
    var phrase;
    debugger;
    for (var i=1; i<data.length; i++) {
        var currCode = data[i].charCodeAt(0);
        if (currCode < 256) {
            phrase = data[i];
        }
        else {
           phrase = dict[currCode] ? dict[currCode] : (oldPhrase + currChar);
        }
        out.push(phrase);
        currChar = phrase.charAt(0);
        dict[code] = oldPhrase + currChar;
        code++;
        oldPhrase = phrase;
    }
    return out.join("");
}

I really just need a decompression algorithm in PHP that can work with the compression javascript function above.

The lzw_encode function above encodes "This is a test of the compression function" as "This Ă a test ofĈhe comprĊsion functěn"

The libraries I've found are either buggy (http://code.google.com/p/php-lzw/) or don't take input of UTC characters.

Any help would be greatly appreciated,

Thanks!

like image 517
xd44 Avatar asked Apr 26 '12 23:04

xd44


1 Answers

I've ported and tested it for you to PHP:

function lzw_decode($s) {
  mb_internal_encoding('UTF-8');

  $dict = array();
  $currChar = mb_substr($s, 0, 1);
  $oldPhrase = $currChar;
  $out = array($currChar);
  $code = 256;
  $phrase = '';

  for ($i=1; $i < mb_strlen($s); $i++) {
      $currCode = implode(unpack('N*', str_pad(iconv('UTF-8', 'UTF-16BE', mb_substr($s, $i, 1)), 4, "\x00", STR_PAD_LEFT)));
      if($currCode < 256) {
          $phrase = mb_substr($s, $i, 1);
      } else {
         $phrase = $dict[$currCode] ? $dict[$currCode] : ($oldPhrase.$currChar);
      }
      $out[] = $phrase;
      $currChar = mb_substr($phrase, 0, 1);
      $dict[$code] = $oldPhrase.$currChar;
      $code++;
      $oldPhrase = $phrase;
  }
  var_dump($dict);
  return(implode($out));
}
like image 135
clover Avatar answered Oct 12 '22 23:10

clover