Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Encode/compress sequence of repeating integers

I have very long integer sequences that look like this (arbitrary length!):

0000000001110002220033333

Now I need some algorithm to convert this string into something compressed like

a9b3a3c3a2d5

Which means "a 9 times, then b 3 times, then a 3 times" and so on, where "a" stands for 0, "b" for 1, "c" for 2 and "d" for 3.

How would you do that? So far nothing suitable came to my mind, and I had no luck with google because I didn't really know what to search for. What is this kind of encoding / compression called?

PS: I am going to do the encoding with PHP, and the decoding in JavaScript.

Edit: Thank you all!

I ended up with this function for encoding:

protected function numStringToRle($s){          
        $rle    = '';
        $count = 1;
        $len    = strlen($s);
        for($i = 0; $i < $len; $i++){
            if($i != $len && isset($s[$i+1]) && $s[$i] == $s[$i+1]){
                $count++;                
            } else {
                $rle .= chr($s[$i] + 97).( $count == 1 ? '' : $count);                                
                $count = 1;
            }
        }
        return $rle;            
}

And that for decoding:

var decodeCoords = function(str) {

   str = str.replace(/(.)(\d+)/g, function(_, x, n) {
       return new Array(parseInt(n, 10) + 1).join(x);
   });

   return str.
     replace(/a/g, '0').
     replace(/b/g, '1').
     replace(/c/g, '2').
     replace(/d/g, '3');     
};
like image 287
Alex Avatar asked Apr 02 '10 13:04

Alex


People also ask

What is Run Length Encoding best at?

Run-length encoding (RLE) is good for repetitive data, replacing it by a count and one copy of a repeated item. Adaptive dictionary methods build a table of strings and then replace occurrences of them by shorter codes.

Can numbers be compressed?

Well, yes, a big number can be expressed mathematically, and potentially save some space doing so. So if you convert each character to its ASCII value, then every character is expanded from a single byte to 1, 2, or 3 bytes. That is, 'A' becomes '65'. 'z' becomes '122'.

Can you compress strings?

"String Compression Algorithm” or “Run Length Encoding” happens when you compress a string, and the consecutive duplicates of each string are replaced with the character, followed by the consecutive, repeated character count. For example: After string compression, the string “aaaabbcddddd” would return “a4b2c1d5”.

What is run length encoding computer graphics?

Run-length encoding (RLE) is a form of lossless data compression in which runs of data (sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run.


1 Answers

It is called Run Length Encoding

Basic encoder in PHP:

function numStringToRle($s){
    $rle = '';
    $count = 1;
    $len = strlen($s);
    for ( $i = 0; $i < $len; $i++ ){
        if ( $i != $len && $s[$i] == $s[$i+1] ){
            $count++;                
        }else{
          $rle .= chr($s[$i] + 97).$count;    
          $count = 1;
        }
    }
    return $rle;
}

Be warned it will preform badly issues with a string like

 123456789123456789

If you were going to be handling a string that may have a lot of individual single characters you would be better to add some complexity and not write the length of the run if the length of the run is 1.

//change
$rle .= chr($s[$i] + 97).$count;    

//to
$rle .= chr($s[$i] + 97).( $count == 1 ? '' : $count );   

//or
$rle .= chr($s[$i] + 97)
if ( $count != 1 ){
    $rle .= $count;
}
like image 128
Yacoby Avatar answered Sep 25 '22 20:09

Yacoby