I have very long integer sequences that look like this (arbitrary length!):
0000000001110002220033333
Now I need some algorithm to convert this string into something compressed like
a9b3a3c3a2d5
Which means "a 9 times, then b 3 times, then a 3 times" and so on, where "a" stands for 0, "b" for 1, "c" for 2 and "d" for 3.
How would you do that? So far nothing suitable came to my mind, and I had no luck with google because I didn't really know what to search for. What is this kind of encoding / compression called?
PS: I am going to do the encoding with PHP, and the decoding in JavaScript.
Edit: Thank you all!
I ended up with this function for encoding:
protected function numStringToRle($s){
$rle = '';
$count = 1;
$len = strlen($s);
for($i = 0; $i < $len; $i++){
if($i != $len && isset($s[$i+1]) && $s[$i] == $s[$i+1]){
$count++;
} else {
$rle .= chr($s[$i] + 97).( $count == 1 ? '' : $count);
$count = 1;
}
}
return $rle;
}
And that for decoding:
var decodeCoords = function(str) {
str = str.replace(/(.)(\d+)/g, function(_, x, n) {
return new Array(parseInt(n, 10) + 1).join(x);
});
return str.
replace(/a/g, '0').
replace(/b/g, '1').
replace(/c/g, '2').
replace(/d/g, '3');
};
Run-length encoding (RLE) is good for repetitive data, replacing it by a count and one copy of a repeated item. Adaptive dictionary methods build a table of strings and then replace occurrences of them by shorter codes.
Well, yes, a big number can be expressed mathematically, and potentially save some space doing so. So if you convert each character to its ASCII value, then every character is expanded from a single byte to 1, 2, or 3 bytes. That is, 'A' becomes '65'. 'z' becomes '122'.
"String Compression Algorithm” or “Run Length Encoding” happens when you compress a string, and the consecutive duplicates of each string are replaced with the character, followed by the consecutive, repeated character count. For example: After string compression, the string “aaaabbcddddd” would return “a4b2c1d5”.
Run-length encoding (RLE) is a form of lossless data compression in which runs of data (sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run.
It is called Run Length Encoding
Basic encoder in PHP:
function numStringToRle($s){
$rle = '';
$count = 1;
$len = strlen($s);
for ( $i = 0; $i < $len; $i++ ){
if ( $i != $len && $s[$i] == $s[$i+1] ){
$count++;
}else{
$rle .= chr($s[$i] + 97).$count;
$count = 1;
}
}
return $rle;
}
Be warned it will preform badly issues with a string like
123456789123456789
If you were going to be handling a string that may have a lot of individual single characters you would be better to add some complexity and not write the length of the run if the length of the run is 1.
//change
$rle .= chr($s[$i] + 97).$count;
//to
$rle .= chr($s[$i] + 97).( $count == 1 ? '' : $count );
//or
$rle .= chr($s[$i] + 97)
if ( $count != 1 ){
$rle .= $count;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With