Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Issues with compression in javascript

I have an object I am trying to compress. It is of the form

[
  {
    array
    string
  },
  {
    array
    string
  },
  ...
]

The arrays are no more than 10-15 in length, extremely small in comparison to the strings (they are html, roughly 170k in length). The strings though are usually repeated, or have huge amounts of overlap. So my intuition tells me the compressed value should be the compress value of 1 string, plus a little extra.

I JSON.stringify this object and try to compress.

Most compression libraries did a bad job of compressing the strings, since the server sends me a gzip compressed version of 77kb, I know it can be at least this small.

gzip-js

lzma-js

Did a good job out of the maybe 15 libraries I tried.

The issue is gzip-js is linear in the number of strings. But lzma does this correctly, where it only increases in size slightly.

Lzma-js(level 2) is very slow unfortunately(20s vs 1s gzip) when compressing 7mbs(about 30~ strings).

Is there a compressopn library out there, that is roughly as quick as gzip, but doesn't scale linearly on repeat strings?

like image 406
Peter P Avatar asked Jul 03 '15 17:07

Peter P


Video Answer


2 Answers

Pako was usefull for me, give it a try:

Instead of using string ids use byteArrays, like it is done here.

Get pako.js and you can decompress byteArray like so:

<html>
<head>
<title>Gunzipping binary gzipped string</title>
<script type="text/javascript" src="pako.js"></script>
<script type="text/javascript">

// Get datastream as Array, for example:
var charData    = [31,139,8,0,0,0,0,0,0,3,5,193,219,13,0,16,16,4,192,86,214,151,102,52,33,110,35,66,108,226,60,218,55,147,164,238,24,173,19,143,241,18,85,27,58,203,57,46,29,25,198,34,163,193,247,106,179,134,15,50,167,173,148,48,0,0,0];

// Turn number array into byte-array
var binData     = new Uint8Array(charData);

// Pako magic
var data        = pako.inflate(binData);

// Convert gunzipped byteArray back to ascii string:
var strData     = String.fromCharCode.apply(null, new Uint16Array(data));

// Output to console
console.log(strData);

</script>
</head>
<body>
Open up the developer console.
</body>
</html>

Running example: http://jsfiddle.net/9yH7M/

Alternatively you can base64 encode the array before you send it over as the Array takes up a lot of overhead when sending as JSON or XML. Decode likewise:

// Get some base64 encoded binary data from the server. Imagine we got this:
var b64Data     = 'H4sIAAAAAAAAAwXB2w0AEBAEwFbWl2Y0IW4jQmziPNo3k6TuGK0Tj/ESVRs6yzkuHRnGIqPB92qzhg8yp62UMAAAAA==';

// Decode base64 (convert ascii to binary)
var strData     = atob(b64Data);

// Convert binary string to character-number array
var charData    = strData.split('').map(function(x){return x.charCodeAt(0);});

// Turn number array into byte-array
var binData     = new Uint8Array(charData);

// Pako magic
var data        = pako.inflate(binData);

// Convert gunzipped byteArray back to ascii string:
var strData     = String.fromCharCode.apply(null, new Uint16Array(data));

// Output to console
console.log(strData);

Running example: http://jsfiddle.net/9yH7M/1/

For more advanced features, read the pako API documentation.

like image 177
Alpha2k Avatar answered Oct 12 '22 23:10

Alpha2k


Use the gzip-js lib with high compress level
https://github.com/beatgammit/gzip-js

var gzip = require('gzip-js'),
    options = {
        level: 9,
        name: 'hello-world.txt',
        timestamp: parseInt(Date.now() / 1000, 10)
    };

// out will be a JavaScript Array of bytes
var out = gzip.zip('Hello world', options);

I found this way as minimum as posible size with normal duration

And for LZ-based compression algorithm i think lz-string is faster
check this on your data sample
https://github.com/pieroxy/lz-string

like image 1
Ali.MD Avatar answered Oct 13 '22 00:10

Ali.MD