Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compressing large string in ruby

Tags:

I have a web application(ruby on rails) that sends some YAML as the value of a hidden input field.

Now I want to reduce the size of the text that is sent across to the browser. What is the most efficient form of lossless compression that would send across minimal data? I'm ok to incur additional cost of compression and decompression at the server side.

like image 904
gnarsi Avatar asked Jul 26 '13 13:07

gnarsi


People also ask

How do I compress a string?

Steps for string compression using run length encoding: Start by taking the first character of the given string and appending it to the compressed string. Next, count the number of occurrences of that specific character and append it to the compressed string.


1 Answers

You could use the zlib implementation in the ruby core to in/de-flate data:

require "zlib" data = "some long yaml string" * 100 compressed_data = Zlib::Deflate.deflate(data) #=> "x\x9C+\xCE\xCFMU\xC8\xC9\xCFKW\xA8L\xCC\xCDQ(.)\xCA\xCCK/\x1E\x15\x1C\x15\x1C\x15\x1C\x15\x1C\x15\x1C\x15\x1C\x15\x1C\x15D\x15\x04\x00\xB3G%\xA6" 

You should base64-encode the compressed data to make it printable:

require 'base64' encoded_data = Base64.encode64 compressed_data #=> "eJwrzs9NVcjJz0tXqEzMzVEoLinKzEsvHhUcFRwVHBUcFRwVHBUcFUQVBACz\nRyWm\n" 

Later, on the client-side, you might use pako (a zlib port to javascript) to get your data back. This answer probably helps you with implementing the JS part.

To give you an idea on how effective this is, here are the sizes of the example strings:

data.size            # 2100 compressed_data.size #   48 encoded_data.size    #   66 

Same thing goes vice-versa when compressing on the client and inflating on the server.

Zlib::Inflate.inflate(Base64.decode64(encoded_data)) #=> "some long yaml stringsome long yaml str ... (shortened, as the string is long :) 

Disclaimer:

  • The ruby zlib implementation should be compatible with the pako implementation. But I have not tried it.
  • The numbers about string sizes are a little cheated. Zlib is really effective here, because the string repeats a lot. Real life data usually does not repeat as much.
like image 78
tessi Avatar answered Sep 24 '22 23:09

tessi