Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Obtain a string from the compressed data and vice versa in java

I want to compress a string(an XML Document) in Java and store it in Cassandra db as varchar. I should be able to decompress it while reading from db. I looked into GZIP and lz4 and both return a byte array on compressing.

My goal is to obtain a string from the compressed data which can also be used to decompress and get back the original string. What is the best possible approach?

like image 927
Sukreet Roy Choudhury Avatar asked Apr 21 '17 10:04

Sukreet Roy Choudhury


People also ask

How do you compress a string in Java?

Deflator is one of most used class for string compression in java. It uses the popular ZLIB compression library. It provides the function called deflator() to compress string. The function first takes the input string data, performs the compression and then fills the given buffer with the compressed data.

What happens when a string is compressed?

"String Compression Algorithm” or “Run Length Encoding” happens when you compress a string, and the consecutive duplicates of each string are replaced with the character, followed by the consecutive, repeated character count. For example: After string compression, the string “aaaabbcddddd” would return “a4b2c1d5”.

Can string be compressed?

The string should be compressed such that consecutive duplicates of characters are replaced with the character and followed by the number of consecutive duplicates. For example, if the input string is “wwwwaaadexxxxxx”, then the function should return “w4a3dex6”. This kind of compression is called Run Length Encoding.

What is data compression in Java?

It is a compression algorithm that compresses a file into a smaller one using a table-based lookup. This algorithm is mainly used to compress GIF files and optionally to compress and PDF and TIFF files. The files compressed using this algorithm are saved with . lzw extension.


1 Answers

I don't see any good reasons for you to compress your data: Cassandra can do it for you transparently (it will LZ4 your data by default). So, if your goal is to reduce your data footprint then you have a non-existent problem, and I'd feed the XML document directly to C*.

By the way, all the compression algorithms take array of bytes and produce array of bytes. As a solution, you could apply something like a base64 encoding to your compressed byte array. On decompression, reverse the logic: decode base64 your string and then apply your decompression algorithm.

like image 56
xmas79 Avatar answered Nov 08 '22 17:11

xmas79