I want to compress a string(an XML Document) in Java and store it in Cassandra db as varchar
. I should be able to decompress it while reading from db. I looked into GZIP and lz4 and both return a byte array on compressing.
My goal is to obtain a string from the compressed data which can also be used to decompress and get back the original string. What is the best possible approach?
Deflator is one of most used class for string compression in java. It uses the popular ZLIB compression library. It provides the function called deflator() to compress string. The function first takes the input string data, performs the compression and then fills the given buffer with the compressed data.
"String Compression Algorithm” or “Run Length Encoding” happens when you compress a string, and the consecutive duplicates of each string are replaced with the character, followed by the consecutive, repeated character count. For example: After string compression, the string “aaaabbcddddd” would return “a4b2c1d5”.
The string should be compressed such that consecutive duplicates of characters are replaced with the character and followed by the number of consecutive duplicates. For example, if the input string is “wwwwaaadexxxxxx”, then the function should return “w4a3dex6”. This kind of compression is called Run Length Encoding.
It is a compression algorithm that compresses a file into a smaller one using a table-based lookup. This algorithm is mainly used to compress GIF files and optionally to compress and PDF and TIFF files. The files compressed using this algorithm are saved with . lzw extension.
I don't see any good reasons for you to compress your data: Cassandra can do it for you transparently (it will LZ4 your data by default). So, if your goal is to reduce your data footprint then you have a non-existent problem, and I'd feed the XML document directly to C*.
By the way, all the compression algorithms take array of bytes and produce array of bytes. As a solution, you could apply something like a base64 encoding to your compressed byte array. On decompression, reverse the logic: decode base64 your string and then apply your decompression algorithm.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With