Effectively compress strings of 10-1000 characters in Java?

1 Answers

"It depends".

I would start with just the primary candidates: LZMA ("7-zip"), deflate (direct, zlib: deflate + small wrapper, gzip: deflate + slightly larger wrapper, zip: deflate + even larger wrapper), bzip2 (I doubt this would be that good here, works best with a relative large window), perhaps even one of other LZ* branches like LZS which has an RFC for IP Payload compression but...

...run some analysis based upon the actual data and compression/throughput using several different approaches. Java has both GZIPOutputStream ("deflate in gzip wrapper") and DeflaterOutputStream ("plain deflate", recommend over gzip or zip "wrappers") standard and there are LZMA Java implementations (just need compressor, not container) so these should all be trivial to mock-up.

If there is regularity between the packets then it is is possible this could be utilized -- e.g. build cache mappings, Huffman tables, or just modify the "windows" of one of the other algorithms -- but packet-loss and "de-compressibility" likely needs to be accounted for. Going down this route though adds far more complexity. More ideas for helping out the compressor may be found at SO: How to find a good/optimal dictionary for zlib 'setDictionary' when processing a given set of data?.

Also the protocol should likely have a simple "fall back" of zero-compression because some [especially small random] data might not be practically compressible or might "compress" to a larger size (zlib actually has this guard, but also has the "wrapper overhead" so it would be better encoded separately for very small data). The overhead of the "wrapper" for the compressed data -- such as gzip or zip -- also needs to be taken into account for such small sizes. This is especially important to consider of string data less than ~100 characters.

Happy coding.

Another thing to consider is the encoding used to shove the characters into the output stream. I would first start with UTF-8, but that may not always be ideal.

See SO: Best compression algorithm for short text strings which suggests SMAZ, but I do not know how this algorithm will transfer to unicode / binary.

Also consider that not all deflate (or other format) implementations are created equal. I am not privy on Java's standard deflate compared to a 3rd party (say JZlib) in terms of efficiency for small data, but consider Compressing Small Payloads [.NET] which shows rather negative numbers for "the same compression" format. The article also ends nicely:

...it’s usually most beneficial to compress anyway, and determine which payload (the compressed or the uncompressed one) has the smallest size and include a small token to indicate whether decompression is required.

My final conclusion: always test using real-world data and measure the benefits, or you might be in for a little surprise in the end!

Happy coding. For real this time.

answered Oct 02 '22 12:10

15 revs, 2 users 95%user166390

Related questions
                            
                                Replace <Unknown Source> in Java Rhino (JSR223) with actual file name
                            
                                Java: How to convert any Object to a String representation of the data
                            
                                Does oneway declaration in Android .aidl guarantee that method will be called in a separate thread?
                            
                                Stacking generics
                            
                                HIbernate loads subclasses along with classes
                            
                                Web-Service: JAX-WS, CXF, WSDL... What's that all about?
                            
                                what is the difference between pass by reference and call by reference?
                            
                                Java ConcurrentHashMap not thread safe.. wth?
                            
                                Copying Java ResultSet
                            
                                CopyOnWriteArray or Vector
                            
                                How to write tag in my spring project?
                            
                                Is there any reason to prefer functional programming for data mining projects? [closed]
                            
                                Initialising An ArrayList [duplicate]
                            
                                How to convert a string to a stream of bits in java
                            
                                Why does loading this jpg using JavaIO give CMMException?
                            
                                Deployment issue with Maven Plugin [duplicate]
                            
                                Autowire not working in junit test
                            
                                Retrieving servlet context, session and request in a POJO outside container
                            
                                how to add javaDoc to my Java code in a NetBeans based project?
                            
                                How can I read a file as unsigned bytes in Java?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Effectively compress strings of 10-1000 characters in Java?

Tags:

java

compression

sanity

People also ask

1 Answers

15 revs, 2 users 95%user166390

Recent Activity

Donate For Us