Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compression of ASCII strings in C

Tags:

c

compression

I have some C code that stores ASCII strings in memory as a four byte length followed by the string. The string lengths are in the range 10-250 bytes.

To reduce occupancy I'd like to compress each string individually on the fly, still storing the length (of the compressed string) followed by the compressed string.

I don't want to compress at a larger scope than individual strings because any string can be read/written at any time.

What libraries/algorithms are available for doing this?

Thanks for your help. NickB

like image 599
NickB Avatar asked Jul 08 '09 10:07

NickB


2 Answers

ZLib is always at your service - it has a very little overhead for the cases when the string contains uncompressable data, it's relatively fast, free and can be easily integrated into C and C++ programs.

like image 128
sharptooth Avatar answered Nov 03 '22 10:11

sharptooth


Most compression algorithms don't work very well with short strings. Here are a few compression algorithms that are designed to compress short English text strings. While they can handle any arbitrary byte in the plaintext string, such bytes often make the "compressed" data longer than the plaintext. So it's a good idea for the compressor to store "uncompressible" data unchanged and set a "literal" flag on such data (as Steve Jessop suggested).

  • "base 40 encoding": maximum compression 3:2
  • "Zork Standard Code for Information Interchange" (ZSCII): maximum compression 3:2
  • byte pair compression: maximum compression 2:1
  • a static Huffman table shared among all the strings (as suggested out by cygil).
    • ideally, formed from the exact character frequencies of all of your actual data.
    • Varicode: maximum compression 2:1
  • PalmDoc compression (byte pair compression + a simple variant of LZ77).
like image 38
David Cary Avatar answered Nov 03 '22 09:11

David Cary