Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple/efficient text compression

What's the simplest, but efficient compression algorithm?

Deflate, lzma, etc. aren't valid options. I need something that compiles really small, like: RLE, LZX, Huffman, etc..

Note: The data is 95% ASCII text
Edit: Data is ~20kb at the moment, but i expect it to grow up to 1mb

Edit2:
Other interesting options
smaz https://github.com/antirez/smaz
FastLZ http://fastlz.org/

like image 590
arthurprs Avatar asked Jun 09 '10 01:06

arthurprs


People also ask

What is a good text compression rate?

The same is efficient in giving high compression ratios and enables super fast searching within the compressed text. Typical compression ratios of 70-80% and reducing the search time by 80-85% are the features of this paper.

What are the 3 text compression methods?

Finding the best possible model is the real art of data compression. There are three types of models: • static • semiadaptive or semistatic • adaptive.

What is the most efficient compression algorithm?

The fastest algorithm, lz4, results in lower compression ratios; xz, which has the highest compression ratio, suffers from a slow compression speed. However, Zstandard, at the default setting, shows substantial improvements in both compression speed and decompression speed, while compressing at the same ratio as zlib.


2 Answers

It sounds like LZO was designed to meet your requirements:

  • Decompression is simple and very fast.
  • Requires no memory for decompression.
  • Compression is pretty fast.
like image 91
Greg Hewgill Avatar answered Sep 21 '22 13:09

Greg Hewgill


Something BWT-based would be probably good for this case. http://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform
It compresses text much better than LZs, and is easy to implement from scratch, and there're good libraries.
http://libbsc.com
http://encode.ru/threads/104-libBWT?p=22903&viewfull=1#post22903
http://code.google.com/p/libdivsufsort/

Or, alternatively, there's ppmd which is used for text compression in rar/winzip/7-zip etc, but its more complicated.
http://www.compression.ru/ds/ppmdj1.rar
http://www.compression.ru/ds/ppmsj.rar (faster/small memory usage)
http://www.ctxmodel.net/files/PPMd/ppmd_Jr1_sh8.rar (alternative port)

like image 31
Shelwien Avatar answered Sep 24 '22 13:09

Shelwien