Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compress small strings

I have an sqlite database full of huge number of URLs and it's taking huge amount of diskspace, and accessing it causes many disk seeks and is slow. Average URL path length is 97 bytes (host names repeat a lot so I moved them to a foreign-keyed table). Is there any good way of compressing them? Most compression algorithms work well with big documents, not "documents" of less that 100 bytes on average, but even 20% reduction would be very useful. Any compression algorithms that would work? Doesn't have to be anything standard.

like image 421
taw Avatar asked Jan 26 '09 09:01

taw


People also ask

How do you compress a string?

Start by taking the first character of the given string and appending it to the compressed string. Next, count the number of occurrences of that specific character and append it to the compressed string. Repeat this process for all the characters until the end of the string is reached.

How do I compress a string in Java?

Deflator is one of most used class for string compression in java. It uses the popular ZLIB compression library. It provides the function called deflator() to compress string. The function first takes the input string data, performs the compression and then fills the given buffer with the compressed data.

How do I compress a large string in Python?

In Python 3, If you change text to a string, compressed = zlib. compress(text) should be compressed = zlib. compress(text. encode()) .

How do I compress a string in C++?

Pick the first character from the input string ( str ). Append it to the compressed string. Count the number of subsequent occurrences of the character (in str ) and append the count to the compressed string if it is more than 1 only. Pick the next character and repeat the steps above until the end of str is reached.


1 Answers

Use the compress algorithm but use a shared dictionary.

I've done something like this before where I used the LZC/LZW algorithm, as used by the Unix compress command.

The trick to get good compression with short strings is to use a dictionary made up of a standard sample of the URLs you are compressing.

You should easily get 20%.

Edit: LZC is a variant of LZW. You only require LZW as you only need a static dictionary. LZC adds support for resetting the dictionary/table when it gets full.

like image 60
ewalshe Avatar answered Oct 20 '22 00:10

ewalshe