I have an sqlite database full of huge number of URLs and it's taking huge amount of diskspace, and accessing it causes many disk seeks and is slow. Average URL path length is 97 bytes (host names repeat a lot so I moved them to a foreign-keyed table). Is there any good way of compressing them? Most compression algorithms work well with big documents, not "documents" of less that 100 bytes on average, but even 20% reduction would be very useful. Any compression algorithms that would work? Doesn't have to be anything standard.
Start by taking the first character of the given string and appending it to the compressed string. Next, count the number of occurrences of that specific character and append it to the compressed string. Repeat this process for all the characters until the end of the string is reached.
Deflator is one of most used class for string compression in java. It uses the popular ZLIB compression library. It provides the function called deflator() to compress string. The function first takes the input string data, performs the compression and then fills the given buffer with the compressed data.
In Python 3, If you change text to a string, compressed = zlib. compress(text) should be compressed = zlib. compress(text. encode()) .
Pick the first character from the input string ( str ). Append it to the compressed string. Count the number of subsequent occurrences of the character (in str ) and append the count to the compressed string if it is more than 1 only. Pick the next character and repeat the steps above until the end of str is reached.
Use the compress algorithm but use a shared dictionary.
I've done something like this before where I used the LZC/LZW algorithm, as used by the Unix compress command.
The trick to get good compression with short strings is to use a dictionary made up of a standard sample of the URLs you are compressing.
You should easily get 20%.
Edit: LZC is a variant of LZW. You only require LZW as you only need a static dictionary. LZC adds support for resetting the dictionary/table when it gets full.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With