Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compressing big number (or string) to small value

My ASP.NET page has following query string parameter:

…?IDs=1000000012,1000000021,1000000013,1000000022&...

Here IDs parameter will always have numbers separated by something, in this case ,. Currently there are 4 numbers but normally they would be in between 3 and 7.

Now, I am looking for method to convert each big number from above into smallest possible value; specifically compressing value of IDs query string parameter. Both, compressing each number algorithm or compressing whole value of IDs query string parameter are welcome.

  1. Encode or decode is not an issue; just compressing the value IDs query string parameter.
  2. Creating some unique small value for IDs and then retrieving its value from some data source is out of scope.

Is there an algorithm to compress such big numbers to small values or to compress value of the IDs query string parameter all together?

like image 692
Dave Avatar asked Aug 30 '09 22:08

Dave


People also ask

Can numbers be compressed?

Well, yes, a big number can be expressed mathematically, and potentially save some space doing so. So if you convert each character to its ASCII value, then every character is expanded from a single byte to 1, 2, or 3 bytes. That is, 'A' becomes '65'. 'z' becomes '122'.

Is it possible to compress a string?

"String Compression Algorithm” or “Run Length Encoding” happens when you compress a string, and the consecutive duplicates of each string are replaced with the character, followed by the consecutive, repeated character count. For example: After string compression, the string “aaaabbcddddd” would return “a4b2c1d5”.

How do I compress a large string in Python?

Algorithm for string compression in pythonPick the first character from the input string ( str ). Append it to the compressed string. Count the number of subsequent occurrences of the character (in str) and append the count to the compressed string if it is more than 1 only​.

How do I compress a string in Java?

string compression in java can be performed using a ZLIB compression library. It offers some distinct features to effectively compress string data in java. Although the compression rate could vary based on the factors such as the amount of compression required, length of data and repetitions in string data.


1 Answers

You basically need so much room for your numbers because you are using base 10 to represent them. An improvement would be to use base 16 (hex). So for example, you could represent 255 (3 digits) as ff (2 digits).

You can take that concept further by using a much larger number base... the set of all characters that are valid query string parameters:

A-Z, a-z, 0-9, '.', '-', '~', '_', '+'

That gives you a base of 67 characters to work with (see Wikipedia on QueryString).

Have a look at this SO post for approaches to converting base 10 to arbitrary number bases.

EDIT:

In the linked SO post, look at this part:

string xx = IntToString(42, 
            new char[] { '0','1','2','3','4','5','6','7','8','9',
            'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z',
            'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x'});

That's almost what you need. Just expand it by adding the few characters it is missing:

yz.-~_+

That post is missing a method to go back to base 10. I'm not going to write it :-) but the procedure is like this:

Define a counter I'll call TOTAL.

Look at the right most character and find it's position in the array.
TOTAL = (the position of the character in the array) Example: Input is BA1. TOTAL is now 1 (since "1" is in position 1 in the array)

Now look at the next character left of the first one and find it's position in the array. TOTAL += 47 * (the position of the character in the array) Example: Input is BA1. TOTAL is now (47 * 11) + 1 = 518

Now look at the next character left of the previous one and find it's position in the array. TOTAL += 47 * 47 * (the position of the character in the array) Example: Input is BA1. Total is now (47 * 47 * 10) + (47 * 11) + 1 = 243508

And so on.

I suggest you write a unit test that converts a bunch of base 10 numbers into base 47 and then back again to make sure your conversion code works properly.

Note how you represented a 6 digit base 10 number in just 3 digits of base 47 :-)

like image 190
Eric J. Avatar answered Sep 20 '22 23:09

Eric J.