Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I quickly encode and then compress a short string containing numbers in c#

Tags:

c#

I have strings that look like this:

000101456890
348324000433
888000033380

They are strings that are all the same length and they contain only numbers.

I would like to find a way to encode and then ompress (reduce the length) of the strings. The compression algoithm would need to just compress down to ASCII characters as these will be used as web page links.

So for example:

www.stackoverflow.com/000101456890  goes to www.stackoverflow.com/aJks

Is there some way I could do this, some method that would do the job of compressing quickly.

Thanks,

like image 385
czhili Avatar asked Jun 15 '11 10:06

czhili


People also ask

What type of encoding can I use to make a string shorter?

There is no encoding that "reduces size." Encodings are just mappings of bits to the character they represent. That said, ASCII is a 7 bit character set (encoding) that is often stored in 8 bits of space. If you limit the ranges that you accept, you can also weed out the control characters.

How do I compress the length of a string?

Start by taking the first character of the given string and appending it to the compressed string. Next, count the number of occurrences of that specific character and append it to the compressed string. Repeat this process for all the characters until the end of the string is reached.

How do I compress a string in Java?

string compression in java can be performed using a ZLIB compression library. It offers some distinct features to effectively compress string data in java. Although the compression rate could vary based on the factors such as the amount of compression required, length of data and repetitions in string data.


1 Answers

To do it simply, you could consider each as a long (plenty of room there), and hex-encode; that gives you:

60c1bfa
5119ba72b1
cec0ed3264

base-64 would be shorter, but you'd need to look at it as big-endian (note most .NET is little-endian) and ignore leading 0 bytes. That gives you:

Bgwb+g==
URm6crE=
zsDtMmQ=

For example:

    static void Main()
    {
        long x = 000101456890L, y = 348324000433L, z = 888000033380L;

        Console.WriteLine(Convert.ToString(x, 16));
        Console.WriteLine(Convert.ToString(y, 16));
        Console.WriteLine(Convert.ToString(y, 16));

        Console.WriteLine(Pack(x));
        Console.WriteLine(Pack(y));
        Console.WriteLine(Pack(z));

        Console.WriteLine(Convert.ToInt64("60c1bfa", 16).ToString().PadLeft(12, '0'));
        Console.WriteLine(Convert.ToInt64("5119ba72b1", 16).ToString().PadLeft(12, '0'));
        Console.WriteLine(Convert.ToInt64("cec0ed3264", 16).ToString().PadLeft(12, '0'));

        Console.WriteLine(Unpack("Bgwb+g==").ToString().PadLeft(12, '0'));
        Console.WriteLine(Unpack("URm6crE=").ToString().PadLeft(12, '0'));
        Console.WriteLine(Unpack("zsDtMmQ=").ToString().PadLeft(12, '0'));

    }
    static string Pack(long value)
    {
        ulong a = (ulong)value; // make shift easy
        List<byte> bytes = new List<byte>(8);
        while (a != 0)
        {
            bytes.Add((byte)a);
            a >>= 8;
        }
        bytes.Reverse();
        var chunk = bytes.ToArray();
        return Convert.ToBase64String(chunk);
    }
    static long Unpack(string value)
    {
        var chunk = Convert.FromBase64String(value);
        ulong a = 0;
        for (int i = 0; i < chunk.Length; i++)
        {
            a <<= 8;
            a |= chunk[i];
        }
        return (long)a;
    }
like image 114
Marc Gravell Avatar answered Nov 15 '22 00:11

Marc Gravell