Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compress about 1000 bytes of text for a QueryString

I want to create a mechanism (in C#) where text from a QueryString is displayed on a website.

For example, in C# I might literally do;

public void Page_Load(blah)
{
      litSomething.text = Reques.QueryString["msg"];
}

Assume that the message is written in English (allowing UTF8 would be nice), and is no longer than say 1000 characters. I want to compress this text down as much as possible and still be able to place it in a QueryString.

We can predefine as many dictionary terms as we like (well with-in reason?). The server side code will encode and decode the messages.

(Obviously I'll be adding in all the usual XSS protection, HttpUtitlity.HtmlEncode etc type stuff. Also pointers to free dictionary sources would be good!)

Any tips, adivce, source code? This isn't my homework before you ask!

Update
Thanks for the suggestions. I want to make this a GET, so people IM/email URLs. Im thinking along the lines of bit.ly which would also be a cheat in itself. Wanted this to be a generic "short text compression" question though.

like image 518
Dead account Avatar asked Sep 18 '09 16:09

Dead account


2 Answers

You can encode the string as UTF-8 so that you get a byte array, that you can compress. The result is also a byte array, so you can use Base-64 encoding to get it as a string:

private static string Compress(string data) {
   using (MemoryStream ms = new MemoryStream()) {
      using (GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true)) {
         zip.Write(Encoding.UTF8.GetBytes(data), 0, data.Length);
      }
      return Convert.ToBase64String(ms.ToArray());
   }
}

Decompressing is just the other way around:

private static string Decompress(string data) {
   using (MemoryStream ms = new MemoryStream(Convert.FromBase64String(data))) {
      using (GZipStream zip = new GZipStream(ms, CompressionMode.Decompress, true)) {
         using (BinaryReader reader = new BinaryReader(zip)) {
            return Encoding.UTF8.GetString(reader.ReadBytes(10000));
         }
      }
   }
}
like image 53
Guffa Avatar answered Nov 15 '22 08:11

Guffa


Well, the immediate problems are:

  • The result of compression is basically going to be binary, so you'll need to base64-encode it, which will make it 1/3 bigger again. (You should use a websafe base64 encoding too.)
  • No compression algorithm will always reduce the size of the text

This means that if you can't cope with (say) ~1300 characters in the query string, there's no guarantee that it will always work. (As Marc says, use the body of a POST instead if you possibly can... then you can probably ignore compression in the first place.)

If you're happy with those though, there's nothing particularly different about your situation than any other:

  • Encode the string into bytes
  • Compress
  • Convert the compressed bytes back into text using Convert.ToBase64String (and then replace web-nasty characters)

On the other side, apply the same transformation in reverse.

Given that the compression API is stream-based, you could use StreamWriter to avoid explicitly converting from text to binary first.

like image 45
Jon Skeet Avatar answered Nov 15 '22 07:11

Jon Skeet