Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Serialize an array and compress the string to use it in a url

The problem:

  • We need to serilize an array into a short string, the shorter the better.
  • The importance is more reflected on the bigger array than the smaller one.
  • The string will be used in a get request so it has to be url decoded.

Current code snippet

  /* 
    array (size=3)
      0 => string '/js/dhdbm78hdfb.js' (length=18)
      1 => string '/js/dfg4dg.js' (length=13)
      2 => string '/js/fg8hfhrt.js' (length=15)
      2 => string '/js/yjtdygj.js' (length=14)
      2 => string '/js/2q5g54rvfd.js' (length=17)
   */

  $json = json_encode($data);
  $gz = gzdeflate($json, 9);
  $base64 = base64_encode($gz);
  $paths = urlencode($base64);

  // outputs: i1aK0c8qjtFPyUhJyjW3yEhJS9LLKlbSgQmnpZukpCOLpKVbZKRlFJUgi1VmlaRUpmchCxkVmqabmhSVpaWARGMB

Not very impressive and pretty slow, I think there should be a better way of doing this...

Question

What would be the best approach to this problem? How can we present the smallest string possible?

PS

It's not the biggest issue if it's slow but it's a variable to be considered. The array will be hashed and retrived from cache when possible.

like image 401
superhero Avatar asked Mar 27 '13 18:03

superhero


Video Answer


2 Answers

If you want it as fast and as small as possible, ditch the general purpose tools and use a method suited to the particular data you're sending. Strip out anything duplicated between all of the array values (e.g. "/js/" and ".js"), strip out all of the array syntax from serialization, and just send a string concatenated list of your unique values, and reconstitute it on the receiving end.

Further compress it with gz and base64_encode if you must, but one of the drawbacks of having unique data is that it must be uniquely represented.

See the selected answer here: How to compress/decompress a long query string in PHP?


Edit: do you have the leeway to POST the data instead? That will at least avoid the direct size restrictions associated with the querystring. It'll also avoid the necessity to encode the string for the URL, adding length, and you should be able to send the zipped content directly, for most savings.

like image 81
Jason Avatar answered Sep 23 '22 16:09

Jason


serialization = json_encode
compression = gzdeflate
urlsafe = base64_encode

urlencode isn't needed as base64_encode produces safe characters only. I'd keep the base64_encode in there. You could consider finding a faster serializer or a faster compressor. You could even leave out the compressor if your input is short enough.


Algorithms have performance characteristics in two dimensions: number of instructions and memory usage. Typically you'll find that if you try to reduce one the other goes up. Indexes in a database allow for fast finding but take up a lot of space. Index-less finding is slow but doesn't require extra space for indexes.

Since you want short that means you need to make sacrifices in terms of fast.


Honestly, I think you've got a pretty optimal set here.

like image 26
Halcyon Avatar answered Sep 22 '22 16:09

Halcyon